A comparative study of age-related stereotypes between OpenAl's GPT-40 and DeepSeek's chat model
W. Hong, M. Choi.
Full text PDF 
( Download count: 1)
AbstractPURPOSE: Generative artificial intelligence, particularly large language models (LLMs), has become integrated into daily use and is rapidly transforming how people communicate, access information, and perceive others. [1] Age-related biases in LLMs tend to be understudied. However, Hong and Choi's recent publication on the semantic analysis of age-related stereotypes in OpenAl's GPT-4 model showed subtle age stereotypes, even while using mostly positive language. [2] Given the different developmental contexts of LLMs, this study aims to compare GPT-40 and DeepSeek regarding age-related stereotypes within the framework of the stereotype content model. [3] Method Data were collected using both chat model APIs with the prompt, "Describe the personality of a [AGE]-year-old person" in November 2025, varying AGE from 10 to 90 in 10-year increments. The prompt did not directly request biased content, but the model still generated coherent responses. Parameters for both LLMs were set to reflect their web-based interfaces. The analysis was guided by the Stereotype Content Model, which assesses how social groups are perceived in terms of friendliness and trustworthiness (warmth) and perceived ability and effectiveness (competence), replicating the approach used in the previous work. [2] After sentence-level filtering, stereotype content was quantified using sentence embeddings and a Partial Least Squares regression model trained on validated warmth and competence-related seed adjectives. This approach produced continuous warmth and competence scores (ranging from -1 to 1) for each sentence. These descriptions were then compared across age groups and between models. RESULTS AND DISCUSSION: Both LLMs showed that individuals aged 60 and older were described as warmer but less competent than younger age groups. Warmth and competence distributions varied significantly across age groups and between the two models. Notably, DeepSeek demonstrated greater variability than GPT-40. An analysis of stereotype-related word use indicated that GPT-40 characterized adults over 60 as "accepting,” “communal,” and “tolerant,” yet comparatively low in competence, particularly in assertiveness, using descriptors such as “cautious” and "vulnerable." In contrast, DeepSeek tended to generate fewer positive warmth-related terms and more descriptors associated with reduced sociability, including words like “quiet.” Overall, while GPT-40 tends to depict older adults with paternalistic stereotypes of being highly warm but less competent, DeepSeek produces a more muted characterization, depicting older adults as having relatively lower sociability. This difference may stem from differences in training data or reflect cultural variations embedded during the models' reinforcement learning process. However, the fact that both models consistently describe older adults as having fewer ambitions and goals is a point of concern, given LLMs' capacity to shape and reinforce age-based perceptions. These findings support the development of age-inclusive gerontechnology by informing the auditing and calibration of LLM-based systems prior to deployment. Designers and evaluators can use this framework to detect paternalistic framing, and such evaluation enables the mitigation of age bias in LLM-driven technologies.Keywords: ageism; generative artificial intelligence; large language model; bias; stereotype
W. Hong, M. Choi. (2026). A comparative study of age-related stereotypes between OpenAl's GPT-40 and DeepSeek's chat model. Gerontechnology, 25(2), 1-10
https://doi.org/10.4017/gt.2026.25.2.1460.3