< - - Back to    Measurement matrix

DIF Depression Abstracts:

Scale Author

Azocar F, Areán P, Miranda J, Muñoz RF. Differential Item Functioning in a Spanish Translation of the Beck Depression Inventory. Journal of Clinical Psychology. (2001) 57(3):355–365.

Differential Item Function (DIF), applying the Mantel Haenszel method, was used to examine differences in the endorsement of Beck Depression Inventory (BDI) items among a sample of  292 English- and Spanish-speaking medical outpatients.  Ethnic differences were found; Latinos were more likely to endorse items reflecting tearfulness and punishment.  The endorsement of inability to work was less likely among Latinos than among non-Latinos. The article provides a good contextual discussion, anchored in the Latino cultural norms, of putative causes of DIF at the item level.

Associated Measurement Evaluation Grids (MEGs)

Kim Y, Pilkonis PA, Frank E, Thase ME, Reynolds CF: Differential Functioning of the Beck Depression Inventory in Late-Life Patients: Use of Item Response Theory. Psychology and Aging (2002) 17(3):379–391.

 “The present analyses examined age-related measurement bias in responses to items on the revised Beck Depression Inventory (BDI) in depressed late-life patients versus midlife patients. Item response theory (IRT) models were used to equate the scale and to differentiate true-group differences from bias in measurement in the 2 samples. Baseline BDI data (218 late life and 613 midlife) were used for the present analysis. IRT results indicated that late-life patients tended to report fewer cognitive symptoms, especially at low to average levels of depression. Conversely, they tended to report more somatic

symptoms, especially at higher levels of depression. Adjusted cutoff scores in the late-life group are provided, and possible reasons for age-related differences in the performance of the BDI are discussed.” [PUBLICATION ABSTRACT]

Associated Measurement Evaluation Grids (MEGs)


Chan KS, Orlando M, Ghosh-Dastidar B, Duan N, Sherbourne CD. The Interview Mode Effect on the Center for Epidemiological Studies Depression (CES-D) Scale: An Item Response Theory Analysis. Medical Care. (2004) 42(3): 281-289.

 Background: Evidence of a mode effect has raised concerns about the comparability and validity of self- versus interviewer-administered versions of the Center for Epidemiological Studies Depression (CES-D) scale. Response anonymity has been proposed to explain this effect. However, the factors that contribute to this mode effect are not well understood. We used item response theory (IRT) to examine the nature of the CES-D mode effect. Methods: A sample of depressed primary care patients from the Partners-in-Care Study were randomized to receive either a phone interview (N = 139) or a mail survey (N = 139) of the CES-D. We used likelihood ratio tests to identify differentially functioning items in the 2 groups. Category response curves are used to describe these effects. Results: Twelve items manifested differential functioning. Category response curves consistently indicate that phone respondents had a lower probability of endorsing the third of 4 response categories than mail respondents, suggesting a possible cognitive effect. Conclusion: Although response anonymity could be important in mode effects observed in surveys of sensitive topics, cognitive factors appear more important to the mode effect in the CES-D. [PUBLICATION ABSTRACT]

Associated Measurement Evaluation Grids (MEGs)

Cole SR, Kawachi I, Maller SJ, Berkman, LF. Test of item-response bias in the CES-D scale: experience from the New Haven EPESE Study. Journal of Clinical Epidemiology. (2000) 53: 285–289.

Secondary data analyses were performed using the New Haven component (N=2340) of the Established Populations for Epidemiologic Studies of the Elderly (EPESE) in the examination of  item-response bias associated with age, gender, and race in the Center for Epidemiologic Studies Depression (CES-D). Bias associated with race was found for the items: “people are unfriendly” and “people dislike me”; the proportional odds of blacks responding higher were two and three times that of whites matched on overall depressive symptoms, respectively.  Gender bias was evidenced by women scoring higher on the CES-D item “crying spells”.

Associated Measurement Evaluation Grids (MEGs)

Grayson DA, Mackinnon A, Jorm AF, Creasey H,   Broe GA.  Item Bias in the Center for Epidemiologic Studies Depression Scale: Effects of Physical Disorders and Disability in an Elderly Community Sample.  Journal of Gerontology: Psychological Sciences. (2000) 55B(5): P273–P282

 “The Center for Epidemiologic Studies Depression Scale (CES-D) is frequently used in studies of elderly individuals. One controversy regarding its use turns on the issue of whether the effect of physical disorder on the CES-D total score reflects genuine effects on depression or item-level artifacts. The present article addresses this issue using medical examination data from 506 community-dwelling individuals aged 75 or older. A form of structural equation modeling, the MIMIC model, is used, enabling the effect of a physical disorder on CES-D total score to be partitioned into bias and genuine depression components. The results show substantial physical disorder related artifacts with the CES-D total score. Caution is required in the use of CES-D (and possibly other) depression scales in groups in which physical disorders are present, such as in elderly individuals.” [PUBLICATION ABSTRACT]

Associated Measurement Evaluation Grids (MEGs)

Pickard AS, Dalal MR, Bushnell DM. A Comparison of Depressive Symptoms in Stroke and Primary Care: Applying Rasch Models to Evaluate the Center for Epidemiologic Studies-Depression Scale. Value in Health, (2006) 9(1):59-64.

 “Objectives: Clinical trials and community-based studies often include the Center for Epidemiologic Studies-Depression scale (CES-D) as a measure of depression outcome. We compared responses to symptom-related items on the CES-D by depressed stroke and primary-care patients for several purposes: 1) to illustrate the use of Item Response Theory (IRT)-based (Rasch) models for comparing scale functioning across different patient subgroups; and 2) to inform clinicians and outcome researchers about scale functioning and depressive symptomatology in stroke- compared with primary care-based depression. Methods: Two data sources were analyzed, including 32 depressed patients who were 3 months poststroke, and 366 depressed primary-care patients. Presence of depression was based on a CES-D score 16 or higher. Rasch models were used to assess item fit and compare item hierarchies between depressed primary-care and stroke patients. Results: Item hierarchies were similar for poststroke depression and primary care-based depression. Interpersonal disruption items were the most difficult to endorse for both groups. No items misfit the scale in primary-care depression.

Items relating to restless sleep, unfriendliness, and crying slightly misfit the scale in stroke patients, that is, may measure a different trait. Differential item functioning (DIF) between the groups was identified for items relating to appetite, restless sleep, crying, and feeling disliked. Conclusions: Results generally supported the use of the CESD as measure of depression outcome, particularly in primary care-based depression. DIF may imply that slightly different clusters of depressive symptoms are reported by depressed stroke patients compared with primary care, but this is conjectural given the small stroke sample size and the same items have been previously associated with bias in studies of large nonstroke samples. This study found Rasch models to be useful tools to investigate scale performance for different clinical applications.” [PUBLICATION ABSTRACT]

Associated Measurement Evaluation Grids (MEGs)


Gallo JJ, Cooper-Patrick L, Lasikar S. Depressive symptoms of whites and African Americans aged 60 years and older The Journals of Gerontology. (1998) 53B(5):277-285

“Consistent with prior work, our hypothesis was that older African Americans are less likely to report dysphoria than are older Whites. Study subjects were 968 participants aged 60 years and older in Baltimore, Maryland, and 1,486 participants aged 60 years and older in the Durham-Piedmont region of North Carolina who identified themselves as African American or White and who had complete data on symptoms of depression active in the one month prior to interview, as well as several covariates thought to be related to depression. The effect of self-reported race on the endorsement of symptoms from the section on Major Depression in the Diagnostic Interview Schedule was estimated employing structural equations with a measurement model. Older African Americans were less likely to report dysphoria than older Whites, although this only achieved statistical significance by conventional standards at the Durham-Piedmont site. Older African Americans at both sites were significantly more likely to report thoughts of death.” [PUBLICATION ABSTRACT]

Associated Measurement Evaluation Grids (MEGs)


Tang WK, Wong E, Chiu HFK, Lum CM, Ungvari, GS. The Geriatric Depression Scale should be shortened: results of Rasch analysis.  International Journal of Geriatric Psychiatry. (2005) 20:783–789.

 The article discusses unidimensionality, item fit, redundancy and differential item functioning (DIF) of the Geriatric Depression Scale (GDS) (15-item version) in a community-residing patients sample with pneumoconiosis in Hong Kong . DIF for age, level of education and cognitive impairment was not found. A large percentage of the GDS items (11/15 or 73.3%) had INFIT/OUTFIT statistics between 0.7–1.3. Misfit and redundant items were removed. A shortened version of the scale which showed similar overall performance to the original 15-item GDS was highlighted.

Associated Measurement Evaluation Grids (MEGs)


Osborne RH, Elsworth GR, Spranger MAG, Oort FJ, Hopper JL. The value of the Hospital Anxiety and Depression Scale (HADS) for comparing women with early onset breast cancer with population-based reference women. Quality of Life Research. (2004) 13:191-206.

 “Background: The Hospital Anxiety and Depression Scale (HADS) is frequently used in cancer studies, yet its utility for comparing people with cancer with people in the community is uncertain. Methods: HADS scores were obtained from population-based samples of women with (n=731) and without (n=158) early onset breast cancer. Psychometric properties were examined using differential item functioning (DIF) which is the presence of systematic group differences in certain response items independent of the trait being measured. Results: Women with breast cancer scored lower than reference women on anxiety (mean (SD) 7.5 (4.3) vs. 8.2 (4.0); p=0.06) and depression (3.3 (3.2) vs. 4.2 (3.0); p= 0.003). Group differences remained following adjustment for demographics. Time since diagnosis was not related to anxiety or depression scores. DIF was present in two anxiety and five depression items. Adjustment for DIF did not substantially change the anxiety or depression group differences. Conclusion: Specific sampling or DIF effects do not explain the observation that women with breast cancer have lower levels of anxiety and depression than population controls. The psychometric properties of the HADS appear to be acceptable in these groups.” [PUBLICATION ABSTRACT]

Associated Measurement Evaluation Grids (MEGs)

< - - Back to    TOP   Measurement matrix