Sometimes Means Some of the Time: Residents' Overlapping Responses to Vague Quantifiers on the ACGME-I Resident Survey

Yvonne Yock; Issac Lim; Yong Hao Lim; Wee Shiong Lim; Nicholas Chew; Sophia Archuleta

doi:10.4300/JGME-D-17-00187.1

ABSTRACT

Background

Vague quantifiers used in the Accreditation Council for Graduate Medical Education–International (ACGME-I) resident survey are open to interpretation, raising concerns about the validity of survey scores. Residency programs may be unduly cited if survey responses are affected by differing judgments of vague quantifiers.

Objective

Through investigating frequency judgment overlap, we assessed the validity of vague quantifiers by quantifying variation in residents' frequency judgment of the following response options: never, rarely, sometimes, very often, and extremely often.

Methods

We conducted a cross-sectional survey of residents in 2 ACGME-I accredited institutions in Singapore. Participants assigned a frequency judgment to response options in 8 questions in the ACGME-I Resident Survey. Overlap in frequency judgment was computed using the minimum and maximum frequency judgment for each response option. This was ascertained to have occurred when the maximum frequency of the preceding category exceeded the minimum frequency of the downstream categories. The percentage of participants whose frequency judgment overlapped was computed.

Results

Of 652 residents, 289 (44%) responded; after exclusions of incomplete and careless responses, 119 responses (18%) were included in the study. Frequency judgment overlap was more frequent for vague quantifiers that are adjacent, ranging from 11% to 50% for questions in faculty, evaluation, and resources domains. The percentage of frequency judgment overlap was greatest for duty hour questions, with an overlap between 21% and 47% for adjacent categories.

Conclusions

Residents demonstrated wide variation in frequency judgment of vague quantifiers, especially on the duty hour questions in the ACGME-I resident survey.

Introduction

The Accreditation Council for Graduate Medical Education–International (ACGME-I) Resident Survey is an important monitoring tool for evaluating residency programs and making accreditation-related decisions. An annual survey that gathers perceptions of clinical education and the learning environment has been conducted in the United States since 2004.¹ Since 2010, the survey has been administered in English in Singapore and 4 other countries that have adopted the ACGME-I accreditation framework.^2,3 Studies looking at the reliability and validity of scores on the resident survey have yielded mixed results.^4–7

For questions related to frequency and occurrences in the survey, residents respond by selecting 1 of the 5 following options: never, rarely, sometimes, very often, and extremely often. These response options have been termed vague quantifiers, as they denote quantification but lack concrete numerical quantities. Vague quantifiers have been found to be subjected to a wide range of frequency judgments with considerable overlaps, especially in those that are semantically adjacent.^4,5 Internal medicine program directors reported that resident survey terms are “vague/ambiguous/misinterpreted by residents,”^5(p3) and indicated that the response option of sometimes can be problematic.⁵

The resident survey is used by ACGME-I as a screening tool to assess compliance; the importance of this screening is increasing with Singapore residency programs' move toward a new accreditation system with annual data screening and less frequent site visits. Concerned about the effect that residents' varied frequency judgment may have on survey results, we aimed to quantify the variation in residents' frequency judgment for vague quantifier response options.

Methods

Study Setting and Data Collection

We conducted an anonymous, cross-sectional survey with residents enrolled full time in ACGME-I accredited residency programs as of March 1, 2014, at 2 sponsoring institutions in Singapore. All residents are proficient in English, and English is the lingua franca and medium of instruction within educational institutions in Singapore. E-mail invitations with an anonymous electronic link to the survey platform Qualtrics (Qualtrics LLC, Provo, UT) were sent to residents via their respective program coordinators between March and May 2014. When residents clicked on the link, they were directed to a participant information sheet explaining study aims and details, with a link to the survey. Consent to participate was implied if they proceeded with the survey.

Eight questions from the domains of educational content, faculty, duty hours, and resources were taken from the ACGME-I Resident Survey (table 1). Participants were instructed that never refers to 0% of the time, and to provide their frequency judgment of rarely, sometimes, very often, and extremely often by moving a slider between numerical values of 0 to 100 on the survey interface.

The National University of Singapore Institutional Review Board reviewed this study and determined it to be exempt.

Data Analysis

Previous studies sought to understand validity through expert validation,^8–10 interviews, and focus group discussions.^4,5,7 By getting participants to provide numerical frequency judgments corresponding to the various vague quantifiers, this study presents another way of looking at the issue of validity through eliciting comprehension of the vague quantifiers. This is akin to a cognitive interview without additional probing.^10–12

Data were first screened for logical consistency of the responses. Based on the phrasing of the study questions and option labels, it necessitates that the frequency judgment of rarely to be the smallest and the frequency judgment of extremely often to be the greatest. Participants who, for example, gave frequency judgments of rarely that were greater than sometimes, very often, and extremely often would be providing logically inconsistent responses. Logically inconsistent responses could be a result of inattentive or insufficient effort, and can be exacerbated by the anonymous nature of web-based surveys.¹³

In this study, we assumed response patterns¹³ that did not show an incremental increase in the frequency judgment of adjacent vague quantifiers (consistent with the measurement of event occurrence from the lowest to the highest intensity) to be careless responses, and removed them from further analyses. Next, we tabulated minimum and maximum frequency for each of the vague quantifier response options (table 1). Frequency judgment overlap was ascertained to have occurred when the maximum frequency of the preceding vague quantifier exceeded the minimum frequency of the downstream vague quantifiers. For example, frequency judgment overlap occurs when the maximum frequency for sometimes is 70 but the minimum frequency for very often is 50. The overlap values between 50 and 70 could either mean sometimes or very often. The percentage of participants whose frequency judgment falls within these overlapped regions would constitute the percentage of overlap between 2 vague quantifiers. To calculate percentage of overlap, the number of participants in the overlapped regions for 2 vague quantifiers was divided by the total number of participants rating both quantifiers.

Intraclass correlation was calculated to determine whether participants were consistent in their frequency judgment of vague quantifiers across all questions. For instance, if they equated a frequency judgment of 15 as rarely across questions. Descriptive statistics and the figure illustrating frequency judgment overlap without outliers were also included to understand how outliers affect frequency judgment overlap. Outliers were identified as observations that were furthest away from the mean, and they were replaced by sample mean.

figure. Illustration of Frequency Judgment Overlap Across Vague Quantifiers — **figure** Illustration of Frequency Judgment Overlap Across Vague Quantifiers
Citation: Journal of Graduate Medical Education 9, 6; 10.4300/JGME-D-17-00187.1

Data were analyzed using RStudio 0.98.162 (RStudio, Boston, MA).

Results

A total of 289 of 652 eligible participants (44%) responded to the study, and 186 (64%) completed all study questions. We excluded 67 (23%) participants due to careless responses. We included 119 surveys (18%) in the final analysis.

Of these 119, 66 (55%) were from medical residency programs, 28 (24%) were from surgical residency programs, and 13 (11%) were from all other residency programs. Twelve (10%) did not indicate their residency program. Sixty participants (50%) were in postgraduate year 1 (PGY-1) to PGY-3; 49 (41%) were in PGY-4 to PGY-6; and 10 (8%) were in PGY-7 and above. The final sample is comparable to the total population (652 residents) and the participants (289 residents) in terms of representation from the 2 sponsoring institutions as well as types of residency programs. In addition, the PGY breakdown was similar between the sample that responded to the survey and the final sample.

Table 1 summarizes the frequency judgment of the vague quantifiers for the 8 survey questions. There was a steady increase in the mean of frequency judgment from rarely to extremely often. The standard deviation tended to be smallest for rarely and larger for the other vague quantifiers. When compared across domains, the standard deviation for very often and extremely often was greater in the domains of resources and duty hours.

In general, frequency judgment overlap occurs at a higher percentage for vague quantifiers that are adjacent, with overlap between 38% and 82% for questions in the faculty, educational content, and resources domains (figure and provided as online supplemental material). Percentage of frequency judgment overlap was also higher for duty hours questions, with overlap between 58% and 95% for adjacent vague quantifiers. In contrast, the percentage of frequency judgment overlap was considerably lower for nonadjacent vague quantifiers. For instance, the overlap was between 1% and 49% for rarely and very often, between 1% and 16% for rarely and extremely often, and between 5% and 29% for sometimes and extremely often for questions in the faculty, educational content, and resources domains. Similarly, the percentage of frequency judgment overlap for nonadjacent vague quantifiers was also higher for duty hours questions—between 18% and 45% for rarely and very often, between 3% and 18% for rarely and extremely often, and between 53% and 56% for sometimes and extremely often.

After removing the outliers, frequency judgment overlaps between adjacent vague quantifiers, although reduced, were still substantial, with overlaps ranging from 21% to 72% for questions in the faculty, educational content, and resources domains (descriptive statistics and illustration are provided as online supplemental material). Good intraclass correlation within participants was found (table 2).

Discussion

Our results suggest that there was considerable frequency judgment overlap of adjacent vague quantifiers in the ACGME-I resident survey, attributable to participants perceiving adjacent vague quantifiers to be similar in meaning. Participants were consistent in their frequency judgment of the vague quantifiers as evident from the good intraclass correlation coefficient. Participants who gave a frequency judgment of 15 to rarely for question 1 were likely to give a similar frequency judgment for rarely in the other questions.

Disconcertingly, standard deviations and frequency judgment overlaps were greater for questions about duty hours. It is unclear why this is the case. One possibility could be the confusing and difficult phrasing of questions in the duty hour domain. Questions in this domain require participants to recall various frequencies and undertake various calculations in their heads, in contrast with questions from other domains where they are asked to recall instances. For instance, 1 question requires residents to think of instances when they break the duty hours rule, which presupposes that they are guilty of breaking the rule. Residents then have to recall these instances over a 4-week period. The effort it takes to process this question is likely to be a burden on the residents' working memory, with subsequent greater variability in their recall.¹⁴ This increase in cognitive effort, coupled with a wide range of interpretations for vague quantifiers, may have implications on interpreting the results of the questions in the ACGME-I Resident Survey, in particular with regard to duty hours violations. In our study, we found that the percentage of frequency judgment overlap was greater for rarely and sometimes (39.9% to 46.6%), rarely and very often (15.6% to 41.6%), and rarely and extremely often (1.7% to 15.9%) for questions in the duty hours domain. Residency programs would be flagged for noncompliance if a substantial number of residents answered sometimes, very often, or extremely often to these questions. The high percentage of frequency judgment overlap may result in the reported incidence of duty hours violation to be higher than the actual duty hours violation, which could lead to residency programs being unduly flagged for noncompliance.

Our findings are similar to those of other studies on frequency judgment. While the other studies set out to understand the average frequency for each vague quantifier, we went a step further to understand the percentage of frequency judgment overlap, which allowed us to quantify variation in residents' frequency judgment for vague quantifier response options in the ACGME-I Resident Survey.

Our study has limitations, including the low response rate and small final sample due to nonresponse and exclusion of careless responses, which reduces the generalizability and validity of the results. While the final sample is comparable to the population, we cannot rule out systematic differences in those who were included in the final sample and those who were not included. The substantial proportion of careless responses (23% of respondents) may suggest a larger problem with study fidelity. If careless responses were included in the analysis, it would increase the percentage of frequency judgment overlap as the majority of the careless respondents gave frequency judgment that were greatest for rarely and smallest for extremely often.

A larger follow-up study is needed to ascertain whether the phrasing of survey questions in the ACGME-I Resident Survey or the vague quantifiers themselves lead to variation in frequency judgment. Future studies could replace vague quantifiers with response options that are more specific, for example, less than once a week for rarely. This way, a reference period and actual numerical benchmark of event occurrence could be established.¹⁵

Conclusion

In this study, residents were asked to give their frequency judgment of the vague quantifiers response options used in the ACGME-I Resident Survey. Considerable variation in residents' interpretation of frequency judgment was found, which could affect the validity of survey results.

[1] 1

Holt KD,

Miller RS.
The ACGME Resident Survey aggregate reports: an analysis and assessment of overall program compliance. J Grad Med Educ. 2009;1(
2
):327–333.

OpenURL
PubMed
Google Scholar
Crossref

[2] OpenURL

[3] PubMed

[4] Google Scholar

[5] Crossref

[6] 2

Huggan PJ,

Samarasekara DD,

Archuleta S,
et al. The successful, rapid transition to a new model of graduate medical education in Singapore. Acad Med. 2012;87(
9
):1268–1273.

OpenURL
PubMed
Google Scholar
Crossref

[7] OpenURL

[8] PubMed

[9] Google Scholar

[10] Crossref

[11] 3
ACGME International. Where we are. http://www.acgme-i.org/about-us/where-we-are. Accessed September 20, 2017.

OpenURL
PubMed
Google Scholar
Crossref

[12] OpenURL

[13] PubMed

[14] Google Scholar

[15] Crossref

[16] 4

Sticca RP,

MacGregor JM,

Szlabick RE.
Is the Accreditation Council for Graduate Medical Education (ACGME) Resident/Fellow Survey a valid tool to assess general surgery residency programs compliance with work hours regulations? J Surg Educ. 2010;67(
6
):406–411.

OpenURL
PubMed
Google Scholar
Crossref

[17] OpenURL

[18] PubMed

[19] Google Scholar

[20] Crossref

[21] 5

Adams M,

Willett LL,

Wahi-Gururaj S,
et al. Usefulness of the ACGME Resident Survey: a view from internal medicine program directors. Am J Med. 2014;127(
4
):351–355.

OpenURL
PubMed
Google Scholar
Crossref

[22] OpenURL

[23] PubMed

[24] Google Scholar

[25] Crossref

[26] 6

Holt KD,

Miller RS,

Philibert I,
et al. Residents' perspectives on the learning environment: data from the Accreditation Council for Graduate Medical Education Resident Survey. Acad Med. 2010;85(
3
):512–518.

OpenURL
PubMed
Google Scholar
Crossref

[27] OpenURL

[28] PubMed

[29] Google Scholar

[30] Crossref

[31] 7

Ibrahim H,

Lindeman B,

Matarelli SA,
et al. International residency program evaluation: assessing the reliability and initial validity of the ACGME-I Resident Survey in Abu Dhabi, United Arab Emirates. J Grad Med Educ. 2014;6(
3
):517–520.

OpenURL
PubMed
Google Scholar
Crossref

[32] OpenURL

[33] PubMed

[34] Google Scholar

[35] Crossref

[36] 8

Schwarz N.
What respondents learn from questionnaires: the survey interview and the logic of conversation. Int Stat Rev. 1995;63:153–168.

OpenURL
PubMed
Google Scholar
Crossref

[37] OpenURL

[38] PubMed

[39] Google Scholar

[40] Crossref

[41] 9

Artino AR Jr,

La Rochelle JS,

Dezee KJ,
et al. Developing questionnaires for educational research: AMEE Guide No. 87. Med Teach. 2014;36(
6
):463–474.

OpenURL
PubMed
Google Scholar
Crossref

[42] OpenURL

[43] PubMed

[44] Google Scholar

[45] Crossref

[46] 10

Bocklisch F,

Bocklisch SF,

Krems JF.
Sometimes, often, and always: exploring the vague meanings of frequency expressions. Behav Res Methods. 2012;44(
1
):144–157.

OpenURL
PubMed
Google Scholar
Crossref

[47] OpenURL

[48] PubMed

[49] Google Scholar

[50] Crossref

[51] 11

Curran PG.
Methods for the detection of carelessly invalid responses in survey data. J Exp Soc Psychol. 2016;66:4–19.

OpenURL
PubMed
Google Scholar
Crossref

[52] OpenURL

[53] PubMed

[54] Google Scholar

[55] Crossref

[56] 12

Sudman S.
Mail surveys of reluctant professionals. Eval Rev. 1985;9(
3
):349–360.

OpenURL
PubMed
Google Scholar
Crossref

[57] OpenURL

[58] PubMed

[59] Google Scholar

[60] Crossref

[61] 13

Johnson JA.
Ascertaining the validity of individual protocols from web-based personality inventories. J Res Personal. 2005;39(
1
):103–129.

OpenURL
PubMed
Google Scholar
Crossref

[62] OpenURL

[63] PubMed

[64] Google Scholar

[65] Crossref

[66] 14

Tourangeau R,

Rips LJ,

Rasinski K.
The Psychology of Survey Response.
Cambridge, UK
:
Cambridge University Press;
2000.

OpenURL
PubMed
Google Scholar
Crossref

[67] OpenURL

[68] PubMed

[69] Google Scholar

[70] Crossref

[71] 15

Lietz P.
Research into questionnaire design. Intl J Market Res. 2010;52(
2
):249–272.

OpenURL
PubMed
Google Scholar
Crossref

[72] OpenURL

[73] PubMed

[74] Google Scholar

[75] Crossref

Article Contents

Sometimes Means Some of the Time: Residents' Overlapping Responses to Vague Quantifiers on the ACGME-I Resident Survey

ABSTRACT

Background

Objective

Methods

Results

Conclusions

Introduction

Methods

Study Setting and Data Collection

Data Analysis

Results

Discussion

Conclusion

Rouge on the Lips of Silence

Evaluating Methodology for Increasing Diversity in US Residency Training Programs: A Scoping Review

Scoping Review of Simulation-Based Training for Social Determinants of Health Within Residency Programs

Career Outcomes Among Graduates of 2 Urban Health Primary Care Training Programs

Trends in MedEdPORTAL Faculty Development Resources for Clinician Educators

The Effect of Paging Reminders on Fellowship Conference Attendance: A Multi-Program Randomized Crossover Study

A Values Affirmation Intervention to Improve Female Residents' Surgical Performance

Improving Residents' Safe Opioid Prescribing for Chronic Pain Using an Objective Structured Clinical Examination

Integrating a Resident-Driven Longitudinal Quality Improvement Curriculum Within an Ambulatory Block Schedule

Skills for Interviewing Adolescent Patients: Sustainability of Structured Feedback in Undergraduate Education on Performance in Residency

Get Email Alerts

Rouge on the Lips of Silence

Evaluating Methodology for Increasing Diversity in US Residency Training Programs: A Scoping Review

Scoping Review of Simulation-Based Training for Social Determinants of Health Within Residency Programs

Career Outcomes Among Graduates of 2 Urban Health Primary Care Training Programs

Trends in MedEdPORTAL Faculty Development Resources for Clinician Educators

The Effect of Paging Reminders on Fellowship Conference Attendance: A Multi-Program Randomized Crossover Study

A Values Affirmation Intervention to Improve Female Residents' Surgical Performance

Improving Residents' Safe Opioid Prescribing for Chronic Pain Using an Objective Structured Clinical Examination

Integrating a Resident-Driven Longitudinal Quality Improvement Curriculum Within an Ambulatory Block Schedule

Skills for Interviewing Adolescent Patients: Sustainability of Structured Feedback in Undergraduate Education on Performance in Residency