Association Between Internal Medicine Residency Applicant Characteristics and Performance on ACGME Milestones During Intern Year
ABSTRACT
Background
Residency programs apply varying criteria to the resident selection process. However, it is unclear which applicant characteristics reflect preparedness for residency.
Objective
We determined the applicant characteristics associated with first-year performance in internal medicine residency as assessed by performance on Accreditation Council for Graduate Medical Education (ACGME) Milestones.
Methods
We examined the association between applicant characteristics and performance on ACGME Milestones during intern year for individuals entering Northwestern University's internal medicine residency between 2013 and 2018. We used bivariate analysis and a multivariable linear regression model to determine the association between individual factors and Milestone performance.
Results
Of 203 eligible residents, 198 (98%) were included in the final sample. One hundred fourteen residents (58%) were female, and 116 residents (59%) were White. Mean Step 1 and Step 2 CK scores were 245.5 (SD 12.0) and 258 (SD 10.8) respectively. Step 1 scores, Alpha Omega Alpha membership, medicine clerkship grades, and interview scores were not associated with Milestone performance in the bivariate analysis and were not included in the multivariable model. In the multivariable model, overall clerkship grades, ranking of the medical school, and year entering residency were significantly associated with Milestone performance (P ≤ .04).
Conclusions
Most traditional metrics used in residency selection were not associated with early performance on ACGME Milestones during internal medicine residency.
Introduction
Residency programs devote considerable thought and apply varying criteria to resident selection. However, prior studies have yielded inconsistent results regarding applicant factors (such as clerkship grades or standardized test scores) associated with residency performance in internal medicine1–4 and other specialties.5–9 The predictive value of such factors is low. Fine and Hayward reported that residency selection committee ranking was only moderately correlated with subsequent performance assessments,1 and Neely et al found that applicant characteristics explained a minority of the variance in third-year resident performance rating in internal medicine.2 Prior studies have relied largely on internally developed benchmarks, limiting generalizability and reproducibility.1,2
In 2013, the Accreditation Council for Graduate Medical Education (ACGME) introduced Milestones for internal medicine residents, outlining competency-based expectations across programs and training years.10,11 The Milestones represent an attempt to standardize expected educational outcomes, enabling educators to study factors that contribute to performance.12–15 While studies have examined the relationship between applicant factors and performance at the conclusion of residency,1,2 factors during residency that are difficult to quantify (such as mentorship and peer support) also contribute to resident success.
The aim of this study was to determine the resident applicant characteristics associated with preparedness for internal medicine internship using performance on ACGME Milestones during intern year.
Methods
We performed a retrospective cohort study examining the association between residency application characteristics and subsequent performance on ACGME Milestones among internal medicine residents at McGaw Medical Center, Northwestern University. All data were deidentified prior to analysis.
The study population consisted of residents entering our categorical internal medicine residency program from 2013 to 2018. These classes were selected because the 2013–2014 intern class was the first assessed using the ACGME Milestones and competency frameworks. Individuals with incomplete data or who transferred in or out of the program outside of the match process were excluded.
We examined resident factors that are used in the Northwestern internal medicine residency selection process or that have been shown in prior studies to be influential.1,2,4,7,16–18 We obtained applicant characteristics from residency files derived from Electronic Residency Application Service (ERAS) applications. Self-reported demographic data were obtained from residency records.
Because the incremental difference of a 1-point increase on USMLE examinations is likely small, Step 1 and Step 2 CK scores were defined as categorical variables with 10-point ranges (Table 1). Step 2 CK scores are not required as part of the residency application; individuals who did not submit scores were categorized as “unknown.”
As part of the residency selection process, each applicant's internal medicine clerkship grade was assigned a value from 5 to 90 (with 5 representing the top fifth percentile) using information available in the medical student performance evaluation to account for the variability in grade distribution between schools. This number was based on the individual's medicine clerkship grade compared to the overall distribution of grades within their medical school class. Applicants were assumed to be in the median percentile within a given grade category (ie, if the top 30% of a medical school class earns honors, an individual earning honors was assumed to be at the top 15th percentile). Numbers were adjusted if more granular data were available within the medicine clerkship letter. A similar process was used to determine each applicant's average grade across all core clinical clerkships.
We used US News & World Report (USNWR) “Best Medical Schools: Research Rankings” as a surrogate for perceived medical school competitiveness.19 We used the most recent rankings for consistency because ranking volatility is relatively low. Medical school ranking was treated as a categorical variable with 20 schools in each category.
Alpha Omega Alpha (AOA) membership is an optional question in ERAS, and non-responders were assumed to be non-members. The few residents who reported that their school held elections senior year or did not have a chapter were classified as non-members given that some students from these schools explicitly indicated this while others left the question blank.
Gold Humanism Honor Society (GHHS) membership was determined using a publicly available database of GHHS members and chapters.20 A resident was considered to be eligible for GHHS if a chapter existed at their medical school at least 1 year prior to their medical school graduation. Because many residents attended a school without a chapter, membership was categorized into 3 groups: members, non-members who were eligible, and non-members who were not eligible.
Each applicant was interviewed by 2 faculty. Interviewers were not provided with applicant grades or test scores. Each interviewer gave an overall interview score from 1 to 5 using a standardized rubric, with 1 being the strongest score. We averaged interview scores for each applicant.
Age and gender are not used in our residency selection process, but have been correlated with resident performance assessments elsewhere.7,21–25 Race was not used as a predictor but was included as a covariate in the multivariable model to account for possible implicit bias in the assessment process.21,26
The primary outcome was mean performance across all 22 ACGME subcompetencies on the midyear assessment in December of intern year. For the 2013–2014 intern class, we used the year-end assessment because a midyear assessment was not completed.
As part of the ongoing resident development process, the clinical competency committee assessed each resident across the 22 ACGME subcompetencies. Performance for each subcompetency was determined using attending evaluations, resident evaluations, and a summative assessment tool generated from the electronic assessment system. Further input was derived from nurse evaluations, conference attendance, evaluation completion rate, scholarly productivity, and extracurricular activities. Residents were rated on each subcompetency using the Milestone-based scale (1–9), with 9 representing the aspirational Milestone. The program director, an associate program director, and a program coordinator approved the final Milestone ratings for each resident.
Each of the 22 ACGME subcompetencies is grouped under 1 of 6 core competencies (patient care, medical knowledge, systems-based practice, practice-based learning and improvement, professionalism, and interpersonal and communication skills). Mean performance for each core competency was assessed as a secondary outcome by averaging the performance on the subcompetencies that comprise the broader core competency.
We first performed an exploratory bivariate analysis to understand the impact of individual predictor variables in isolation. We used Pearson's correlation coefficients to evaluate the correlation between continuous predictors and outcome measures and 2-sample t tests to assess the association between binary predictors (gender and AOA membership) and outcome measures. One-way analysis of variance (ANOVA) tested for differences in mean outcome scores between non-binary categorical predictor groups.
We then performed a multivariable linear regression analysis. Given the moderate sample size, we initially included all predictors in our model to account for potentially unmeasured confounders, irrespective of whether predictors were statistically significant in bivariate analysis. We then used a backward stepwise approach to refine our model, removing individual predictors sequentially. We determined the Akaike information criterion (an estimate of model fit) for each sequential model to select the set of covariates that demonstrated the best regression model fit for the primary outcome.27 Gender, age, intern academic year, Step 2 CK score, overall clerkship grades, and USNWR rankings were retained in the final model. This set of covariates consistently demonstrated relatively high goodness of fit across regression models for secondary outcomes.
A significance level of .05 was used for all analyses. Analyses were conducted in Stata 15.1 (StataCorp LLC, College Station, TX).
This study was determined to be exempt by Northwestern University's Institutional Review Board with a waiver of informed consent.
Results
Of 203 eligible residents, 198 were included in the final analysis. Of the 5 excluded residents, 4 transferred in or out of the program and 1 had incomplete data. Residents' demographic and academic characteristics are summarized in Table 1. The study population was majority female (114 residents, 58%) and White (116 residents, 59%). Mean Step 1 and Step 2 CK scores were 245.5 (SD 12) and 258.0 (SD 10.8), respectively. Approximately 41% (82 of 198) of residents were in AOA. Only 12% (24 of 198) of residents were in GHHS, but approximately 37% (73 of 198) of students attended a school without a chapter.
The mean score across all subcompetencies was 5.22 (SD 0.51), and there was an approximately normal distribution of scores. For individual core competencies, the mean score ranged from 4.98 (SD 0.64) for patient care to 5.54 (SD 0.60) for professionalism.
The results of the bivariate analysis are presented in Table 2. The year entering residency had a statistically significant association with the primary outcome and all secondary outcomes (P < .001 for all), although no trend was observed across years. The 2013–2014 intern class had the highest mean Milestone score (5.81), whereas the 2018–2019 class had the lowest (4.81). Women were assessed as having lower performance on Medical Knowledge Milestones compared to men (4.90 vs 5.10, P = .033). USNWR medical school ranking was associated with performance on patient care and medical knowledge competencies, with students attending a school ranked 1 to 20 having the highest mean score (5.16, P = .036 and 5.11, P = .021, respectively). Milestone performance was not significantly associated with Step 1, Step 2 CK, or AOA membership.
For continuous variables, interview score, age, and performance in the medicine clerkship were not associated with Milestone performance. Performance across all core clerkships was correlated with performance on professionalism subcompetencies (r = -0.14, P = .045), but was not significantly associated with other outcomes.
In multivariable regression analysis, only a few predictors were significantly associated with Milestone performance (Table 3). There were statistically significant differences between each year entering residency compared to the referent group (2018–2019 intern year) for the primary outcome (P < .001 to .03) and many of the secondary outcomes. Male gender was associated with 0.14 points higher performance on medical knowledge Milestones (95% CI 0.01–0.26, P = .031). Core clerkship grades were significantly associated with the primary outcome as well as performance on professionalism, interpersonal communication skills, and practice-based learning (P = .01 to .02). Each 1 percentile point worsening in clerkship grades was associated with a -0.01 change in overall Milestone score (95% CI -0.01 to -0.001).
Compared to attending a medical school ranked in the top 20 by USNWR, attending a school ranked 20 to 40 was associated with lower performance for the primary outcome and all core competencies except patient care and systems-based practice. Attending the lowest ranked category of school (> 60 or unranked) was also associated with lower overall performance on the Milestones (-0.23; 95% CI -0.42 to -0.04; P = .019) as well as lower performance on medical knowledge (-0.36; 95% CI -0.57 to -0.15; P = .001), patient care (-0.23; 95% CI -0.44 to -0.02; P = .034), and practice-based learning and improvement (-0.22; 95% CI -0.43 to -0.005; P = .045).
Discussion
Most internal medicine residency applicant factors (including Step 1 scores, medicine clerkship grades, interview performance, and AOA membership) were not associated with Milestone performance during intern year. Attending a medical school ranked in the top 20 by USNWR was associated with a statistically significant improvement in overall performance on ACGME Milestones, but the absolute difference was minimal (0.19 points higher compared to those who attended a school ranked 20–40) and was not statistically significant when compared to individuals who attended a school ranked 41 to 60. Core clerkship grades were significantly associated with mean Milestone performance. While the effect size (a .01 change in Milestone score per 1 percentile improvement in grade) appears small, it may suggest meaningful differences in residency performance: a student ranked in the middle of the first quartile (12.5 percentile) of their medical school may perform 0.5 points higher on Milestones as intern compared to a student ranked in the middle of the third quartile (62.5 percentile).
This study builds on prior studies of internal medicine residency programs, suggesting that most traditional residency selection criteria do not predict resident performance.1,2,4 A University of Michigan study found that only internal medicine clerkship honors and medical school were significantly associated with third-year resident performance in a multivariable model.1 A prior study of Northwestern internal medicine residents graduating between 2000 and 2005 found that medical school quality and overall clerkship grades were most strongly associated with residency performance.2 Research from other clinical specialties also has suggested that commonly used metrics (such as USMLE scores and interview performance) are not strongly predictive of residency success.28–30
These findings may also reflect the challenges of using the ACGME Milestones as an assessment tool. These Milestones, while reflecting the theoretical educational outcomes of a program, may also be an imperfect measure of resident performance in the real world.31,32 Further work is needed to understand how Milestone performance correlates to patient outcomes and other measures of clinical competency.15
Female gender was negatively associated with performance on medical knowledge subcompetencies. This may reflect gender bias within the assessment process. Studies from internal medicine and other specialties on gender bias within the resident assessment process have had mixed findings.21,22,33,34 Dayal et al found potential gender bias in faculty assessments of emergency medicine residents,21 but Santen et al subsequently found in a larger national study that male and female emergency medicine residents had similar Milestone ratings for the majority of competencies.35
This study has several limitations. First, this is a single institution study where the residency selection process favors competitive applicants whose characteristics (eg, USMLE scores, percent of students in AOA) do not mirror the general population. Second, midyear Milestone scores were used to estimate an intern's initial performance in residency; faculty may not have sufficient interactions at that point to make accurate assessments.18 However, we found the same conclusions in the year that used year-end assessments in the absence of midyear assessments (2013–2014). Third, we excluded residents who joined or left the program outside of the Match; although this was a small number of trainees, these individuals may be important outliers. Fourth, our summative assessment tool averages rotation-based global ratings and served as an anchor for the final Milestone assessment; the tool may not be as accurate as a deconstructed rating system. It is also possible that Milestone assessments are proxies for preexisting global assessments of competence that are influenced by criteria other than the Milestones themselves.36 Fifth, we defined AOA as a binary variable given limitations in ERAS data. However, we performed a sensitivity analysis in which students who indicated that they attended a school with no AOA chapter or with elections during senior year were excluded and found that the final multivariable regression model had similar findings. Finally, many residents were not eligible for GHHS, limiting our ability to assess this factor.
This study supports the need for reform within the medical student assessment and residency admissions processes.37 The USMLE recently announced that Step 1 will transition to a pass/fail format, underscoring the limitations of this assessment. “Traditional” metrics, such as standardized test scores and AOA membership, produce anxiety for medical students without delivering reliable assessment information. Novel and holistic assessment methods of medical students (eg, those assessing entrustable professional activities) have the potential to benefit both students and residency programs alike.38
Conclusions
Most selection criteria for internal medicine residency applicants are poorly predictive of intern year performance as measured by performance on the ACGME Milestones. This may be due to imperfect selection criteria, the limitations of the Milestones as measurements of intern year performance (in our residency program, or perhaps globally), or both.
Author Notes
Funding: The authors report no external funding source for this study.
Conflict of interest: The authors declare they have no competing interests.



