ABSTRACT
Background Learner assessments of faculty are widespread in medicine, yet concerns are growing about possible biases in these assessments and their associations with gender disparities.
Objective To investigate gender-based differences in how residents and fellows describe faculty (rater effect) and how faculty are described (ratee effect) in faculty assessments, and their associations with teaching effectiveness ratings.
Methods We analyzed 2164 trainee assessments of University of Minnesota Medical School faculty from 2019 to 2023 with trainee and faculty gender information and narrative comments. Using natural language processing, we categorized words and 2-word groups (n-grams) into communal (eg, caring, kind), standout (eg, outstanding, amazing), and agentic/ability (eg, assertive, controlling) groups. We examined gender-based differences in n-grams used by trainees (rater effect) and received by faculty (ratee effect), and relationships between n-gram and teaching effectiveness ratings.
Results Women trainees used more communal (rater effect, incidence rate ratio [IRR]=1.36; 95% CI, 1.27-1.47), standout (IRR=1.20; 95% CI, 1.08-1.34), and agentic/ability words (IRR=1.37; 95% CI, 1.26-1.49; P<.001) than men trainees. Women faculty received fewer agentic/ability words than men faculty (ratee effect, IRR=0.83; 95% CI, 0.77-0.90; P<.001). Women trainees used fewer communal words when describing women faculty (interaction effect, IRR=0.84; 95% CI, 0.73-0.98; P<.05). Teaching effectiveness ratings correlated with faculty n-gram word frequency in standout (men: rs
=0.29, women: rs=0.28, P<.001) and communal categories (men: rs
=0.23, P=.003; women: rs=0.22, P=.01).
Conclusions Women trainees used more communal, standout, and agentic/ability descriptors, while women faculty had fewer agentic/ability descriptors. Women trainees used fewer communal words when describing women faculty. Standout and communal word frequency predicted teaching effectiveness ratings for both genders.