This paper studies the association between GSA (Gender and Sex Analysis) and the author gender.
Data was obtained from PubMed's MEDLINE, which offers the MeSH (Medical Subject Headings) terms. The dataset includes papers published within the period of 2008-2015 (~2.5M records). Among these, 2.1M are matched to the Web of Science data.
First name and nation were used to infer gender, using Gender API (https://gender-api.com).
The final sample is about 1.5M papers.
WoS sample was matched with GenderMedDB to idnetify GSA.
GenderMed includes 4,830 studies with MeSH terms subordinate to a disease category for the period 2008--2015, of which 3,394 matched with our WoS sample. ...
The GenderMed Database limits its scope to selected diseases that field experts have deemed epidemiologically relevant for GSA. Thus, we excluded all studies in our final sample (n=1,542,690) that did not overlap with studies in the GenderMed subsample (n=3,394) with respect to disease-specific MeSH terms. This exclusion resulted in a reduced sample of 1,513,638 unique disease-specific papers, which was used in the logistic regression analyses.
GSA is the binary outcome variable. ++3,394 GenderMed articles have GSA=1 and others have GSA=0?++
Predictors and covariates #
fw: fraction of women authors
fw MeSH: average fw scores for the MeSH disease terms
fw SC: WoS Subject categories
fw country: last author country
f_last MeSH: average
f_lastscore for MeSH disease terms (those start with "C" in MeSH).