Publications by Year: 2016

Sisti JS, Collins LC, Beck AH, Tamimi RM, Rosner BA, Eliassen HA. Reproductive risk factors in relation to molecular subtypes of breast cancer: Results from the nurses' health studies. Int J Cancer 2016;138(10):2346-56.Abstract
Several intrinsic breast cancer subtypes, possibly representing unique etiologic processes, have been identified by gene expression profiles. Evidence suggests that associations with reproductive risk factors may vary by breast cancer subtype. In the Nurses' Health Studies, we prospectively examined associations of reproductive factors with breast cancer subtypes defined using immunohistochemical staining of tissue microarrays. Multivariate-adjusted Cox proportional hazard models were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs). Over follow-up, we identified 2,063 luminal A, 1,008 luminal B, 209 HER2-enriched, 378 basal-like and 110 unclassified tumors. Many factors appeared associated with luminal A tumors, including ages at menarche (p(heterogeneity) = 0.65) and menopause (p(heterogeneity) = 0.05), and current HT use (p(heterogeneity) = 0.33). Increasing parity was not associated with any subtype (p(heterogeneity) = 0.76), though age at first birth was associated with luminal A tumors only (per 1-year increase HR = 1.03 95%CI (1.02-1.05), p(heterogeneity)  = 0.04). Though heterogeneity was not observed, duration of lactation was inversely associated with risk of basal-like tumors only (7+ months vs. never HR = 0.65 95%CI (0.49-0.87), ptrend = 0.02), p(heterogeneity) = 0.27). Years between menarche and first birth was strongly positively associated with luminal A and non-luminal subtypes (e.g. 22-year interval vs. nulliparous HR = 1.80, 95%CI (1.08-3.00) for basal-like tumors; p(heterogeneity) = 0.003), and evidence of effect modification by breastfeeding was observed. In summary, many reproductive risk factors for breast cancer appeared most strongly associated with the luminal A subtype. Our results support previous reports that lactation is protective against basal-like tumors, representing a potential modifiable risk factor for this aggressive subtype.
Smirnov P, Safikhani Z, El-Hachem N, Wang D, She A, Olsen C, Freeman M, Selby H, Gendoo DM, Grossmann P, Beck AH, Aerts HJWL, Lupien M, Goldenberg A, Haibe-Kains B. PharmacoGx: an R package for analysis of large pharmacogenomic datasets. Bioinformatics 2016;32(8):1244-6.Abstract
UNLABELLED: Pharmacogenomics holds great promise for the development of biomarkers of drug response and the design of new therapeutic options, which are key challenges in precision medicine. However, such data are scattered and lack standards for efficient access and analysis, consequently preventing the realization of the full potential of pharmacogenomics. To address these issues, we implemented PharmacoGx, an easy-to-use, open source package for integrative analysis of multiple pharmacogenomic datasets. We demonstrate the utility of our package in comparing large drug sensitivity datasets, such as the Genomics of Drug Sensitivity in Cancer and the Cancer Cell Line Encyclopedia. Moreover, we show how to use our package to easily perform Connectivity Map analysis. With increasing availability of drug-related data, our package will open new avenues of research for meta-analysis of pharmacogenomic data. AVAILABILITY AND IMPLEMENTATION: PharmacoGx is implemented in R and can be easily installed on any system. The package is available from CRAN and its source code is available from GitHub. CONTACT: or SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Nabavi S, Schmolze D, Maitituoheti M, Malladi S, Beck AH. EMDomics: a robust and powerful method for the identification of genes differentially expressed between heterogeneous classes. Bioinformatics 2016;32(4):533-41.Abstract
MOTIVATION: A major goal of biomedical research is to identify molecular features associated with a biological or clinical class of interest. Differential expression analysis has long been used for this purpose; however, conventional methods perform poorly when applied to data with high within class heterogeneity. RESULTS: To address this challenge, we developed EMDomics, a new method that uses the Earth mover's distance to measure the overall difference between the distributions of a gene's expression in two classes of samples and uses permutations to obtain q-values for each gene. We applied EMDomics to the challenging problem of identifying genes associated with drug resistance in ovarian cancer. We also used simulated data to evaluate the performance of EMDomics, in terms of sensitivity and specificity for identifying differentially expressed gene in classes with high within class heterogeneity. In both the simulated and real biological data, EMDomics outperformed competing approaches for the identification of differentially expressed genes, and EMDomics was significantly more powerful than conventional methods for the identification of drug resistance-associated gene sets. EMDomics represents a new approach for the identification of genes differentially expressed between heterogeneous classes and has utility in a wide range of complex biomedical conditions in which sample classes show within class heterogeneity. AVAILABILITY AND IMPLEMENTATION: The R package is available at
Hirko KA, Chen WY, Willett WC, Rosner BA, Hankinson SE, Beck AH, Tamimi RM, Eliassen HA. Alcohol consumption and risk of breast cancer by molecular subtype: Prospective analysis of the nurses' health study after 26 years of follow-up. Int J Cancer 2016;138(5):1094-101.Abstract
Alcohol consumption is a consistent risk factor for breast cancer, although it is unclear whether the association varies by breast cancer molecular subtype. We investigated associations between cumulative average alcohol intake and risk of breast cancer by molecular subtype among 105,972 women in the prospective Nurses' Health Study cohort, followed from 1980 to 2006. Breast cancer molecular subtypes were defined according to estrogen receptor (ER), progesterone receptor, human epidermal growth factor 2 (HER2), cytokeratin 5/6, and epidermal growth factor status from immunostained tumor microarrays in combination with histologic grade. Multivariable Cox proportional hazards models were used to estimate hazard ratios (HR) and 95% confidence intervals (CI). Competing risk analyses were used to assess heterogeneity by subtype. We observed suggestive heterogeneity in associations between alcohol and breast cancer by subtype (phet  = 0.06). Alcohol consumers had an increased risk of luminal A breast cancers [n = 1,628 cases, per 10 g/day increment HR (95%CI) = 1.10(1.05-1.15)], and an increased risk that was suggestively stronger for HER2-type breast cancer [n = 160 cases, HR (95%CI) = 1.16(1.02-1.33)]. We did not observe statistically significant associations between alcohol and risk of luminal B [n = 631 cases, HR (95%CI) = 1.08(0.99-1.16)], basal-like [n = 254 cases, HR (95%CI) = 0.90(0.77-1.04)], or unclassified [n = 87 cases, HR (95%CI) = 0.90(0.71-1.14)] breast cancer. Alcohol consumption was associated with increased risk of luminal A and HER2-type breast cancer, but not significantly associated with other subtypes. Given that ERs are expressed in luminal A but not in HER2-type tumors, our findings suggest that other mechanisms may play a role in the association between alcohol and breast cancer.
Sangoi AR, Shrestha B, Yang G, Mego O, Beck AH. The Novel Marker GATA3 is Significantly More Sensitive Than Traditional Markers Mammaglobin and GCDFP15 for Identifying Breast Cancer in Surgical and Cytology Specimens of Metastatic and Matched Primary Tumors. Appl Immunohistochem Mol Morphol 2016;24(4):229-37.Abstract
Traditional markers mammaglobin and GCDFP15 show good specificity but lack sensitivity and can be difficult to interpret in small tissue samples. We undertook a comparative study of the novel nuclear marker GATA3 (expression typically restricted to breast and urothelial carcinomas) and GCDFP15 and mammaglobin. We first compared quantitative mRNA expression levels of these 3 markers across a diverse set of over 6000 tumors and 500 normal samples from The Cancer Genome Atlas which showed dramatically higher GATA3 expression (>10-fold higher) in breast cancer as compared with GCDFP15 or mammaglobin (both P<2.2e-16), suggesting that GATA3 may represent a more sensitive marker of breast cancer than GCDFP15 or mammaglobin. We next examined protein expression by immunohistochemistry in 166 cases (including surgical and cytology specimens) of metastatic breast carcinoma and 54 cases with available matched primaries. One whole-slide section from each case was stained for monoclonal GATA3 (L50-823), monoclonal mammaglobin (31A5), and monoclonal GCDFP15 (EP1582Y). Staining intensity (0 to 3+) and extent (0% to 100%) were scored with an H-score calculated (range, 0 to 300). Sensitivities by varying H-score cutoffs for a positive result in metastatic breast carcinoma among GATA3/GCDFP15/mammaglobin, respectively, were as follows: any H-score=95%/65%/78%, H-score>50=93%/37%/47%, H-score>100=90%/25%/27%, H-score>150=86%/21%/19%, H-score>200=73%/18%/9%, H-score>250=66%/14%/6%. Significant staining differences by specimen type, tumor subtype/grade, or ER/PR/HER2 status were not identified. Significantly stronger correlation was observed between primary/metastatic GATA3 expression [Pearson's correlation=0.81 (0.68-0.89)] as compared with the primary/metastatic correlations of GCDFP15 [Pearson's correlation=0.57 (0.33-0.74)] and mammaglobin [Pearson's correlation=0.50 (0.24-0.70)] (both P<0.05). In conclusion, the novel marker GATA3 stains a significantly higher proportion of both primary and metastatic breast carcinomas than GCDFP15 or mammaglobin with stronger and more diffuse staining, helpful in cases with small tissue samples. The matched primary/metastatic expression of GATA3 is also more consistent. We propose that GATA3 be included among a panel of confirmatory markers for metastatic breast carcinoma.