Bioinformatics Analysis to Identify of Hub Genes and Pathways in Her-2 Positive Breast Cancer by Xinshuai Wang* in Open Access Journal of Biogeneric Science and Research
Abstract
Objective: The aim of this study is to identify key genes and pathways as potential prognostic markers of HER-2 positive breast cancer.
Methods: RNA sequence data with HER-2 positive breast cancer and patient clinic traits were obtained from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were screened by Gene ontology (GO) and KEGG pathway enrichment analysis. Weighted gene co-expression network analysis (WGCNA) and protein-protein interaction (PPI) were performed for the key genes and pathways. The identified hub genes were testified for survival curve using the Kaplan–Meier plotter database.
Results: After removing the outliers from the GSE65212 datasets, 30 HER-2 positive breast cancer samples and 11 normal samples were screened. A total of 628 upregulated and 595 downregulated DEGs were identified in HER-2 positive breast cancer and used to construct the PPI network. 47 modules were constructed by WGCNA, and the genes in the blue module with the threshold (|GS|>0.6) were selected. Nine hub genes were screened out, including AURKA, CCNB1, CCNB2, CENPF, KIF11, KIF20A, MAD2L1, NUSAP and PBK. Among them, only MAD2L1 and PBK were found to be a poor prognostic factor according to survival analysis.
Conclusion: These findings further confirmed the rationality of PI3K/Akt/mTOR inhibitors on the treatment of HER-2 positive breast cancer; and we discovered two novel prognostic markers related to HER-2 positive breast cancer, PBK and MAD2L1, providing new diagnosis and treatment strategies for patients with HER-2 positive breast cancer.
Keywords: HER-2 positive breast cancer, hub genes, key pathways, survival analysis, prognostic markers
Introduction
Breast cancer, as the most commonly diagnosed malignancy, is the leading cause of cancer mortality in women worldwide, according to the Agency for Research on Cancer. It was reported that the incidence rate of breast cancer continued to increase, with an estimated 2088849 new breast cancer cases in 2018 [1]. Different diagnostic and therapeutic methods are required for every subtype, since breast cancer is clinically and histopathologically heterogeneous. Among the subtypes, tumors with positive for human epidermal growth factor receptor 2 (HER2) account for approximately 15% to 20% of breast carcinomas. Due to the higher recurrence and metastasis, the subset of HER-2 positive breast cancer was known to the most aggressive type [2].
With the advent of HER-2-targeted therapy in the late 1990s, the remarkable advances were achieved in the treatment of HER-2-positive breast cancer [3]. Although it has been made a breakthrough on anti-HER2 therapy, because of novo or acquired drug resistance, almost all patients with HER-2-positive breast cancer eventually made progress [4]. The potential resistance mechanisms of anti-HER2 therapy are indeed complex and there is a wide range of mechanisms of resistance may coexist in the same cell. Therefore, exploring novel predictive biomarkers other than HER2 will help us to better select patients and improve their survival.
Numbers of studies have shown that the early stage of HER-2-positive breast cancer was a complicated process, associated with multiple cellular pathways and numerous genes alterations. Additionally, gene expression profiles of various cancers can be obtained from Gene Expression Omnibus (GEO). Based on microarray technology, gene expression analysis is a widely used, high-throughput and powerful research method, which can simultaneously detect expression change of thousands of genes at the mRNA level. In the current study, we aimed to identify the differentially expressed genes and key pathways between cancerous and normal mammary tissues. Bioinformatics analysis including GO term enrichment analysis, KEGG pathway analysis. PPI network analysis and gene co-expression network analysis were performed to discover other potential key genes and pathways in HER-2 positive breast cancer.
Materials and Methods
Acquisition Of Microarray Data And Data Processing
The dataset GSE65212 submitted by Thierry Dubois and based on GPL14877 platform (Affymetrix Human Genome U133 Plus 2.0 Array) was downloaded from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) database. The gene expression profile of GSE65212 included 30 HER-2 positive breast cancer samples and 11 normal samples. According to micro-expression matrix chip data, we deleted missing values, processed log2, standardized data and took the probe with the highest average expression to match the probe ID and gene name. Then, we used the screened gene expression matrix data as the data of this study.
Differentially Expressed Genes (Degs) Screening
Statistical software R (version 3.6.1, https://www.r-project.org/) and packages of Bioconductor (http://www.bioconductor.org/) were applied for the significant analysis of DEGs between cancerous and normal mammary tissues. The “limma” (linear models for microarray data) R package was used to screen the DEGs between normal mammary tissues and HER-2 positive breast cancer tissues. The SAM (significance analysis of microarrays) with FDR (false discovery rate) <0.01and |log2 fold change (FC)|≥2 were chosen as the cut-off criteria.
Go Term and Kegg Pathway Enrichment Analysis
Biological significance of DEGs was explored by GO term enrichment analysis including biological process, cellular component and molecular function. KEGG pathway enrichment analysis of DEGs was performed to explore the critical pathways closely related to HER-2 positive breast cancer initiation and progression. We used the “ggplot2” package and “pathview” package (version 1.24.0), which were based on Bioconductor, to make the GO function enrichment analysis and the KEGG pathway enrichment analysis visualization.
Protein–Protein Interaction (PPI) Network Analysis
PPI network was used identify the important gene modules and key genes which are involved in HER-2 positive breast cancer development from interaction level. The STRING database (https://string-db.org/) is an online database for constructing protein interaction networks and analyzing and exploring predicted protein interaction networks. The screened differentially expressed genes were submitted to STRING database (version11). Then, Cytoscape software (version3.7.1) was used for construction of PPI network. MCODE, a plug in Cytoscape, was used for the visual analysis with "Combined score>0.4 and MCODE> 61" as the threshold.
Weighted Correlation Network Analysis of Degs
Gene co-expression network analysis is a systematic biological method to screen highly correlated gene modules for finding candidate biomarkers and drug targets. In this co-expression network, the nodes represent DEGs and the correlated gene expression patterns were defined as connectivity in genes. In short, the excessive value was missing for the first time by detecting outliers and extracting the expression matrix. The soft threshold power was determined through analysis of the network topology. The co-expression similarity and adjacency of genes were sequentially increased using soft threshold power for calculation. Then, the adjacency was transformed into a topologically overlapping matrix (TOM). Finally, hierarchical clustering was performed to select the modules by using TOM and dynamic tree cutting algorithms, and then Go enrichment analysis was performed for the characteristics of gene modules related toHER-2-positive breast tumor.
Survival prognostic analysis
The prognostic value of critical genes was analyzed by Kaplan-Meier plotter. Overall survival for patients with breast cancer was obtained by the Cancer Genome Atlas (TCGA) in Kaplan Meier - plotter (http://kmplot.com/analysis/) database.
Figure 1A: Volcano map with differential expression.
The red lines represent 628 up-regulated genes and the blue lines represent 595 down-regulated genes.
Figure 1B: Heat maps of differentially expressed genes.
Figure 2: GO enrichment analysis result of DEGs with |logFC|≥2.
Figure 2A: Cellular component.
Figure 2B: Biological process.
Figure 2C: Molecular function.
Figure 2D: Visualization of KEGG pathway enrichment of DEGs in normal and HER-2 positive breast cancer tissues (showing hsa04151 pathway).
Figure 3A: Sample clustering to detect outliers and the trait heatmap to display the sample traits
Figure 3B: Analysis of the network topology for various soft thresholding powers.
The left panel shows the scale-free fit index (y-axis) as a function of the soft-thresholding power (x-axis). The right panel displays the mean connectivity (degree, y-axis) as a function of the soft-thresholding power (x-axis). The power was set as 4 for further analysis.
Figure 4A: Clustering dendrograms for the 17522 genes with dissimilarity based on the topological overlap together with the assigned module colors. 47 co-expression modules were constructed with various colors. The relationship between gene dendrogram and gene modules was up and down of then image.
Figure 4B: Module–trait relationships. Each row corresponds to a module eigengene, each column corresponds to a trait, and each cell consists of the corresponding correlation and P-value, which are color-coded by correlated according to the color legend. Among them, Blue module was the most relevant modules to the HER-2 positive breast cancer clinical subtypes.(“H” means HER-2 positive breast cancer tissues).
Figure 4C: Visualizing 1,000 random genes from the network using a heatmap plot to depict the TOM among the genes in the analysis. Then depth of the red color is positively correlated with the strength of the correlation between the pairs of modules on a linear scale. The gene dendrogram and module assignment are shown along the left side and the top.( TOM, topological overlap matrix).
Figure 4D: The eigengene dendrogram and heatmap identify groups of correlated eigengenes termed meta-modules. The dendrogram indicated that the Blue module was highly correlated with HER-2 positive breast cancer The heatmap in the panel shows the eigengene adjacency.
Results
Identification of Degs Between Her-2 Positive Breast Cancer Samples and Normal Mammary Samples
After removing batch differences and data normalization, a total of 1226 DEGs were obtained, including 628 up-regulated genes and 595 down-regulated genes based on cut-off criteria (P <0.01 and |logFC|≥2). All abnormally expressed genes with the log2 FC score and -log10 P value were used to generate Volcano diagram in R, which is an intuitive tool to show the overall gene expression level of the DEGs (Figure 1A). The gene expression pattern was consistent with Volcano diagram, as presented in the heatmap (Figure 1B).
Degs GO and KEGG Pathway Enrichment Analysis
GO term enrichment analysis results varied from GO classification and expression change of DEGs. For cellular component, the DEGs were significantly enriched in extracellular matrix, collagen−containing extracellular matrix, spindle, condensed chromosome, centromeric region and condensed chromosome kinetochore (Figure 2A). As to biological process, the DEGs were significantly enriched in mitotic nuclear division, extracellular structure organization, mitotic sister chromatid segregation, regulation of mitotic nuclear division and extracellular matrix organization (Figure 2B). About molecular function, the upregulated genes were significantly enriched in heparin binding, extracellular matrix structural constituent, sulfur compound binding, glycosaminoglycan binding and chemokine receptor binding (Figure 2C). These significantly enriched pathways and terms could help us to further understand the role of DEGs in HER-2 positive breast cancer occurrence and progress. In addition, KEGG pathway enrichment analysis was used to examine the function of the DEGs. The results of KEGG analysis showed that the upregulated genes were significantly enriched in pathways including Neuroactive ligand-receptor interaction, Human papillomavirus infection and PI3K-Akt signaling pathway and so on. (Table 1 and Figure 2D).
Construction of Co-Expression Network
After eliminating the batch effect, WGCNA analysis was performed to identify modules containing highly correlated genes. Sample clustering was used to detect outliers (Figure 3A) and set the power to 4, 47 modules were mined (Figure 3B). Gene co-expression network was constructed to detect genes showing similar trends in the same module (Figure 4A). Among these modules, the blue module was the most closely related to HER-2 positive breast cancer (Figure 4B). 1,000 genes were randomly selected for heat map analysis (Figure 4C). Trait gene trees and heat maps analysis revealed that the tree indicating modules were significantly associated with the clinical phenotype of HER-2 positive breast cancer (Figure 4D). Next, we performed intramodular analysis of genes in 47 modules in GS and MM. The resluts of GS and MM showed a very significant correlation, indicating that genes in the blue module were highly correlated with HER-2 positive breast cancer. Then, the genes in these modules with relevant cut-off value ≥0.6 were selected as the central genes.
PPI Network And Cluster Analysis To Identify Key Genes
Using the STRING online database, 1226 DEGs were filtered into the DEGs PPI network complex containing nodes and edges with a combined score>0.4 (medium confidence). The Cytoscape software was employed to analyze the interactive relationship between the candidate protein and then a cluster containing 74 nodes was selected with a cut-off k-score=2 by the MCODE scoring system. The results showed that the hub genes were the intersections between the initial key genes and the modules highly correlated with phenotypes from the WGCNA.
Survival Prognostic Analysis
To evaluate prognosis value of the potential hub genes with HER-2 positive breast cancer, survival prognosis analysis was performed by Kaplan-Meier plotter. Nine genes (AURKA、CCNB1、CCNB2、CENPF、KIF11、KIF20A、MAD2L1、NUSAP and PBK) were correlated with survival prognosis . Following all the analyses, only MAD2L1 and PBK were closely related to the prognosis of patients (P<0.05).
Table1: KEGG pathway enrichment analysis of DEGs in normal and HER-2 positive breast cancer tissues.
Discussion
Breast cancer is the most frequently diagnosed cancer and the leading cause of cancer-related mortality in women worldwide [5]. HER-2-overexpressing cancer is characterized by a highly aggressive phenotype and associated with metastasis to the lymph nodes and distant organs. HER-2 overexpression is due to the amplification of genes derived from oncogenic signaling in adenocarcinomas including breast cancer [6-8]. At present, anti-HER-2 antibodies, as a molecular target-based therapy, ameliorated the prognosis of HER-2-positive patients.
GO analysis proved that the key genes were enriched in heparin binding, extracellular matrix structural constituent, sulfur compound binding, glycosaminoglycan binding and chemokine receptor binding. According to the KEGG pathway analysis, the differentially expressed genes were enriched in pathways such as neuroactive ligand-receptor interaction, human papillomavirus infection and PI3K-Akt signaling pathway.
The results of KEGG analysis showed that the upregulated genes were significantly enriched in pathways in PI3K-Akt signaling pathway, which is a major signal pathway involved in cellular proliferation, survival, metabolism and motility. This result is consistent with existing researches. PI3K-Akt-mTOR (PAM) pathway is the most frequently altered pathway in human cancers, in which PIK3CA2 and PTEN3 are the most frequently altered oncogenes and tumor suppressor genes respectively. It is estimated that the activation rate of PAM pathway is as frequent as 70% in patients with HER-2-overexpression [9,10]. Resistance to trastuzumab in patients with HER-2-positive breast cancer is associated with higher risk of progression or cancer death, which may be related to the activation of PI3K-AKT-mTOR signaling cascade and the decreased expression level of inhibitor PTEN [11]. Currently, clinical trials of anti-HER2 drugs combined with PI3K/AKT/mTOR inhibitors are undergoing extensive evaluation [4].
Nine hub genes and several pathways were identified by the methods of DEGs and WGCNA. All key genes were selected for survival analysis in the study. However, only MAD2L1 and PBK were significantly related to the prognosis of patients with HER-2 positive breast cancer (P <0.05), which may serve as biomarker for treatment and prognosis. Although there were no significant differences in the survival prognosis analysis of the other seven genes in patients with HER-2 positive breast cancer, it was interesting that there were significant differences among them when we put these key genes in breast cancer patients for survival prognosis analysis, these genes all had statistical differences. We speculated that this may be related to the insufficient sample size of HER-2 positive breast cancer. When the survival prognosis database is expanded, the survival prognosis of the other seven genes may be statistically different. Then we conducted in-depth analysis of the excavated genes from enrichment analysis and found that AURKA, CCNB1, CCNB2, CENPF, MAD2L1 and PBK participated in molecular functions; KIF11 and KIF20A participated in cellular components; NUSAP participated in biological processes.
PBK (PDZ-binding kinase), also known as T-lymphokine-activated killer cell-originated protein kinase (TOPK), is a 322 amino-acid and mitogen-activated serine/threonine protein kinase. PBK/TOPK protein is generally difficult to detect in normal tissues, but frequently elevated in cancer tissues, like breast cancer [12-16]. Furthermore, PBK/TOPK plays vital roles in inflammation, cell apoptosis, and cell-cycle regulation and high expression of PBK/TOPK contributes to tumor growth, proliferation, and metastasis. PBK/TOPK has been repeatedly reported and is correlated with clinical outcomes in various solid cancers [17-22]. In addition, PBK/TOPK acts as an oncogene in multiple cancers and contributes to tumor progression and high expression of PBK/TOPK tends to indicate higher biological malignant aggressiveness [23-26]. Knockout of TOPK could significantly inhibit tumor growth, invasion and metastasis in vitro and in vivo. Conversely, the upregulation of PBK could promote cell growth, metastasis and enhance tumor transformation [24,27,28]. PBK/TOPK also correlates with drug response and tumor resistance, indicating that it is a valid target for anti-neoplastic kinase inhibitors and a potential therapeutic target for various cancers, including breast cancer. Generally, PBK/TOPK, as a candidate molecular marker, exhibits the potential clinical utility for HER-2 positive breast cancer.
Mitotic arrest deficient 2-like 1 (MAD2L1), as a component of spindle checkpoint, plays an essential role in supervising chromosomal segregation during mitosis [29-31]. Dysregulation of MAD2L1 could lead to chromosomal instability and aneuploidy, which might promote formation of human cancers. Overexpression of MAD2L1 has been discovered in several cancers including breast cancer [32,33]. It was reported that the expression level of MAD2L1 in breast cancer was higher than that in normal tissues, and MAD2L1 was associated with malignant progression and poor disease-free survival of breast cancer patients(30). In addition, MAD2L1 expression also associated with HER-2 positive breast cancer. In summary, overexpression of MAD2L1 plays a key role in the development of HER-2 positive breast cancer, suggesting that MAD2L1 can be used as an effective therapeutic and diagnostic marker for HER-2 positive breast cancer. However, this study only verified the overexpression of MAD2L1 in HER-2 positive breast cancer and its clinical significance. It is necessary to further study the molecular basis of the oncogenic impact of MAD2L1 in HER-2 positive breast cancer.
In this study, gene expression profile was obtained from GEO. WGCNA demonstrated that the blue modules displayed the highest correlation with HER-2 positive breast cancer, and nine hub genes including AURKA、CCNB1、CCNB2、CENPF、KIF11、KIF20A、MAD2L1、NUSAP and PBK were identified. Survival analysis revealed that MAD2L1 and PBK in HER-2 positive breast cancer may be a poor prognosis marker. These findings provided the framework for the identification of important gene modules and identified key pathways and driving genes that may be novel therapeutic targets for this disease.
More information regarding this Article visit: OAJBGSR