Breast Cancer Gene-Expression Miner v4.0
(bc-GenExMiner v4.0)

bc-GenExMiner logo


Glossary


[ Published annotated data ][ Published genomic data ][ Data pre-processing ][ Molecular subtype classification ]
[ Statistical analyses ][ Survival statistical tests ][ Gene expression ][ Correlation map ][ Biological validation ]


Published annotated data:

bc-GenExMiner version v4.0 (current: - archives: )

#ReferenceNo. patientsNodal
status
ER statusPR statusHER2 statusSBR
status
Age at diagnosisNPI
status
AOL
status
SSPs
status
SCMs
status
Event status   
MR   AE   
1Van de Vijver et al., 2002295   101   122   
2Sotiriou et al., 200399   30   53   
3Ma et al., 200459      27   
4Minn et al., 200582   27   27   
5Pawitan et al., 2005159   140   50   
6Wang et al., 2005286   107   107   
7Weigelt et al., 200550   213   13   
8Bild et al., 2006158   1   50   
9Chin et al., 2006112   221   42   
10Ivshina et al., 2006249   2   89   
11Chin et al., 2007171   38   56   
12Desmedt et al., 2007198   62   91   
13Loi et al., 2007401   2101   139   
14Minn et al., 200758   11   11   
15Naderi et al., 2007135      65   
16Zhou et al., 200754   9   9   
17Anders et al., 200875   14   14   
18Chanrion et al., 2008155   48   57   
19Loi et al., 200877   210   13   
20Schmidt et al., 2008200   146   46   
21Calabrò et al., 2009139      96   
22Desmedt et al., 200955      55   
23Jézéquel et al., 2009252   65   68   
24Zhang et al., 2009136   20   20   
25Jönsson et al., 2010346      151   
26Li et al., 2010115   214   14   
27Sircoulomb et al., 201055   17   17   
28Buffa et al., 2011216   82   82   
29Dedeurwaerder et al., 201185      36   
30Filipits et al., 2011277   58   58   
31Hatzis et al., 2011309   65   65   
32Kao et al., 2011296   163   73   
33Sabatier et al., 2011266      83   
34Wang et al., 2011149      10   
35Kuo et al., 201251   12   12   
36Nagalla et al., 201341   14   14   
Total5 861   29361915262617833321 088   1 935   

1 ER status was determined based on 205225_at Affymetrix probe (HG-U133) or on the median value of Affymetrix probes representing ESR1 (HG-U95A v2) using a 2-component Gaussian mixture distribution model. Lehmann et al. J Clin Invest. 2011 Jul 1;121(7):2750-67
2 NPI score could be computed only for node negative patients

    Legend
 No.: number of
 ER: oestrogen receptor by IHC
 PR: progesterone receptor by IHC
 HER2: HER2 receptor by IHC
 IHC: ImmunoHistoChemistry
 SBR: Scarff Bloom and Richardson grade
 NPI: Nottingham prognostic index
 AOL: Adjuvant! Online
 SSPs: Single Sample Predictors (Sorlie, Hu and PAM50)
 SCMs: Subtype Clustering Models (SCMOD1, SCMOD2, SCMGENE)
 MR: metastatic relapse
 AE: any event (any pejorative event: local relapse, metastatic relapse or death.)
 : available information
 : unavailable information

[ back ]


Published genomic data:

bc-GenExMiner version v4.0 (current: - archives: )

#ReferenceNo. patientsStudy codePlatform originPlatform codeDNA chipNo. unique genes (2015)Processing *bc-GenExMiner version
1Van de Vijver et al., 2002295   Rosetta2002Agilent25k oligo custom15 031   log2 ratio1.0
2Sotiriou et al., 200399   PNAS1732912100NCI8k cDNA custom4 368   log2 ratio1.0
3Ma et al., 200459   GSE1378ArcturusGPL122322k oligo custom15 558   log2 ratio1.0
4Minn et al., 200582   GSE2603AffymetrixGPL96HG-U133A13 226   MAS5 and log21.0
5Pawitan et al., 2005159   GSE1456AffymetrixGPL96 - GPL97HG-U133A + B19 894   MAS5 and log21.0
6Wang et al., 2005286   GSE2034AffymetrixGPL96HG-U133A13 226   MAS5 and log21.0
7Weigelt et al., 200550   GSE2741AgilentGPL1390Human 1A oligo UNC custom13 927   log2 ratio1.0
8Bild et al., 2006158   GSE3143AffymetrixGPL91HG-U95A v29 076   MAS5 and log21.0
9Chin et al., 2006112   E_TABM_158AffymetrixA-AFFY-76HG-U133A v213 226   MAS5 and log21.0
10Ivshina et al., 2006249   GSE4922AffymetrixGPL96 - GPL97HG-U133A + B19 894   MAS5 and log21.0
11Chin et al., 2007171   GSE8757VUMC MicroarrayGPL5737Human 30K 60-mer oligo array18 363   log2 ratio3.1
12Desmedt et al., 2007198   GSE7390AffymetrixGPL96HG-U133A13 226   MAS5 and log21.0
13Loi et al., 2007401   GSE6532AffymetrixGPL96 - GPL97 - GPL570HG U133A + B + P222 847   MAS5 and log21.0
14Minn et al., 200758   GSE5327AffymetrixGPL96HG-U133A13 226   MAS5 and log21.0
15Naderi et al., 2007135   E_UCON_1AgilentA-AGIL-14Human 1A oligo G4110A14 268   log2 ratio1.0
16Zhou et al., 200754   GSE7378AffymetrixGPL96HG-U133A13 226   MAS5 and log23.1
17Anders et al., 200875   GSE7849AffymetrixGPL91HG-U95A v29 076   MAS5 and log21.0
18Chanrion et al., 2008155   GSE9893MLRGGPL5049Human 21k v12.015 184   MAS5 and log21.0
19Loi et al., 200877   GSE9195AffymetrixGPL570HG-U133P222 847   MAS5 and log21.0
20Schmidt et al., 2008200   GSE11121AffymetrixGPL96HG-U133A13 226   MAS5 and log21.1
21Calabrò et al., 2009139   GSE10510DKFZGPL648635k oligo16 536   log2 ratio1.0
22Desmedt et al., 200955   GSE16391AffymetrixGPL570HG-U133P222 847   MAS5 and log23.1
23Jézéquel et al., 2009252   GSE11264UMGC-IRCNAGPL48199k cDNA custom1 814   log2 ratio1.0
24Zhang et al., 2009136   GSE12093AffymetrixGPL96HG-U133A13 226   MAS5 and log21.1
25Jönsson et al., 2010346   GSE22133SweGeneGPL5345H_v2.1.1 55K9 281   log2 ratio3.1
26Li et al., 2010115   GSE19615AffymetrixGPL570HG-U133P222 847   MAS5 and log23.1
27Sircoulomb et al., 201055   GSE17907AffymetrixGPL570HG-U133P222 847   MAS5 and log23.1
28Buffa et al., 2011216   GSE22219IlluminaGPL6098HumanRef-8 v1.0 expr-bc15 623   log2 ratio3.1
29Dedeurwaerder et al., 201185   GSE20711AffymetrixGPL570HG-U133P222 847   MAS5 and log23.1
30Filipits et al., 2011277   GSE26971AffymetrixGPL96HG-U133A13 226   MAS5 and log23.1
31Hatzis et al., 2011309   GSE25055AffymetrixGPL96HG-U133A13 226   MAS5 and log23.1
32Kao et al., 2011296   GSE20685AffymetrixGPL570HG-U133P222 847   MAS5 and log23.1
33Sabatier et al., 2011266   GSE21653AffymetrixGPL570HG-U133P222 847   MAS5 and log23.1
34Wang et al., 2011149   GSE16987IlluminaGPL6104HumanRef-8 v2.0 expr-bc16 976   log2 ratio3.1
35Kuo et al., 201251   GSE33926AgilentGPL7264Human 1A Microarray (V2) G4110B16 754   log2 ratio3.1
36Nagalla et al., 201341   GSE45255AffymetrixGPL96HG-U133A13 226   MAS5 and log23.1
Total  5861   

* Data have been converted to a common scale (median equal to 0 and standard deviation equal to 1).

[ back ]


Data pre-processing:


1.1 Affymetrix pre-processing:

Before being log2-transformed, Affymetrix raw CEL data were MAS5.0-normalised using the Affymetrix Expression Console.

1.2 Non-Affymetrix pre-processing:

Data have been downloaded as they were deposited in the public databases. When patient to reference ratio and its log2-transformation were not already calculated, we performed the complete process.
2 All data:

Finally, in order to merge all studies data and create pooled cohorts, we converted studies data to a common scale (median equal to 0 and standard deviation equal to 1 a).





a Shabalin et al. Bioinformatics. 2008; 24,1154-1160


[ back ]


Molecular subtype classification:



RMSPC (Robust Molecular Subtype Predictors Classification): patients classified in the same molecular subtype with the six molecular subtype predictors (MSP).


Table 1: Molecular subtyping methods

Molecular subtype predictor (MSP) No. genes in MSP Reference Platform correspondence R script reference Statistics Subtypes
Single sample predictor (SSP) Sorlie's SSP 500   Sorlie et al, 2003 Gene symbols; probes median (if multiple probes for a same gene) Weigelt et al, 2010 Nearest centroid classifier;
highest correlation coefficient between patient profile and the 5 centroids
Basal-like,
HER2-E,
Luminal A,
Luminal B,
Normal breast-like
Hu's SSP 306   Hu et al, 2006
PAM50 SSP 50   Parker et al, 2009
Subtype clustering model (SCM) SCMOD1 726   Desmedt et al, 2008
Wirapati et al, 2008
subtype.cluster function, R package genefu Mixture of three gaussians;
use of ESR1, ERBB2 and AURKA modules
ER-/HER2-,
HER2-E,
ER+/HER2- low proliferation,
ER+/HER2- high proliferation
SCMOD2 663  
SCMGENE 3  




Table 2: Molecular subtyping of 5 861 breast cancer patients included in bc-GenExMiner v3.1 according to 6 molecular subtype predictors

MSP Basal-likeHER2-ELuminal ALuminal BNormal breast-likeunclassified
No%No%No%No%No%No%
Sorlie's SSP795 13.6 606 10.3 1503 25.6 637 10.9 663 11.3 1657 28.3 
Hu's SSP1268 21.6 502 8.6 1339 22.8 989 16.9 808 13.8 955 16.3 
PAM50 SSP1144 19.5 828 14.1 1581 27 1068 18.2 728 12.4 512 8.7 
RSSPC703 190 761 190 335 
MSP ER-/HER2-HER2-EER+/HER2-
low proliferation
ER+/HER2-
high proliferation
-unclassified
No%No%No%No%--No%
SCMOD1929 15.9 861 14.7 1653 28.2 1499 25.6 919 15.7 
SCMOD2996 17 1027 17.5 1588 27.1 1418 24.2 832 14.2 
SCMGENE2038 34.8 911 15.5 1048 17.9 945 16.1 919 15.7 
RSCMC699 373 656 524 
RMSPC 580 124 324 80 


    Legend
 MSP: Molecular Subtype Predictor (SSPs + SCMs)
 No: number of patients
 SSP: Single Sample Predictor
 RSSPC: Robust SSP Classification based on patients classified in the same subtype with the three SSPs
 SCM: Subtype Clustering Model
 RSCMC: Robust SCM Classification based on patients classified in the same subtype with the three SCMs
 RMSPC: Robust Molecular Subtype Predictors Classification



[ back ]


Statistical analyses:


Several types of analyses are available: prognostic analyses, correlation analyses and expression analyses, all of which have different subtypes.

EXPRESSION ANALYSES

Targeted expression analysis:

Once the analysis criteria have been chosen (gene(s) to be tested, clinical criterion (criteria) to test the gene against), the distribution of the gene in the available population (all cohorts with availability of required information pooled together) according to the clinical criterion (criteria) is illustrated by box and whiskers plots. To assess the significance of the difference in gene distributions in between the different groups, a Welch's test is performed, as well as Dunnett-Tukey-Kramer's tests when appropriate.

Exhaustive expression analysis:

box and whiskers plots are displayed, along with Welch's (and Dunett-Tukey-Kramer's) tests for every possible clinical criteria for a unique gene.

Customised expression analysis:

Similarly to targeted analysis, distribution of a chosen gene is compared in between groups, but here, the groups are defined based on another gene: the population (all cohorts with both gene values available pooled together) is split according to the median of the latter gene, resulting in 2 groups.



PROGNOSTIC ANALYSES

Targeted prognostic analysis:

Once the analysis criteria have been chosen (gene(s) to be tested, nodal and oestrogen receptor status of the cohorts to be explored and event), several statistical tests are conducted on each cohort and on all cohorts pooled.
The prognostic impact of each gene is evaluated by means of univariate Cox proportional hazards model. Results are displayed by cohorts (including pool) and are illustrated in a forest plot.
Kaplan-Meier curves are then performed on the pool with the gene values dichotomised according to gene median (calculated from the pool). Cox results corresponding to dichotomised values are displayed on the curve. In order to minimize unreliability at the end of the curve, the 15% of patients with the longest follow-up are not plotteda.
To evaluate independent prognostic impact of gene(s) relative to the well-established clinical markers NPIb and AOLc (10-year overall survival) and to proliferation scored, adjusted Cox proportional hazards models are performed on pool's patients with available data.

Exhaustive prognostic analysis:

Univariate Cox proportional hazards model is performed on each of the 18 possible pools corresponding to every combination of population (nodal and oestrogen receptor status) and event criteria (metatastic relapse [MR], any event [AE]) to assess the prognostic impact of a unique gene. Results are displayed by population and event criteria and are ordered by p-value (smallest to largest).

Molecular subtype prognostic analysis:

Patients are pooled according to their molecular subtypes, based on three single sample predictors (SSPs) and three subtype clustering models (SCMs), and on three supplementary robust molecular subtype classifications consisting on the intersections of the 3 SSPs and/or of the 3 SCMs classifications: only patients with concordant molecular subtype assignment for the 3 SSPs (RSSPC), for the 3 SCMs (RSCMC), or for all predictors (RMSPC), are kept. Univariate Cox proportional analysis is performed for the chosen gene for each of the different molecular subtypes populations. Kaplan-Meier curves are also computed.

Basal-like/TNBC prognostic analysis:

Univariate Cox proportional hazards analyses are performed, for the chosen gene, on Basal-like (BL) patients (as defined by PAM50), on Triple-Negative breast cancer (TNBC) patients (as defined by immunohistochemistry [IHC]) and on patients both BL and TNBC. Kaplan-Meier curves are also computed.


CORRELATION ANALYSES

Gene correlation targeted analysis:

Pearson's correlation coefficient is computed with associated p-value for each pair of genes based on ten different populations: all patients pooled together, patients with positive oestrogen receptor status, patients with negative oestrogen receptor status, Basal-like patients, HER2-E patients, Luminal A patients and Luminal B patients (the last 4 subgroups being determined by the RMSPC), Basal-like (PAM50) patients, Triple-Negative (IHC) patients and the intersection of the 2 latter populations.
Results are displayed in a correlation map, where each cell corresponds to a pairwise correlation and is coloured according to the correlation coefficient value, from dark blue (coefficient = -1) to dark red (coefficient = 1).
Pearson's pairwise correlation plots are also computed to illustrate each pairwise correlation.

Gene correlation exhaustive analysis:

Pearson's correlation coefficient is computed, with associated p-value, between the chosen gene and all other genes that are present in the database, based on different populations: all patients pooled together, Basal-like patients, HER2-E patients, Luminal A patients and Luminal B patients, the last 4 subgroups being determined by the RMSPC.
Genes with correlation above 0.40 in absolute value and with associated p-value less than 0.05 are retained and the genes with best correlation coefficients are displayed in two different tables: one for the first 50 (or less) positive correlations, one for the first 50 (or less) negative ones.
The lists with all genes fulfilling criteria of correlation coefficient above 0.40 in absolute value and associated p-value less than 0.05 can be downloaded from the results page.

Gene Ontology analysis:

As a complement to this "screening" analysis, an analysis is performed to find Gene Ontology enrichment terms. This analysis focuses on significantly under- or over-represented terms present in the list of genes most positively correlated with the chosen gene, including itself, in the list of genes most negatively correlated with the chosen gene and in the union of these two lists.
For each term of each of the Gene Ontology trees (biological process, molecular function and cellular component), comparison is done between the number of occurrences of this term in the "target list", i.e. the number of times this term is directly linked to a gene, and the number of occurrences of this term in the "gene universe" (all of the genes that are expressed in the database) by means of Fisher's exact test. Terms with associated p-values less than 0.01 are kept.

Gene correlation analysis by chromosomal location:

Pearson's correlation coefficient is computed, with associated p-value, between the chosen gene and genes located around the chosen gene (up to 15 up and 15 down) on the same chromosome, based on seven different populations: all patients pooled together, patients with positive oestrogen receptor status, patients with negative oestrogen receptor status, Basal-like patients, HER2-E patients, Luminal A patients and Luminal B patients, the last 4 subgroups being determined by the RMSPC.
Detailed results are displayed in a table for each population. Pearson's pairwise correlation plots are also performed to illustrate correlation of each gene with the chosen one.

Targeted correlation analysis (TCA):

As a complement, results of gene correlation analysis for genes selected via the "TCA" column can be displayed.
Targeted correlation analysis ("TCA" button), which aims at evaluating the robustness of clusters, is proposed: correlation analyses are automatically computed between all possible pairs of genes that compose a selected cluster.



a Pocock et al. Lancet. 2002; 359(9318):1686-9
b Galea et al. Breast Cancer Res Treat. 1982; 45(3):361-6.
c Adjuvant! Online
d Dexter et al. BMC Syst Biol. 2010; 4:127.




Nota bene:
  • When working with gene symbols and in case of multiple probesets for the same gene, probeset values median is taken as unique value for the gene.
  • Cox models performed on pool(s) are stratified by cohort.
  • The value of gene median taken as a cutoff to dichotomise gene expression values and perform Kaplan-Meier curves on the pool is an arbitrary value and may not be - and in most case is not - the best cutoff for the specific gene. Hence, a gene that is significant when considering continue values might not remain significant after dichotomisation.



[ back ]


Statistical tests:


  Survival statistical tests
Cox model

  - Aim of the Cox model:
Cox model is a regression model to express the relation between a covariate, either continuous (e.g. G gene) or ordered discrete (e.g. SBR grade), and the risk of occurrence of a certain event (e.g. metastatic relapse).
Its simplified formula for G gene can be written as follows:
h(t,g) = h0(t)*exp(.g), where h is the hazard function of the event occurrence at time t, dependent on the value g of G and h0(t) is the positive baseline hazard function, shared by all patients.
is the regression coefficient associated with G, the parameter one wants to evaluate.

  - Interpretation of Cox model results:
There are two particularly interesting results when building a Cox model: the p-value associated with , which tells us whether the covariate (e.g. gene) has a significant impact on the event-free survival (if the p-value is less than a certain threshold, usually 5%) and the hazard ratio (HR) (equal to exp()), sometimes summed up by its way (sign of ).


The HR, which is really interesting when the p-value is significant, is actually a risk ratio of an event occurrence between patients with regards to their relative measurements for the gene under study. To be more specific, the HR corresponds to the factor by which the risk of occurrence of the event is multiplied when the risk factor increases by one unit: h(t,G+1) = h(t,G)*exp().
The "way" of this HR permits therefore to know how the gene will generally affect the patients event-free survival.
For example, saying that parameter associated with the gene G under study is negative (thus exp() < 1) means that the greater the value of G, the lower the risk of event: if A and B are two patients such as A's G value gA is greater than B's G value gB, then one can say that patient A has a lower risk of metastatic relapse than patient B:
    gA > gB, < 0
 ⇒ .gA < .gB
 ⇒ exp(.gA) < exp(.gB)
 ⇒ h0(t)*exp(.gA) < h0(t)*exp(.gB), that is, h(t, gA) < h(t, gB).



Kaplan-Meier curves

  - The Kaplan-Meier estimator:
Kaplan-Meier method, also known as the product-limit method, is a non-parametric method to estimate the survival function S(t) (= Pr(T > t): probability of having a survival time T longer than time t) of a given population. It is based on the idea that being alive at time t means being alive just before t and staying alive at t.
Suppose we have a population of n patients, among whom k patients have experienced an event (metastastic relapse or death for instance) at distinct times t1 < t2 < ... < tm (m=k if all events occurred at different times). For each time ti, let ni designs the number of patients still at risk just before ti, that is patients who have not yet experienced the event and are not censored, and let ei designs the number of events that occurred at ti. The event-free survival probability at time ti, S(ti), is then the probability S(ti-1) of not experiencing the event before time ti (at time ti-1) multiply by the probability (ni-ei)/ni of not experiencing the event at time ti (which by definition of ti corresponds to the probability of not experiencing the event during the interval between ti-1 and ti): S(ti) = S(ti-1) x (ni-ei)/ni.
The Kaplan-Meier estimator of the survival function S(t) is thus the cumulative product:

Kaplan-Meier formula

  - The curve:
The Kaplan-Meier survival curve, i. e. the plot of the survival function, permits to visualize the evolution of the survival function (estimate). The curve is shaped like a staircase, with a step corresponding to events at the end of each [ti-1; ti[ interval.

The illustration of the Kaplan-Meier survival estimator by the Kaplan-Meier survival curve becomes especially interesting when there are different groups of patients (e.g. according to different treatments or different values of biological markers) and one wants to compare their relative event-free survival. The different survival curves are then plotted together and can be visually compared.

  - Reliability of the estimation:
Caution must be taken concerning the interpretation of the survival curve, especially at the end of the survival curve: the censored patients induce a loss of information and reduce the sample size, making the survival curve less reliable; the end of the curve is obviously particularly affected. For our analyses, in order to minimize unreliability at the end of the curve, the 15% of patients with the longest event-free survival or follow-up are not plotteda.

Forest plot

A forest plot is a graphical means to view results, i.e. a score (odds or hazard ratio) and a confidence interval (CI), of the same analysis applied to different populations (studies). In particular it permits, via Cox HRs, to survey the impact of a gene on survival in different cohorts all at once, and thus to get a better (visual) idea of how the results vary between studies.
A forest plot is organized as follows: for each study, the score (eg. HR) is represented by a square centred on the value of the score (HR) and whose size depends on the precision of the score estimation (the more precise the estimation, the bigger the square). A horizontal line passing through the square represents the (usually 95%) CI. At the bottom of the forest plot are represented the score (HR) and CI obtained by the pool (i.e. all cohorts pooled) in the shape of a diamond with the centre representing the score (HR) and the right and left ends representing the CI limits. Finally, a vertical line representing a no effect score (HR=1) is drawn.

a Pocock et al. Lancet. 2002; 359(9318):1686-9

[ back ]



  Gene expression correlations
Pearson correlation

  - The coefficient:
Pearson correlation coefficient, also known as the Pearson's product moment correlation coefficient and denoted by r, measures the linear dependence (correlation) between two variables (e.g. genes).
It is obtained by the formula r = cov(G1,G2) / (std(G1)*std(G2)), where cov(G1,G2) is the covariance between the variables G1 and G2 and std denotes the standard deviation of each variable.
r values can vary from -1 to 1. A negative r means that when the first variable increases, the second one decreases, a postive r means that both variables increase or decrease simultaneously. The greater the r in absolute value, the stronger the linear dependence between the two variables, with the extreme values of -1 or 1 meaning a perfect linear dependence between the two variables, in which case, if the two variables are plotted, all data points lie on a line.

  - The associated p-value:
Along with the Pearson correlation coefficient, one can test if this coefficient is different from 0, knowing that the statistic
t = r*√(n-2)/√(1-r2) follows a Student distribution with (n-2) degrees of freedom, n being the number of values.
The p-value associated with the Pearson correlation coefficient permits thus to know if a linear dependence exists between the two variables.
Note that one has to be careful when interpreting p-value associated with Pearson correlation coefficient: a significant p-value means that a linear dependence exists between two variables but does not mean that this linear dependence is strong; for example, a coefficient of 0.05 with 1600 data points is associated with a significant p-value (p = 0.046) but one can certainly not conclude that there is a strong linear dependence between the two variables !



Correlation map

A correlation map illustrates pairwise correlations among a given group of genes.
A correlation map is a square table where each line and each column represent a gene. Each cell represents an "interaction" between two genes and is coloured according to the value of the Pearson correlation coefficient between these two genes, from dark blue (coefficient = -1) to dark red (coefficient = 1).
Cells from the diagonal of the correlation map represents "interaction" of a gene with itself and are coloured in black.

Pairwise correlation plot

On a correlation plot, the least-squares regression line is plotted along with the data points to illustrate the correlation between two given genes.

[ back ]



  Gene expression analyses
Box and whiskers plots

Box and whiskers plots permit to graphically represent descriptive statistics of a continuous variable (e.g. gene) : the box goes from the lower quartile (Q1) to the upper quartile (Q3), with an horizontal line marking the median. At the bottom and the top of the box, whiskers indicate the distance between the Q1, respectively Q3, and 1.5 times the interquartile range, that is : Q1-1.5*(Q3-Q1) and Q3+1.5*(Q3-Q1). Finally, stars indicate outliers, if there is any, that is, patients with values below or above the end of the whiskers.



Box and whiskers plots permit to visually compare distributions of a gene among the different population groups. When there is more than one group, Welch's test is used to evaluate the difference of gene's expression in between the groups. Moreover, when there are at least three different groups and Welch's p-value is significative (indicating that gene's expression is different in between at least two subpopulations), Dunnett-Tukey-Kramer's test is used for two-by-two comparisons (this test permits to know the significativity level but does not give a precise p-value).
[ back ]



Biological validation:


Complexity of bioinformatics process may distort genomic data, and downstream, statistics applied on these data and meta-data may conduct to erroneous results. That is why biological validation of our tool is needed.
Since 2010, we conduct a screening of breast cancer markers (RNA and protein) referenced in PubMed (keywords: breast cancer marker/biomarker). Significance of these genes is then tested in our tool.
These tests proved that bc-GenExMiner caught biological sense contained in annotated genomic data and preserved it from bioinformatics biases, even when data are merged in new cohorts, and that its results are pertinent. Following tables display concordant conclusions about significance of recently published candidate markers in breast cancer.

1) Prognostic module validation

1-1) Exhaustive
Table 3: Tested genes for biological validation of prognostic module
#Gene symbol   Authors   Year   Journal   PubMed
APOBEC3BSieuwerts AM et al.2014 Horm Cancerlink to article
ARHGDIBMoon HG et al.2010 Cancer Res Treatlink to article
AURKASiggelkow W et al.2012 BMC Cancerlink to article
AZGP1Parris TZ et al.2013 Int J Cancerlink to article
BECN1He Y et al.2014 Tumour Biollink to article
BIRC5 (Survivin)Xu C et al.2012 Breast Cancerlink to article
BTG2Möllerström E et al.2010 BMC Cancerlink to article
C2orf40Lu J et al.2013 Epigeneticslink to article
CA9Lancashire LJ et al.2010 Breast Cancer Res Treatlink to article
10 CA9 (CAIX)Pinheiro C et al.2011 Histol Histopathollink to article
11 CADM4Jang SM et al.2013 J Clin Pathollink to article
12 CARM1Habashy HO et al.2013 Breast Cancer Res Treatlink to article
13 CARM1Cheng H et al.2013 Diagn Pathollink to article
14 CCNB1Niméus-Malmström E et al.2010 Int J Cancerlink to article
15 CCNB1Ding K et al.2014 Med Hypotheseslink to article
16 CCNB2Shubbar E et al.2013 BMC Cancerlink to article
17 CCNE1Lundgren C et al.2014 Acta Oncollink to article
18 CDO1Jeschke J et al.2013 Clin Cancer Reslink to article
19 CENPAMcGovern S et al.2012 Breast Cancer Reslink to article
20 CKAP2Kim HS et al.2014 PLoS Onelink to article
21 CRYABKoletsa T et al.2014 BMC Clin Pathollink to article
22 CX3CL1Park MH et al.2012 J Surg Oncollink to article
23 CXCL12Lv ZD et al.2014 Int J Clin Exp Pathollink to article
24 CXXC5Knappskog S et al.2011 Ann Oncollink to article
25 DEKLiu S et al.2012 Pathol Intlink to article
26 DUSP1 (MKP1)Hou MF et al.2012 World J Surglink to article
27 ELAVL1Wang J et al.2013 Breast Cancer Res Treatlink to article
28 ERCC1Gerhard R et al.2013 Pathol Res Practlink to article
29 FEN1Abdel-Fatah TMA et al.2014 Mol Oncollink to article
30 FGF19Buhmeida A et al.2013 Tumour Biollink to article
31 FLOT2Wang X et al.2013 J Transl Medlink to article
32 GATA3Yoon NK et al.2010 Hum Pathollink to article
33 GBP2Godoy P et al.2012 Breast Cancerlink to article
34 GGHShubbar E et al.2013 BMC Cancerlink to article
35 GSDMBHergueta-Redondo M et al.2014 PLoS Onelink to article
36 HJURPHu Z et al.2010 Breast Cancer Reslink to article
37 HOTAIRSørensen KP et al.2013 Breast Cancer Res Treatlink to article
38 IGKCChen Z et al.2012 PLoS Onelink to article
39 IGKCSchmidt M et al.2012 Clin Cancer Reslink to article
40 IL8Milovanovic J et al.2013 J BUONlink to article
41 JMJD6Lee YF et al.2012 Breast Cancer Reslink to article
42 KIAA1199Jami MS et al.2014 BMC Cancerlink to article
43 KIF2AWang J et al.2014 BMC Cancerlink to article
44 LAPTM4BXiao M et al.2013 J Cancer Res Clin Oncollink to article
45 LOXL2Ahn SG et al.2013 Breast Cancer Res Treatlink to article
46 LRIG1Krig SR et al.2011 Mol Cancer Reslink to article
47 MAGEA3Balafoutas D et al.2013 BMC Cancerlink to article
48 MAP1LC3B (LC3B)He Y et al.2014 Tumour Biollink to article
49 MAPTBaquero MT et al.2012 Cancerlink to article
50 MMP1Boström P et al.2011 BMC Cancerlink to article
51 MMP9Merdad A et al.2014 Anticancer Reslink to article
52 MMP9Yousef EM et al.2014 BMC Cancerlink to article
53 MTUS1 (ATIP3)Molina A et al.2013 Cancer Reslink to article
54 MYCBPPresti M et al.2010 PLoS Onelink to article
55 NCOA3 (AIB1)Burandt E et al.2013 Breast Cancer Res Treatlink to article
56 NEFLLi XQ et al.2012 PLoS Onelink to article
57 NEFLKang S et al.2013 Int J Oncollink to article
58 NPAS2Yi C et al.2010 Breast Cancer Res Treatlink to article
59 NQO1Yang Y et al.2014 J Exp Clin Cancer Reslink to article
60 P4HA1Gilkes D et al.2013 Cancer Reslink to article
61 P4HA2Gilkes D et al.2013 Cancer Reslink to article
62 P4HA2Xiong G et al.2014 BMC Cancerlink to article
63 PARP1Rojo F et al.2012 Ann Oncollink to article
64 PIPParris TZ et al.2013 Int J Cancerlink to article
65 PIP (GCDFP15)Darb-Esfahani S et al.2014 BMC Cancerlink to article
66 POLQLemée F et al.2010 Proc Natl Acad Sci U S Alink to article
67 PTK6Regan Anderson TM et al.2013 Cancer Reslink to article
68 RACGAP1Pliarchopoulou K et al.2013 Cancer Chemother Pharmacollink to article
69 RRM2Putluri N et al.2014 Neoplasialink to article
70 S100A8Parris TZ et al.2013 Int J Cancerlink to article
71 SDC1Nguyen TL et al.2013 Am J Clin Pathollink to article
72 SHFM1Rezano A et al.2013 BMC Cancerlink to article
73 SIX1Jin H et al.2014 Exp Mol Pathollink to article
74 SKP2Liu J et al.2012 PLoS Onelink to article
75 SLC2A1 (GLUT1)Pinheiro C et al.2011 Histol Histopathollink to article
76 SLC35B2Chim-Ong A et al.2014 Asian Pac J Cancer Prevlink to article
77 SLC9A3R1 (NHERF1)Malfettone A et al.2012 BMC Cancerlink to article
78 SMAD4Liu NN et al.2013 Tumour Biollink to article
79 SMARCA4 (BRG1)Bai J et al.2013 PLoS Onelink to article
80 SPHK1Zhang Y et al.2014 PLoS Onelink to article
81 SPRY2Faratian D et al.2011 PLoS Onelink to article
82 SRPK1Li X-H et al.2014 Med Oncollink to article
83 STC1Murai R et al.2014 Clin Exp Metastasislink to article
84 STMN1Baquero MT et al.2012 Cancerlink to article
85 TIMM17ASalhab M et al.2010 Breast Cancerlink to article
86 TMPRSS4Liang B et al.2013 Med Oncollink to article
87 TNFRSF12AWang J et al.2013 Histol Histopathollink to article
88 TOP2ASparano JA et al.2012 Breast Cancer Res Treatlink to article
89 TXNIPCadenas C et al.2010 Breast Cancer Reslink to article
90 TXNRD1Cadenas C et al.2010 Breast Cancer Reslink to article
91 UBE2CParris TZ et al.2013 Int J Cancerlink to article
92 UCHL1Schröder C et al.2013 J Cancer Res Clin Oncollink to article
93 VIMUlirsch J et al.2013 Breast Cancer Res Treatlink to article
94 YWHAG (14-3-3G)Song Y et al.2012 Cancer Epidemiollink to article

1-2) By molecular subtype
Table 4: Tested genes for biological validation of prognostic module by molecular subtype
#Gene symbol   Authors   Year   Journal   PubMed
CRYABMalin D et al.2013 Clin Cancer Reslink to article
PIK3R1Cizkova M et al.2013 BMC Cancerlink to article

2) Correlation module validation

2-1) Targeted
Table 5: Tested genes for biological validation of targeted correlation analysis
#Gene(s) symbolAuthorsYear(s)Journal(s)PubMed
ESR1; GATA3; FOXA1; XBP1Lacroix M et al.2004 Mol Cell Endocrinollink to article
FBP1; ESR1Dong C et al.2013 Cancer Celllink to article
MKI67; AURKA; UBE2CWirapati P et al.
Jézéquel P et al.
Loussouarn D et al.
2008 
2009 
2009 
Breast Cancer Res
Breast Cancer Res Treat
Br J Cancer
link to article
link to article
link to article
PIP (GCDFP15);ARDarb-Esfahani S et al.2014 BMC Cancerlink to article
TNFAIP1; POLDIP2
RAF1; MKRN2
TBCB; POLR2I
Grinchuk OV et al.2010 BMC Genomicslink to article

2-2) Exhaustive and Gene ontology analysis
Table 6: Tested genes for biological validation of exhaustive correlation and gene ontology analyses
#Gene symbol   Authors   Year   Journal   PubMed
AURKADexter TJ et al.2010 BMC Syst Biollink to article
ESR1--link to article
FTLJézéquel P et al.2012 Int J Cancerlink to article

2-3) By chromosomal location
Table 7: Tested genes for biological validation of correlation analysis by chromosomal location
#Gene(s) symbolAuthorsYear(s)Journal(s)PubMed
ESR1; C6orf97; C6orf211; RMND1Dunbier AK et al.2011 PLoS Genetlink to article
LSM1; BAG4; DDHD2; PPAPDC1B; WHSC1L1Bernard-Pierrot I et al.
André F et al.
2008 
2009 
Cancer Res
Clin Cancer Res
link to article
link to article
Numerous genesBuness A et al.
Jézéquel P et al.
2007 
2013 
Bioinformatics
Database (Oxford)
link to article
link to article
TRAF4; MED24; GGA3Bergamaschi A et al.
Buness A et al.
Hu X et al.
2006 
2007 
2009 
Genes Chromosomes Cancer
Bioinformatics
Mol Cancer Res
link to article
link to article
link to article

3) Expression map module validation

By molecular subtype
Table 8: Tested genes for biological validation of expression map analysis
#Gene(s) symbolAuthorsYear(s)Journal(s)PubMed
CALB2Taliano RJ et al.2013 Hum Pathollink to article
CDH3Liu N et al.2012 Med Oncollink to article
CDH3Tsang JYS et al.2013 Hum Pathollink to article
CEACAM6Tsang JYS et al.2013 Breast Cancer Res Treatlink to article
CEACAM6Balk-Møller E et al.2014 Am J Pathollink to article
CKAP2Kim HS et al.2014 PLoS Onelink to article
CRYABMalin D et al.2013 Clin Cancer Reslink to article
CRYABKoletsa T et al.2014 BMC Clin Pathollink to article
CXCR4Zhang M et al.2012 Ultrastruct Pathollink to article
10 DACH1Powe DG et al.2014 PLoS Onelink to article
11 ERCC1Gerhard R et al.2013 Pathol Res Practlink to article
12 FBP1Dong C et al.2013 Cancer Celllink to article
13 FEN1Abdel-Fatah TMA et al.2014 Mol Oncollink to article
14 FOXC1Ray PS et al.2011 Ann Surg Oncollink to article
15 FSCN1Esnakula AK et al.2013 J Clin Pathollink to article
16 FZD7Yang L et al.2011 Oncogenelink to article
17 LDHBMcCleland ML et al.2012 Cancer Reslink to article
18 LRP6Yang L et al.2011 Oncogenelink to article
19 MED1; STARD3; TCAP; PNMT; PGAP3; C17orf37; ORMDL3; PSMD3; NR1D1Kauraniemi P et al.2006 Endocr Relat Cancerlink to article
20 MET; ETS1; KRT6A; KRT6B; ANXA8; MMP9Charafe-Jauffret E et al.2006 Oncogenelink to article
21 MKI67; AURKA; UBE2CWirapati P et al.
Jézéquel P et al.
Loussouarn D et al.
2008 
2009 
2009 
Breast Cancer Res
Breast Cancer Res Treat
Br J Cancer
link to article
link to article
link to article
22 PI3Labidi-Galy SI et al.2014 Oncogenelink to article
23 PIP (GCDFP15)Darb-Esfahani S et al.2014 BMC Cancerlink to article
24 PRLR; KRT19Charafe-Jauffret E et al.2006 Oncogenelink to article
25 SDC1Nguyen TL et al.2013 Am J Clin Pathollink to article
26 SFRP1Jeong YJ et al.2013 Oncol Replink to article
27 SOX10Cimino-Mathews A et al.2012 Hum Pathollink to article
28 SPDEFBuchwalter G et al.2013 Cancer Celllink to article
29 TCF7Yang L et al.2011 Oncogenelink to article
30 VIMTsang JYS et al.2013 Hum Pathollink to article


[ back ]





© 2009 bc-GenExMiner team    Contact Last update March 2016 Disclaimer Site map
2017