Project Description

Objective:
Outcome of patients with HR+/Her2- metastatic breast cancer is highly heterogeneous, with a subset of patients who are long term survivors, and at the other extreme, a group of patients who die within two years. The objective of this project is to characterise this group with very poor outcome. Specific Aim 1: to identify a genomic profile associated with poor outcome. We have already performed whole exome sequencing of 617 metastatic breast cancer patients (Bertucci, Nature, 2019). We will have sequenced overall 1500 patients by the end of 2020. We believe this cohort will represent patient heterogeneity sufficiently well to develop useful statistical models of patient outcomes. Genomic data are often characterized by complex interactions, so we plan to use graph-based genomic data analysis to model these interactions (Pirayre 2017). Specific Aim 2: to identify a proteomic profile associated with poor outcome. In order to address this question, we will profile the same samples mentioned in specific aim 1 in a proteomic facility (Liquid chromatography-mass spectrometry, n=6 000 prot, Astra Zeneca). We plan to profile 1 000 samples with a high degree of precision. From the statistical point of view, using so many variables to characterize a relatively low number of patients may incur bias. We will specifically use bias-reduction techniques from AI techniques, such as bagging and boosting methods (Bühlman 2012), to produce more reliable models and profiles. Specific Aim 3: to integrate data from genomic and proteomics with methods of representation learning for multi-modal data and to develop a predictor of patients outcome with artificial intelligence models. A special focus will be devoted to the identification of biological markers with a strong influence on patient outcomes thanks to techniques of explainable AI. Combining data from these two very different sources is a challenge from the data analytics point of view. We plan to leverage the power of graph-based neural-networks to this end (Kipf 2017). Specific Aim 4: to validate the predictor in an independent dataset (PADA1 trial, n=300 for the validation set).