a305768f967a85d3.tex
1: \begin{abstract}
2: %\New{\bf Update with theorems}
3: Existing techniques to reconstruct tree models of progression for
4: accumulative processes, such as cancer, seek to estimate \cause 
5: by combining correlation and a frequentist notion of temporal
6: priority. In this paper, we define a novel theoretical framework
7: \New{called CAPRESE (CAncer PRogression Extraction with Single Edges)} 
8: to reconstruct such models based on the notion of probabilistic \cause
9: defined by Suppes.  %\Old{which differs fundamentally from that based on correlation.} % which is more suitable than correlation to infer causal structures. 
10: %We consider a general reconstruction setting complicated by the presence of noise in the data, owing to the intrinsic variability of 
11: %biological processes as well as experimental or measurement errors. To gain immunity to noise in the reconstruction performance 
12: %we use a weighted estimator.
13: We consider a general reconstruction setting complicated by the
14: presence of noise in the data due to 
15: biological variation, as well as experimental or measurement
16: errors. %\Old{To increase the resistance against} 
17: \New{To improve
18:   tolerance to} noise we define and use a
19: shrinkage-like estimator.
20: \New{We prove the correctness of our algorithm by showing asymptotic
21:   convergence to the correct tree under mild constraints on the level
22:   of noise. Moreover,} on synthetic data, we show that our approach
23: outperforms the state-of-the-art, that it is efficient even with a
24: relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models. 
25: \end{abstract}
26: