c91bf087d8906755.tex
1: \begin{abstract} Most methods for learning causal structures from non-experimental data rely on some assumptions of simplicity, the most famous of which is known as the Faithfulness condition. Without assuming such conditions to begin with, we develop a learning theory for inferring the structure of a causal Bayesian network, and we use the theory to provide a novel justification of a certain assumption of simplicity that is closely related to Faithfulness. Here is the idea. With {\em only} the Markov and IID assumptions, causal learning is notoriously too hard to achieve statistical consistency but we show that it can still achieve a quite desirable ``combined'' mode of stochastic convergence to the truth: having almost sure convergence to the true causal hypothesis with respect to {\em almost all} causal Bayesian networks, together with a certain kind of {\em locally uniform} convergence. Furthermore, {\em every} learning algorithm achieving at least that joint mode of convergence has this property: having stochastic convergence to the truth with respect to a causal Bayesian network $N$ {\em only if} $N$ satisfies a certain variant of Faithfulness, known as Pearl's Minimality condition---as if the learning algorithm were designed by assuming that condition. This explains, for the first time, why it is not merely optional but mandatory to assume the Minimality condition---or to proceed as if we assumed it. 
2: \end{abstract}
3: