q-bio0604017/eccs.tex
1: \documentclass[11pt,a4paper]{article}
2: \usepackage{amsfonts,amssymb,latexsym, amsmath,epsfig,theorem}
3: \usepackage[latin1]{inputenc}
4: \usepackage{fullpage}
5: \usepackage{algorithm2e}
6: %\usepackage{multicol}
7: 
8:  % \usepackage{times}
9:   \usepackage{mathptmx}
10:   \usepackage{textcomp}
11: 
12: \newcommand{\rem}{\bf \em}
13: \newcommand{\vect}[1]{{\bf #1}}
14: \newcommand{\ord}[1]{\ensuremath{\mathcal{O}\!\left(#1 \right)} }
15: \newcommand{\DP}[2]{ \ensuremath{ \frac{\partial #1 }{\partial #2 } } }
16: \newcommand{\D}[2]{ \ensuremath{ \frac{d #1 }{d #2 } }}
17: \newcommand{\correction}[1]{ {\bf #1}}
18: \newcommand{\comment}[1]{ \begin{center} \fbox{\begin{minipage}[h]{.9\linewidth} #1 \end{minipage}} \end{center}}
19: 
20: \newcommand{\noun}[1]{\textsc{#1}}
21: %% Bold symbol macro for standard LaTeX users
22: \providecommand\boldsymbol[1]{\mbox{\boldmath $#1$}}
23: 
24: \newcommand\BS[1]{\mbox{\fontseries{b}\selectfont #1}}
25: \newcommand\plus{\BS{+}}
26: \newcommand\moins{\BS{\textminus}}
27: \newcommand\plusmoins{\BS{\textpm}}
28: \newcommand\indet{\BS{?}}
29: \newcommand\zero{\BS{0}}
30: 
31: \newcommand\DEF{\stackrel{\text{\tiny def}}{=}}
32: \newcommand\actif[1]{\ensuremath{\overline{\text{#1}}}}
33: 
34: 
35: \theoremstyle{break}\newtheorem{Theorem}{Theorem}[section]
36: \theoremstyle{break}\newtheorem{Prop}[Theorem]{Property}
37: \theoremstyle{break}\newtheorem{Lemma}[Theorem]{Lemma}
38: \theoremstyle{break}\newtheorem{Def}[Theorem]{Definition}
39: 
40: \theoremstyle{break}\newtheorem{System}{System}
41: \theoremstyle{break}\newtheorem{Data}{Data}
42: \theoremstyle{break}\newtheorem{Observations}{Observations}
43: 
44: \DeclareMathOperator{\pred}{pred}
45: \DeclareMathOperator{\sgn}{sgn}
46: 
47: % ************ Brouillon *************
48: %\setlength\overfullrule{5pt}
49: %\newcommand\XXX[1]{[XXX #1]}
50: 
51: %\pagestyle{fancy}
52: %\lhead{}
53: %\chead{}
54: %\rhead{}
55: %\renewcommand\headrulewidth{0pt}
56: %\renewcommand\footrulewidth{0pt}
57: %\addtolength\footskip{\baselineskip}
58: %\addtolength\footskip{\baselineskip}
59: %\lfoot{Soumission à \textsc{Jobim}}
60: %\cfoot{\today}
61: %\rfoot{\thepage}
62: 
63: \begin{document}
64: 
65: \title{Complex Qualitative Models in Biology: a new approach}
66: 
67: 
68: \author{ P. \noun{Veber}$\ ^1$
69: \and M. \noun{Le Borgne}$\ ^1$ \and A. \noun{Siegel}$\ ^1$
70: \and S. \noun{Lagarrigue}$\ ^3$ \and
71: O. \noun{Radulescu}$\ ^2$ }
72: 
73: \date{}
74: %\affiliation{$^{*}$ ENSA Rennes\\
75: %$^{\dag}$ UMR de génétique animale, INRA, Rennes\\
76: %$^{\ddag}$ Projet Symbiose, IRISA, Rennes\\
77: %$^{\mathsection}$ IRMAR UMR-CNRS 6625, Rennes\\
78: %$^{**}$ INSERM U620, Rennes}
79: 
80: 
81: \maketitle
82: 
83: 
84: 
85: {\footnotesize $^1$ Projet Symbiose.  Institut de Recherche en
86: Informatique et Syst\`emes Al\'eatoires, IRISA-CNRS 6074-Université de
87: Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France} 
88: 
89: {\footnotesize $^2$ Institut de Recherche
90: Math\'ematique de Rennes, UMR-CNRS 6625, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France}
91: 
92: {\footnotesize  $^3$ UMR Génétique animale, Agrocampus Rennes-INRA,  65 rue de Saint-Brieuc,  CS 84215 Rennes, France}
93: 
94: \smallskip
95: \paragraph{Abstract.} 
96: %\abstract{
97: {\em 
98: We advocate the use of qualitative models in the analysis of large biological
99: systems. We show how qualitative
100: models are linked to theoretical differential models and practical
101: graphical models of biological networks. A new technique for analyzing qualitative models is
102: introduced, which is based on an efficient representation of
103: qualitative systems. As shown through several applications, this
104: representation is a relevant tool for the understanding and testing of
105: large and complex biological networks.
106: }
107: 
108: 
109:  
110: \section{Introduction}
111: \label{sec:intro}
112: 
113: Understanding the behavior of a biological system from the interplay
114: of its molecular components is a particularly difficult task. A
115: model-based approach proposes a framework to express some hypotheses
116: about a system and make some predictions out of it, in order to compare
117: with experimental observations. Traditional approaches (see
118: \cite{DJ02} for
119: an interesting review) include ordinary differential equations or
120: stochastic processes. While they are powerful tools to acquire a fine
121: grained knowledge of the system at hand, these frameworks need
122: accurate experimental data on chemical reactions kinetics, which are
123: scarcely available. Furthermore, they also are computationally
124: demanding and their practical use is restricted to a limited number of
125: variables.
126: 
127: As an answer to these issues, many approaches were proposed, that
128: abstract from quantitative details of the system. Among others, let us
129: stress the work done on gene regulation dynamics \cite{DJGHP04+}, hybrid
130: systems \cite{GT04} or discrete event systems \cite{CRCDF04+},
131: \cite{CRRT04}. The goal of such
132: qualitative frameworks is to enable system-level analysis of a
133: biological phenomenon. This appears as a relevant answer to recent
134: technical breakthrough in experimental biology:
135: \begin{itemize}
136: \item microarrays, mass spectrometry, protein chips currently allow to
137:   measure thousands of variables simultaneously,
138: \item obtained measurements are rather noisy, and may not be
139:   quantitatively reliable.
140: \end{itemize}
141: 
142: Microarrays for instance, are used for comparing the activity of genes
143: between two experimental settings. A microarray experiment gives
144: differential measure between two experimental settings. 
145: It delivers  informations on the relative activity of each
146: gene represented on the array. Despite many attempts made to
147: quantified the output of microarrays, the essential output of the
148: technique says, for example, that a gene
149: G is more active in situation A than in situation B.
150: 
151: In this paper, we use a framework developed in \cite{Biosystems} for
152: the comparison of two experimental conditions, in order to derive
153: qualitative constraints on the possible variations of the
154: variables. Our main contribution is the use of an efficient
155: representation for the set of solutions of a qualitative system.
156: This representation allows to solve  systems with
157: hundreds of variables. Moreover, this representation opens the way to finer
158: analysis of qualitative systems. This new approach is
159: illustrated by solving three important problems:
160: \begin{itemize}
161: \item checking the accordance of a qualitative system with qualitative
162:   experimental data.
163: \item minimally correcting corrupted data in discordance with a
164:   model
165: \item helping in the design of experiments
166: \end{itemize}
167: 
168: Our main focus here is to show how to use large qualitative models and
169: qualitative interpretations of experimental data. In this respect our
170: work could be used as an extension to what was proposed in \cite{GRRLH03+},
171: where basically the authors propose to analyze pangenomic gene
172: expression arrays in \emph{E.coli}, using simple qualitative rules. 
173: 
174: In the first section we establish links between differential,
175:  graphical and qualitative models.
176: 
177: \section{Mathematical modeling}
178: \label{sec:maths} In this section we show how qualitative models can
179: be linked to more traditional differential models. Differential
180: models are central to the theory of metabolic control
181: \cite{meta-control,meta-control2}. They also have been
182: applied to various aspects of gene networks dynamics.
183: %(cycles (TODO citer
184: %tyson, goldbeter), switches (TODO citer  hasty),etc.).
185: The purpose of this section is to lay down a set of qualitative
186: equations describing steady states shifts of differential models.
187: For the sake of completeness, we rederive in a simpler case results
188: that have been established in greater generality in
189: \cite{Biosystems,Radulescu05}.
190: 
191: \subsection{Modeling assumptions}
192: 
193: Let us consider a network of interacting cellular constituents,
194: numbered from 1 to $n$. These constituents may be proteins, RNA
195: transcripts or metabolites for instance. The state vector $X$
196: denotes the concentration of each constituent.
197: 
198: 
199: \paragraph{Differential dynamics}
200: 
201: $X$ is assumed to evolve according to the following differential
202: equation:
203: 
204: $$ \D{X}{t}=F(X) $$
205: 
206: \noindent where $F$ is an (unknown) nonlinear, differentiable function. A
207: steady state $X_{eq}$ of the system is a solution of the algebraic
208: equation:
209: 
210: $$ F(X_{eq}) = 0.$$
211: 
212: 
213: Steady states are asymptotically stable if they attract all nearby
214: trajectories. A steady state is non-degenerated if the Jacobian
215: calculated in that steady state is non-vanishing.
216:  According to the Grobman-Hartman theorem, a sufficient condition to have
217:  nondegenerated asymptotically stable
218:  steady states is
219: $Re (\lambda_i) < -C, C>0, i=1,\ldots,n$, where $\lambda_i$ are the
220: eigenvalues of the Jacobian matrix calculated at the steady state.
221: 
222: 
223: 
224: 
225: \paragraph{Experiment modeling}
226: Typical two state experiments such as differential microarrays are
227: modeled as steady state shifts. We suppose that under a change of
228: the control parameters in the experiment, the system goes from one
229: non-degenerated stable steady state to another one. 
230: The output of the two state experiment can be expressed in terms of
231: concentration variations for a subset of products, between the two
232: states. We suppose that the signs of these variations were proven to
233: be statistically significant.
234: 
235: 
236: \paragraph{Interaction graph}
237: The only knowledge we require about the function $F$ concerns the
238: signs of the derivatives $\DP{F_i}{X_j}$.  These are interpreted as
239: the action of the product $j$ on the product $i$. It is an
240: activation if the sign is $\plus$, an inhibition if the sign is
241: $\moins$. A null value means no action.
242: 
243: An interaction graph $G(V,E)$ is  derived from the Jacobian matrix
244: of $F$:
245: \begin{itemize}
246: \item with nodes $V = \{1,\dots,n\}$ corresponding to products
247: \item and (oriented) edges $E = \{ (j,i) | \DP{F_i}{X_j} \neq 0
248:   \}$. Edges are labeled by $s(j,i) = \sgn(\DP{F_i}{X_j})$.
249: \end{itemize}
250: 
251: The set of predecessors of a node $i$ in $G$ is denoted $\pred(i)$. The 
252: interaction graph is actually built from informations gathered
253: in the literature. In consequence in some places it may be
254: incomplete (some interactions may be missing), in others it may be
255: redundant (some interactions may appear several times as direct and
256: indirect interactions). It is an important issue that neither
257: incompleteness nor redundancy do not introduce inconsistencies and
258: this will be addressed in section \ref{sec:qua_exp}.
259: 
260: 
261: 
262: \paragraph{Negative diagonal in the Jacobian matrix}
263: For any product $i$, we exclude the possibility of vanishing
264: diagonal elements of the Jacobian $\DP{F_i}{X_i}$. This can be
265: justified by taking into account  degradation and dilution (cell
266: growth) processes that can be represented  as negative self-loops in
267: the interaction graph, that is for all $i$, $(i,i) \in E$ and
268: $s(i,i) = \moins$.
269: 
270: 
271: 
272: 
273: \paragraph{Discussion} 
274:  In our mathematical modeling we suppose that the system
275: starts and ends in non-degenerated stable steady states. Of course
276: this is not always the case for several reasons: the waiting time to
277: reach steady state is too big; one can end up in a limit cycle and
278: oscillate instead of reaching a steady state.
279: %(TODO citer quelques
280: %exemples P53,NFKB, circadian clocks). 
281: All these possibilities should
282: be considered with caution.  Actually this hypothesis 
283: might be difficult to check from the two states only. Complementary
284: strategies such as time series analysis 
285: %(TODO citer Hoffmann et Uri Alon)
286: could be employed in order to assess the possibility of limit cycle
287: oscillations.
288: 
289: Positive self-regulation is also  possible but introduces
290: a supplementary complication.
291: In this case
292: for certain values of the  concentrations
293: degradation exactly compensates the positive self-regulation and
294: the diagonal elements of the Jacobian vanish (this is
295: a consequence of the intermediate value theorem).
296: We can avoid dealing with this situation by
297: considering that the positive self-regulation does not act directly
298: and that it involves intermediate species.
299: This is a realistic assumption because a  molecule never really acts directly
300: on itself (transcripts can be auto-regulated but only via protein
301: products).
302: Thus, all nodes can keep their negative self-loops and all diagonal
303: elements of the Jacobian can be considered to be non-vanishing.
304: Although the positive regulation may imply vanishing higher order
305: minors of the Jacobian, this will not affect our local qualitative
306: equations.
307: 
308: 
309: 
310: 
311: 
312: \subsection{Quantitative variation of one variable}
313: We focus 
314: here on the variation of the concentration of a single chemical
315: species represented by a component $X_i$ of the vector $X$. Since we
316: have adopted a {\em static} point of view, we are only interested in
317: the variation of $X_i$ between two non-degenerated stable steady
318: states  $X_{eq}^1$ and $X_{eq}^2$ independently of the trajectory of
319: the dynamical system between the two states.
320: 
321: Let us denote by $\hat{X}_i$ the vector of dimension $n_i$ obtained
322: by keeping from $X$ all coordinates $j$ that are predecessors of $i$
323: in the interaction graph. Then, under some additional assumptions
324: described and discussed in \cite{Radulescu05}, we have the following
325: result: 
326: 
327: \begin{Theorem}
328: \label{math:var}
329: The variation of the concentration
330: of species $i$ between two non-degenerated steady states $X_{eq}^1$
331: and $X_{eq}^2$ is given by
332: \begin{equation}
333:   X_{eq_i}^1 - X_{eq_i}^2 = \int_S -\left( \DP{F_i}{X_i} \right)^{-1}
334:   \sum_{k \in \pred(i)} \DP{F_i}{X_k} d X_k 
335: \label{math:eq:var}
336: \end{equation}
337: where $S$ is the segment linking
338: $\hat{X}^1_{eq_i}$ to $\hat{X}^2_{eq_i}$.
339: \end{Theorem}
340: 
341: Full proof is given in \cite{Radulescu05}. The above formula is a
342: quantitative relation between the variation of concentrations and the
343: derivatives $\DP{F_i}{X_j}$. Now our next move will be to introduce a
344: qualitative abstraction of this relation.
345: 
346: 
347: 
348: %% Any influence on $\hat{i}$ comes
349: %% exclusively from the same set of predecessors, for all concentration
350: %% vector $X$  in a domain $D$ including the two steady states.
351: %% Therefore, in $D$ the function  $F_i$ depends on $X_i, \hat{X}_i$
352: %% only. The fundamental hypothesis $\DP{F_i}{X_i} <0$ implies that the
353: %% function $X_i \rightarrow F_i(X_i, \hat{X}_i)$ is strictly
354: %% decreasing. Consequently, for each $\hat{X}_i$ the equation
355: %% $F_i(X_i, \hat{X}_i)=0$  defining steady states has at most one
356: %% solution. The set $F_i(X_i, \hat{X}_i)=0$  represents a $n_i$
357: %% dimensional
358: %%  manifold  $M_i \subset D$, including the two states. Using a
359: %% global implicit function theorem, one shows that the
360: %% set $F_i(X_i, \hat{X}_i)=0$ can be represented as the graph of a
361: %% function $X_i = \Phi_i (\hat{X}_i )$ in $D$ (see \cite{eccb} for
362: %% detailed hypotheses and proof).
363: 
364: 
365: %% The differential of $\Phi_i$ is given by the implicit function
366: %% theorem:
367: 
368: %% \begin{equation}
369: %% \label{math:diffphi}
370: %%  d \Phi_i = -\left( \DP{F_i}{X_i} \right)^{-1} \left( \sum_{k \in
371: %% \pred(i)} \DP{F_i}{X_k} d X_k \right)
372: %% \end{equation}
373: 
374: %%  In order to obtain the
375: %% variations of $X_i$ between the two states, one can simply integrate
376: %% the differential of $\Phi_i$ along any smooth path included in
377: %% $M_i$.
378: 
379: 
380: 
381: %Concentrations are always positive and  bounded. We can assume that
382: %they evolve in a closed  bounded hypercube and that  $\DP{F_i}{X_i}
383: %<K<0$ on this domain.
384: 
385: 
386: 
387: 
388: 
389: 
390: 
391: %For small perturbations of the parameters, the variation of
392: %concentration of a node $i$ can be approximated by linearizing
393: %Eq. \ref{math:steadystate}:
394: 
395: %\begin{equation}
396: %\sum_{j\in V} \DP{F_i}{X_j} \delta X_j +  \sum_{k} \DP{F_i}{P_k}
397: %\delta P_k   = 0. 
398: %\end{equation}
399: 
400: %Additionaly, we assume that parameters do
401: %not directly influence all variables. Mathematically we assume
402: %$\DP{F_i}{P_k} =0$, except for input nodes. In this case the sum 
403: %$\sum_{k} \DP{F_i}{P_k}\delta P_k$ represents the effect of the
404: %experimental settings on the system. 
405: %Using the above assumption that $\DP{F_i}{X_i} < 0$ we obtain:  
406: %\begin{equation}
407: %\label{math:quant}
408: % \delta X_i = -\left( \DP{F_i}{X_i} \right)^{-1} \sum_{k \in
409: %\pred(i)} \DP{F_i}{X_k} \delta X_k
410: %\end{equation}
411: 
412: %As mentioned earlier, quantitative information on the Jacobian, 
413: %or on variations $\delta X_k$ is generally not available. 
414: %Nevertheless, the sign of these values is known, most often, and can
415: %be used for reasoning.
416: 
417: 
418: 
419: \subsection{Qualitative equations}
420: We propose here to study Eq. \ref{math:eq:var} in sign algebra.
421: By sign algebra, we mean the set $\{\plus,\moins, \indet \}$, where
422: $\indet$ represents undetermined sign. This set is provided with the
423: natural  commutative operations:
424: \[
425: \begin{array}[c]{llllll}
426: \plus+\moins = \indet & \plus+\plus = \plus &\moins+\moins = \moins &
427: \plus\times\moins = \moins & \plus\times\plus = \plus & \moins\times\moins = \plus \\
428: 
429: \indet+\moins = \indet & \indet+\plus = \indet & \indet + \indet = \indet  &
430: \indet \times \moins = \indet & \indet \times \plus = \indet & \indet
431: \times \indet = \indet \\
432: \end{array}
433: \]
434: 
435: Equality in sign algebra $\approx$ is defined as follows:
436: \[
437: \begin{array}{c|c|c|c}
438: \approx & \plus & \moins & \indet \\
439: \hline
440: \plus & T & F & T \\
441: \hline
442: \moins & F & T & T \\
443: \hline
444: \indet & T & T & T \\
445: \end{array}
446: \]
447: 
448: 
449: Importantly, qualitative equality is not an equivalence relation,
450: since it is not
451: transitive. This implies that computations in qualitative algebra must
452: be carried with care. At least two major properties should be
453: emphasized: 
454: \begin{itemize}
455: \item if a term of a sum is indeterminate ($\indet$) then the whole
456:   sum is indeterminate.
457: \item if one hand of a qualitative equality is indeterminate, then
458:   the equality is satisfied whatever the value of the other hand is.
459: \end{itemize}
460: 
461: A \emph{qualitative system} is a set of algebraic equations with
462: variables in $\{\plus,\moins, \indet \}$. A \emph{solution} of this
463: system is a valuation of the unknowns which satisfies each equation, 
464: and such that no variable is instantiated to \indet. This
465: last requirement is important since otherwise any system would have
466: trivial solutions (like all variables to \indet). 
467: 
468: 
469: %% Eq \ref{math:diffphi} is established for small changes in the
470: %% parameters. However, Eq \ref{math:var} shows that the sign of $X_i$
471: %% variation can be deduced from the sign of the variations of the other
472: %% variables between two equilibrium points:
473: 
474: 
475: \begin{Theorem} 
476: Under the assumptions and notations of Theorem \ref{math:var}, if the
477: sign of
478: $\DP{F_i}{X_j}$ is constant, then the following relation holds in sign
479: algebra: 
480: \begin{equation}
481:  s(\Delta X_i) \approx
482: \sum_{k \in pred(i)}  s(k,i) s(\Delta X_k) \label{math:qual}
483: \end{equation}
484: where $s(\Delta X_k)$ denotes the sign of $X_{eq_k}^1 - X_{eq_k}^2$.
485: \end{Theorem}
486: 
487: By writing Eq. \ref{math:qual} for all nodes in the graph, we
488: obtain a system of equations on signs of variations, later referred to as
489: \emph{qualitative system} associated to the interaction graph $G$. This will
490: be used extensively in the next sections.
491: 
492: 
493: 
494: \subsection{Link between qualitative and quantitative}
495: \label{sec:link_qua_quant}
496: 
497: The qualitative system obtained from Eq.\ref{math:qual} is a consequence of
498: the quantitative relations that result from Theorem \ref{math:var}.  
499: So the sign function maps a quantitative
500: variation between two equilibrium points onto a
501: qualitative solution of Eq.\ref{math:qual}.
502: The converse is not true in general. For a given solution $S$ of the
503: qualitative system, there might be no equilibrium change  $\Delta X$
504: in the differential quantitative model, s.t. each real-valued
505: component of $\Delta X$ has the sign given by $S$.
506: 
507: However, some 
508: components of the solution vectors are uniquely determined by the
509: qualitative system. They take the same sign
510: value in every solution vector. For such so-called hard components, 
511: the sign of any quantitative solution (if it exists) is
512: completely determined by the qualitative system.
513: 
514: We will use the previous  properties to check the coherence between
515: models and experimental data. By experimental data we mean the sign of
516: the observed variation in concentration for some nodes. In particular,
517: if the qualitative system associated to an interaction graph $G$ has
518: no solution given some experimental observations, then no function $F$
519: satisfying the sign conditions on the derivatives can describe the 
520: observed equilibrium shift, meaning that either 
521: the model is wrong, either some data are corrupted. In the next
522: section, we introduce a simplified model related to lipid
523: metabolism, and illustrate the above described formalism.
524: 
525: 
526: \section{Toy example: regulation of the synthesis of fatty acids}
527: \label{sec:we}
528: 
529: 
530: In order to illustrate our approach, we use a toy example describing a
531: simplified model of genetic 
532: regulation of fatty acid synthesis in liver. The corresponding
533: interaction graph is shown in Fig. \ref{igraph2}.
534: 
535: 
536: Two ways of production of fatty acids coexist in liver. Saturated
537: and mono-unsaturated fatty acids are produced from citrates thanks
538: to a metabolic pathway composed of four enzymes, namely ACL (ATP
539: citrate liase), ACC (acetyl-Coenzyme A carboxylase), FAS (fatty
540: acid synthase) and SCD1 (Stearoyl-CoA desaturase 1).
541: Polyunsaturated fatty acids (PUFA) such as arachidonic acid  and
542: docosahexaenoic acid are synthesized from essential fatty acids
543: provided by nutrition;  D5D (Delta-5 Desaturase) and D6D (Delta-6
544: Desaturase) catalyze the key steps of the synthesis of PUFA.
545: 
546: PUFA  plays pivotal roles in many biological functions; among them, they
547:  regulate the expression  of genes  that impact on lipid,
548:  carbohydrate, and protein metabolism. 
549: The effects of PUFA are mediated either directly through their
550: specific binding to various nuclear receptors (PPAR$\alpha$ --
551: peroxisome proliferator activated receptors, LXR$\alpha$ --
552: Liver-X-Receptor $\alpha$, HNF-4$\alpha$) leading to changes in the
553: trans-activating activity of these transcription factors; or
554: indirectly as the result of changes in the abundance of regulatory
555: transcription factors (SREBP-1c -- sterol regulatory element
556: binding-protein--, ChREBP, etc.) \cite{Jump}.
557: 
558: 
559: \begin{figure*}[hbt]
560: \begin{center}
561: \includegraphics[width=12cm]{graphe_exemple3.eps}
562: \caption{Interaction graph for the toy model.
563: Self-regulation loops on nodes are omitted for sake of clarity.
564: Observed variations are depicted next to each vertex, when available.}
565: \label{igraph2}
566: \end{center}
567: \end{figure*}
568: 
569: \paragraph{Variables in the model}
570: 
571: We consider in our model nuclear receptors PPAR$\alpha$, LXR$\alpha$,
572: SREBP-1c (denoted by PPAR, LXR, SREBP respectively in the model), as
573: they are synthesized from the corresponding genes and the
574: trans-activating active forms of these transcription factors, that is,
575: LXR-a (denoting a complex LXR$\alpha$:RXR$\alpha$), PPAR-a (denoting a
576: complex PPAR$\alpha$:RXR$\alpha$) and SREBP-a (denoting the cleaved
577: form of SREBP-1c. We also consider SCAP -- (SREBP cleavage activating
578: protein), a key enzyme involved in the cleavage of SREBP-1c, that
579: interacts with another family of proteins called INSIG (showing the
580: complexity of molecular mechanism).
581: 
582: We also include in the model ``final'' products, that is, enzymes ACL,
583:  ACC, FAS, SCD1 (implied in the fatty acid synthesis from citrate),
584:  D5D, D6D (implied in PUFA synthesis) as well as PUFA themselves.
585: 
586: \paragraph{Interactions in the model}
587: Relations between the variables are the following. SREBP-a is an
588: activator of the transcription of ACL, ACC, FAS, 
589: SCD1, D5D and D6D \cite{Nara,Jump}.
590: LXR-a is a direct activator of the transcription of SREBP and FAS,
591: it also indirectly activates  ACL, ACC and SCD1
592: \cite{StefGus04}. Notice that
593: these indirect actions are kept in the model  because we don't know
594: whether they are only SREBP-mediated. 
595: 
596: PUFA activates the formation of PPAR-a from PPAR, and inhibits the formation
597: of LXR-a from LXR as well as the formation of  SREBP-a (by
598: inducing the degradation of mRNA and inhibiting the cleavage)
599: \cite{Jump}. SCAP represents the activators of the formation
600: of SREBP-a from SREBP, and is inhibited by PUFA.
601: 
602: PPAR directly activates the production of  SCD1, D5D, D6D
603: \cite{Miller,Tang,Takashi}. The dual regulation of 
604: SCD1, D5D and D6D by SREBP and PPAR is
605: paradoxical because SREBP transactivates genes for fatty acid
606: synthesis in liver, while PPAR induces enzymes for fatty acid
607: oxidation.  \\ \noindent Hence,  the induction of D5D and D6D gene by PPAR
608: appears to be
609: a compensatory response to the  increased PUFA demand caused by
610: induction of fatty acid oxidation. 
611: 
612: 
613: \paragraph{Fasting-refeeding protocols}
614: The  fasting-refeeding
615: protocols represent a favorable condition for studying
616: lipogenesis regulation;  we suppose that
617: during an experimentation, animals (as rodents or chicken) were kept
618: in a fasted state during several hours. Then, hepatic mRNA of LXR,
619: SREBP, PPAR, ACL, FAS, ACC and SCD1
620: are quantified by  DNA microarray analysis. Biochemical measures
621: also provide the variation of PUFA.
622: 
623:  A compilation of recent
624: literature on lipogenesis regulation provides hypothetical results of
625: such  protocols: SREBP, ACL, ACC, FAS and SCD1 decline
626: in liver during the fasted state \cite{Liang2002}. This is expected
627: because fasting results in an inhibition of fatty acid synthesis and an
628: activation of the fatty acid oxidation. For the same reason, PPAR is
629: increased in order to trigger oxidation. However, Tobin et al
630: (\cite{Tobin2000}) showed that fasting rats for 24h increased the
631: hepatic LXR mRNA, although LXR positively regulates fatty acid
632: synthesis in its activated form. Finally, PUFA levels can be
633: considered to be increased
634: in liver following  starvation because of the important lipolysis
635: from adipose tissue as shown by Lee et al in mice after 72h
636: fasting (\cite{Lee}).
637: 
638: 
639: 
640: 
641: \paragraph{Qualitative system derived from the graph}
642: As explained in the previous section, we derive a qualitative system
643: from the interaction graph shown in Fig. \ref{igraph2}. For ease of
644: presentation, we denote by \verb|A| the sign of variation for species
645: A. 
646: 
647: 
648: {\small
649: \begin{minipage}{.7\linewidth}
650: \begin{System}
651: \begin{verbatim}
652: (1)  PPAR-a  = PPAR + PUFA
653: (2)  LXR-a   = -PUFA + LXR
654: (3)  SREBP   = LXR-a
655: (4)  SREBP-a = SREBP + SCAP -PUFA
656: (5)  ACL     = LXR-a + SREBP-a - PUFA
657: (6)  ACC     = LXR-a + SREBP-a - PUFA
658: (7)  FAS     = LXR-a + SREBP-a - PUFA
659: (8)  SCD1    = LXR-a  + SREBP-a - PUFA + PPAR-a
660: (9)  SCAP    = -PUFA
661: (10) D5D     = PPAR-a + SREBP-a - PUFA
662: (11) D6D     = PPAR-a + SREBP-a - PUFA
663: \end{verbatim}
664: \label{system2}
665: \end{System}
666: \end{minipage}
667: \begin{minipage}{.3\linewidth}
668: \begin{Observations}
669: \begin{verbatim}
670:   PPAR    = +
671:   PUFA    = +
672:   LXR     = +
673:   SREBP   = - 
674:   ACL     = -
675:   ACC     = - 
676:   FAS     = - 
677:   SCD1    = -
678: \end{verbatim}
679: \end{Observations}
680: \end{minipage}
681: }
682: 
683: 
684: In the next section, we propose an efficient representation for such
685: qualitative systems. 
686: 
687: 
688: \section{Analysis of qualitative equations: a new approach}
689: \label{sec:eff_qua}
690: 
691: \subsection{Resolution of qualitative systems}
692: \label{sec:res_qua}
693: 
694: The resolution of (even linear) qualitative systems is a NP-complete
695: problem (see for instance \cite{Trave,Dormoy88}). One can show this by
696: reducing the satisfiability problem for a finite set of clauses to the
697: resolution of a qualitative system
698: in polynomial time. 
699: 
700: Let us consider a collection $C=\{c_1, \ldots , c_n \}$ of clauses on
701: a finite set $V$ of variables. Let $\{ \plus ,\moins ,\indet \}$ a sign qualitative
702: algebra. In order to reduce the satisfiability problem to the
703: resolution of a qualitative system, let us code $true$ into $+$ and
704: $false$ into $-$. If $c$ is a clause, let us denote by $\bar{c}$ the
705: encoding of $c$ in a qualitative algebra formula. The following encoding scheme
706: provides a polynomial procedure to code a clause into a qualitative
707: formula. :
708: $$
709: \begin{array}{ccc}
710: \mbox{clause}& & \mbox{sign algebra}\\
711: \hline
712: a \in V & \rightarrow & \bar{a}\\
713: c_1 \vee c_2  & \rightarrow & \bar{c_1} + \bar{c_2}\\
714: \lnot c & \rightarrow & - \bar{c}
715: \end{array}
716: $$
717: 
718: The satisfiability problem for the set of clauses $C$ is then reduced
719: to finding a solution of the qualitative system:
720: \[
721: \{ \bar{c_i} \approx \plus \ /\ i=1, \ldots , n \}
722: \]
723: So a NP-complete problem can be reduced to the resolution of
724: a qualitative system in polynomial time (with respect to the size of
725: the problem). This shows that solving qualitative systems is a
726: NP-complete problem.
727: For example, the only pair of values which are
728: not solution of $-\bar{a} + \bar{b}\ \approx\ +$ are
729: $(+,-)$. This corresponds to the only pair $(true, false)$ that does not
730: satisfy $\lnot a \vee b$. 
731: 
732: Several heuristics were proposed for the resolution of qualitative
733: systems. For linear systems, set of rules have been
734: designed \cite{Dormoy88}. This set is complete: it allows to find every
735: solution. It is also sound: every solution found by applying these
736: rules is correct. The rules  are based on an adaptation of Gaussian
737: elimination. However only heuristics
738: exist for choosing the equation and the rule to apply on it. In case
739: of a dead-end, when no more rule can apply, it is necessary to backtrack
740: to the last decision made. As a result programs implementing
741: qualitative resolution are not very
742: efficient in general and only problems of small size can be resolved
743: in reasonable time. For that reason we propose an alternate way to
744: solve qualitative systems (linear or not).
745: 
746: 
747: \subsection{Qualitative equation coding}
748: \label{sec:eq_coding}
749: 
750: Our method is based on a coding of qualitative equations as
751: algebraic equations over Galois fields ${\mathbb Z}/p{\mathbb Z}$
752: where $p$ is a prime number greater than 2. The elements of these 
753: fields are the classes
754: modulo $p$ of the integers. If $\bar{x}$ denotes the class of the
755: integer $x$ modulo $p$, a sum and a product are defined on ${\mathbb
756: Z}/p{\mathbb Z}$ as follows:
757: $$ 
758: \bar{x} + \bar{y}  =  \overline{x+y} \qquad \bar{x} \times
759: \bar{y}  =  \overline{x \times y}
760: $$
761: 
762: Galois fields have two basic properties which we use extensively:
763: \begin{itemize}
764: \item Every function $f:({\mathbb Z}/p{\mathbb Z})^n \rightarrow
765:       {\mathbb Z}/p{\mathbb Z}$ with  $n$  
766:   arguments  ${\mathbb Z}/p{\mathbb Z}$ is a polynomial function
767: \item if $\oplus$ denotes the operation $f \oplus g = f^{(p-1)} +
768:       g^{(p-1)}$, then every equation system  $p_1(X)=0, \ldots ,
769:       p_k(X)=0$ has the same solutions than the unique equation $p_1\oplus p_2 \oplus
770:   \ldots \oplus p_k (X) = 0$.
771: \end{itemize}
772: 
773: 
774: 
775: 
776: 
777: The following table specifies how the sign algebra
778: $\{\plus,\moins,\indet\}$ is mapped onto the Galois field with three
779: elements ${\mathbb Z}/3{\mathbb Z}$ is used for that coding. 
780: $$
781: \begin{array}{ccc}
782: \mbox{sign algebra}& & \mathbb{Z}/3{\mathbb Z}\\
783: \hline
784: \plus  & \rightarrow &  1\\
785: \moins & \rightarrow & -1\\
786: \indet & \rightarrow &  0\\
787: \end{array}
788: \qquad \qquad \qquad
789: \begin{array}{ccc}
790: \mbox{sign algebra}& & \mathbb{Z}/3{\mathbb Z}\\
791: \hline
792: e_1 + e_2 & \rightarrow & - \overline{e_1}.\overline{e_2}.(\overline{e_1} + \overline{e_2})\\
793: e_1 \times e_2 & \rightarrow & \overline{e_1}.\overline{e_2}\\
794: e_1 \approx e_2 & \rightarrow & \overline{e_1}.\overline{e_2}.(\overline{e_1} - \overline{e_2})
795: \end{array}
796: $$
797: 
798: Finally a qualitative system $\{ e_1,\dots,e_n \}$ is coded as the
799: polynomial $\overline{e_1} \oplus \dots \oplus \overline{e_n}$.
800: A similar coding for the qualitative
801: algebra $\{\plus,\moins,\zero,\indet\}$ uses the Galois field 
802: ${\mathbb Z}/5{\mathbb Z}$ and will not be presented here.
803: 
804: With this coding, every qualitative system has a solution if and only
805: if the corresponding polynomial has a solution without null
806: component. Null solutions are excluded since $\indet$ solutions are
807: excluded for qualitative systems. In general we will have to add
808: polynomial equations $X^2 = 1$ to insure this.
809: 
810: 
811: \subsection{An efficient representation of polynomial functions}
812: \label{sec:rep_func}
813: 
814: Recall that our purpose is to efficiently solve a NP-complete
815: problem. There is no hope to find a representation of polynomial
816: functions allowing to solve polynomial systems of equations in
817: polynomial time. The coding of a qualitative system as a
818: polynomial equation is obviously polynomial in the size of the system
819: (number of variables plus number of equations). So finding the
820: solution of a polynomial system of equations is itself a NP-complete
821: problem. It is more or less the SAT problem.
822: 
823: Nevertheless, there exists a representation of polynomial functions on
824: Galois fields which gives, in practice, good performances for
825: polynomials with hundreds of variables. This kind of representation
826: was first used for logical functions which may be considered as
827: polynomial functions over the field ${\mathbb Z}/2{\mathbb Z}$. This
828: representation is known as BDD (Binary Decision Diagrams) and is
829: widely used in checking logical circuits \cite{BDD} and in model checkers
830: as nu-SMV \cite{SMV}.
831: 
832: We present here this representation for the field ${\mathbb
833: Z}/3{\mathbb Z}$. Generalizations to other Galois fields 
834: could be treated as well. The starting point is a generalization of Shannon
835: decomposition for logical functions:
836: \[
837: p(X_1,X) = (1-X_1^2)p_{[X_1=0]}(X)+X_1(-X_1-X_1^2)p_{[X_1=1]}(X) +
838: X_1(X_1-X_1^2)p_{[X_1=2]}(X) 
839: \]
840: where $p$ is a polynomial function with $n$ variables. This
841: decomposition leads to a tree representation of the polynomial
842: function: the variable $X_1$ is the root and has three children. Each
843: of these is obtained by instantiating $X_1$ to -1, 0 or 1 in
844: $p(X_1,X)$. This representation is exponential ($3^n$) as each non
845: constant node has 3 children. It also depends on a chosen order on
846: the variables.
847: 
848: Then a key observation (see \cite{BDD}), is that several
849: subtrees are identical. They have the same variable as root variable
850: and isomorphic children. If we decide to represent only once each type
851: of tree, then the tree representation is transformed into  a direct
852: acyclic graph. With this representation there is no more redundancy
853: among subtrees. The result may be a dramatic decrease in the size of
854: the representation of a polynomial function.
855: 
856: \begin{figure*}[hbt]
857: \begin{center}
858: \includegraphics[width=12cm]{tdd.eps}
859: \caption{From tree representation to direct acyclic graph for $X^2
860: (Y+1)$. The tree has 13 nodes while the DAG representing the same
861: function has 5 nodes.}\label{itree-dag}
862: \end{center}
863: \end{figure*}
864: 
865: A property of the Shannon like decomposition is that many
866: operations on polynomial functions are recursive with respect to this
867: decomposition. More precisely let
868: \[
869: p^i(X_1,X) = (1-X_1^2)p^i_0(X)+X_1(-X_1-X_1^2)p^i_1(X) +
870: X_1(X_1-X_1^2)p^i_2(X) 
871: \]
872: $i=1,2$ be two polynomial functions with $p_{\alpha}(X) = p_{[X_1 =
873: \alpha]}(X)\ ,\ \alpha = 0,1,2$. Then for binary operations $\Delta$
874: on polynomial  functions,
875: \[
876: p^1 \Delta p^2 = (1-X_1^2) (p^1_0 \Delta p^2_0 ) + X_1(-X_1-X_1^2)
877: (p^1_1 \Delta p^2_1 )+  X_1(X_1-X_1^2) ( p^1_2 \Delta p^2_2 )
878: \] 
879: This kind of recursive formula leads to an exponential
880: complexity of any computation. Again, it is possible to take advantage
881: of the redundancy by using a cache to remember each operation. This
882: technique is known as
883: memoisation in formal calculus. A  40\% cache hit rate is commonly observed. 
884: 
885: More complex operations on polynomial functions are also implemented
886: with a recursive scheme and memoisation. Let us just mention
887: quantifier elimination as among the most useful for our
888: purpose. 
889: 
890: This representation of polynomial functions on Galois fields has
891: also several drawbacks:
892: \begin{itemize}
893: \item the memory size heavily depends on the order of variables. The
894:       libraries implementing formal computations always have reordering
895:       algorithms. 
896:     \item for each order, there exists polynomial functions which are
897:           exponential in memory size.
898: \end{itemize}
899: 
900: Nevertheless, in practice, this representation has proved to be very
901: efficient for polynomial functions with several hundred of
902: variables. The computations performed on our toy model and on another 
903: real size one
904: used a program named SIGALI which is devoted to polynomial functions on
905: ${\mathbb Z}/3{\mathbb Z}$ representation. Several algorithms were
906: added to this program in order to answer questions of biological
907: interest. 
908: 
909: \section{Qualitative models and experimental data}
910: \label{sec:qua_exp}
911: In this section, we show how to compute some properties of a
912: qualitative system, and eventually get some insights on the biological
913: model it represents. The algorithms we derive heavily rely on the
914: representation introduced above. Hence, not only they can deal in
915: practice with computationally hard problems efficiently, but also they
916: are expressed in a rather simple and generic fashion.
917: 
918: Let $M$ be a qualitative model represented by its associated
919: interaction graph $G(V,E)$. Recall that $V$ is the set of
920: variables. Let $V_O$ be the set of observed variables, and 
921: $o_i \in \{ \plus,\moins \}$ for $i \in V_O$ the experimental
922: observations. As explained in the previous section, the 
923: qualitative system derived from $M$ can be coded as a polynomial
924: function $P_M(X_1,\ldots,X_n)$.  Roots of $P_M$ correspond to
925: solutions of the qualitative system. 
926: 
927: \subsection{Satisfiability of the qualitative system}
928: \label{sec:data_agree}
929: 
930: A property of the coding described above, is that the system has no
931: solution iff $P_M$ is equal to the constant polynomial
932: $1$. Alternatively if $P_M=0$, the qualitative equations do not
933: constraint the variables at all.
934:  
935: Now if some observations $o_i$ for $i \in V_O$ are available, checking
936: their consistency with the model $M$ boils down to instantiating 
937: $X_i = o_i$  in $P_M(X_1,\ldots,X_n)$, for all $i \in V_O$, and
938: testing whether the resulting polynomial is different from 1.
939: 
940: We computed the polynomial $P_L$ associated to our toy example (see section
941: \ref{sec:we}) and it has roots. Recall that it does not guarantee the
942: existence of some (quantitative) differential model conforming to the
943: interaction graph depicted in Fig. \ref{igraph2}. Satisfiability of
944: the qualitative system is only a necessary condition for the model to
945: be correct. 
946: 
947: 
948: %% As explained in section \ref{sec:we}, our toy example
949: %% focus on a fasting experiment. From the literature, we derive the
950: %% following hypothetical:
951: 
952: %% \begin{Observations}
953: %% \begin{verbatim}
954: %% ACL   = -1                      FAS  = -1
955: %% ACC   = -1                      SCD1 = -1
956: %% SREBP = -1                      PPAR =  1
957: %% LXR   =  1                      PUFA =  1
958: %% \end{verbatim}
959: %% \end{Observations}
960: 
961: The polynomial obtained by instantiating observations into $P_L$
962: is different from 1, meaning that our model does not contradict
963: generally observed variations during fasting.
964: 
965: Large size models might advantageously be reduced
966: using standard graph techniques.
967: First we look for connected components in the interaction graph. A
968: graph with several
969: connected components represents a coherent qualitative model iff 
970: each component is coherent. 
971: Second, a node without successor except itself appears only
972: in its associated equation. If this node is not observed, its
973: associated qualitative equation adds no constraint on the other
974: nodes. So, at least for satisfiability checking, this node can be
975: suppressed and its qualitative equation removed from the system. This
976: procedure is applied iteratively, until no node can be deleted.
977: The resulting graph leads to a new qualitative system which is 
978: satisfiable iff the initial system is satisfiable. 
979: 
980: 
981: \subsection{Correcting data or model}
982: 
983: 
984: If the qualitative system, given some experimental observations, is
985: found to have no solution, it is of interest to propose some
986: correction of the data and/or the model. By correction, we mean
987: inverting the sign of an observed variable or the sign of an edge of
988: the interaction graph. In
989: the general case, there are several possibilities to make the system
990: satisfiable, and we need some criterion to choose among them. We
991: applied a parsimony principle: a correction of the data should imply a
992: minimal number of sign inversions.
993: 
994: In the following, we show how to compute all minimal corrections for the
995: data. Given $(o_i)_{i \in V_O}$ a vector of experimental observations
996: which is not compatible with the model, we compute all 
997: $(o'_i)_{i \in V_O}$ vectors which are compatible with the data and
998: such that the Hamming distance between $o$ and $o'$ is minimal. By
999: Hamming distance, we mean the number of differences between $o$ and
1000: $o'$. The set of such $o'$ vectors might be very large; but again, by 
1001: encoding it as the set of roots of a polynomial function,  we obtain a 
1002: compact representation.
1003: 
1004: 
1005: This procedure can be extended in a straightforward manner to
1006: corrections of edges sign in the interaction graph. This is done by
1007: considering these signs as variables of the model. For ease of
1008: presentation, we only detail data correction. 
1009: 
1010: 
1011: 
1012: \begin{algorithm}
1013: {
1014:   \small
1015:   \dontprintsemicolon
1016:   \KwIn{\\
1017:     \qquad $P$, a polynomial function on variables $V$ \\
1018:     \qquad $i \in V$ 
1019:   }
1020:   \KwOut{\\
1021:     \qquad $C$, a polynomial function encoding all minimal corrections\\
1022:     \qquad $d$, minimal number of corrections
1023:   }
1024:   \BlankLine
1025:   \eIf{$P$ is constant}{
1026:     \eIf{$P$ = 0}
1027:    {\KwResult{$C=0$, $d=0$}}
1028:    {\KwResult{$C=1$, $d= \infty$}}
1029:   }{
1030:     let $P_0$, $P_1$, $P_2$ be the Shannon decomposition of $P$ with respect to
1031:     variable $X_i$,\\
1032:     and $(C_j,d_j)$ the result obtained by recursively applying the
1033:     algorithm on $P_j$ and $i+1$ for $j \in \{0,1,2\}$
1034:     \BlankLine
1035:     let $d'_j = 
1036:     \left\{
1037:     \begin{array}{ll}
1038:       d_j + 1 & \mbox{if } i \in V_O \mbox{ and } o_i \neq j\\
1039:       d_j     & \mbox{otherwise}
1040:     \end{array}
1041:     \right.$
1042:      and $C'_j = 
1043:     \left\{
1044:     \begin{array}{ll}
1045:       (X_i - j) \oplus C_j & \mbox{if } i \in V_O \\
1046:       C_j     & \mbox{otherwise}
1047:     \end{array}
1048:     \right.$\\
1049:      \BlankLine
1050:     
1051:      \KwResult{$d = \min\ d'_j$, $C = \displaystyle\prod_{j,\ d'_j = d} C'_j$}
1052:   }
1053: }
1054: \caption{Algorithm for experimental data correction.}
1055: \label{algo:hamming}
1056: \end{algorithm}
1057: 
1058: Let us illustrate this algorithm on our toy example: during fasting
1059: experiments, synthesis of fatty acids tends to be inhibited, while
1060: oxidation, which produces ATP, is activated. In particular ACC, ACL,
1061: FAS and SCD1 are implied in the same pathway to produce saturated and
1062: monounsaturated fatty acids. Expectedly, they are known to decline
1063: together at fasting. Suppose we introduce some wrong observation, say
1064: for instance an increase of ACL, while keeping all other
1065: observations given above. The polynomial obtained from $P_L$ including
1066: these new observations is equal to 1, and hence has no solution.
1067: Applying algorithm \ref{algo:hamming}, we recover this error. Now if
1068: we wrongly change two values, say ACL and ACC to 1, the algorithm
1069: proposes a different correction, namely to change the observed value
1070: of SREBP to 1, which is more parsimonious.
1071: 
1072: \subsection{Experiment design}
1073: \label{sec:exp_des}
1074: 
1075: It is often the case that not all variables in the system under study
1076: can be observed. Biochemical measurements of metabolites can be costly
1077: and/or time consuming. By experiment design, we mean here the choice
1078: of the variables to observe so that an experiment might be
1079: informative. 
1080: 
1081: Let $P_M(X_O,X_U)$ be the polynomial function coding for the
1082: qualitative system $M$. $X_O$ (resp. $X_U$) denotes the state vector
1083: of observed (resp. unobserved) variables. The polynomial function
1084: representing the admissible values of the observed variables is
1085: obtained by elimination of the quantifier in 
1086: $\exists X_U \  P_M(X_O,X_U)$. Let $P_M^O(X_O)$ denote the  resulting
1087: polynomial function.
1088: 
1089: For some choice of observed variables, it might well be that $P_M^O$
1090: is null, which basically means that the experiment is totally useless.
1091: Remark that no improvement can be found by taking a subset of
1092: $X_O$  The solution is either to add new observed variables or to
1093: chose a completely different set of observed variables.
1094: 
1095: In order to assess the relevance of a given experiment (namely of a
1096: given observed subset), we suggest to compute the following ratio:
1097: number of consistent valuations for observed variables versus the
1098: total number
1099: of valuations of observed variables. A very stringent experiment
1100: has a low ratio. An experiment having a ratio value of one is useless.
1101: 
1102: Again this computation is carried out in a recursive fashion. Let $P$
1103: be a polynomial function representing the
1104: set of admissible observed values. Let $Rat(p)$ the percentage of
1105: solutions of $P(X)=0$ in the
1106: space $({\mathbb Z}/p{\mathbb Z})^n $, where $n$ is the number of
1107: variables $X$. If $P$ is constant then $Rat(P)=1$
1108: (resp. $Rat(P)=0$) if $P=0$ (resp. $P \neq 0$). Else, let $P_1$,
1109: $P_2$, $P_3$ be a Shannon like decomposition of $P(X)$
1110: with respect to some variable of $P$. Then it is easy to prove:
1111: \[
1112: Rat(P)\ =\ (Rat(P_0)+Rat(P_1)+Rat(P_2))/3
1113: \]
1114: 
1115: The relevance of this approach was assessed on our toy example: for
1116: each subset $O$ of variables in the model, containing at most four
1117: variables, we computed $Rat(P_L^O)$. Expectedly, the lowest ratios
1118: (i.e. the most stringent experiments) were achieved observing four
1119: variables: either \{SCAP, PUFA, PPAR-a, PPAR\}, or
1120: \{SREBP, SCAP, PUFA, LXR-a\}, or \{SREBP, PPAR-a, PPAR, LXR-a\}.
1121: 
1122: Interestingly, the procedure captures what might be though of as
1123: control variables, like PUFA/SCAP, SREBP/LXR-a and PPAR/PPAR-a.
1124: The first two pairs control the activation of fatty acids synthesis;
1125: the third one controls fatty acid oxidation.
1126: 
1127: Indeed one can go even further: if we isolate some kind of control
1128: variables, we are naturally interested in knowing how they constrain
1129: other variables. Achieving this amounts to computing the set of
1130: variables which value is constant for all solutions of the system (the
1131: so called hard components). Recall that
1132: these hard components of qualitative solutions are also important with
1133: respect to the hypothetical differential model which is abstracted in
1134: the qualitative one. Indeed, all solutions of the quantitative equation for
1135: equilibrium change have the same sign pattern on the hard components. 
1136: Algorithm \ref{algo:hardcomponents} describes a recursive procedure
1137: which finds the set of hard components, together with their value. 
1138: 
1139: \begin{algorithm}
1140: {
1141:   \small
1142:   \dontprintsemicolon
1143:   \KwIn{$P$, a polynomial function on variables $V$}
1144:   \KwOut{\\
1145:     \qquad the set $W \subset V \times \{0,1,2\}$ of hard components,
1146:     together with their values\\
1147:     \qquad a boolean $b$ which is true if $P$ has at least one root
1148:   }
1149:   \BlankLine
1150:   \eIf{$P$ is constant}{
1151:     \eIf{$P$ = 0}
1152:    {\Return{$(\emptyset,\mbox{true})$}}
1153:    {\Return{$(\emptyset,\mbox{false})$}}
1154:   }{
1155:     let $P_0$, $P_1$, $P_2$ be the Shannon decomposition of $P$ with respect to
1156:     variable $X_i$,\\
1157:     and $(W_j,b_j)$ the result obtained by recursively applying the
1158:     algorithm on $P_j$ for $j \in \{0,1,2\}$
1159:     \BlankLine
1160:     let $W = \{ (v,v') | v \in V,\ v' \in \{0,1,2\},\ \forall j\ b_j \Rightarrow (v,v') \in W_j \}$\\
1161:     \If{there exists a unique $j_0$ s.t. $b_{j_0}$ is true}
1162:        {add $(i,j_0)$ to $W$}
1163:     \BlankLine
1164:     \Return{$(W,b_0 \vee b_1 \vee b_2)$}
1165:   }
1166: }
1167: \caption{Determination of hard components}
1168: \label{algo:hardcomponents}
1169: \end{algorithm}
1170: 
1171: Let us set some of our previously found control variables of the toy example,
1172: to a given value, say PUFA to 1, and LXR to -1. Then applying the
1173: algorithm \ref{algo:hardcomponents}, the corresponding polynomial has the
1174: following hard components: 
1175: \begin{verbatim}
1176: ACL     = -1             FAS   = -1
1177: ACC     = -1             LXR-a = -1
1178: SCAP    = -1             SREBP = -1
1179: SREBP-a = -1             PPAR  = -1
1180: PPAR-a  = -1
1181: \end{verbatim}
1182: which expectedly corresponds to the inhibition of fatty acids
1183: synthesis.
1184: 
1185: 
1186: 
1187: 
1188: \subsection{Real size system}
1189:  We have used our new technique to check the consistency of a
1190: database of molecular interactions involved in the genetic regulation of fatty
1191: acid synthesis.
1192: In the database, interactions were classified as behavioral or biochemical. 
1193: \begin{itemize}
1194: \item a behavioral interaction describes the effects of a variation of
1195:       a product concentration. It is either direct or indirect (unknown
1196:       mechanism).
1197:     \item a biochemical interaction may be a gene transcription, a
1198:           reaction catalyzed by an enzyme ... Such molecular
1199:           interactions can be found in existing databases. They need
1200:           a behavioral interpretation.
1201: \end{itemize}
1202:  All the behavioral interactions were manually extracted from a 
1203: selection of scientific papers. Biochemical interactions were
1204: extracted from  public databases available on the Web 
1205: (Bind~\cite{Bind}, IntAct~\cite{IntAct}, Amaze~\cite{Amaze},
1206: KEGG~\cite{Kegg} or TransPath~\cite{TransPath}).
1207: A biochemical interaction may be linked to a behavioral interpretation
1208: in the database.
1209: 
1210: The database is used to generate the interaction graph. While
1211: behavioral interactions directly correspond to edges in the graph, 
1212: biochemical interactions are given a simplified
1213: interpretation. Roughly, any increase of a reaction input induces an
1214: increase of the outputs.
1215: 
1216: The interaction graph which is built from the database contains more than 600
1217: vertices and more than 1400 edges. It is clear that even though, the
1218: obtained graph is not a comprehensive model of genetic
1219: regulation of fatty acid synthesis in liver. Anyway our aim is to see
1220: how far this model can account for experimental observations, and
1221: propose some corrections when it cannot.
1222: 
1223: We used our technique to check the coherence of the whole model. After
1224: reducing the graph with standard graph techniques as described in
1225: section \ref{sec:data_agree}, we found that the model was incoherent. The reduced
1226: graph has about 150 nodes. We
1227: developed a heuristic to isolate minimal incoherent sub-systems. It
1228: turned out that all the contradictions we detected resulted from arguable
1229: interpretations of the literature.
1230: 
1231:   
1232: \section{Conclusion}
1233: \label{sec:conclusion}
1234: 
1235: In this paper we proposed a qualitative approach for the analysis
1236: of large biological systems. We rely on a framework more thoroughly
1237: described in \cite{Biosystems}, which is meant to model the comparison
1238: between two experimental conditions as a steady state shift. This
1239: approach fits well with
1240: state of the art biological measurement techniques, which provide
1241: rather noisy data for a large amount of targets. It is also 
1242: well suited to the use of biological knowledge, which is most of the time 
1243: descriptive and qualitative. 
1244: 
1245: This qualitative approach is all the more attractive that we can rely
1246: on new analysis methods for qualitative systems. This new technique is
1247: also introduced in this paper and is original in
1248: qualitative modeling. It relies on a representation of qualitative
1249: constraints by decision diagrams. Not only this has a major impact on
1250: the scalability of qualitative reasoning, but it also permits to
1251: derive many algorithms in a quite generic fashion.
1252: 
1253: We plan to validate our approach on pathways which are published for
1254: yeast and \emph{E.Coli}. Not only this pathways are of significant size but
1255: microarray data for this species are publicly available. Concerning
1256: the scalability of the methods, qualitative systems with up to 200
1257: variables are handled within a few minutes.
1258: 
1259: 
1260: On the theoretical side, we study applications of our algebraic
1261: techniques to network reconstruction, as proposed in \cite{Wagner04}.
1262: The problem is to infer direct actions between products, based on
1263: large scale perturbation data, in order to obtain the most
1264: parsimonious interaction graph. Our approach could lead to a
1265: reformulation of this problem in terms of polynomial operations. 
1266: Indeed, finding a minimal regulation network from a minimal polynomial
1267: representation has already been described in
1268: \cite{Laubenbacher04}, though it was applied to a rather different
1269: type of network. A similar approach tailored to the framework
1270: described in this paper could eventually lead to original and
1271: practical algorithms for network reconstruction.
1272: 
1273: \paragraph{Acknowledgment} This research was supported by ACI IMPBio, a French Ministry for Research program on interdisciplinarity. 
1274:  
1275: 
1276: \bibliographystyle{plain}
1277: \bibliography{eccs}
1278: \end{document}
1279: