1: \documentclass[11pt,a4paper]{article}
2: \usepackage{amsfonts,amssymb,latexsym, amsmath,epsfig,theorem}
3: \usepackage[latin1]{inputenc}
4: \usepackage{fullpage}
5: \usepackage{algorithm2e}
6: %\usepackage{multicol}
7:
8: % \usepackage{times}
9: \usepackage{mathptmx}
10: \usepackage{textcomp}
11:
12: \newcommand{\rem}{\bf \em}
13: \newcommand{\vect}[1]{{\bf #1}}
14: \newcommand{\ord}[1]{\ensuremath{\mathcal{O}\!\left(#1 \right)} }
15: \newcommand{\DP}[2]{ \ensuremath{ \frac{\partial #1 }{\partial #2 } } }
16: \newcommand{\D}[2]{ \ensuremath{ \frac{d #1 }{d #2 } }}
17: \newcommand{\correction}[1]{ {\bf #1}}
18: \newcommand{\comment}[1]{ \begin{center} \fbox{\begin{minipage}[h]{.9\linewidth} #1 \end{minipage}} \end{center}}
19:
20: \newcommand{\noun}[1]{\textsc{#1}}
21: %% Bold symbol macro for standard LaTeX users
22: \providecommand\boldsymbol[1]{\mbox{\boldmath $#1$}}
23:
24: \newcommand\BS[1]{\mbox{\fontseries{b}\selectfont #1}}
25: \newcommand\plus{\BS{+}}
26: \newcommand\moins{\BS{\textminus}}
27: \newcommand\plusmoins{\BS{\textpm}}
28: \newcommand\indet{\BS{?}}
29: \newcommand\zero{\BS{0}}
30:
31: \newcommand\DEF{\stackrel{\text{\tiny def}}{=}}
32: \newcommand\actif[1]{\ensuremath{\overline{\text{#1}}}}
33:
34:
35: \theoremstyle{break}\newtheorem{Theorem}{Theorem}[section]
36: \theoremstyle{break}\newtheorem{Prop}[Theorem]{Property}
37: \theoremstyle{break}\newtheorem{Lemma}[Theorem]{Lemma}
38: \theoremstyle{break}\newtheorem{Def}[Theorem]{Definition}
39:
40: \theoremstyle{break}\newtheorem{System}{System}
41: \theoremstyle{break}\newtheorem{Data}{Data}
42: \theoremstyle{break}\newtheorem{Observations}{Observations}
43:
44: \DeclareMathOperator{\pred}{pred}
45: \DeclareMathOperator{\sgn}{sgn}
46:
47: % ************ Brouillon *************
48: %\setlength\overfullrule{5pt}
49: %\newcommand\XXX[1]{[XXX #1]}
50:
51: %\pagestyle{fancy}
52: %\lhead{}
53: %\chead{}
54: %\rhead{}
55: %\renewcommand\headrulewidth{0pt}
56: %\renewcommand\footrulewidth{0pt}
57: %\addtolength\footskip{\baselineskip}
58: %\addtolength\footskip{\baselineskip}
59: %\lfoot{Soumission à \textsc{Jobim}}
60: %\cfoot{\today}
61: %\rfoot{\thepage}
62:
63: \begin{document}
64:
65: \title{Complex Qualitative Models in Biology: a new approach}
66:
67:
68: \author{ P. \noun{Veber}$\ ^1$
69: \and M. \noun{Le Borgne}$\ ^1$ \and A. \noun{Siegel}$\ ^1$
70: \and S. \noun{Lagarrigue}$\ ^3$ \and
71: O. \noun{Radulescu}$\ ^2$ }
72:
73: \date{}
74: %\affiliation{$^{*}$ ENSA Rennes\\
75: %$^{\dag}$ UMR de génétique animale, INRA, Rennes\\
76: %$^{\ddag}$ Projet Symbiose, IRISA, Rennes\\
77: %$^{\mathsection}$ IRMAR UMR-CNRS 6625, Rennes\\
78: %$^{**}$ INSERM U620, Rennes}
79:
80:
81: \maketitle
82:
83:
84:
85: {\footnotesize $^1$ Projet Symbiose. Institut de Recherche en
86: Informatique et Syst\`emes Al\'eatoires, IRISA-CNRS 6074-Université de
87: Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France}
88:
89: {\footnotesize $^2$ Institut de Recherche
90: Math\'ematique de Rennes, UMR-CNRS 6625, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France}
91:
92: {\footnotesize $^3$ UMR Génétique animale, Agrocampus Rennes-INRA, 65 rue de Saint-Brieuc, CS 84215 Rennes, France}
93:
94: \smallskip
95: \paragraph{Abstract.}
96: %\abstract{
97: {\em
98: We advocate the use of qualitative models in the analysis of large biological
99: systems. We show how qualitative
100: models are linked to theoretical differential models and practical
101: graphical models of biological networks. A new technique for analyzing qualitative models is
102: introduced, which is based on an efficient representation of
103: qualitative systems. As shown through several applications, this
104: representation is a relevant tool for the understanding and testing of
105: large and complex biological networks.
106: }
107:
108:
109:
110: \section{Introduction}
111: \label{sec:intro}
112:
113: Understanding the behavior of a biological system from the interplay
114: of its molecular components is a particularly difficult task. A
115: model-based approach proposes a framework to express some hypotheses
116: about a system and make some predictions out of it, in order to compare
117: with experimental observations. Traditional approaches (see
118: \cite{DJ02} for
119: an interesting review) include ordinary differential equations or
120: stochastic processes. While they are powerful tools to acquire a fine
121: grained knowledge of the system at hand, these frameworks need
122: accurate experimental data on chemical reactions kinetics, which are
123: scarcely available. Furthermore, they also are computationally
124: demanding and their practical use is restricted to a limited number of
125: variables.
126:
127: As an answer to these issues, many approaches were proposed, that
128: abstract from quantitative details of the system. Among others, let us
129: stress the work done on gene regulation dynamics \cite{DJGHP04+}, hybrid
130: systems \cite{GT04} or discrete event systems \cite{CRCDF04+},
131: \cite{CRRT04}. The goal of such
132: qualitative frameworks is to enable system-level analysis of a
133: biological phenomenon. This appears as a relevant answer to recent
134: technical breakthrough in experimental biology:
135: \begin{itemize}
136: \item microarrays, mass spectrometry, protein chips currently allow to
137: measure thousands of variables simultaneously,
138: \item obtained measurements are rather noisy, and may not be
139: quantitatively reliable.
140: \end{itemize}
141:
142: Microarrays for instance, are used for comparing the activity of genes
143: between two experimental settings. A microarray experiment gives
144: differential measure between two experimental settings.
145: It delivers informations on the relative activity of each
146: gene represented on the array. Despite many attempts made to
147: quantified the output of microarrays, the essential output of the
148: technique says, for example, that a gene
149: G is more active in situation A than in situation B.
150:
151: In this paper, we use a framework developed in \cite{Biosystems} for
152: the comparison of two experimental conditions, in order to derive
153: qualitative constraints on the possible variations of the
154: variables. Our main contribution is the use of an efficient
155: representation for the set of solutions of a qualitative system.
156: This representation allows to solve systems with
157: hundreds of variables. Moreover, this representation opens the way to finer
158: analysis of qualitative systems. This new approach is
159: illustrated by solving three important problems:
160: \begin{itemize}
161: \item checking the accordance of a qualitative system with qualitative
162: experimental data.
163: \item minimally correcting corrupted data in discordance with a
164: model
165: \item helping in the design of experiments
166: \end{itemize}
167:
168: Our main focus here is to show how to use large qualitative models and
169: qualitative interpretations of experimental data. In this respect our
170: work could be used as an extension to what was proposed in \cite{GRRLH03+},
171: where basically the authors propose to analyze pangenomic gene
172: expression arrays in \emph{E.coli}, using simple qualitative rules.
173:
174: In the first section we establish links between differential,
175: graphical and qualitative models.
176:
177: \section{Mathematical modeling}
178: \label{sec:maths} In this section we show how qualitative models can
179: be linked to more traditional differential models. Differential
180: models are central to the theory of metabolic control
181: \cite{meta-control,meta-control2}. They also have been
182: applied to various aspects of gene networks dynamics.
183: %(cycles (TODO citer
184: %tyson, goldbeter), switches (TODO citer hasty),etc.).
185: The purpose of this section is to lay down a set of qualitative
186: equations describing steady states shifts of differential models.
187: For the sake of completeness, we rederive in a simpler case results
188: that have been established in greater generality in
189: \cite{Biosystems,Radulescu05}.
190:
191: \subsection{Modeling assumptions}
192:
193: Let us consider a network of interacting cellular constituents,
194: numbered from 1 to $n$. These constituents may be proteins, RNA
195: transcripts or metabolites for instance. The state vector $X$
196: denotes the concentration of each constituent.
197:
198:
199: \paragraph{Differential dynamics}
200:
201: $X$ is assumed to evolve according to the following differential
202: equation:
203:
204: $$ \D{X}{t}=F(X) $$
205:
206: \noindent where $F$ is an (unknown) nonlinear, differentiable function. A
207: steady state $X_{eq}$ of the system is a solution of the algebraic
208: equation:
209:
210: $$ F(X_{eq}) = 0.$$
211:
212:
213: Steady states are asymptotically stable if they attract all nearby
214: trajectories. A steady state is non-degenerated if the Jacobian
215: calculated in that steady state is non-vanishing.
216: According to the Grobman-Hartman theorem, a sufficient condition to have
217: nondegenerated asymptotically stable
218: steady states is
219: $Re (\lambda_i) < -C, C>0, i=1,\ldots,n$, where $\lambda_i$ are the
220: eigenvalues of the Jacobian matrix calculated at the steady state.
221:
222:
223:
224:
225: \paragraph{Experiment modeling}
226: Typical two state experiments such as differential microarrays are
227: modeled as steady state shifts. We suppose that under a change of
228: the control parameters in the experiment, the system goes from one
229: non-degenerated stable steady state to another one.
230: The output of the two state experiment can be expressed in terms of
231: concentration variations for a subset of products, between the two
232: states. We suppose that the signs of these variations were proven to
233: be statistically significant.
234:
235:
236: \paragraph{Interaction graph}
237: The only knowledge we require about the function $F$ concerns the
238: signs of the derivatives $\DP{F_i}{X_j}$. These are interpreted as
239: the action of the product $j$ on the product $i$. It is an
240: activation if the sign is $\plus$, an inhibition if the sign is
241: $\moins$. A null value means no action.
242:
243: An interaction graph $G(V,E)$ is derived from the Jacobian matrix
244: of $F$:
245: \begin{itemize}
246: \item with nodes $V = \{1,\dots,n\}$ corresponding to products
247: \item and (oriented) edges $E = \{ (j,i) | \DP{F_i}{X_j} \neq 0
248: \}$. Edges are labeled by $s(j,i) = \sgn(\DP{F_i}{X_j})$.
249: \end{itemize}
250:
251: The set of predecessors of a node $i$ in $G$ is denoted $\pred(i)$. The
252: interaction graph is actually built from informations gathered
253: in the literature. In consequence in some places it may be
254: incomplete (some interactions may be missing), in others it may be
255: redundant (some interactions may appear several times as direct and
256: indirect interactions). It is an important issue that neither
257: incompleteness nor redundancy do not introduce inconsistencies and
258: this will be addressed in section \ref{sec:qua_exp}.
259:
260:
261:
262: \paragraph{Negative diagonal in the Jacobian matrix}
263: For any product $i$, we exclude the possibility of vanishing
264: diagonal elements of the Jacobian $\DP{F_i}{X_i}$. This can be
265: justified by taking into account degradation and dilution (cell
266: growth) processes that can be represented as negative self-loops in
267: the interaction graph, that is for all $i$, $(i,i) \in E$ and
268: $s(i,i) = \moins$.
269:
270:
271:
272:
273: \paragraph{Discussion}
274: In our mathematical modeling we suppose that the system
275: starts and ends in non-degenerated stable steady states. Of course
276: this is not always the case for several reasons: the waiting time to
277: reach steady state is too big; one can end up in a limit cycle and
278: oscillate instead of reaching a steady state.
279: %(TODO citer quelques
280: %exemples P53,NFKB, circadian clocks).
281: All these possibilities should
282: be considered with caution. Actually this hypothesis
283: might be difficult to check from the two states only. Complementary
284: strategies such as time series analysis
285: %(TODO citer Hoffmann et Uri Alon)
286: could be employed in order to assess the possibility of limit cycle
287: oscillations.
288:
289: Positive self-regulation is also possible but introduces
290: a supplementary complication.
291: In this case
292: for certain values of the concentrations
293: degradation exactly compensates the positive self-regulation and
294: the diagonal elements of the Jacobian vanish (this is
295: a consequence of the intermediate value theorem).
296: We can avoid dealing with this situation by
297: considering that the positive self-regulation does not act directly
298: and that it involves intermediate species.
299: This is a realistic assumption because a molecule never really acts directly
300: on itself (transcripts can be auto-regulated but only via protein
301: products).
302: Thus, all nodes can keep their negative self-loops and all diagonal
303: elements of the Jacobian can be considered to be non-vanishing.
304: Although the positive regulation may imply vanishing higher order
305: minors of the Jacobian, this will not affect our local qualitative
306: equations.
307:
308:
309:
310:
311:
312: \subsection{Quantitative variation of one variable}
313: We focus
314: here on the variation of the concentration of a single chemical
315: species represented by a component $X_i$ of the vector $X$. Since we
316: have adopted a {\em static} point of view, we are only interested in
317: the variation of $X_i$ between two non-degenerated stable steady
318: states $X_{eq}^1$ and $X_{eq}^2$ independently of the trajectory of
319: the dynamical system between the two states.
320:
321: Let us denote by $\hat{X}_i$ the vector of dimension $n_i$ obtained
322: by keeping from $X$ all coordinates $j$ that are predecessors of $i$
323: in the interaction graph. Then, under some additional assumptions
324: described and discussed in \cite{Radulescu05}, we have the following
325: result:
326:
327: \begin{Theorem}
328: \label{math:var}
329: The variation of the concentration
330: of species $i$ between two non-degenerated steady states $X_{eq}^1$
331: and $X_{eq}^2$ is given by
332: \begin{equation}
333: X_{eq_i}^1 - X_{eq_i}^2 = \int_S -\left( \DP{F_i}{X_i} \right)^{-1}
334: \sum_{k \in \pred(i)} \DP{F_i}{X_k} d X_k
335: \label{math:eq:var}
336: \end{equation}
337: where $S$ is the segment linking
338: $\hat{X}^1_{eq_i}$ to $\hat{X}^2_{eq_i}$.
339: \end{Theorem}
340:
341: Full proof is given in \cite{Radulescu05}. The above formula is a
342: quantitative relation between the variation of concentrations and the
343: derivatives $\DP{F_i}{X_j}$. Now our next move will be to introduce a
344: qualitative abstraction of this relation.
345:
346:
347:
348: %% Any influence on $\hat{i}$ comes
349: %% exclusively from the same set of predecessors, for all concentration
350: %% vector $X$ in a domain $D$ including the two steady states.
351: %% Therefore, in $D$ the function $F_i$ depends on $X_i, \hat{X}_i$
352: %% only. The fundamental hypothesis $\DP{F_i}{X_i} <0$ implies that the
353: %% function $X_i \rightarrow F_i(X_i, \hat{X}_i)$ is strictly
354: %% decreasing. Consequently, for each $\hat{X}_i$ the equation
355: %% $F_i(X_i, \hat{X}_i)=0$ defining steady states has at most one
356: %% solution. The set $F_i(X_i, \hat{X}_i)=0$ represents a $n_i$
357: %% dimensional
358: %% manifold $M_i \subset D$, including the two states. Using a
359: %% global implicit function theorem, one shows that the
360: %% set $F_i(X_i, \hat{X}_i)=0$ can be represented as the graph of a
361: %% function $X_i = \Phi_i (\hat{X}_i )$ in $D$ (see \cite{eccb} for
362: %% detailed hypotheses and proof).
363:
364:
365: %% The differential of $\Phi_i$ is given by the implicit function
366: %% theorem:
367:
368: %% \begin{equation}
369: %% \label{math:diffphi}
370: %% d \Phi_i = -\left( \DP{F_i}{X_i} \right)^{-1} \left( \sum_{k \in
371: %% \pred(i)} \DP{F_i}{X_k} d X_k \right)
372: %% \end{equation}
373:
374: %% In order to obtain the
375: %% variations of $X_i$ between the two states, one can simply integrate
376: %% the differential of $\Phi_i$ along any smooth path included in
377: %% $M_i$.
378:
379:
380:
381: %Concentrations are always positive and bounded. We can assume that
382: %they evolve in a closed bounded hypercube and that $\DP{F_i}{X_i}
383: %<K<0$ on this domain.
384:
385:
386:
387:
388:
389:
390:
391: %For small perturbations of the parameters, the variation of
392: %concentration of a node $i$ can be approximated by linearizing
393: %Eq. \ref{math:steadystate}:
394:
395: %\begin{equation}
396: %\sum_{j\in V} \DP{F_i}{X_j} \delta X_j + \sum_{k} \DP{F_i}{P_k}
397: %\delta P_k = 0.
398: %\end{equation}
399:
400: %Additionaly, we assume that parameters do
401: %not directly influence all variables. Mathematically we assume
402: %$\DP{F_i}{P_k} =0$, except for input nodes. In this case the sum
403: %$\sum_{k} \DP{F_i}{P_k}\delta P_k$ represents the effect of the
404: %experimental settings on the system.
405: %Using the above assumption that $\DP{F_i}{X_i} < 0$ we obtain:
406: %\begin{equation}
407: %\label{math:quant}
408: % \delta X_i = -\left( \DP{F_i}{X_i} \right)^{-1} \sum_{k \in
409: %\pred(i)} \DP{F_i}{X_k} \delta X_k
410: %\end{equation}
411:
412: %As mentioned earlier, quantitative information on the Jacobian,
413: %or on variations $\delta X_k$ is generally not available.
414: %Nevertheless, the sign of these values is known, most often, and can
415: %be used for reasoning.
416:
417:
418:
419: \subsection{Qualitative equations}
420: We propose here to study Eq. \ref{math:eq:var} in sign algebra.
421: By sign algebra, we mean the set $\{\plus,\moins, \indet \}$, where
422: $\indet$ represents undetermined sign. This set is provided with the
423: natural commutative operations:
424: \[
425: \begin{array}[c]{llllll}
426: \plus+\moins = \indet & \plus+\plus = \plus &\moins+\moins = \moins &
427: \plus\times\moins = \moins & \plus\times\plus = \plus & \moins\times\moins = \plus \\
428:
429: \indet+\moins = \indet & \indet+\plus = \indet & \indet + \indet = \indet &
430: \indet \times \moins = \indet & \indet \times \plus = \indet & \indet
431: \times \indet = \indet \\
432: \end{array}
433: \]
434:
435: Equality in sign algebra $\approx$ is defined as follows:
436: \[
437: \begin{array}{c|c|c|c}
438: \approx & \plus & \moins & \indet \\
439: \hline
440: \plus & T & F & T \\
441: \hline
442: \moins & F & T & T \\
443: \hline
444: \indet & T & T & T \\
445: \end{array}
446: \]
447:
448:
449: Importantly, qualitative equality is not an equivalence relation,
450: since it is not
451: transitive. This implies that computations in qualitative algebra must
452: be carried with care. At least two major properties should be
453: emphasized:
454: \begin{itemize}
455: \item if a term of a sum is indeterminate ($\indet$) then the whole
456: sum is indeterminate.
457: \item if one hand of a qualitative equality is indeterminate, then
458: the equality is satisfied whatever the value of the other hand is.
459: \end{itemize}
460:
461: A \emph{qualitative system} is a set of algebraic equations with
462: variables in $\{\plus,\moins, \indet \}$. A \emph{solution} of this
463: system is a valuation of the unknowns which satisfies each equation,
464: and such that no variable is instantiated to \indet. This
465: last requirement is important since otherwise any system would have
466: trivial solutions (like all variables to \indet).
467:
468:
469: %% Eq \ref{math:diffphi} is established for small changes in the
470: %% parameters. However, Eq \ref{math:var} shows that the sign of $X_i$
471: %% variation can be deduced from the sign of the variations of the other
472: %% variables between two equilibrium points:
473:
474:
475: \begin{Theorem}
476: Under the assumptions and notations of Theorem \ref{math:var}, if the
477: sign of
478: $\DP{F_i}{X_j}$ is constant, then the following relation holds in sign
479: algebra:
480: \begin{equation}
481: s(\Delta X_i) \approx
482: \sum_{k \in pred(i)} s(k,i) s(\Delta X_k) \label{math:qual}
483: \end{equation}
484: where $s(\Delta X_k)$ denotes the sign of $X_{eq_k}^1 - X_{eq_k}^2$.
485: \end{Theorem}
486:
487: By writing Eq. \ref{math:qual} for all nodes in the graph, we
488: obtain a system of equations on signs of variations, later referred to as
489: \emph{qualitative system} associated to the interaction graph $G$. This will
490: be used extensively in the next sections.
491:
492:
493:
494: \subsection{Link between qualitative and quantitative}
495: \label{sec:link_qua_quant}
496:
497: The qualitative system obtained from Eq.\ref{math:qual} is a consequence of
498: the quantitative relations that result from Theorem \ref{math:var}.
499: So the sign function maps a quantitative
500: variation between two equilibrium points onto a
501: qualitative solution of Eq.\ref{math:qual}.
502: The converse is not true in general. For a given solution $S$ of the
503: qualitative system, there might be no equilibrium change $\Delta X$
504: in the differential quantitative model, s.t. each real-valued
505: component of $\Delta X$ has the sign given by $S$.
506:
507: However, some
508: components of the solution vectors are uniquely determined by the
509: qualitative system. They take the same sign
510: value in every solution vector. For such so-called hard components,
511: the sign of any quantitative solution (if it exists) is
512: completely determined by the qualitative system.
513:
514: We will use the previous properties to check the coherence between
515: models and experimental data. By experimental data we mean the sign of
516: the observed variation in concentration for some nodes. In particular,
517: if the qualitative system associated to an interaction graph $G$ has
518: no solution given some experimental observations, then no function $F$
519: satisfying the sign conditions on the derivatives can describe the
520: observed equilibrium shift, meaning that either
521: the model is wrong, either some data are corrupted. In the next
522: section, we introduce a simplified model related to lipid
523: metabolism, and illustrate the above described formalism.
524:
525:
526: \section{Toy example: regulation of the synthesis of fatty acids}
527: \label{sec:we}
528:
529:
530: In order to illustrate our approach, we use a toy example describing a
531: simplified model of genetic
532: regulation of fatty acid synthesis in liver. The corresponding
533: interaction graph is shown in Fig. \ref{igraph2}.
534:
535:
536: Two ways of production of fatty acids coexist in liver. Saturated
537: and mono-unsaturated fatty acids are produced from citrates thanks
538: to a metabolic pathway composed of four enzymes, namely ACL (ATP
539: citrate liase), ACC (acetyl-Coenzyme A carboxylase), FAS (fatty
540: acid synthase) and SCD1 (Stearoyl-CoA desaturase 1).
541: Polyunsaturated fatty acids (PUFA) such as arachidonic acid and
542: docosahexaenoic acid are synthesized from essential fatty acids
543: provided by nutrition; D5D (Delta-5 Desaturase) and D6D (Delta-6
544: Desaturase) catalyze the key steps of the synthesis of PUFA.
545:
546: PUFA plays pivotal roles in many biological functions; among them, they
547: regulate the expression of genes that impact on lipid,
548: carbohydrate, and protein metabolism.
549: The effects of PUFA are mediated either directly through their
550: specific binding to various nuclear receptors (PPAR$\alpha$ --
551: peroxisome proliferator activated receptors, LXR$\alpha$ --
552: Liver-X-Receptor $\alpha$, HNF-4$\alpha$) leading to changes in the
553: trans-activating activity of these transcription factors; or
554: indirectly as the result of changes in the abundance of regulatory
555: transcription factors (SREBP-1c -- sterol regulatory element
556: binding-protein--, ChREBP, etc.) \cite{Jump}.
557:
558:
559: \begin{figure*}[hbt]
560: \begin{center}
561: \includegraphics[width=12cm]{graphe_exemple3.eps}
562: \caption{Interaction graph for the toy model.
563: Self-regulation loops on nodes are omitted for sake of clarity.
564: Observed variations are depicted next to each vertex, when available.}
565: \label{igraph2}
566: \end{center}
567: \end{figure*}
568:
569: \paragraph{Variables in the model}
570:
571: We consider in our model nuclear receptors PPAR$\alpha$, LXR$\alpha$,
572: SREBP-1c (denoted by PPAR, LXR, SREBP respectively in the model), as
573: they are synthesized from the corresponding genes and the
574: trans-activating active forms of these transcription factors, that is,
575: LXR-a (denoting a complex LXR$\alpha$:RXR$\alpha$), PPAR-a (denoting a
576: complex PPAR$\alpha$:RXR$\alpha$) and SREBP-a (denoting the cleaved
577: form of SREBP-1c. We also consider SCAP -- (SREBP cleavage activating
578: protein), a key enzyme involved in the cleavage of SREBP-1c, that
579: interacts with another family of proteins called INSIG (showing the
580: complexity of molecular mechanism).
581:
582: We also include in the model ``final'' products, that is, enzymes ACL,
583: ACC, FAS, SCD1 (implied in the fatty acid synthesis from citrate),
584: D5D, D6D (implied in PUFA synthesis) as well as PUFA themselves.
585:
586: \paragraph{Interactions in the model}
587: Relations between the variables are the following. SREBP-a is an
588: activator of the transcription of ACL, ACC, FAS,
589: SCD1, D5D and D6D \cite{Nara,Jump}.
590: LXR-a is a direct activator of the transcription of SREBP and FAS,
591: it also indirectly activates ACL, ACC and SCD1
592: \cite{StefGus04}. Notice that
593: these indirect actions are kept in the model because we don't know
594: whether they are only SREBP-mediated.
595:
596: PUFA activates the formation of PPAR-a from PPAR, and inhibits the formation
597: of LXR-a from LXR as well as the formation of SREBP-a (by
598: inducing the degradation of mRNA and inhibiting the cleavage)
599: \cite{Jump}. SCAP represents the activators of the formation
600: of SREBP-a from SREBP, and is inhibited by PUFA.
601:
602: PPAR directly activates the production of SCD1, D5D, D6D
603: \cite{Miller,Tang,Takashi}. The dual regulation of
604: SCD1, D5D and D6D by SREBP and PPAR is
605: paradoxical because SREBP transactivates genes for fatty acid
606: synthesis in liver, while PPAR induces enzymes for fatty acid
607: oxidation. \\ \noindent Hence, the induction of D5D and D6D gene by PPAR
608: appears to be
609: a compensatory response to the increased PUFA demand caused by
610: induction of fatty acid oxidation.
611:
612:
613: \paragraph{Fasting-refeeding protocols}
614: The fasting-refeeding
615: protocols represent a favorable condition for studying
616: lipogenesis regulation; we suppose that
617: during an experimentation, animals (as rodents or chicken) were kept
618: in a fasted state during several hours. Then, hepatic mRNA of LXR,
619: SREBP, PPAR, ACL, FAS, ACC and SCD1
620: are quantified by DNA microarray analysis. Biochemical measures
621: also provide the variation of PUFA.
622:
623: A compilation of recent
624: literature on lipogenesis regulation provides hypothetical results of
625: such protocols: SREBP, ACL, ACC, FAS and SCD1 decline
626: in liver during the fasted state \cite{Liang2002}. This is expected
627: because fasting results in an inhibition of fatty acid synthesis and an
628: activation of the fatty acid oxidation. For the same reason, PPAR is
629: increased in order to trigger oxidation. However, Tobin et al
630: (\cite{Tobin2000}) showed that fasting rats for 24h increased the
631: hepatic LXR mRNA, although LXR positively regulates fatty acid
632: synthesis in its activated form. Finally, PUFA levels can be
633: considered to be increased
634: in liver following starvation because of the important lipolysis
635: from adipose tissue as shown by Lee et al in mice after 72h
636: fasting (\cite{Lee}).
637:
638:
639:
640:
641: \paragraph{Qualitative system derived from the graph}
642: As explained in the previous section, we derive a qualitative system
643: from the interaction graph shown in Fig. \ref{igraph2}. For ease of
644: presentation, we denote by \verb|A| the sign of variation for species
645: A.
646:
647:
648: {\small
649: \begin{minipage}{.7\linewidth}
650: \begin{System}
651: \begin{verbatim}
652: (1) PPAR-a = PPAR + PUFA
653: (2) LXR-a = -PUFA + LXR
654: (3) SREBP = LXR-a
655: (4) SREBP-a = SREBP + SCAP -PUFA
656: (5) ACL = LXR-a + SREBP-a - PUFA
657: (6) ACC = LXR-a + SREBP-a - PUFA
658: (7) FAS = LXR-a + SREBP-a - PUFA
659: (8) SCD1 = LXR-a + SREBP-a - PUFA + PPAR-a
660: (9) SCAP = -PUFA
661: (10) D5D = PPAR-a + SREBP-a - PUFA
662: (11) D6D = PPAR-a + SREBP-a - PUFA
663: \end{verbatim}
664: \label{system2}
665: \end{System}
666: \end{minipage}
667: \begin{minipage}{.3\linewidth}
668: \begin{Observations}
669: \begin{verbatim}
670: PPAR = +
671: PUFA = +
672: LXR = +
673: SREBP = -
674: ACL = -
675: ACC = -
676: FAS = -
677: SCD1 = -
678: \end{verbatim}
679: \end{Observations}
680: \end{minipage}
681: }
682:
683:
684: In the next section, we propose an efficient representation for such
685: qualitative systems.
686:
687:
688: \section{Analysis of qualitative equations: a new approach}
689: \label{sec:eff_qua}
690:
691: \subsection{Resolution of qualitative systems}
692: \label{sec:res_qua}
693:
694: The resolution of (even linear) qualitative systems is a NP-complete
695: problem (see for instance \cite{Trave,Dormoy88}). One can show this by
696: reducing the satisfiability problem for a finite set of clauses to the
697: resolution of a qualitative system
698: in polynomial time.
699:
700: Let us consider a collection $C=\{c_1, \ldots , c_n \}$ of clauses on
701: a finite set $V$ of variables. Let $\{ \plus ,\moins ,\indet \}$ a sign qualitative
702: algebra. In order to reduce the satisfiability problem to the
703: resolution of a qualitative system, let us code $true$ into $+$ and
704: $false$ into $-$. If $c$ is a clause, let us denote by $\bar{c}$ the
705: encoding of $c$ in a qualitative algebra formula. The following encoding scheme
706: provides a polynomial procedure to code a clause into a qualitative
707: formula. :
708: $$
709: \begin{array}{ccc}
710: \mbox{clause}& & \mbox{sign algebra}\\
711: \hline
712: a \in V & \rightarrow & \bar{a}\\
713: c_1 \vee c_2 & \rightarrow & \bar{c_1} + \bar{c_2}\\
714: \lnot c & \rightarrow & - \bar{c}
715: \end{array}
716: $$
717:
718: The satisfiability problem for the set of clauses $C$ is then reduced
719: to finding a solution of the qualitative system:
720: \[
721: \{ \bar{c_i} \approx \plus \ /\ i=1, \ldots , n \}
722: \]
723: So a NP-complete problem can be reduced to the resolution of
724: a qualitative system in polynomial time (with respect to the size of
725: the problem). This shows that solving qualitative systems is a
726: NP-complete problem.
727: For example, the only pair of values which are
728: not solution of $-\bar{a} + \bar{b}\ \approx\ +$ are
729: $(+,-)$. This corresponds to the only pair $(true, false)$ that does not
730: satisfy $\lnot a \vee b$.
731:
732: Several heuristics were proposed for the resolution of qualitative
733: systems. For linear systems, set of rules have been
734: designed \cite{Dormoy88}. This set is complete: it allows to find every
735: solution. It is also sound: every solution found by applying these
736: rules is correct. The rules are based on an adaptation of Gaussian
737: elimination. However only heuristics
738: exist for choosing the equation and the rule to apply on it. In case
739: of a dead-end, when no more rule can apply, it is necessary to backtrack
740: to the last decision made. As a result programs implementing
741: qualitative resolution are not very
742: efficient in general and only problems of small size can be resolved
743: in reasonable time. For that reason we propose an alternate way to
744: solve qualitative systems (linear or not).
745:
746:
747: \subsection{Qualitative equation coding}
748: \label{sec:eq_coding}
749:
750: Our method is based on a coding of qualitative equations as
751: algebraic equations over Galois fields ${\mathbb Z}/p{\mathbb Z}$
752: where $p$ is a prime number greater than 2. The elements of these
753: fields are the classes
754: modulo $p$ of the integers. If $\bar{x}$ denotes the class of the
755: integer $x$ modulo $p$, a sum and a product are defined on ${\mathbb
756: Z}/p{\mathbb Z}$ as follows:
757: $$
758: \bar{x} + \bar{y} = \overline{x+y} \qquad \bar{x} \times
759: \bar{y} = \overline{x \times y}
760: $$
761:
762: Galois fields have two basic properties which we use extensively:
763: \begin{itemize}
764: \item Every function $f:({\mathbb Z}/p{\mathbb Z})^n \rightarrow
765: {\mathbb Z}/p{\mathbb Z}$ with $n$
766: arguments ${\mathbb Z}/p{\mathbb Z}$ is a polynomial function
767: \item if $\oplus$ denotes the operation $f \oplus g = f^{(p-1)} +
768: g^{(p-1)}$, then every equation system $p_1(X)=0, \ldots ,
769: p_k(X)=0$ has the same solutions than the unique equation $p_1\oplus p_2 \oplus
770: \ldots \oplus p_k (X) = 0$.
771: \end{itemize}
772:
773:
774:
775:
776:
777: The following table specifies how the sign algebra
778: $\{\plus,\moins,\indet\}$ is mapped onto the Galois field with three
779: elements ${\mathbb Z}/3{\mathbb Z}$ is used for that coding.
780: $$
781: \begin{array}{ccc}
782: \mbox{sign algebra}& & \mathbb{Z}/3{\mathbb Z}\\
783: \hline
784: \plus & \rightarrow & 1\\
785: \moins & \rightarrow & -1\\
786: \indet & \rightarrow & 0\\
787: \end{array}
788: \qquad \qquad \qquad
789: \begin{array}{ccc}
790: \mbox{sign algebra}& & \mathbb{Z}/3{\mathbb Z}\\
791: \hline
792: e_1 + e_2 & \rightarrow & - \overline{e_1}.\overline{e_2}.(\overline{e_1} + \overline{e_2})\\
793: e_1 \times e_2 & \rightarrow & \overline{e_1}.\overline{e_2}\\
794: e_1 \approx e_2 & \rightarrow & \overline{e_1}.\overline{e_2}.(\overline{e_1} - \overline{e_2})
795: \end{array}
796: $$
797:
798: Finally a qualitative system $\{ e_1,\dots,e_n \}$ is coded as the
799: polynomial $\overline{e_1} \oplus \dots \oplus \overline{e_n}$.
800: A similar coding for the qualitative
801: algebra $\{\plus,\moins,\zero,\indet\}$ uses the Galois field
802: ${\mathbb Z}/5{\mathbb Z}$ and will not be presented here.
803:
804: With this coding, every qualitative system has a solution if and only
805: if the corresponding polynomial has a solution without null
806: component. Null solutions are excluded since $\indet$ solutions are
807: excluded for qualitative systems. In general we will have to add
808: polynomial equations $X^2 = 1$ to insure this.
809:
810:
811: \subsection{An efficient representation of polynomial functions}
812: \label{sec:rep_func}
813:
814: Recall that our purpose is to efficiently solve a NP-complete
815: problem. There is no hope to find a representation of polynomial
816: functions allowing to solve polynomial systems of equations in
817: polynomial time. The coding of a qualitative system as a
818: polynomial equation is obviously polynomial in the size of the system
819: (number of variables plus number of equations). So finding the
820: solution of a polynomial system of equations is itself a NP-complete
821: problem. It is more or less the SAT problem.
822:
823: Nevertheless, there exists a representation of polynomial functions on
824: Galois fields which gives, in practice, good performances for
825: polynomials with hundreds of variables. This kind of representation
826: was first used for logical functions which may be considered as
827: polynomial functions over the field ${\mathbb Z}/2{\mathbb Z}$. This
828: representation is known as BDD (Binary Decision Diagrams) and is
829: widely used in checking logical circuits \cite{BDD} and in model checkers
830: as nu-SMV \cite{SMV}.
831:
832: We present here this representation for the field ${\mathbb
833: Z}/3{\mathbb Z}$. Generalizations to other Galois fields
834: could be treated as well. The starting point is a generalization of Shannon
835: decomposition for logical functions:
836: \[
837: p(X_1,X) = (1-X_1^2)p_{[X_1=0]}(X)+X_1(-X_1-X_1^2)p_{[X_1=1]}(X) +
838: X_1(X_1-X_1^2)p_{[X_1=2]}(X)
839: \]
840: where $p$ is a polynomial function with $n$ variables. This
841: decomposition leads to a tree representation of the polynomial
842: function: the variable $X_1$ is the root and has three children. Each
843: of these is obtained by instantiating $X_1$ to -1, 0 or 1 in
844: $p(X_1,X)$. This representation is exponential ($3^n$) as each non
845: constant node has 3 children. It also depends on a chosen order on
846: the variables.
847:
848: Then a key observation (see \cite{BDD}), is that several
849: subtrees are identical. They have the same variable as root variable
850: and isomorphic children. If we decide to represent only once each type
851: of tree, then the tree representation is transformed into a direct
852: acyclic graph. With this representation there is no more redundancy
853: among subtrees. The result may be a dramatic decrease in the size of
854: the representation of a polynomial function.
855:
856: \begin{figure*}[hbt]
857: \begin{center}
858: \includegraphics[width=12cm]{tdd.eps}
859: \caption{From tree representation to direct acyclic graph for $X^2
860: (Y+1)$. The tree has 13 nodes while the DAG representing the same
861: function has 5 nodes.}\label{itree-dag}
862: \end{center}
863: \end{figure*}
864:
865: A property of the Shannon like decomposition is that many
866: operations on polynomial functions are recursive with respect to this
867: decomposition. More precisely let
868: \[
869: p^i(X_1,X) = (1-X_1^2)p^i_0(X)+X_1(-X_1-X_1^2)p^i_1(X) +
870: X_1(X_1-X_1^2)p^i_2(X)
871: \]
872: $i=1,2$ be two polynomial functions with $p_{\alpha}(X) = p_{[X_1 =
873: \alpha]}(X)\ ,\ \alpha = 0,1,2$. Then for binary operations $\Delta$
874: on polynomial functions,
875: \[
876: p^1 \Delta p^2 = (1-X_1^2) (p^1_0 \Delta p^2_0 ) + X_1(-X_1-X_1^2)
877: (p^1_1 \Delta p^2_1 )+ X_1(X_1-X_1^2) ( p^1_2 \Delta p^2_2 )
878: \]
879: This kind of recursive formula leads to an exponential
880: complexity of any computation. Again, it is possible to take advantage
881: of the redundancy by using a cache to remember each operation. This
882: technique is known as
883: memoisation in formal calculus. A 40\% cache hit rate is commonly observed.
884:
885: More complex operations on polynomial functions are also implemented
886: with a recursive scheme and memoisation. Let us just mention
887: quantifier elimination as among the most useful for our
888: purpose.
889:
890: This representation of polynomial functions on Galois fields has
891: also several drawbacks:
892: \begin{itemize}
893: \item the memory size heavily depends on the order of variables. The
894: libraries implementing formal computations always have reordering
895: algorithms.
896: \item for each order, there exists polynomial functions which are
897: exponential in memory size.
898: \end{itemize}
899:
900: Nevertheless, in practice, this representation has proved to be very
901: efficient for polynomial functions with several hundred of
902: variables. The computations performed on our toy model and on another
903: real size one
904: used a program named SIGALI which is devoted to polynomial functions on
905: ${\mathbb Z}/3{\mathbb Z}$ representation. Several algorithms were
906: added to this program in order to answer questions of biological
907: interest.
908:
909: \section{Qualitative models and experimental data}
910: \label{sec:qua_exp}
911: In this section, we show how to compute some properties of a
912: qualitative system, and eventually get some insights on the biological
913: model it represents. The algorithms we derive heavily rely on the
914: representation introduced above. Hence, not only they can deal in
915: practice with computationally hard problems efficiently, but also they
916: are expressed in a rather simple and generic fashion.
917:
918: Let $M$ be a qualitative model represented by its associated
919: interaction graph $G(V,E)$. Recall that $V$ is the set of
920: variables. Let $V_O$ be the set of observed variables, and
921: $o_i \in \{ \plus,\moins \}$ for $i \in V_O$ the experimental
922: observations. As explained in the previous section, the
923: qualitative system derived from $M$ can be coded as a polynomial
924: function $P_M(X_1,\ldots,X_n)$. Roots of $P_M$ correspond to
925: solutions of the qualitative system.
926:
927: \subsection{Satisfiability of the qualitative system}
928: \label{sec:data_agree}
929:
930: A property of the coding described above, is that the system has no
931: solution iff $P_M$ is equal to the constant polynomial
932: $1$. Alternatively if $P_M=0$, the qualitative equations do not
933: constraint the variables at all.
934:
935: Now if some observations $o_i$ for $i \in V_O$ are available, checking
936: their consistency with the model $M$ boils down to instantiating
937: $X_i = o_i$ in $P_M(X_1,\ldots,X_n)$, for all $i \in V_O$, and
938: testing whether the resulting polynomial is different from 1.
939:
940: We computed the polynomial $P_L$ associated to our toy example (see section
941: \ref{sec:we}) and it has roots. Recall that it does not guarantee the
942: existence of some (quantitative) differential model conforming to the
943: interaction graph depicted in Fig. \ref{igraph2}. Satisfiability of
944: the qualitative system is only a necessary condition for the model to
945: be correct.
946:
947:
948: %% As explained in section \ref{sec:we}, our toy example
949: %% focus on a fasting experiment. From the literature, we derive the
950: %% following hypothetical:
951:
952: %% \begin{Observations}
953: %% \begin{verbatim}
954: %% ACL = -1 FAS = -1
955: %% ACC = -1 SCD1 = -1
956: %% SREBP = -1 PPAR = 1
957: %% LXR = 1 PUFA = 1
958: %% \end{verbatim}
959: %% \end{Observations}
960:
961: The polynomial obtained by instantiating observations into $P_L$
962: is different from 1, meaning that our model does not contradict
963: generally observed variations during fasting.
964:
965: Large size models might advantageously be reduced
966: using standard graph techniques.
967: First we look for connected components in the interaction graph. A
968: graph with several
969: connected components represents a coherent qualitative model iff
970: each component is coherent.
971: Second, a node without successor except itself appears only
972: in its associated equation. If this node is not observed, its
973: associated qualitative equation adds no constraint on the other
974: nodes. So, at least for satisfiability checking, this node can be
975: suppressed and its qualitative equation removed from the system. This
976: procedure is applied iteratively, until no node can be deleted.
977: The resulting graph leads to a new qualitative system which is
978: satisfiable iff the initial system is satisfiable.
979:
980:
981: \subsection{Correcting data or model}
982:
983:
984: If the qualitative system, given some experimental observations, is
985: found to have no solution, it is of interest to propose some
986: correction of the data and/or the model. By correction, we mean
987: inverting the sign of an observed variable or the sign of an edge of
988: the interaction graph. In
989: the general case, there are several possibilities to make the system
990: satisfiable, and we need some criterion to choose among them. We
991: applied a parsimony principle: a correction of the data should imply a
992: minimal number of sign inversions.
993:
994: In the following, we show how to compute all minimal corrections for the
995: data. Given $(o_i)_{i \in V_O}$ a vector of experimental observations
996: which is not compatible with the model, we compute all
997: $(o'_i)_{i \in V_O}$ vectors which are compatible with the data and
998: such that the Hamming distance between $o$ and $o'$ is minimal. By
999: Hamming distance, we mean the number of differences between $o$ and
1000: $o'$. The set of such $o'$ vectors might be very large; but again, by
1001: encoding it as the set of roots of a polynomial function, we obtain a
1002: compact representation.
1003:
1004:
1005: This procedure can be extended in a straightforward manner to
1006: corrections of edges sign in the interaction graph. This is done by
1007: considering these signs as variables of the model. For ease of
1008: presentation, we only detail data correction.
1009:
1010:
1011:
1012: \begin{algorithm}
1013: {
1014: \small
1015: \dontprintsemicolon
1016: \KwIn{\\
1017: \qquad $P$, a polynomial function on variables $V$ \\
1018: \qquad $i \in V$
1019: }
1020: \KwOut{\\
1021: \qquad $C$, a polynomial function encoding all minimal corrections\\
1022: \qquad $d$, minimal number of corrections
1023: }
1024: \BlankLine
1025: \eIf{$P$ is constant}{
1026: \eIf{$P$ = 0}
1027: {\KwResult{$C=0$, $d=0$}}
1028: {\KwResult{$C=1$, $d= \infty$}}
1029: }{
1030: let $P_0$, $P_1$, $P_2$ be the Shannon decomposition of $P$ with respect to
1031: variable $X_i$,\\
1032: and $(C_j,d_j)$ the result obtained by recursively applying the
1033: algorithm on $P_j$ and $i+1$ for $j \in \{0,1,2\}$
1034: \BlankLine
1035: let $d'_j =
1036: \left\{
1037: \begin{array}{ll}
1038: d_j + 1 & \mbox{if } i \in V_O \mbox{ and } o_i \neq j\\
1039: d_j & \mbox{otherwise}
1040: \end{array}
1041: \right.$
1042: and $C'_j =
1043: \left\{
1044: \begin{array}{ll}
1045: (X_i - j) \oplus C_j & \mbox{if } i \in V_O \\
1046: C_j & \mbox{otherwise}
1047: \end{array}
1048: \right.$\\
1049: \BlankLine
1050:
1051: \KwResult{$d = \min\ d'_j$, $C = \displaystyle\prod_{j,\ d'_j = d} C'_j$}
1052: }
1053: }
1054: \caption{Algorithm for experimental data correction.}
1055: \label{algo:hamming}
1056: \end{algorithm}
1057:
1058: Let us illustrate this algorithm on our toy example: during fasting
1059: experiments, synthesis of fatty acids tends to be inhibited, while
1060: oxidation, which produces ATP, is activated. In particular ACC, ACL,
1061: FAS and SCD1 are implied in the same pathway to produce saturated and
1062: monounsaturated fatty acids. Expectedly, they are known to decline
1063: together at fasting. Suppose we introduce some wrong observation, say
1064: for instance an increase of ACL, while keeping all other
1065: observations given above. The polynomial obtained from $P_L$ including
1066: these new observations is equal to 1, and hence has no solution.
1067: Applying algorithm \ref{algo:hamming}, we recover this error. Now if
1068: we wrongly change two values, say ACL and ACC to 1, the algorithm
1069: proposes a different correction, namely to change the observed value
1070: of SREBP to 1, which is more parsimonious.
1071:
1072: \subsection{Experiment design}
1073: \label{sec:exp_des}
1074:
1075: It is often the case that not all variables in the system under study
1076: can be observed. Biochemical measurements of metabolites can be costly
1077: and/or time consuming. By experiment design, we mean here the choice
1078: of the variables to observe so that an experiment might be
1079: informative.
1080:
1081: Let $P_M(X_O,X_U)$ be the polynomial function coding for the
1082: qualitative system $M$. $X_O$ (resp. $X_U$) denotes the state vector
1083: of observed (resp. unobserved) variables. The polynomial function
1084: representing the admissible values of the observed variables is
1085: obtained by elimination of the quantifier in
1086: $\exists X_U \ P_M(X_O,X_U)$. Let $P_M^O(X_O)$ denote the resulting
1087: polynomial function.
1088:
1089: For some choice of observed variables, it might well be that $P_M^O$
1090: is null, which basically means that the experiment is totally useless.
1091: Remark that no improvement can be found by taking a subset of
1092: $X_O$ The solution is either to add new observed variables or to
1093: chose a completely different set of observed variables.
1094:
1095: In order to assess the relevance of a given experiment (namely of a
1096: given observed subset), we suggest to compute the following ratio:
1097: number of consistent valuations for observed variables versus the
1098: total number
1099: of valuations of observed variables. A very stringent experiment
1100: has a low ratio. An experiment having a ratio value of one is useless.
1101:
1102: Again this computation is carried out in a recursive fashion. Let $P$
1103: be a polynomial function representing the
1104: set of admissible observed values. Let $Rat(p)$ the percentage of
1105: solutions of $P(X)=0$ in the
1106: space $({\mathbb Z}/p{\mathbb Z})^n $, where $n$ is the number of
1107: variables $X$. If $P$ is constant then $Rat(P)=1$
1108: (resp. $Rat(P)=0$) if $P=0$ (resp. $P \neq 0$). Else, let $P_1$,
1109: $P_2$, $P_3$ be a Shannon like decomposition of $P(X)$
1110: with respect to some variable of $P$. Then it is easy to prove:
1111: \[
1112: Rat(P)\ =\ (Rat(P_0)+Rat(P_1)+Rat(P_2))/3
1113: \]
1114:
1115: The relevance of this approach was assessed on our toy example: for
1116: each subset $O$ of variables in the model, containing at most four
1117: variables, we computed $Rat(P_L^O)$. Expectedly, the lowest ratios
1118: (i.e. the most stringent experiments) were achieved observing four
1119: variables: either \{SCAP, PUFA, PPAR-a, PPAR\}, or
1120: \{SREBP, SCAP, PUFA, LXR-a\}, or \{SREBP, PPAR-a, PPAR, LXR-a\}.
1121:
1122: Interestingly, the procedure captures what might be though of as
1123: control variables, like PUFA/SCAP, SREBP/LXR-a and PPAR/PPAR-a.
1124: The first two pairs control the activation of fatty acids synthesis;
1125: the third one controls fatty acid oxidation.
1126:
1127: Indeed one can go even further: if we isolate some kind of control
1128: variables, we are naturally interested in knowing how they constrain
1129: other variables. Achieving this amounts to computing the set of
1130: variables which value is constant for all solutions of the system (the
1131: so called hard components). Recall that
1132: these hard components of qualitative solutions are also important with
1133: respect to the hypothetical differential model which is abstracted in
1134: the qualitative one. Indeed, all solutions of the quantitative equation for
1135: equilibrium change have the same sign pattern on the hard components.
1136: Algorithm \ref{algo:hardcomponents} describes a recursive procedure
1137: which finds the set of hard components, together with their value.
1138:
1139: \begin{algorithm}
1140: {
1141: \small
1142: \dontprintsemicolon
1143: \KwIn{$P$, a polynomial function on variables $V$}
1144: \KwOut{\\
1145: \qquad the set $W \subset V \times \{0,1,2\}$ of hard components,
1146: together with their values\\
1147: \qquad a boolean $b$ which is true if $P$ has at least one root
1148: }
1149: \BlankLine
1150: \eIf{$P$ is constant}{
1151: \eIf{$P$ = 0}
1152: {\Return{$(\emptyset,\mbox{true})$}}
1153: {\Return{$(\emptyset,\mbox{false})$}}
1154: }{
1155: let $P_0$, $P_1$, $P_2$ be the Shannon decomposition of $P$ with respect to
1156: variable $X_i$,\\
1157: and $(W_j,b_j)$ the result obtained by recursively applying the
1158: algorithm on $P_j$ for $j \in \{0,1,2\}$
1159: \BlankLine
1160: let $W = \{ (v,v') | v \in V,\ v' \in \{0,1,2\},\ \forall j\ b_j \Rightarrow (v,v') \in W_j \}$\\
1161: \If{there exists a unique $j_0$ s.t. $b_{j_0}$ is true}
1162: {add $(i,j_0)$ to $W$}
1163: \BlankLine
1164: \Return{$(W,b_0 \vee b_1 \vee b_2)$}
1165: }
1166: }
1167: \caption{Determination of hard components}
1168: \label{algo:hardcomponents}
1169: \end{algorithm}
1170:
1171: Let us set some of our previously found control variables of the toy example,
1172: to a given value, say PUFA to 1, and LXR to -1. Then applying the
1173: algorithm \ref{algo:hardcomponents}, the corresponding polynomial has the
1174: following hard components:
1175: \begin{verbatim}
1176: ACL = -1 FAS = -1
1177: ACC = -1 LXR-a = -1
1178: SCAP = -1 SREBP = -1
1179: SREBP-a = -1 PPAR = -1
1180: PPAR-a = -1
1181: \end{verbatim}
1182: which expectedly corresponds to the inhibition of fatty acids
1183: synthesis.
1184:
1185:
1186:
1187:
1188: \subsection{Real size system}
1189: We have used our new technique to check the consistency of a
1190: database of molecular interactions involved in the genetic regulation of fatty
1191: acid synthesis.
1192: In the database, interactions were classified as behavioral or biochemical.
1193: \begin{itemize}
1194: \item a behavioral interaction describes the effects of a variation of
1195: a product concentration. It is either direct or indirect (unknown
1196: mechanism).
1197: \item a biochemical interaction may be a gene transcription, a
1198: reaction catalyzed by an enzyme ... Such molecular
1199: interactions can be found in existing databases. They need
1200: a behavioral interpretation.
1201: \end{itemize}
1202: All the behavioral interactions were manually extracted from a
1203: selection of scientific papers. Biochemical interactions were
1204: extracted from public databases available on the Web
1205: (Bind~\cite{Bind}, IntAct~\cite{IntAct}, Amaze~\cite{Amaze},
1206: KEGG~\cite{Kegg} or TransPath~\cite{TransPath}).
1207: A biochemical interaction may be linked to a behavioral interpretation
1208: in the database.
1209:
1210: The database is used to generate the interaction graph. While
1211: behavioral interactions directly correspond to edges in the graph,
1212: biochemical interactions are given a simplified
1213: interpretation. Roughly, any increase of a reaction input induces an
1214: increase of the outputs.
1215:
1216: The interaction graph which is built from the database contains more than 600
1217: vertices and more than 1400 edges. It is clear that even though, the
1218: obtained graph is not a comprehensive model of genetic
1219: regulation of fatty acid synthesis in liver. Anyway our aim is to see
1220: how far this model can account for experimental observations, and
1221: propose some corrections when it cannot.
1222:
1223: We used our technique to check the coherence of the whole model. After
1224: reducing the graph with standard graph techniques as described in
1225: section \ref{sec:data_agree}, we found that the model was incoherent. The reduced
1226: graph has about 150 nodes. We
1227: developed a heuristic to isolate minimal incoherent sub-systems. It
1228: turned out that all the contradictions we detected resulted from arguable
1229: interpretations of the literature.
1230:
1231:
1232: \section{Conclusion}
1233: \label{sec:conclusion}
1234:
1235: In this paper we proposed a qualitative approach for the analysis
1236: of large biological systems. We rely on a framework more thoroughly
1237: described in \cite{Biosystems}, which is meant to model the comparison
1238: between two experimental conditions as a steady state shift. This
1239: approach fits well with
1240: state of the art biological measurement techniques, which provide
1241: rather noisy data for a large amount of targets. It is also
1242: well suited to the use of biological knowledge, which is most of the time
1243: descriptive and qualitative.
1244:
1245: This qualitative approach is all the more attractive that we can rely
1246: on new analysis methods for qualitative systems. This new technique is
1247: also introduced in this paper and is original in
1248: qualitative modeling. It relies on a representation of qualitative
1249: constraints by decision diagrams. Not only this has a major impact on
1250: the scalability of qualitative reasoning, but it also permits to
1251: derive many algorithms in a quite generic fashion.
1252:
1253: We plan to validate our approach on pathways which are published for
1254: yeast and \emph{E.Coli}. Not only this pathways are of significant size but
1255: microarray data for this species are publicly available. Concerning
1256: the scalability of the methods, qualitative systems with up to 200
1257: variables are handled within a few minutes.
1258:
1259:
1260: On the theoretical side, we study applications of our algebraic
1261: techniques to network reconstruction, as proposed in \cite{Wagner04}.
1262: The problem is to infer direct actions between products, based on
1263: large scale perturbation data, in order to obtain the most
1264: parsimonious interaction graph. Our approach could lead to a
1265: reformulation of this problem in terms of polynomial operations.
1266: Indeed, finding a minimal regulation network from a minimal polynomial
1267: representation has already been described in
1268: \cite{Laubenbacher04}, though it was applied to a rather different
1269: type of network. A similar approach tailored to the framework
1270: described in this paper could eventually lead to original and
1271: practical algorithms for network reconstruction.
1272:
1273: \paragraph{Acknowledgment} This research was supported by ACI IMPBio, a French Ministry for Research program on interdisciplinarity.
1274:
1275:
1276: \bibliographystyle{plain}
1277: \bibliography{eccs}
1278: \end{document}
1279: