0604:q-bio0604017/eccs.tex

1: \documentclass[11pt,a4paper]{article}

2: \usepackage{amsfonts,amssymb,latexsym, amsmath,epsfig,theorem}

3: \usepackage[latin1]{inputenc}

4: \usepackage{fullpage}

5: \usepackage{algorithm2e}

6: %\usepackage{multicol}

7:

8:  % \usepackage{times}

9:   \usepackage{mathptmx}

10:   \usepackage{textcomp}

11:

12: \newcommand{\rem}{\bf \em}

13: \newcommand{\vect}[1]{{\bf #1}}

14: \newcommand{\ord}[1]{\ensuremath{\mathcal{O}\!\left(#1 \right)} }

15: \newcommand{\DP}[2]{ \ensuremath{ \frac{\partial #1 }{\partial #2 } } }

16: \newcommand{\D}[2]{ \ensuremath{ \frac{d #1 }{d #2 } }}

17: \newcommand{\correction}[1]{ {\bf #1}}

18: \newcommand{\comment}[1]{ \begin{center} \fbox{\begin{minipage}[h]{.9\linewidth} #1 \end{minipage}} \end{center}}

19:

20: \newcommand{\noun}[1]{\textsc{#1}}

21: %% Bold symbol macro for standard LaTeX users

22: \providecommand\boldsymbol[1]{\mbox{\boldmath $#1$}}

23:

24: \newcommand\BS[1]{\mbox{\fontseries{b}\selectfont #1}}

25: \newcommand\plus{\BS{+}}

26: \newcommand\moins{\BS{\textminus}}

27: \newcommand\plusmoins{\BS{\textpm}}

28: \newcommand\indet{\BS{?}}

29: \newcommand\zero{\BS{0}}

30:

31: \newcommand\DEF{\stackrel{\text{\tiny def}}{=}}

32: \newcommand\actif[1]{\ensuremath{\overline{\text{#1}}}}

33:

34:

35: \theoremstyle{break}\newtheorem{Theorem}{Theorem}[section]

36: \theoremstyle{break}\newtheorem{Prop}[Theorem]{Property}

37: \theoremstyle{break}\newtheorem{Lemma}[Theorem]{Lemma}

38: \theoremstyle{break}\newtheorem{Def}[Theorem]{Definition}

39:

40: \theoremstyle{break}\newtheorem{System}{System}

41: \theoremstyle{break}\newtheorem{Data}{Data}

42: \theoremstyle{break}\newtheorem{Observations}{Observations}

43:

44: \DeclareMathOperator{\pred}{pred}

45: \DeclareMathOperator{\sgn}{sgn}

46:

47: % ************ Brouillon *************

48: %\setlength\overfullrule{5pt}

49: %\newcommand\XXX[1]{[XXX #1]}

50:

51: %\pagestyle{fancy}

52: %\lhead{}

53: %\chead{}

54: %\rhead{}

55: %\renewcommand\headrulewidth{0pt}

56: %\renewcommand\footrulewidth{0pt}

57: %\addtolength\footskip{\baselineskip}

58: %\addtolength\footskip{\baselineskip}

59: %\lfoot{Soumission � \textsc{Jobim}}

60: %\cfoot{\today}

61: %\rfoot{\thepage}

62:

63: \begin{document}

64:

65: \title{Complex Qualitative Models in Biology: a new approach}

66:

67:

68: \author{ P. \noun{Veber}$\ ^1$

69: \and M. \noun{Le Borgne}$\ ^1$ \and A. \noun{Siegel}$\ ^1$

70: \and S. \noun{Lagarrigue}$\ ^3$ \and

71: O. \noun{Radulescu}$\ ^2$ }

72:

73: \date{}

74: %\affiliation{$^{*}$ ENSA Rennes\\

75: %$^{\dag}$ UMR de g�n�tique animale, INRA, Rennes\\

76: %$^{\ddag}$ Projet Symbiose, IRISA, Rennes\\

77: %$^{\mathsection}$ IRMAR UMR-CNRS 6625, Rennes\\

78: %$^{**}$ INSERM U620, Rennes}

79:

80:

81: \maketitle

82:

83:

84:

85: {\footnotesize $^1$ Projet Symbiose.  Institut de Recherche en

86: Informatique et Syst\`emes Al\'eatoires, IRISA-CNRS 6074-Universit� de

87: Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France}

88:

89: {\footnotesize $^2$ Institut de Recherche

90: Math\'ematique de Rennes, UMR-CNRS 6625, Universit� de Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France}

91:

92: {\footnotesize  $^3$ UMR G�n�tique animale, Agrocampus Rennes-INRA,  65 rue de Saint-Brieuc,  CS 84215 Rennes, France}

93:

94: \smallskip

95: \paragraph{Abstract.}

96: %\abstract{

97: {\em

98: We advocate the use of qualitative models in the analysis of large biological

99: systems. We show how qualitative

100: models are linked to theoretical differential models and practical

101: graphical models of biological networks. A new technique for analyzing qualitative models is

102: introduced, which is based on an efficient representation of

103: qualitative systems. As shown through several applications, this

104: representation is a relevant tool for the understanding and testing of

105: large and complex biological networks.

106: }

107:

108:

109:

110: \section{Introduction}

111: \label{sec:intro}

112:

113: Understanding the behavior of a biological system from the interplay

114: of its molecular components is a particularly difficult task. A

115: model-based approach proposes a framework to express some hypotheses

116: about a system and make some predictions out of it, in order to compare

117: with experimental observations. Traditional approaches (see

118: \cite{DJ02} for

119: an interesting review) include ordinary differential equations or

120: stochastic processes. While they are powerful tools to acquire a fine

121: grained knowledge of the system at hand, these frameworks need

122: accurate experimental data on chemical reactions kinetics, which are

123: scarcely available. Furthermore, they also are computationally

124: demanding and their practical use is restricted to a limited number of

125: variables.

126:

127: As an answer to these issues, many approaches were proposed, that

128: abstract from quantitative details of the system. Among others, let us

129: stress the work done on gene regulation dynamics \cite{DJGHP04+}, hybrid

130: systems \cite{GT04} or discrete event systems \cite{CRCDF04+},

131: \cite{CRRT04}. The goal of such

132: qualitative frameworks is to enable system-level analysis of a

133: biological phenomenon. This appears as a relevant answer to recent

134: technical breakthrough in experimental biology:

135: \begin{itemize}

136: \item microarrays, mass spectrometry, protein chips currently allow to

137:   measure thousands of variables simultaneously,

138: \item obtained measurements are rather noisy, and may not be

139:   quantitatively reliable.

140: \end{itemize}

141:

142: Microarrays for instance, are used for comparing the activity of genes

143: between two experimental settings. A microarray experiment gives

144: differential measure between two experimental settings.

145: It delivers  informations on the relative activity of each

146: gene represented on the array. Despite many attempts made to

147: quantified the output of microarrays, the essential output of the

148: technique says, for example, that a gene

149: G is more active in situation A than in situation B.

150:

151: In this paper, we use a framework developed in \cite{Biosystems} for

152: the comparison of two experimental conditions, in order to derive

153: qualitative constraints on the possible variations of the

154: variables. Our main contribution is the use of an efficient

155: representation for the set of solutions of a qualitative system.

156: This representation allows to solve  systems with

157: hundreds of variables. Moreover, this representation opens the way to finer

158: analysis of qualitative systems. This new approach is

159: illustrated by solving three important problems:

160: \begin{itemize}

161: \item checking the accordance of a qualitative system with qualitative

162:   experimental data.

163: \item minimally correcting corrupted data in discordance with a

164:   model

165: \item helping in the design of experiments

166: \end{itemize}

167:

168: Our main focus here is to show how to use large qualitative models and

169: qualitative interpretations of experimental data. In this respect our

170: work could be used as an extension to what was proposed in \cite{GRRLH03+},

171: where basically the authors propose to analyze pangenomic gene

172: expression arrays in \emph{E.coli}, using simple qualitative rules.

173:

174: In the first section we establish links between differential,

175:  graphical and qualitative models.

176:

177: \section{Mathematical modeling}

178: \label{sec:maths} In this section we show how qualitative models can

179: be linked to more traditional differential models. Differential

180: models are central to the theory of metabolic control

181: \cite{meta-control,meta-control2}. They also have been

182: applied to various aspects of gene networks dynamics.

183: %(cycles (TODO citer

184: %tyson, goldbeter), switches (TODO citer  hasty),etc.).

185: The purpose of this section is to lay down a set of qualitative

186: equations describing steady states shifts of differential models.

187: For the sake of completeness, we rederive in a simpler case results

188: that have been established in greater generality in

189: \cite{Biosystems,Radulescu05}.

190:

191: \subsection{Modeling assumptions}

192:

193: Let us consider a network of interacting cellular constituents,

194: numbered from 1 to $n$. These constituents may be proteins, RNA

195: transcripts or metabolites for instance. The state vector $X$

196: denotes the concentration of each constituent.

197:

198:

199: \paragraph{Differential dynamics}

200:

201: $X$ is assumed to evolve according to the following differential

202: equation:

203:

204: $$ \D{X}{t}=F(X) $$

205:

206: \noindent where $F$ is an (unknown) nonlinear, differentiable function. A

207: steady state $X_{eq}$ of the system is a solution of the algebraic

208: equation:

209:

210: $$ F(X_{eq}) = 0.$$

211:

212:

213: Steady states are asymptotically stable if they attract all nearby

214: trajectories. A steady state is non-degenerated if the Jacobian

215: calculated in that steady state is non-vanishing.

216:  According to the Grobman-Hartman theorem, a sufficient condition to have

217:  nondegenerated asymptotically stable

218:  steady states is

219: $Re (\lambda_i) < -C, C>0, i=1,\ldots,n$, where $\lambda_i$ are the

220: eigenvalues of the Jacobian matrix calculated at the steady state.

221:

222:

223:

224:

225: \paragraph{Experiment modeling}

226: Typical two state experiments such as differential microarrays are

227: modeled as steady state shifts. We suppose that under a change of

228: the control parameters in the experiment, the system goes from one

229: non-degenerated stable steady state to another one.

230: The output of the two state experiment can be expressed in terms of

231: concentration variations for a subset of products, between the two

232: states. We suppose that the signs of these variations were proven to

233: be statistically significant.

234:

235:

236: \paragraph{Interaction graph}

237: The only knowledge we require about the function $F$ concerns the

238: signs of the derivatives $\DP{F_i}{X_j}$.  These are interpreted as

239: the action of the product $j$ on the product $i$. It is an

240: activation if the sign is $\plus$, an inhibition if the sign is

241: $\moins$. A null value means no action.

242:

243: An interaction graph $G(V,E)$ is  derived from the Jacobian matrix

244: of $F$:

245: \begin{itemize}

246: \item with nodes $V = \{1,\dots,n\}$ corresponding to products

247: \item and (oriented) edges $E = \{ (j,i) | \DP{F_i}{X_j} \neq 0

248:   \}$. Edges are labeled by $s(j,i) = \sgn(\DP{F_i}{X_j})$.

249: \end{itemize}

250:

251: The set of predecessors of a node $i$ in $G$ is denoted $\pred(i)$. The

252: interaction graph is actually built from informations gathered

253: in the literature. In consequence in some places it may be

254: incomplete (some interactions may be missing), in others it may be

255: redundant (some interactions may appear several times as direct and

256: indirect interactions). It is an important issue that neither

257: incompleteness nor redundancy do not introduce inconsistencies and

258: this will be addressed in section \ref{sec:qua_exp}.

259:

260:

261:

262: \paragraph{Negative diagonal in the Jacobian matrix}

263: For any product $i$, we exclude the possibility of vanishing

264: diagonal elements of the Jacobian $\DP{F_i}{X_i}$. This can be

265: justified by taking into account  degradation and dilution (cell

266: growth) processes that can be represented  as negative self-loops in

267: the interaction graph, that is for all $i$, $(i,i) \in E$ and

268: $s(i,i) = \moins$.

269:

270:

271:

272:

273: \paragraph{Discussion}

274:  In our mathematical modeling we suppose that the system

275: starts and ends in non-degenerated stable steady states. Of course

276: this is not always the case for several reasons: the waiting time to

277: reach steady state is too big; one can end up in a limit cycle and

278: oscillate instead of reaching a steady state.

279: %(TODO citer quelques

280: %exemples P53,NFKB, circadian clocks).

281: All these possibilities should

282: be considered with caution.  Actually this hypothesis

283: might be difficult to check from the two states only. Complementary

284: strategies such as time series analysis

285: %(TODO citer Hoffmann et Uri Alon)

286: could be employed in order to assess the possibility of limit cycle

287: oscillations.

288:

289: Positive self-regulation is also  possible but introduces

290: a supplementary complication.

291: In this case

292: for certain values of the  concentrations

293: degradation exactly compensates the positive self-regulation and

294: the diagonal elements of the Jacobian vanish (this is

295: a consequence of the intermediate value theorem).

296: We can avoid dealing with this situation by

297: considering that the positive self-regulation does not act directly

298: and that it involves intermediate species.

299: This is a realistic assumption because a  molecule never really acts directly

300: on itself (transcripts can be auto-regulated but only via protein

301: products).

302: Thus, all nodes can keep their negative self-loops and all diagonal

303: elements of the Jacobian can be considered to be non-vanishing.

304: Although the positive regulation may imply vanishing higher order

305: minors of the Jacobian, this will not affect our local qualitative

306: equations.

307:

308:

309:

310:

311:

312: \subsection{Quantitative variation of one variable}

313: We focus

314: here on the variation of the concentration of a single chemical

315: species represented by a component $X_i$ of the vector $X$. Since we

316: have adopted a {\em static} point of view, we are only interested in

317: the variation of $X_i$ between two non-degenerated stable steady

318: states  $X_{eq}^1$ and $X_{eq}^2$ independently of the trajectory of

319: the dynamical system between the two states.

320:

321: Let us denote by $\hat{X}_i$ the vector of dimension $n_i$ obtained

322: by keeping from $X$ all coordinates $j$ that are predecessors of $i$

323: in the interaction graph. Then, under some additional assumptions

324: described and discussed in \cite{Radulescu05}, we have the following

325: result:

326:

327: \begin{Theorem}

328: \label{math:var}

329: The variation of the concentration

330: of species $i$ between two non-degenerated steady states $X_{eq}^1$

331: and $X_{eq}^2$ is given by

332: \begin{equation}

333:   X_{eq_i}^1 - X_{eq_i}^2 = \int_S -\left( \DP{F_i}{X_i} \right)^{-1}

334:   \sum_{k \in \pred(i)} \DP{F_i}{X_k} d X_k

335: \label{math:eq:var}

336: \end{equation}

337: where $S$ is the segment linking

338: $\hat{X}^1_{eq_i}$ to $\hat{X}^2_{eq_i}$.

339: \end{Theorem}

340:

341: Full proof is given in \cite{Radulescu05}. The above formula is a

342: quantitative relation between the variation of concentrations and the

343: derivatives $\DP{F_i}{X_j}$. Now our next move will be to introduce a

344: qualitative abstraction of this relation.

345:

346:

347:

348: %% Any influence on $\hat{i}$ comes

349: %% exclusively from the same set of predecessors, for all concentration

350: %% vector $X$  in a domain $D$ including the two steady states.

351: %% Therefore, in $D$ the function  $F_i$ depends on $X_i, \hat{X}_i$

352: %% only. The fundamental hypothesis $\DP{F_i}{X_i} <0$ implies that the

353: %% function $X_i \rightarrow F_i(X_i, \hat{X}_i)$ is strictly

354: %% decreasing. Consequently, for each $\hat{X}_i$ the equation

355: %% $F_i(X_i, \hat{X}_i)=0$  defining steady states has at most one

356: %% solution. The set $F_i(X_i, \hat{X}_i)=0$  represents a $n_i$

357: %% dimensional

358: %%  manifold  $M_i \subset D$, including the two states. Using a

359: %% global implicit function theorem, one shows that the

360: %% set $F_i(X_i, \hat{X}_i)=0$ can be represented as the graph of a

361: %% function $X_i = \Phi_i (\hat{X}_i )$ in $D$ (see \cite{eccb} for

362: %% detailed hypotheses and proof).

363:

364:

365: %% The differential of $\Phi_i$ is given by the implicit function

366: %% theorem:

367:

368: %% \begin{equation}

369: %% \label{math:diffphi}

370: %%  d \Phi_i = -\left( \DP{F_i}{X_i} \right)^{-1} \left( \sum_{k \in

371: %% \pred(i)} \DP{F_i}{X_k} d X_k \right)

372: %% \end{equation}

373:

374: %%  In order to obtain the

375: %% variations of $X_i$ between the two states, one can simply integrate

376: %% the differential of $\Phi_i$ along any smooth path included in

377: %% $M_i$.

378:

379:

380:

381: %Concentrations are always positive and  bounded. We can assume that

382: %they evolve in a closed  bounded hypercube and that  $\DP{F_i}{X_i}

383: %<K<0$ on this domain.

384:

385:

386:

387:

388:

389:

390:

391: %For small perturbations of the parameters, the variation of

392: %concentration of a node $i$ can be approximated by linearizing

393: %Eq. \ref{math:steadystate}:

394:

395: %\begin{equation}

396: %\sum_{j\in V} \DP{F_i}{X_j} \delta X_j +  \sum_{k} \DP{F_i}{P_k}

397: %\delta P_k   = 0.

398: %\end{equation}

399:

400: %Additionaly, we assume that parameters do

401: %not directly influence all variables. Mathematically we assume

402: %$\DP{F_i}{P_k} =0$, except for input nodes. In this case the sum

403: %$\sum_{k} \DP{F_i}{P_k}\delta P_k$ represents the effect of the

404: %experimental settings on the system.

405: %Using the above assumption that $\DP{F_i}{X_i} < 0$ we obtain:

406: %\begin{equation}

407: %\label{math:quant}

408: % \delta X_i = -\left( \DP{F_i}{X_i} \right)^{-1} \sum_{k \in

409: %\pred(i)} \DP{F_i}{X_k} \delta X_k

410: %\end{equation}

411:

412: %As mentioned earlier, quantitative information on the Jacobian,

413: %or on variations $\delta X_k$ is generally not available.

414: %Nevertheless, the sign of these values is known, most often, and can

415: %be used for reasoning.

416:

417:

418:

419: \subsection{Qualitative equations}

420: We propose here to study Eq. \ref{math:eq:var} in sign algebra.

421: By sign algebra, we mean the set $\{\plus,\moins, \indet \}$, where

422: $\indet$ represents undetermined sign. This set is provided with the

423: natural  commutative operations:

424: \[

425: \begin{array}[c]{llllll}

426: \plus+\moins = \indet & \plus+\plus = \plus &\moins+\moins = \moins &

427: \plus\times\moins = \moins & \plus\times\plus = \plus & \moins\times\moins = \plus \\

428:

429: \indet+\moins = \indet & \indet+\plus = \indet & \indet + \indet = \indet  &

430: \indet \times \moins = \indet & \indet \times \plus = \indet & \indet

431: \times \indet = \indet \\

432: \end{array}

433: \]

434:

435: Equality in sign algebra $\approx$ is defined as follows:

436: \[

437: \begin{array}{c|c|c|c}

438: \approx & \plus & \moins & \indet \\

439: \hline

440: \plus & T & F & T \\

441: \hline

442: \moins & F & T & T \\

443: \hline

444: \indet & T & T & T \\

445: \end{array}

446: \]

447:

448:

449: Importantly, qualitative equality is not an equivalence relation,

450: since it is not

451: transitive. This implies that computations in qualitative algebra must

452: be carried with care. At least two major properties should be

453: emphasized:

454: \begin{itemize}

455: \item if a term of a sum is indeterminate ($\indet$) then the whole

456:   sum is indeterminate.

457: \item if one hand of a qualitative equality is indeterminate, then

458:   the equality is satisfied whatever the value of the other hand is.

459: \end{itemize}

460:

461: A \emph{qualitative system} is a set of algebraic equations with

462: variables in $\{\plus,\moins, \indet \}$. A \emph{solution} of this

463: system is a valuation of the unknowns which satisfies each equation,

464: and such that no variable is instantiated to \indet. This

465: last requirement is important since otherwise any system would have

466: trivial solutions (like all variables to \indet).

467:

468:

469: %% Eq \ref{math:diffphi} is established for small changes in the

470: %% parameters. However, Eq \ref{math:var} shows that the sign of $X_i$

471: %% variation can be deduced from the sign of the variations of the other

472: %% variables between two equilibrium points:

473:

474:

475: \begin{Theorem}

476: Under the assumptions and notations of Theorem \ref{math:var}, if the

477: sign of

478: $\DP{F_i}{X_j}$ is constant, then the following relation holds in sign

479: algebra:

480: \begin{equation}

481:  s(\Delta X_i) \approx

482: \sum_{k \in pred(i)}  s(k,i) s(\Delta X_k) \label{math:qual}

483: \end{equation}

484: where $s(\Delta X_k)$ denotes the sign of $X_{eq_k}^1 - X_{eq_k}^2$.

485: \end{Theorem}

486:

487: By writing Eq. \ref{math:qual} for all nodes in the graph, we

488: obtain a system of equations on signs of variations, later referred to as

489: \emph{qualitative system} associated to the interaction graph $G$. This will

490: be used extensively in the next sections.

491:

492:

493:

494: \subsection{Link between qualitative and quantitative}

495: \label{sec:link_qua_quant}

496:

497: The qualitative system obtained from Eq.\ref{math:qual} is a consequence of

498: the quantitative relations that result from Theorem \ref{math:var}.

499: So the sign function maps a quantitative

500: variation between two equilibrium points onto a

501: qualitative solution of Eq.\ref{math:qual}.

502: The converse is not true in general. For a given solution $S$ of the

503: qualitative system, there might be no equilibrium change  $\Delta X$

504: in the differential quantitative model, s.t. each real-valued

505: component of $\Delta X$ has the sign given by $S$.

506:

507: However, some

508: components of the solution vectors are uniquely determined by the

509: qualitative system. They take the same sign

510: value in every solution vector. For such so-called hard components,

511: the sign of any quantitative solution (if it exists) is

512: completely determined by the qualitative system.

513:

514: We will use the previous  properties to check the coherence between

515: models and experimental data. By experimental data we mean the sign of

516: the observed variation in concentration for some nodes. In particular,

517: if the qualitative system associated to an interaction graph $G$ has

518: no solution given some experimental observations, then no function $F$

519: satisfying the sign conditions on the derivatives can describe the

520: observed equilibrium shift, meaning that either

521: the model is wrong, either some data are corrupted. In the next

522: section, we introduce a simplified model related to lipid

523: metabolism, and illustrate the above described formalism.

524:

525:

526: \section{Toy example: regulation of the synthesis of fatty acids}

527: \label{sec:we}

528:

529:

530: In order to illustrate our approach, we use a toy example describing a

531: simplified model of genetic

532: regulation of fatty acid synthesis in liver. The corresponding

533: interaction graph is shown in Fig. \ref{igraph2}.

534:

535:

536: Two ways of production of fatty acids coexist in liver. Saturated

537: and mono-unsaturated fatty acids are produced from citrates thanks

538: to a metabolic pathway composed of four enzymes, namely ACL (ATP

539: citrate liase), ACC (acetyl-Coenzyme A carboxylase), FAS (fatty

540: acid synthase) and SCD1 (Stearoyl-CoA desaturase 1).

541: Polyunsaturated fatty acids (PUFA) such as arachidonic acid  and

542: docosahexaenoic acid are synthesized from essential fatty acids

543: provided by nutrition;  D5D (Delta-5 Desaturase) and D6D (Delta-6

544: Desaturase) catalyze the key steps of the synthesis of PUFA.

545:

546: PUFA  plays pivotal roles in many biological functions; among them, they

547:  regulate the expression  of genes  that impact on lipid,

548:  carbohydrate, and protein metabolism.

549: The effects of PUFA are mediated either directly through their

550: specific binding to various nuclear receptors (PPAR$\alpha$ --

551: peroxisome proliferator activated receptors, LXR$\alpha$ --

552: Liver-X-Receptor $\alpha$, HNF-4$\alpha$) leading to changes in the

553: trans-activating activity of these transcription factors; or

554: indirectly as the result of changes in the abundance of regulatory

555: transcription factors (SREBP-1c -- sterol regulatory element

556: binding-protein--, ChREBP, etc.) \cite{Jump}.

557:

558:

559: \begin{figure*}[hbt]

560: \begin{center}

561: \includegraphics[width=12cm]{graphe_exemple3.eps}

562: \caption{Interaction graph for the toy model.

563: Self-regulation loops on nodes are omitted for sake of clarity.

564: Observed variations are depicted next to each vertex, when available.}

565: \label{igraph2}

566: \end{center}

567: \end{figure*}

568:

569: \paragraph{Variables in the model}

570:

571: We consider in our model nuclear receptors PPAR$\alpha$, LXR$\alpha$,

572: SREBP-1c (denoted by PPAR, LXR, SREBP respectively in the model), as

573: they are synthesized from the corresponding genes and the

574: trans-activating active forms of these transcription factors, that is,

575: LXR-a (denoting a complex LXR$\alpha$:RXR$\alpha$), PPAR-a (denoting a

576: complex PPAR$\alpha$:RXR$\alpha$) and SREBP-a (denoting the cleaved

577: form of SREBP-1c. We also consider SCAP -- (SREBP cleavage activating

578: protein), a key enzyme involved in the cleavage of SREBP-1c, that

579: interacts with another family of proteins called INSIG (showing the

580: complexity of molecular mechanism).

581:

582: We also include in the model ``final'' products, that is, enzymes ACL,

583:  ACC, FAS, SCD1 (implied in the fatty acid synthesis from citrate),

584:  D5D, D6D (implied in PUFA synthesis) as well as PUFA themselves.

585:

586: \paragraph{Interactions in the model}

587: Relations between the variables are the following. SREBP-a is an

588: activator of the transcription of ACL, ACC, FAS,

589: SCD1, D5D and D6D \cite{Nara,Jump}.

590: LXR-a is a direct activator of the transcription of SREBP and FAS,

591: it also indirectly activates  ACL, ACC and SCD1

592: \cite{StefGus04}. Notice that

593: these indirect actions are kept in the model  because we don't know

594: whether they are only SREBP-mediated.

595:

596: PUFA activates the formation of PPAR-a from PPAR, and inhibits the formation

597: of LXR-a from LXR as well as the formation of  SREBP-a (by

598: inducing the degradation of mRNA and inhibiting the cleavage)

599: \cite{Jump}. SCAP represents the activators of the formation

600: of SREBP-a from SREBP, and is inhibited by PUFA.

601:

602: PPAR directly activates the production of  SCD1, D5D, D6D

603: \cite{Miller,Tang,Takashi}. The dual regulation of

604: SCD1, D5D and D6D by SREBP and PPAR is

605: paradoxical because SREBP transactivates genes for fatty acid

606: synthesis in liver, while PPAR induces enzymes for fatty acid

607: oxidation.  \\ \noindent Hence,  the induction of D5D and D6D gene by PPAR

608: appears to be

609: a compensatory response to the  increased PUFA demand caused by

610: induction of fatty acid oxidation.

611:

612:

613: \paragraph{Fasting-refeeding protocols}

614: The  fasting-refeeding

615: protocols represent a favorable condition for studying

616: lipogenesis regulation;  we suppose that

617: during an experimentation, animals (as rodents or chicken) were kept

618: in a fasted state during several hours. Then, hepatic mRNA of LXR,

619: SREBP, PPAR, ACL, FAS, ACC and SCD1

620: are quantified by  DNA microarray analysis. Biochemical measures

621: also provide the variation of PUFA.

622:

623:  A compilation of recent

624: literature on lipogenesis regulation provides hypothetical results of

625: such  protocols: SREBP, ACL, ACC, FAS and SCD1 decline

626: in liver during the fasted state \cite{Liang2002}. This is expected

627: because fasting results in an inhibition of fatty acid synthesis and an

628: activation of the fatty acid oxidation. For the same reason, PPAR is

629: increased in order to trigger oxidation. However, Tobin et al

630: (\cite{Tobin2000}) showed that fasting rats for 24h increased the

631: hepatic LXR mRNA, although LXR positively regulates fatty acid

632: synthesis in its activated form. Finally, PUFA levels can be

633: considered to be increased

634: in liver following  starvation because of the important lipolysis

635: from adipose tissue as shown by Lee et al in mice after 72h

636: fasting (\cite{Lee}).

637:

638:

639:

640:

641: \paragraph{Qualitative system derived from the graph}

642: As explained in the previous section, we derive a qualitative system

643: from the interaction graph shown in Fig. \ref{igraph2}. For ease of

644: presentation, we denote by \verb|A| the sign of variation for species

645: A.

646:

647:

648: {\small

649: \begin{minipage}{.7\linewidth}

650: \begin{System}

651: \begin{verbatim}

652: (1)  PPAR-a  = PPAR + PUFA

653: (2)  LXR-a   = -PUFA + LXR

654: (3)  SREBP   = LXR-a

655: (4)  SREBP-a = SREBP + SCAP -PUFA

656: (5)  ACL     = LXR-a + SREBP-a - PUFA

657: (6)  ACC     = LXR-a + SREBP-a - PUFA

658: (7)  FAS     = LXR-a + SREBP-a - PUFA

659: (8)  SCD1    = LXR-a  + SREBP-a - PUFA + PPAR-a

660: (9)  SCAP    = -PUFA

661: (10) D5D     = PPAR-a + SREBP-a - PUFA

662: (11) D6D     = PPAR-a + SREBP-a - PUFA

663: \end{verbatim}

664: \label{system2}

665: \end{System}

666: \end{minipage}

667: \begin{minipage}{.3\linewidth}

668: \begin{Observations}

669: \begin{verbatim}

670:   PPAR    = +

671:   PUFA    = +

672:   LXR     = +

673:   SREBP   = -

674:   ACL     = -

675:   ACC     = -

676:   FAS     = -

677:   SCD1    = -

678: \end{verbatim}

679: \end{Observations}

680: \end{minipage}

681: }

682:

683:

684: In the next section, we propose an efficient representation for such

685: qualitative systems.

686:

687:

688: \section{Analysis of qualitative equations: a new approach}

689: \label{sec:eff_qua}

690:

691: \subsection{Resolution of qualitative systems}

692: \label{sec:res_qua}

693:

694: The resolution of (even linear) qualitative systems is a NP-complete

695: problem (see for instance \cite{Trave,Dormoy88}). One can show this by

696: reducing the satisfiability problem for a finite set of clauses to the

697: resolution of a qualitative system

698: in polynomial time.

699:

700: Let us consider a collection $C=\{c_1, \ldots , c_n \}$ of clauses on

701: a finite set $V$ of variables. Let $\{ \plus ,\moins ,\indet \}$ a sign qualitative

702: algebra. In order to reduce the satisfiability problem to the

703: resolution of a qualitative system, let us code $true$ into $+$ and

704: $false$ into $-$. If $c$ is a clause, let us denote by $\bar{c}$ the

705: encoding of $c$ in a qualitative algebra formula. The following encoding scheme

706: provides a polynomial procedure to code a clause into a qualitative

707: formula. :

708: $$

709: \begin{array}{ccc}

710: \mbox{clause}& & \mbox{sign algebra}\\

711: \hline

712: a \in V & \rightarrow & \bar{a}\\

713: c_1 \vee c_2  & \rightarrow & \bar{c_1} + \bar{c_2}\\

714: \lnot c & \rightarrow & - \bar{c}

715: \end{array}

716: $$

717:

718: The satisfiability problem for the set of clauses $C$ is then reduced

719: to finding a solution of the qualitative system:

720: \[

721: \{ \bar{c_i} \approx \plus \ /\ i=1, \ldots , n \}

722: \]

723: So a NP-complete problem can be reduced to the resolution of

724: a qualitative system in polynomial time (with respect to the size of

725: the problem). This shows that solving qualitative systems is a

726: NP-complete problem.

727: For example, the only pair of values which are

728: not solution of $-\bar{a} + \bar{b}\ \approx\ +$ are

729: $(+,-)$. This corresponds to the only pair $(true, false)$ that does not

730: satisfy $\lnot a \vee b$.

731:

732: Several heuristics were proposed for the resolution of qualitative

733: systems. For linear systems, set of rules have been

734: designed \cite{Dormoy88}. This set is complete: it allows to find every

735: solution. It is also sound: every solution found by applying these

736: rules is correct. The rules  are based on an adaptation of Gaussian

737: elimination. However only heuristics

738: exist for choosing the equation and the rule to apply on it. In case

739: of a dead-end, when no more rule can apply, it is necessary to backtrack

740: to the last decision made. As a result programs implementing

741: qualitative resolution are not very

742: efficient in general and only problems of small size can be resolved

743: in reasonable time. For that reason we propose an alternate way to

744: solve qualitative systems (linear or not).

745:

746:

747: \subsection{Qualitative equation coding}

748: \label{sec:eq_coding}

749:

750: Our method is based on a coding of qualitative equations as

751: algebraic equations over Galois fields ${\mathbb Z}/p{\mathbb Z}$

752: where $p$ is a prime number greater than 2. The elements of these

753: fields are the classes

754: modulo $p$ of the integers. If $\bar{x}$ denotes the class of the

755: integer $x$ modulo $p$, a sum and a product are defined on ${\mathbb

756: Z}/p{\mathbb Z}$ as follows:

757: $$

758: \bar{x} + \bar{y}  =  \overline{x+y} \qquad \bar{x} \times

759: \bar{y}  =  \overline{x \times y}

760: $$

761:

762: Galois fields have two basic properties which we use extensively:

763: \begin{itemize}

764: \item Every function $f:({\mathbb Z}/p{\mathbb Z})^n \rightarrow

765:       {\mathbb Z}/p{\mathbb Z}$ with  $n$

766:   arguments  ${\mathbb Z}/p{\mathbb Z}$ is a polynomial function

767: \item if $\oplus$ denotes the operation $f \oplus g = f^{(p-1)} +

768:       g^{(p-1)}$, then every equation system  $p_1(X)=0, \ldots ,

769:       p_k(X)=0$ has the same solutions than the unique equation $p_1\oplus p_2 \oplus

770:   \ldots \oplus p_k (X) = 0$.

771: \end{itemize}

772:

773:

774:

775:

776:

777: The following table specifies how the sign algebra

778: $\{\plus,\moins,\indet\}$ is mapped onto the Galois field with three

779: elements ${\mathbb Z}/3{\mathbb Z}$ is used for that coding.

780: $$

781: \begin{array}{ccc}

782: \mbox{sign algebra}& & \mathbb{Z}/3{\mathbb Z}\\

783: \hline

784: \plus  & \rightarrow &  1\\

785: \moins & \rightarrow & -1\\

786: \indet & \rightarrow &  0\\

787: \end{array}

788: \qquad \qquad \qquad

789: \begin{array}{ccc}

790: \mbox{sign algebra}& & \mathbb{Z}/3{\mathbb Z}\\

791: \hline

792: e_1 + e_2 & \rightarrow & - \overline{e_1}.\overline{e_2}.(\overline{e_1} + \overline{e_2})\\

793: e_1 \times e_2 & \rightarrow & \overline{e_1}.\overline{e_2}\\

794: e_1 \approx e_2 & \rightarrow & \overline{e_1}.\overline{e_2}.(\overline{e_1} - \overline{e_2})

795: \end{array}

796: $$

797:

798: Finally a qualitative system $\{ e_1,\dots,e_n \}$ is coded as the

799: polynomial $\overline{e_1} \oplus \dots \oplus \overline{e_n}$.

800: A similar coding for the qualitative

801: algebra $\{\plus,\moins,\zero,\indet\}$ uses the Galois field

802: ${\mathbb Z}/5{\mathbb Z}$ and will not be presented here.

803:

804: With this coding, every qualitative system has a solution if and only

805: if the corresponding polynomial has a solution without null

806: component. Null solutions are excluded since $\indet$ solutions are

807: excluded for qualitative systems. In general we will have to add

808: polynomial equations $X^2 = 1$ to insure this.

809:

810:

811: \subsection{An efficient representation of polynomial functions}

812: \label{sec:rep_func}

813:

814: Recall that our purpose is to efficiently solve a NP-complete

815: problem. There is no hope to find a representation of polynomial

816: functions allowing to solve polynomial systems of equations in

817: polynomial time. The coding of a qualitative system as a

818: polynomial equation is obviously polynomial in the size of the system

819: (number of variables plus number of equations). So finding the

820: solution of a polynomial system of equations is itself a NP-complete

821: problem. It is more or less the SAT problem.

822:

823: Nevertheless, there exists a representation of polynomial functions on

824: Galois fields which gives, in practice, good performances for

825: polynomials with hundreds of variables. This kind of representation

826: was first used for logical functions which may be considered as

827: polynomial functions over the field ${\mathbb Z}/2{\mathbb Z}$. This

828: representation is known as BDD (Binary Decision Diagrams) and is

829: widely used in checking logical circuits \cite{BDD} and in model checkers

830: as nu-SMV \cite{SMV}.

831:

832: We present here this representation for the field ${\mathbb

833: Z}/3{\mathbb Z}$. Generalizations to other Galois fields

834: could be treated as well. The starting point is a generalization of Shannon

835: decomposition for logical functions:

836: \[

837: p(X_1,X) = (1-X_1^2)p_{[X_1=0]}(X)+X_1(-X_1-X_1^2)p_{[X_1=1]}(X) +

838: X_1(X_1-X_1^2)p_{[X_1=2]}(X)

839: \]

840: where $p$ is a polynomial function with $n$ variables. This

841: decomposition leads to a tree representation of the polynomial

842: function: the variable $X_1$ is the root and has three children. Each

843: of these is obtained by instantiating $X_1$ to -1, 0 or 1 in

844: $p(X_1,X)$. This representation is exponential ($3^n$) as each non

845: constant node has 3 children. It also depends on a chosen order on

846: the variables.

847:

848: Then a key observation (see \cite{BDD}), is that several

849: subtrees are identical. They have the same variable as root variable

850: and isomorphic children. If we decide to represent only once each type

851: of tree, then the tree representation is transformed into  a direct

852: acyclic graph. With this representation there is no more redundancy

853: among subtrees. The result may be a dramatic decrease in the size of

854: the representation of a polynomial function.

855:

856: \begin{figure*}[hbt]

857: \begin{center}

858: \includegraphics[width=12cm]{tdd.eps}

859: \caption{From tree representation to direct acyclic graph for $X^2

860: (Y+1)$. The tree has 13 nodes while the DAG representing the same

861: function has 5 nodes.}\label{itree-dag}

862: \end{center}

863: \end{figure*}

864:

865: A property of the Shannon like decomposition is that many

866: operations on polynomial functions are recursive with respect to this

867: decomposition. More precisely let

868: \[

869: p^i(X_1,X) = (1-X_1^2)p^i_0(X)+X_1(-X_1-X_1^2)p^i_1(X) +

870: X_1(X_1-X_1^2)p^i_2(X)

871: \]

872: $i=1,2$ be two polynomial functions with $p_{\alpha}(X) = p_{[X_1 =

873: \alpha]}(X)\ ,\ \alpha = 0,1,2$. Then for binary operations $\Delta$

874: on polynomial  functions,

875: \[

876: p^1 \Delta p^2 = (1-X_1^2) (p^1_0 \Delta p^2_0 ) + X_1(-X_1-X_1^2)

877: (p^1_1 \Delta p^2_1 )+  X_1(X_1-X_1^2) ( p^1_2 \Delta p^2_2 )

878: \]

879: This kind of recursive formula leads to an exponential

880: complexity of any computation. Again, it is possible to take advantage

881: of the redundancy by using a cache to remember each operation. This

882: technique is known as

883: memoisation in formal calculus. A  40\% cache hit rate is commonly observed.

884:

885: More complex operations on polynomial functions are also implemented

886: with a recursive scheme and memoisation. Let us just mention

887: quantifier elimination as among the most useful for our

888: purpose.

889:

890: This representation of polynomial functions on Galois fields has

891: also several drawbacks:

892: \begin{itemize}

893: \item the memory size heavily depends on the order of variables. The

894:       libraries implementing formal computations always have reordering

895:       algorithms.

896:     \item for each order, there exists polynomial functions which are

897:           exponential in memory size.

898: \end{itemize}

899:

900: Nevertheless, in practice, this representation has proved to be very

901: efficient for polynomial functions with several hundred of

902: variables. The computations performed on our toy model and on another

903: real size one

904: used a program named SIGALI which is devoted to polynomial functions on

905: ${\mathbb Z}/3{\mathbb Z}$ representation. Several algorithms were

906: added to this program in order to answer questions of biological

907: interest.

908:

909: \section{Qualitative models and experimental data}

910: \label{sec:qua_exp}

911: In this section, we show how to compute some properties of a

912: qualitative system, and eventually get some insights on the biological

913: model it represents. The algorithms we derive heavily rely on the

914: representation introduced above. Hence, not only they can deal in

915: practice with computationally hard problems efficiently, but also they

916: are expressed in a rather simple and generic fashion.

917:

918: Let $M$ be a qualitative model represented by its associated

919: interaction graph $G(V,E)$. Recall that $V$ is the set of

920: variables. Let $V_O$ be the set of observed variables, and

921: $o_i \in \{ \plus,\moins \}$ for $i \in V_O$ the experimental

922: observations. As explained in the previous section, the

923: qualitative system derived from $M$ can be coded as a polynomial

924: function $P_M(X_1,\ldots,X_n)$.  Roots of $P_M$ correspond to

925: solutions of the qualitative system.

926:

927: \subsection{Satisfiability of the qualitative system}

928: \label{sec:data_agree}

929:

930: A property of the coding described above, is that the system has no

931: solution iff $P_M$ is equal to the constant polynomial

932: $1$. Alternatively if $P_M=0$, the qualitative equations do not

933: constraint the variables at all.

934:

935: Now if some observations $o_i$ for $i \in V_O$ are available, checking

936: their consistency with the model $M$ boils down to instantiating

937: $X_i = o_i$  in $P_M(X_1,\ldots,X_n)$, for all $i \in V_O$, and

938: testing whether the resulting polynomial is different from 1.

939:

940: We computed the polynomial $P_L$ associated to our toy example (see section

941: \ref{sec:we}) and it has roots. Recall that it does not guarantee the

942: existence of some (quantitative) differential model conforming to the

943: interaction graph depicted in Fig. \ref{igraph2}. Satisfiability of

944: the qualitative system is only a necessary condition for the model to

945: be correct.

946:

947:

948: %% As explained in section \ref{sec:we}, our toy example

949: %% focus on a fasting experiment. From the literature, we derive the

950: %% following hypothetical:

951:

952: %% \begin{Observations}

953: %% \begin{verbatim}

954: %% ACL   = -1                      FAS  = -1

955: %% ACC   = -1                      SCD1 = -1

956: %% SREBP = -1                      PPAR =  1

957: %% LXR   =  1                      PUFA =  1

958: %% \end{verbatim}

959: %% \end{Observations}

960:

961: The polynomial obtained by instantiating observations into $P_L$

962: is different from 1, meaning that our model does not contradict

963: generally observed variations during fasting.

964:

965: Large size models might advantageously be reduced

966: using standard graph techniques.

967: First we look for connected components in the interaction graph. A

968: graph with several

969: connected components represents a coherent qualitative model iff

970: each component is coherent.

971: Second, a node without successor except itself appears only

972: in its associated equation. If this node is not observed, its

973: associated qualitative equation adds no constraint on the other

974: nodes. So, at least for satisfiability checking, this node can be

975: suppressed and its qualitative equation removed from the system. This

976: procedure is applied iteratively, until no node can be deleted.

977: The resulting graph leads to a new qualitative system which is

978: satisfiable iff the initial system is satisfiable.

979:

980:

981: \subsection{Correcting data or model}

982:

983:

984: If the qualitative system, given some experimental observations, is

985: found to have no solution, it is of interest to propose some

986: correction of the data and/or the model. By correction, we mean

987: inverting the sign of an observed variable or the sign of an edge of

988: the interaction graph. In

989: the general case, there are several possibilities to make the system

990: satisfiable, and we need some criterion to choose among them. We

991: applied a parsimony principle: a correction of the data should imply a

992: minimal number of sign inversions.

993:

994: In the following, we show how to compute all minimal corrections for the

995: data. Given $(o_i)_{i \in V_O}$ a vector of experimental observations

996: which is not compatible with the model, we compute all

997: $(o'_i)_{i \in V_O}$ vectors which are compatible with the data and

998: such that the Hamming distance between $o$ and $o'$ is minimal. By

999: Hamming distance, we mean the number of differences between $o$ and

1000: $o'$. The set of such $o'$ vectors might be very large; but again, by

1001: encoding it as the set of roots of a polynomial function,  we obtain a

1002: compact representation.

1003:

1004:

1005: This procedure can be extended in a straightforward manner to

1006: corrections of edges sign in the interaction graph. This is done by

1007: considering these signs as variables of the model. For ease of

1008: presentation, we only detail data correction.

1009:

1010:

1011:

1012: \begin{algorithm}

1013: {

1014:   \small

1015:   \dontprintsemicolon

1016:   \KwIn{\\

1017:     \qquad $P$, a polynomial function on variables $V$ \\

1018:     \qquad $i \in V$

1019:   }

1020:   \KwOut{\\

1021:     \qquad $C$, a polynomial function encoding all minimal corrections\\

1022:     \qquad $d$, minimal number of corrections

1023:   }

1024:   \BlankLine

1025:   \eIf{$P$ is constant}{

1026:     \eIf{$P$ = 0}

1027:    {\KwResult{$C=0$, $d=0$}}

1028:    {\KwResult{$C=1$, $d= \infty$}}

1029:   }{

1030:     let $P_0$, $P_1$, $P_2$ be the Shannon decomposition of $P$ with respect to

1031:     variable $X_i$,\\

1032:     and $(C_j,d_j)$ the result obtained by recursively applying the

1033:     algorithm on $P_j$ and $i+1$ for $j \in \{0,1,2\}$

1034:     \BlankLine

1035:     let $d'_j =

1036:     \left\{

1037:     \begin{array}{ll}

1038:       d_j + 1 & \mbox{if } i \in V_O \mbox{ and } o_i \neq j\\

1039:       d_j     & \mbox{otherwise}

1040:     \end{array}

1041:     \right.$

1042:      and $C'_j =

1043:     \left\{

1044:     \begin{array}{ll}

1045:       (X_i - j) \oplus C_j & \mbox{if } i \in V_O \\

1046:       C_j     & \mbox{otherwise}

1047:     \end{array}

1048:     \right.$\\

1049:      \BlankLine

1050:

1051:      \KwResult{$d = \min\ d'_j$, $C = \displaystyle\prod_{j,\ d'_j = d} C'_j$}

1052:   }

1053: }

1054: \caption{Algorithm for experimental data correction.}

1055: \label{algo:hamming}

1056: \end{algorithm}

1057:

1058: Let us illustrate this algorithm on our toy example: during fasting

1059: experiments, synthesis of fatty acids tends to be inhibited, while

1060: oxidation, which produces ATP, is activated. In particular ACC, ACL,

1061: FAS and SCD1 are implied in the same pathway to produce saturated and

1062: monounsaturated fatty acids. Expectedly, they are known to decline

1063: together at fasting. Suppose we introduce some wrong observation, say

1064: for instance an increase of ACL, while keeping all other

1065: observations given above. The polynomial obtained from $P_L$ including

1066: these new observations is equal to 1, and hence has no solution.

1067: Applying algorithm \ref{algo:hamming}, we recover this error. Now if

1068: we wrongly change two values, say ACL and ACC to 1, the algorithm

1069: proposes a different correction, namely to change the observed value

1070: of SREBP to 1, which is more parsimonious.

1071:

1072: \subsection{Experiment design}

1073: \label{sec:exp_des}

1074:

1075: It is often the case that not all variables in the system under study

1076: can be observed. Biochemical measurements of metabolites can be costly

1077: and/or time consuming. By experiment design, we mean here the choice

1078: of the variables to observe so that an experiment might be

1079: informative.

1080:

1081: Let $P_M(X_O,X_U)$ be the polynomial function coding for the

1082: qualitative system $M$. $X_O$ (resp. $X_U$) denotes the state vector

1083: of observed (resp. unobserved) variables. The polynomial function

1084: representing the admissible values of the observed variables is

1085: obtained by elimination of the quantifier in

1086: $\exists X_U \  P_M(X_O,X_U)$. Let $P_M^O(X_O)$ denote the  resulting

1087: polynomial function.

1088:

1089: For some choice of observed variables, it might well be that $P_M^O$

1090: is null, which basically means that the experiment is totally useless.

1091: Remark that no improvement can be found by taking a subset of

1092: $X_O$  The solution is either to add new observed variables or to

1093: chose a completely different set of observed variables.

1094:

1095: In order to assess the relevance of a given experiment (namely of a

1096: given observed subset), we suggest to compute the following ratio:

1097: number of consistent valuations for observed variables versus the

1098: total number

1099: of valuations of observed variables. A very stringent experiment

1100: has a low ratio. An experiment having a ratio value of one is useless.

1101:

1102: Again this computation is carried out in a recursive fashion. Let $P$

1103: be a polynomial function representing the

1104: set of admissible observed values. Let $Rat(p)$ the percentage of

1105: solutions of $P(X)=0$ in the

1106: space $({\mathbb Z}/p{\mathbb Z})^n $, where $n$ is the number of

1107: variables $X$. If $P$ is constant then $Rat(P)=1$

1108: (resp. $Rat(P)=0$) if $P=0$ (resp. $P \neq 0$). Else, let $P_1$,

1109: $P_2$, $P_3$ be a Shannon like decomposition of $P(X)$

1110: with respect to some variable of $P$. Then it is easy to prove:

1111: \[

1112: Rat(P)\ =\ (Rat(P_0)+Rat(P_1)+Rat(P_2))/3

1113: \]

1114:

1115: The relevance of this approach was assessed on our toy example: for

1116: each subset $O$ of variables in the model, containing at most four

1117: variables, we computed $Rat(P_L^O)$. Expectedly, the lowest ratios

1118: (i.e. the most stringent experiments) were achieved observing four

1119: variables: either \{SCAP, PUFA, PPAR-a, PPAR\}, or

1120: \{SREBP, SCAP, PUFA, LXR-a\}, or \{SREBP, PPAR-a, PPAR, LXR-a\}.

1121:

1122: Interestingly, the procedure captures what might be though of as

1123: control variables, like PUFA/SCAP, SREBP/LXR-a and PPAR/PPAR-a.

1124: The first two pairs control the activation of fatty acids synthesis;

1125: the third one controls fatty acid oxidation.

1126:

1127: Indeed one can go even further: if we isolate some kind of control

1128: variables, we are naturally interested in knowing how they constrain

1129: other variables. Achieving this amounts to computing the set of

1130: variables which value is constant for all solutions of the system (the

1131: so called hard components). Recall that

1132: these hard components of qualitative solutions are also important with

1133: respect to the hypothetical differential model which is abstracted in

1134: the qualitative one. Indeed, all solutions of the quantitative equation for

1135: equilibrium change have the same sign pattern on the hard components.

1136: Algorithm \ref{algo:hardcomponents} describes a recursive procedure

1137: which finds the set of hard components, together with their value.

1138:

1139: \begin{algorithm}

1140: {

1141:   \small

1142:   \dontprintsemicolon

1143:   \KwIn{$P$, a polynomial function on variables $V$}

1144:   \KwOut{\\

1145:     \qquad the set $W \subset V \times \{0,1,2\}$ of hard components,

1146:     together with their values\\

1147:     \qquad a boolean $b$ which is true if $P$ has at least one root

1148:   }

1149:   \BlankLine

1150:   \eIf{$P$ is constant}{

1151:     \eIf{$P$ = 0}

1152:    {\Return{$(\emptyset,\mbox{true})$}}

1153:    {\Return{$(\emptyset,\mbox{false})$}}

1154:   }{

1155:     let $P_0$, $P_1$, $P_2$ be the Shannon decomposition of $P$ with respect to

1156:     variable $X_i$,\\

1157:     and $(W_j,b_j)$ the result obtained by recursively applying the

1158:     algorithm on $P_j$ for $j \in \{0,1,2\}$

1159:     \BlankLine

1160:     let $W = \{ (v,v') | v \in V,\ v' \in \{0,1,2\},\ \forall j\ b_j \Rightarrow (v,v') \in W_j \}$\\

1161:     \If{there exists a unique $j_0$ s.t. $b_{j_0}$ is true}

1162:        {add $(i,j_0)$ to $W$}

1163:     \BlankLine

1164:     \Return{$(W,b_0 \vee b_1 \vee b_2)$}

1165:   }

1166: }

1167: \caption{Determination of hard components}

1168: \label{algo:hardcomponents}

1169: \end{algorithm}

1170:

1171: Let us set some of our previously found control variables of the toy example,

1172: to a given value, say PUFA to 1, and LXR to -1. Then applying the

1173: algorithm \ref{algo:hardcomponents}, the corresponding polynomial has the

1174: following hard components:

1175: \begin{verbatim}

1176: ACL     = -1             FAS   = -1

1177: ACC     = -1             LXR-a = -1

1178: SCAP    = -1             SREBP = -1

1179: SREBP-a = -1             PPAR  = -1

1180: PPAR-a  = -1

1181: \end{verbatim}

1182: which expectedly corresponds to the inhibition of fatty acids

1183: synthesis.

1184:

1185:

1186:

1187:

1188: \subsection{Real size system}

1189:  We have used our new technique to check the consistency of a

1190: database of molecular interactions involved in the genetic regulation of fatty

1191: acid synthesis.

1192: In the database, interactions were classified as behavioral or biochemical.

1193: \begin{itemize}

1194: \item a behavioral interaction describes the effects of a variation of

1195:       a product concentration. It is either direct or indirect (unknown

1196:       mechanism).

1197:     \item a biochemical interaction may be a gene transcription, a

1198:           reaction catalyzed by an enzyme ... Such molecular

1199:           interactions can be found in existing databases. They need

1200:           a behavioral interpretation.

1201: \end{itemize}

1202:  All the behavioral interactions were manually extracted from a

1203: selection of scientific papers. Biochemical interactions were

1204: extracted from  public databases available on the Web

1205: (Bind~\cite{Bind}, IntAct~\cite{IntAct}, Amaze~\cite{Amaze},

1206: KEGG~\cite{Kegg} or TransPath~\cite{TransPath}).

1207: A biochemical interaction may be linked to a behavioral interpretation

1208: in the database.

1209:

1210: The database is used to generate the interaction graph. While

1211: behavioral interactions directly correspond to edges in the graph,

1212: biochemical interactions are given a simplified

1213: interpretation. Roughly, any increase of a reaction input induces an

1214: increase of the outputs.

1215:

1216: The interaction graph which is built from the database contains more than 600

1217: vertices and more than 1400 edges. It is clear that even though, the

1218: obtained graph is not a comprehensive model of genetic

1219: regulation of fatty acid synthesis in liver. Anyway our aim is to see

1220: how far this model can account for experimental observations, and

1221: propose some corrections when it cannot.

1222:

1223: We used our technique to check the coherence of the whole model. After

1224: reducing the graph with standard graph techniques as described in

1225: section \ref{sec:data_agree}, we found that the model was incoherent. The reduced

1226: graph has about 150 nodes. We

1227: developed a heuristic to isolate minimal incoherent sub-systems. It

1228: turned out that all the contradictions we detected resulted from arguable

1229: interpretations of the literature.

1230:

1231:

1232: \section{Conclusion}

1233: \label{sec:conclusion}

1234:

1235: In this paper we proposed a qualitative approach for the analysis

1236: of large biological systems. We rely on a framework more thoroughly

1237: described in \cite{Biosystems}, which is meant to model the comparison

1238: between two experimental conditions as a steady state shift. This

1239: approach fits well with

1240: state of the art biological measurement techniques, which provide

1241: rather noisy data for a large amount of targets. It is also

1242: well suited to the use of biological knowledge, which is most of the time

1243: descriptive and qualitative.

1244:

1245: This qualitative approach is all the more attractive that we can rely

1246: on new analysis methods for qualitative systems. This new technique is

1247: also introduced in this paper and is original in

1248: qualitative modeling. It relies on a representation of qualitative

1249: constraints by decision diagrams. Not only this has a major impact on

1250: the scalability of qualitative reasoning, but it also permits to

1251: derive many algorithms in a quite generic fashion.

1252:

1253: We plan to validate our approach on pathways which are published for

1254: yeast and \emph{E.Coli}. Not only this pathways are of significant size but

1255: microarray data for this species are publicly available. Concerning

1256: the scalability of the methods, qualitative systems with up to 200

1257: variables are handled within a few minutes.

1258:

1259:

1260: On the theoretical side, we study applications of our algebraic

1261: techniques to network reconstruction, as proposed in \cite{Wagner04}.

1262: The problem is to infer direct actions between products, based on

1263: large scale perturbation data, in order to obtain the most

1264: parsimonious interaction graph. Our approach could lead to a

1265: reformulation of this problem in terms of polynomial operations.

1266: Indeed, finding a minimal regulation network from a minimal polynomial

1267: representation has already been described in

1268: \cite{Laubenbacher04}, though it was applied to a rather different

1269: type of network. A similar approach tailored to the framework

1270: described in this paper could eventually lead to original and

1271: practical algorithms for network reconstruction.

1272:

1273: \paragraph{Acknowledgment} This research was supported by ACI IMPBio, a French Ministry for Research program on interdisciplinarity.

1274:

1275:

1276: \bibliographystyle{plain}

1277: \bibliography{eccs}

1278: \end{document}

1279: