1: \section{Numerical Experiments.} \label{secnum}
2:
3: \subsection{Description of the numerical implementation of the DPLL algorithm.}
4:
5: We have implemented DPLL with the GUC rule, see Figure~\ref{algo} and
6: Section~\ref{heuri}, to have a fast unit
7: propagation and an inexpensive backtracking \cite{Cra}. The program is
8: divided in three parts. The first routine draws the
9: clauses and represents the data in a convenient structure. The second,
10: main routine updates and saves the state of the search, {\em i.e.} the
11: indices and values of assigned variables, to allow an
12: easy backtracking. Then, it checks if a solution is found; if not,
13: a new variable is assigned.
14: The third routine extracts the implication of the choice (propagation).
15: If unit clauses have been generated, the corresponding literals are
16: fixed, or a contradiction is detected.
17:
18: \subsubsection{Data Representation}
19:
20: Three arrays are used to encode the data: the two first arrays are labelled
21: by the clause number $m=1,\ldots, M$ and the number $b=1,\ldots, K$ of
22: the components in the clause (with $K=2,3$).
23: The entries of these arrays, initially drawn at random, are the indices
24: $clausnum\, (m,b)$ and the values $ clausval\,(m,b)$,
25: true or false, of the variables. The indices of the third array are
26: integers $i=1,\ldots ,N$ and $j=1,\ldots , o_i$ where $o_i$ is the number
27: of occurrences of $x_i$ in the clauses (from zero to $M$).
28: The entries of the matrix, $a\,(i,j)$, are the numbers of the corresponding
29: clauses (between 1 and $M$).
30:
31: \subsubsection{Updating of the search state.}
32:
33: If the third routine has found a contradiction, the second routine
34: goes back along the branch, inverts the last assigned variable
35: and calls again the third routine. If not, the descent along the branch
36: is pursued. A Boolean-valued vector points to
37: the assigned variables, while the values are stored
38: in another unidimensional array.
39: For each clause, we check if the variables are already assigned and, if so,
40: if they are in agreement or not with the clauses. When splitting occurs,
41: a new variable is fixed to satisfy a 2-clause, {\em i.e.} a clause with
42: one false literal and two unknown variables, and the third subroutine
43: is called. If there are only 3-clauses, a
44: new variable is fixed to satisfy any 3-clause and
45: the third subroutine is called. The variable
46: chosen and its value are stored in a vector with index the length of the
47: branch, {\em i.e.} the number of nodes it contains, to allow
48: future backtracking. If there are neither 2- nor 3-clauses left, a solution is
49: found.
50:
51: \subsubsection{Consequences of a choice and unit propagation}
52:
53: All clauses containing the last fixed variable
54: are analyzed by taking into account all possibilities: 1. the clause
55: is satisfied; 2. the clause is reduced to a 2- or 1-clause; 3. the
56: clause is violated (contradiction). In the second case, the 1-clause is stored
57: to be analyzed by unit-propagation once all clauses containing
58: the variable have been reviewed.
59:
60: \subsection{Characteristic running times}
61:
62: We have implemented the DPLL search algorithm in Fortran~77;
63: the algorithm runs on a Pentium II PC with a 433M Hz frequency clock.
64: The number of nodes added per minute ranges from
65: 300,000 (typically obtained for $\alpha=3.5$) to
66: 100,000 ($\alpha=10$) since unit propagation is more and more
67: frequent as $\alpha$ increases.
68: The order of magnitude of the computational time needed to solve an instance
69: are listed in Table~\ref{tabtemps} for ratios corresponding to hard instances.
70: These times limit the maximal size $N$ of the instances we have
71: experimentally studied and the number of samples over which we have
72: averaged. Some rare instances
73: may be much harder than the typical times indicated in Table~\ref{tabtemps}.
74: For instance, for $\alpha=3.1$ and $N=500$, instances
75: are usually solved in about 4 minutes but some samples required
76: more than 2 hours of computation. Such a phenomenon will be discussed
77: in Section~\ref{foc}.
78:
79: \subsection{Overview of experiments }
80:
81: \subsubsection{Number of nodes of the search tree}
82:
83: We have first investigated complexity by a direct count of
84: the number of splittings, that is the number of nodes (black
85: points) in Figure~\ref{tree}, for sat (Figure~\ref{tree}A,C) and
86: unsat (Figure~\ref{tree}B) trees.
87:
88: \subsubsection{Histogram of branch length.}
89:
90: We have also experimentally investigated the structure
91: of search trees for unsat instances (Figure~\ref{tree}B).
92: A branch is defined as a path joining the root (first
93: node on the top of the search tree) to a leaf marked
94: with a contradiction $C$ (or a solution $S$ for sat
95: instance) in Figure~\ref{tree}.
96: The length of a branch is the number of nodes it contains.
97: For an unsat instance, the complete search tree depends on
98: the variables chosen at each split, but not on the values they are assigned to.
99: Indeed, to prove that there is no solution, all nodes in the search tree
100: have to carry two outgoing branches,
101: corresponding to the two choices of the attached variables. What
102: choice is made first does not matter.
103: This simple remark will be of crucial importance in the theoretical analysis
104: of Section~\ref{paral}.
105:
106: We have derived the histogram of the branch lengths by counting
107: the number $B(l)$ of branches having length $l \,N$ once the tree
108: is entirely built up. The histogram is very useful to deduce the
109: complexity in the unsat phase, since in a complete tree the
110: total number of branches $B$ is related to the number of nodes $Q$
111: through the identity,
112: \begin{equation}
113: \label{eqnb}
114: B=\sum _{l=\frac 1N , \ldots , 1} B(l) = Q+1 \qquad .
115: \end{equation}
116: that can be inferred from Figure~\ref{tree}B.
117:
118: \subsubsection{Highest backtracking point} \label{high}
119:
120: Another key property we have focused upon is the
121: highest backtracking point in the search tree.
122: In the unsat phase, DPLL backtracks all the nodes of the tree since no
123: solution can be present. The highest backtracking point in the tree
124: simply coincides with the top (root) node. In the sat phase, the
125: situation is more involved. A solution generally requires some
126: backtracking and the highest backtracking node may be defined as the
127: closest node to the origin through which two branches pass, node $G$ on
128: Figure~\ref{tree}B. We experimentally keep trace of the
129: highest backtracking point by measuring the numbers $C_2 (G)$, $C_3(G)$
130: of 2- and 3-clauses, the number of not-yet-assigned variables
131: $N (G)$, and computing the coordinates
132: $p_G=C_3(G)/(C_2(G)+C_3(G)), \alpha_G=(C_2 (G)+C_3 (G))/N(G)$
133: of $G$ in the phase diagram of Figure~\ref{diag}.
134:
135: \subsection{Experimental Results}
136:
137: \subsubsection{Fluctuations of complexity.}
138: \label{foc}
139: The size of the search tree built by DPLL is a random variable, due to
140: the (quenched) randomness of the 3-SAT instance and the choices made
141: by the splitting rule (``thermal noise''). We show on
142: Figure~\ref{histoseuil} the distribution of the logarithms (in base 2, and
143: divided by $N$) of the number of nodes for different values
144: of $N$. The distributions are more and more peaked around their mean
145: values $\omega _N (\alpha )$ as the size $N$ increases. This indicates
146: that the logarithm of the complexity is a self-averaging quantity in
147: the thermodynamic limit. However, fluctuations are dramatically
148: different at low and large ratios. For $\alpha =10$, and more
149: generally in the unsat phase, the distributions are roughly
150: symmetric (Figure~\ref{histoseuil}A). Tails are small and
151: complexity does not fluctuate too much from sample to
152: sample\cite{Kirk2,Easy}. In the vicinity of $\alpha_L$, e.g. $\alpha
153: =3.1$, much bigger fluctuations are present. There are large tails on
154: the right flanks of the distributions on Figure~\ref{histoseuil}B, due
155: to the presence of rare and very hard samples \cite{Easy}. Complexity is not
156: self-averaging. We will come back to this point in
157: section~\ref{secmixed}.
158:
159: \subsubsection{The easy-hard-easy pattern.}
160:
161: We have averaged the logarithm of the number of nodes over
162: 10,000 randomly drawn instances to obtain $\omega _N (\alpha)$.
163: The typical size $Q$ of the search tree is simply $Q=2^{N \omega _N}$.
164: Results are shown in Figure~\ref{tempi}. An easy-hard-easy pattern of
165: complexity appears clearly as the ratio $\alpha$ varies.
166:
167: \begin{itemize}
168: \item At small ratios, complexity increases as $\gamma (\alpha ) N$,
169: that is, only linearly with the size of the instance.
170: DPLL easily finds a solution and the search tree
171: essentially reduces to a single branch shown on Figure~\ref{tree}A.
172: For the GUC heuristic, the linear regime extends up to $\alpha _L
173: \simeq 3.003$ \cite{Fra,Fri}.
174:
175: \item Above threshold, complexity grows exponentially
176: with $N$\cite{Chv}. The logarithm $\omega
177: (\alpha )$, limit of $\omega _N (\alpha)$ as $N \to \infty$, is maximal at
178: criticality...
179:
180: \item ... and decreases at large ratios as $1/\alpha$ \cite{Bea}.
181: The ``easy'' region on the right
182: hand side of Figure~\ref{tempi} is still exponential but with
183: small coefficients $\omega$.
184: \end{itemize}
185:
186: Of particular interest is the intermediate region $\alpha _L < \alpha
187: < \alpha _C$. We shall show that complexity is exponential in this
188: range of ratios, and that the search tree is a mixture of the
189: search trees taking place in the other ranges $\alpha < \alpha _L$
190: and $\alpha > \alpha _C$.
191:
192:
193: Let us mention that, while this paper is devoted to typical-case
194: (happening with probability one) complexity, rigorous results have
195: been obtained that apply to any instance. So far, using a refined
196: version of DPLL, any instance is guaranteed to be solved in less than
197: $1.504^N$ steps, {\em i.e.} $\omega < 0.588$. The reader is referred
198: to reference \cite{Kul} for this worst-case analysis.
199:
200: \subsubsection{Lower sat phase ($\alpha<\alpha_L$)}
201: \label{secnumlin}
202:
203: The complexity data of Figure~\ref{tempi} obtained for different
204: sizes $N=50, 75, 100$ are plotted again on Figure~\ref{constante} after
205: division by $N$. Data collapse on a single curve, proving that
206: complexity is linear in the regime $\alpha < \alpha _L$.
207: In the vicinity of the cross over ratio $\alpha =\alpha_L$
208: finite-size effects became important in this
209: region. We have thus performed extra simulations for larger sizes
210: $N=500, 1000$ in the range $2.5 \le \alpha \le 3.$ that confirm the
211: linear scaling of the complexity.
212:
213: \subsubsection{Unsat Phase ($\alpha>\alpha_C$)}
214:
215: Results for the shape of the search trees are shown in
216: Figure~\ref{histo}. We represent the logarithm $b(l)$, in base 2 and divided
217: by $N$, of the number $B(l)$ of branches as a function of the branch length
218: $l$, averaged over many samples and for different sizes $N$ and ratios
219: $\alpha.$ When $\alpha$ increases
220: at fixed $N$, branches are shorter and shorter and less and less
221: numerous, making complexity decrease (Figure~\ref{tempi}).
222:
223: As $N$ gets large at fixed $\alpha$,
224: the histogram $b(l)$ becomes a smooth function
225: of $l$ and we can replace the discrete sum in (\ref{eqnb}) with
226: a continuous integral on the length,
227: \begin{equation}
228: Q+1=\sum_{l=\frac 1N,\ldots,1} 2^{N\,b(l)}
229: \simeq N\int_{0}^{1} dl \;
230: 2^{N\,b(l)} \qquad . \label{int741}
231: \end{equation}
232: The integral is exponentially dominated by
233: the maximal value $b_{max}$ of $b(l)$. $\omega$, the limit of the logarithm
234: of the complexity divided by $N$, is therefore equal to $b_{max}$.
235: Nicely indeed, the height $b_{max}$ of the histogram does not depend on $N$
236: (within the statistical errors)
237: and gives a straightforward and precise estimate of $\omega$, not
238: affected by finite-size effects. The values of $b_{max}$ as a function
239: of $\alpha$ are listed in the third column of Table~\ref{tabunsat}.
240:
241: The above discussion is also very useful to interpret the data
242: on the size $Q$ of the search trees.
243: From the quadratic correction around the saddle-point,
244: $b(l) \simeq b_{max} - \beta (l-l_{max})^2 /2 $,
245: the expected finite size correction to $\omega _N$ read
246: \begin{equation}
247: \label{nodeq}
248: \omega _N \simeq \omega + \frac 1{2N} \log_2 N +
249: \frac{1}{2N} \log_2 \left[ \frac{2\pi}{\beta \ln 2} \right] + O \left(
250: \frac 1 {N^2} \right) \qquad .
251: \end{equation}
252: We have checked the validity of this equation by fitting
253: $\omega_N - \log_2 N / (2N)$ as a polynomial function of $1/N$.
254: The constant at the origin gives values of $\omega$ in very good
255: agreement with $b_{max}$ (second column in Table~\ref{tabunsat})
256: while the linear term gives access to the curvature $\beta$.
257: We compare in Table~\ref{tabfit} this curvature with the direct
258: measurements of $\beta$ obtained by looking at the vicinity of the
259: top of the histogram. The agreement is fair, proving that equation
260: (\ref{nodeq}) is an accurate way of extrapolating data on $Q$ to the
261: infinite size limit.
262:
263: \subsubsection{Upper sat phase ($\alpha _L <\alpha < \alpha _C$)}
264:
265: To investigate the sat region slightly below threshold $\alpha _L <
266: \alpha < \alpha _C$, we have carried out simulations with a starting
267: ratio $\alpha =3.5$. Results are shown on Figure~\ref{comp3.5}A.
268: As instances are sat with a high probability, no simple
269: identity relates the number of nodes $Q$ to the number of branches $B$,
270: see search tree in Figure~\ref{tree}C and
271: we measure the complexity through $Q$ only. Complexity clearly scales
272: exponentially with the size of the instance
273: and exhibits large fluctuations from sample to sample. The annealed
274: complexity (logarithm of the average complexity), $\omega _3 ^{ann}$,
275: is larger than the typical solving hardness (average of the logarithm
276: of the complexity), $\omega _3 ^{typ}$, see Table~\ref{tabmixed}.
277:
278: To reach a better understanding of the structure of the search tree,
279: we have focused on the highest backtracking point G defined in
280: Figure~\ref{tree}C and Section~\ref{high}. The coordinates $p_G, \alpha _G$
281: of point G, averaged over instances are shown for increasing sizes
282: $N$ on Figure~\ref{pgalphag}. The coordinates of G exhibit strong
283: sample fluctuations which make the large $N$ extrapolation, $p_G
284: =0.78 \pm 0.01$, $\alpha _G = 3.02 \pm 0.02$ rather imprecise.
285:
286: In Section~\ref{secmixed}, we shall show how the solving complexity in the
287: upper sat phase is related to the solving complexity of corresponding
288: 2+p-SAT problems with parameters $p_G, \alpha _G$.
289:
290:
291:
292:
293:
294: