cs0302001/cs0302001
1: 
2: 
3: \documentclass[letterpaper, 11pt]{article}
4: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
5: %TCIDATA{OutputFilter=LATEX.DLL}
6: %TCIDATA{LastRevised=Thursday, October 24, 2002 16:50:28}
7: %TCIDATA{<META NAME="GraphicsSave" CONTENT="32">}
8: 
9: %\usepackage{times}
10: \setlength{\headsep}{0.cm}
11: \renewcommand{\baselinestretch}{1.0}
12: \renewcommand{\arraystretch}{1}
13: \setlength{\oddsidemargin}{0.1cm}
14: \setlength{\evensidemargin}{0.1cm}
15: \setlength{\topmargin}{0.cm}
16: \setlength{\parskip}{0.05cm}
17: \textheight=22.9cm
18: \textwidth=16.5cm
19: 
20: \usepackage{amsmath}
21: \usepackage{amsfonts}
22: %\usepackage{times}
23: 
24: 
25: \begin{document}
26: 
27: 
28: %\begin{center}
29: %{\Large {\bf Many Hard Examples in Exact Phase Transitions\\[0.3cm]
30: %{\normalsize with Application to Generating Hard Satisfiable Instances\footnote{%
31: %{\small This research was partially supported by the National Key Basic Research
32: %Program (973 Program) of China under Grant No. G1999032701.}}}}}\\[0.5cm]
33: %{\large Ke Xu and Wei Li}
34: 
35: \begin{center}
36: {\Large {\bf Many Hard Examples in Exact Phase Transitions with\\[0.3cm]
37: Application to Generating Hard Satisfiable Instances\footnote{%
38: {\small This research was partially supported by the National Key Basic Research
39: Program (973 Program) of China under Grant No. G1999032701 and Special Funds
40: for Authors of National Excellent Doctoral Dissertations of China under Grant No.
41: 200241. Preliminary version of this paper appeared as Technical Report cs.CC/0302001 
42: of CoRR in Feb. 2003. }}}}\\[0.5cm]
43: {\large Ke Xu and Wei Li}
44: 
45: \bigskip
46: %{\setlength{\parskip}{0.cm}
47: National Lab of Software Development Environment
48: 
49: Department of Computer Science
50: 
51: Beihang University, Beijing 100083, China
52: 
53: Email:\{kexu,liwei\}@nlsde.buaa.edu.cn
54: 
55: \end{center}
56: 
57: \begin{quotation}
58: {\noindent {\small {\bf Abstract.} This paper first analyzes the resolution
59: complexity of two random CSP models (i.e. Model RB/RD) for which we can
60: establish the existence of phase transitions and identify the threshold
61: points exactly. By encoding CSPs into CNF formulas, it is proved that
62: almost all instances of Model RB/RD have no tree-like resolution proofs of
63: less than exponential size. Thus, we not only introduce new families of CNF
64: formulas hard for resolution, which is a central task of Proof-Complexity
65: theory, but also propose models with both many hard instances and exact 
66: phase transitions. Then, the implications of such models are addressed.
67: It is shown both theoretically and experimentally that an application of 
68: Model RB/RD might be in the generation of hard satisfiable instances, which is not only 
69: of practical importance but also related to some open problems in cryptography 
70: such as generating one-way functions. Subsequently, a further theoretical support for
71: the generation method is shown by establishing exponential lower bounds 
72: on the complexity of solving random satisfiable and forced satisfiable 
73: instances of RB/RD near the threshold. Finally, conclusions are presented, 
74: as well as a detailed comparison of Model RB/RD with the Hamiltonian cycle problem and 
75: random 3-SAT, which, respectively, exhibit three different kinds of phase transition 
76: behavior in NP-complete problems.}}
77: \end{quotation}
78: 
79: \bigskip
80: 
81: \noindent {\large {\bf 1. Introduction }}
82: 
83: \smallskip
84: 
85: \noindent Over the past ten years, the study of phase transition
86: phenomena has been one of the most exciting areas in computer science and
87: artificial intelligence. Numerous empirical studies suggest that for many
88: NP-complete problems, as a parameter is varied, there is a sharp transition
89: from 1 to 0 at a threshold point with respect to the probability of a random
90: instance being soluble. More interestingly, the hardest instances to solve
91: are concentrated in the sharp transition region. As well known, finding ways
92: to generate hard instances for a problem is important both for understanding
93: the complexity of the problem and for providing challenging benchmarks for
94: experimental evaluation of algorithms [12]. So the finding of phase
95: transition phenomena in computer science not only gives a new method to
96: generate hard instances but also provides useful insights into the study of
97: computational complexity from a new perspective.
98: 
99: Although tremendous progress has been made in the study of phase
100: transitions, there is still some lack of research about the connections
101: between the threshold phenomena and the generation of hard instances,
102: especially from a theoretical point of view. For example, some problems can
103: be used to generate hard instances but the existence of phase transitions in
104: such problems has not been proved. One such an example is the well-studied
105: random 3-SAT. A theoretical result by Chv\'{a}tal and Szemer\'{e}di [10]
106: shows that for random 3-SAT, no short proofs exists in general, which means
107: that almost all proofs for this problem require exponential resolution
108: lengths. Experimental results further indicate that instances from the phase
109: transition region of random 3-SAT tend to be particularly hard to solve
110: [25]. Since the early 1990's, considerable efforts have been put into random
111: 3-SAT, but until now, the existence of the phase transition phenomenon in
112: this problem has not been established, although recently, Friedgut [14] made
113: tremendous progress in proving that the width of the phase transition region
114: narrows as the number of variables increases. On the other hand, for some
115: problems with proved phase transitions, it was found either theoretically or
116: experimentally that instances generated by these problems are easy to solve
117: or easy in general. Such examples include random 2-SAT, Hamiltonian cycle
118: problem and random 2+$p$-SAT ($0<p\leq 0.4$). For random 2-SAT, Chv\'{a}tal
119: and Reed [11] and Goerdt [20] proved that the phase transition phenomenon will
120: occur when the ratio of clauses to variables is 1. But we know that 2-SAT is
121: in P class which can be solved in polynomial time, implying that random
122: 2-SAT can not be used to generate hard instances. For the Hamiltonian cycle
123: problem which is NP-compete, Koml\'{o}s and Szemer\'{e}di [22] not only
124: proved the existence of the phase transition in this problem but also gave
125: the exact location of the transition point. However, both theoretical
126: results [9] and experimental results [32] suggest that generally, the
127: instances produced by this problem are not hard to solve. 
128: Different from the above two problems, random
129: 2+$p$-SAT [30] was first proposed as an attempt to interpolate between the
130: polynomial time problem random 2-SAT with $p=0$ and the NP-complete problem
131: random 3-SAT with $p=1.$ It is not hard to see that random 2+$p$-SAT is in
132: fact NP-compelte for $p>0.$ The phase transition behavior in this problem
133: with $0<p\leq 0.4$ was established by Achlioptas et al. and the exact
134: location of the threshold point was also obtained [1]. But it was further
135: shown that random 2+$p$-SAT is essentially similar to random 2-SAT when $%
136: 0<p\leq 0.4$ with the typical computational cost scaling linearly with the
137: number of variables [29].
138: 
139: As mentioned before, from a computational theory point of view, what
140: attracts people most in the study of phase transitions is the finding of
141: many hard instances in the phase transition region. Hence, starting from
142: this point, we can say that the problem models which can not be used to generate
143: random hard instances are not so interesting for study as random 3-SAT.
144: However, until now, for the models with many hard instances, e.g. random
145: 3-SAT, the existence of phase transitions has not been established, not even
146: the exact location of the threshold points. So, from a theoretical
147: perspective, we still do not have sufficient evidence to support the
148: long-standing observation that there exists a close relation between the
149: generation of many hard instances and the threshold phenomena, although this
150: observation opened the door for, and has greatly advanced the study of phase
151: transitions in the last decade. From the discussion above, an interesting
152: question naturally arises: {\em whether there exist models with both
153: proved phase transitions and many hard instances and, if so, what are the
154: implications of such models.} 
155: 
156: Recently, to overcome the trivial asymptotic insolubility of the previous
157: random CSP models, Xu and Li [33] proposed a new CSP model, i.e. Model RB,
158: which is a revision to the standard Model B. It was proved that the phase
159: transitions from solubility to insolubility do exist for Model RB as the
160: number of variables approaches infinity. Moreover, the threshold points at
161: which the phase transitions occur are also known exactly. Based on previous
162: experiments and by relating the hardness of Model RB to Model B, it has
163: already been shown that Model RB abounds with hard instances in the phase
164: transition region. In this paper, we will first propose a random CSP model,
165: called Model RD, along the same line as for Model RB. Then, by encoding CSPs
166: into CNF formulas, we will prove that almost all instances of Model RB/RD
167: have no tree-like resolution proofs of less than exponential size. This
168: means that Model RB/RD are hard for all popular CSP algorithms because 
169: such algorithms are
170: essentially based on tree-like resolutions [24]. Therefore, we not only
171: introduce new families of CNF formulas hard for resolution, which is a
172: central task of Proof-Complexity theory, but also propose models
173: with both many hard instances and exact phase transitions. 
174: More importantly, it will be shown that an application of RB/RD 
175: might be in the generation of hard satisfiable instances, which is not only 
176: of significance for experimental studies, but also of interest to the theoretical 
177: computer science community.   
178: Finally, exponential lower bounds will be established for random satisfiable
179: and forced satisfiable instances of RB/RD near the threshold.
180: 
181: \bigskip
182: 
183: \noindent {\large {\bf 2. Model RB and Model RD}}
184: 
185: \smallskip
186: 
187: \noindent A {\it Constraint Satisfaction Problem}, or CSP for short, 
188: consists of a set of variables, a set of possible values for 
189: each variable (its domain) and a set
190: of constraints defining the allowed tuples of values for the variables
191: (a well-studied special case of it is SAT). 
192: The CSP is a fundamental problem in Artificial Intelligence, with a distinguished
193: history and many applications, such as in knowledge representation, scheduling
194: and pattern recognition. To compare the efficiency of different CSP algorithms,
195: some standard random CSP models have been widely used experimentally to
196: generate benchmark instances in the past decade. For the most widely used CSP
197: model (i.e. standard Model B), Achlioptas et al. [2] proved that except for a small
198: range of values of the constraint tightness, almost all instances generated
199: are unsatisfiable as the number of variables approaches infinity. This result,
200: as shown in [19], implies that most previous experimental results about random
201: CSPs are asymptotically uninteresting. However, it should be noted that
202: Achlioptas et al.'s result holds under the condition of fixed domain size and
203: so is applicable only when the number of variables is overwhelmingly larger
204: than the domain size. But in fact, it can be observed that the domain size,
205: compared to the number of variables, is not very small in most experimental
206: CSP studies. This, in turn, explains why there is a big gap between Achlioptas
207: et al.'s theoretical result and the experimental findings about the phase
208: transition behavior in random CSPs. Motivated by the observation above, and
209: to overcome the trivial asymptotic insolubility of the previous random
210: CSP models, Xu and Li [33] proposed an alternative CSP model as follows.
211: 
212: {\bf Model RB: }First, we select with repetition $m=rn\ln n$ random
213: constraints. Each random constraint is formed by selecting without
214: repetition $k$ of $n$ variables, where $k\geq 2$ is an integer. Next, for
215: each constraint we uniformly select without repetition $q=p\cdot d^{k}$
216: incompatible tuples of values, i.e., each constraint contains exactly $%
217: (1-p)\cdot d^{k}$ allowed tuples of values, where $d=n^{\alpha }$ is the
218: domain size of each variable and $\alpha >0$ is a constant.{\em \ }
219: 
220: Note that the way of generating random instances for Model RB is almost the
221: same as that for Model B. However, like the N-queens problem and Latin square,
222: the domain size of Model RB is not fixed but polynomial in the number of
223: variables. It is proved that Model RB not only avoids the trivial asymptotic
224: behavior but also has exact phase transitions. More precisely, the
225: following theorems hold for Model RB, where $\Pr (Sat)$ denotes the probability 
226: that a random CSP instance generated by Model RB is satisfiable.
227: 
228: %Xu and Li [23] further proved that the probability that a random CSP
229: %instance generated by Model RB is satisfiable, denoted by $\Pr (Sat),$
230: %exhibits phase transitions at a threshold point known exactly, i.e. the
231: %following theorems hold for Model RB.
232: 
233: {\bf Theorem 1} \ (Xu and Li [33]) Let $r_{cr}=-\frac{\alpha }{\ln (1-p)}$.
234: If $\alpha >\frac{1}{k}$, $0<p<1$ are two constants and $k$, $p$ satisfy the
235: inequality $k\geq \frac{1}{1-p}$, then
236: \begin{eqnarray*}
237: \underset{n\rightarrow \infty }{\lim }\Pr (Sat) &=&1\text{ when }r<r_{cr}, \\
238: \underset{n\rightarrow \infty }{\lim }\Pr (Sat) &=&0\text{ when }r>r_{cr}.
239: \end{eqnarray*}
240: 
241: {\bf Theorem 2} \ (Xu and Li [33]) Let $p_{cr}=1-e^{-\frac{\alpha }{r}}$. If
242: $\alpha >\frac{1}{k}$, $r>0$ are two constants and $k$, $\alpha $ and $r$
243: satisfy the inequality $ke^{-\frac{\alpha }{r}}\geq 1$, then
244: \begin{eqnarray*}
245: \underset{n\rightarrow \infty }{\lim }\Pr (Sat) &=&1\text{ when }p<p_{cr}, \\
246: \underset{n\rightarrow \infty }{\lim }\Pr (Sat) &=&0\text{ when }p>p_{cr}.
247: \end{eqnarray*}
248: 
249: As shown in [33], many instances generated following Model B in previous
250: experiments can also be viewed as instances of Model RB, and more importantly,
251: the experimental results for these instances agree well with the theoretical
252: predictions for Model RB. Therefore, in this sense, we can say that Model B
253: can still be used experimentally to produce benchmark instances. However, to
254: guarantee an asymptotic phase transition behavior and to generate random hard
255: instances, a natural and convenient way is to vary the values of CSP parameters 
256: under the framework of Model RB. Note that another standard CSP\
257: Model, i.e. Model D, is almost the same as Model B except that for every
258: constraint, each tuple of values is selected to be incompatible with
259: probability $p.$ Similarly, we can make a revision to Model D and then get a
260: new Model as follows.
261: 
262: {\bf Model RD: }First, we select with repetition $m=rn\ln n$ random
263: constraints. Each random constraint is formed by selecting without
264: repetition $k$ of $n$ variables, where $k\geq 2$ is an integer. Next, for
265: each constraint, from $d^{k}$ possible tuples of values, each tuple is
266: selected to be incompatible with probability $p$, where $d=n^{\alpha }$ is
267: the domain size of each variable and $\alpha >0$ is a constant.{\em \ }
268: 
269: Along the same line as in the proof for Model RB [33], we can easily prove
270: that exact phase transitions also exist for Mode RD. More precisely, Theorem
271: 1 and Theorem 2 hold for Model RD too. In fact, it is exactly because the
272: differences between Model RB and Model RD are very small that many
273: properties hold for both of them and the proof techniques are also almost
274: the same. So in this paper, we will discuss both models, denoted by Model
275: RB/RD.
276: 
277: Recently, there has been a growing theoretical interest in random CSPs,
278: especially with respect to their phase transition behaviors [13, 16, 17, 27, 31, 35] 
279: and resolution complexity [18, 26, 28].
280: To discuss the resolution complexity of CSPs, we first need to encode a CSP
281: instance into a CNF formula. In this paper we will adopt the encoding method
282: used in [24]. For convenience, we give the outline of this method here. For
283: each CSP variable $u,$ we introduce $d$ propositional variables, called {\it %
284: domain variables}, to represent assignments of values to $u.$ There are
285: three sets of clauses needed in the encoding, i.e. the {\it domain clauses}
286: asserting that each variable must be assigned a value from its domain, the
287: {\it conflict clauses} excluding assignments violating constraints and
288: clauses asserting that each variable is assigned at most one value from its
289: domain.
290: 
291: \bigskip
292: 
293: \noindent {\large {\bf 3. Resolution Lower Bounds for Model RB/RD}}
294: 
295: \smallskip
296: 
297: \noindent In this section, we will analyze the resolution complexity of
298: unsatisfiability proofs for Model RB/RD and get the following result.
299: 
300: {\bf Theorem 3 \ }Let $P$ be a random CSP instance generated following Model
301: RB/RD. Then, almost surely, $P$ has no tree-like resolutions of length less
302: than 2$^{\Omega (n)}.$
303: 
304: When we say that a property holds almost surely it means that this property
305: holds with probability tending to 1 as the number of variables approaches
306: infinity.
307: 
308: The core of the proof for Theorem 3 is to show that almost surely there exists a
309: clause with large width in every refutation. The width of a clause $C$,
310: denoted by $w(C),$ is the number of variables appearing in it. The width of
311: a set of clauses is the maximal width of a clause in the set. The width of
312: deriving a clause $C$ from the formula $F,$ denoted by $w(F\vdash C)$ is
313: defined as the minimum of the widths of all derivations of $C$ from $F.$ So,
314: the width of refutations for $F$\ can be denoted by $w(F\vdash 0).$
315: Ben-Sasson and Wigderson [8] gave the following theorem on size-width 
316: relations and proposed a general strategy for proving width
317: lower bounds for CNF formulas.
318: 
319: {\bf Theorem 4 \ }(Ben-Sasson and Wigderson [8]) Let $F$ be a CNF formula
320: and $S_{T}(F)$ be the minimal size of a tree-like refutation. Then we have
321: \[
322: S_{T}(F)\geq 2^{(w(F\vdash 0)-w(F))}.
323: \]
324: 
325: By extending Ben-Sasson and Wigderson's strategy, Mitchell [26] proved
326: exponential resolution lower bounds for some random CSPs of fixed domain 
327: size. In what follows, to obtain  
328: lower bounds on width for RB/RD, we will basically use the same strategy
329: as in [26], but adapt it to handle random CSPs with growing domains. 
330: First, we prove the following local sparse property for RB/RD.
331: 
332: {\bf Lemma 1 }Let $P$ be a random CSP instance generated by Model RB/RD.
333: There is constant $c>0$ such that almost surely every sub-problem of $P$
334: with size $s\leq cn$ has at most $b=\beta s\ln n$ constraints, where $\beta =%
335: \frac{\alpha }{6k\ln \frac{1}{1-p}}.$
336: 
337: {\bf Proof: }As mentioned in [27], this is a standard type of argument in
338: random graph theory. Similarly, we consider the number of sub-problems on $s$
339: variables with $b=\beta s\ln n$ constraints for $0<s\leq cn.$ There are $%
340: \binom{n}{s}$ possible choices for the variables and $\binom{m}{b}$ for the
341: constraints. Given such choices, the probability that all the $b$
342: constraints are in the $s$ variables is not greater than $\left( \frac{s}{n}%
343: \right) ^{kb}.$ So, the number of such sub-problems is at most
344: \begin{eqnarray*}
345: \binom{n}{s}\binom{m}{b}\left( \frac{s}{n}\right) ^{kb} &\leq &\left( \frac{%
346: en}{s}\right) ^{s}\left( \frac{em}{b}\right) ^{b}\left( \frac{s}{n}\right)
347: ^{kb} \\
348: &=&\left( \frac{en}{s}\right) ^{s}\left( \frac{ern\ln n}{\beta s\ln n}%
349: \right) ^{\beta s\ln n}\left( \frac{s}{n}\right) ^{k\beta \ln n} \\
350: &=&\left[ \frac{e^{1+\beta \ln n}r^{\beta \ln n}}{\beta ^{\beta \ln n}}%
351: \left( \frac{s}{n}\right) ^{(k-1)\beta \ln n-1}\right] ^{s}.
352: \end{eqnarray*}
353: 
354: \noindent For sufficiently large $n,$ there exists a constant $c_{1}>0$ such
355: that
356: 
357: \[
358: \frac{e^{1+\beta \ln n}r^{\beta \ln n}}{\beta ^{\beta \ln n}}<n^{c_{1}}.
359: \]
360: 
361: \noindent Thus we get
362: 
363: \[
364: \binom{n}{s}\binom{m}{b}\left( \frac{s}{n}\right) ^{kb}<\left[
365: n^{c_{1}}\left( \frac{s}{n}\right) ^{(k-1)\beta \ln n-1}\right] ^{s}.
366: \]
367: 
368: \noindent Let $c<\frac{1}{2}\exp\left(-\frac{2+c_{1}}{(k-1)\beta}\right)$ be
369: a positive constant.
370: For $0<s\leq cn,$ it follows from the above inequality that
371: 
372: \[
373: \binom{n}{s}\binom{m}{b}\left( \frac{s}{n}\right) ^{kb}<\left( \frac{1}{n^{2}%
374: }\right) ^{s}\leq \frac{1}{n^{2}}.
375: \]
376: 
377: \noindent Thus the expected number of such sub-problems with $s\leq cn$ is
378: at most
379: 
380: \[
381: \overset{cn}{\underset{s=1}{\sum }}\binom{n}{s}\binom{m}{b}\left( \frac{s}{n}%
382: \right) ^{kb}<\frac{1}{n^{2}}cn=o(1).
383: \]
384: 
385: \noindent This finishes the proof. \hfill $\Box $
386: 
387: \smallskip
388: 
389: The following two definitions will be of use later.
390: 
391: {\bf Definition 1 \ }Consider a variable $u$ and $i$ constraints associated
392: with $u.$ In these $i$ constraints, all the variables except $u$ have
393: already been assigned values from their domains. We call this an $i$-{\it %
394: constraint assignment tuple}, denoted by $T_{i,u}.$
395: 
396: {\bf Definition 2 \ }Given a variable $u$ and an $i$-constraint assignment
397: tuple $T_{i,u}.$ We assign a value $v$ to $u$ from its domain$.$ So, all the
398: variables in the $i$ constraints of $T_{i,u}$ have been assigned values. If
399: at least one constraint in $T_{i,u}$ is violated by these values, then we
400: say that {\it the} {\it value }$v${\it \ of }$u${\it \ is flawed} {\it by} $%
401: T_{i,u}.$ If all the values of $u$ in its domain are flawed by $T_{i,u},$
402: then we say that {\it the variable }$u${\it \ is flawed by} $T_{i,u},$ and $%
403: T_{i,u}$ is called a {\it flawed }$i${\it -constraint assignment tuple}.
404: 
405: \smallskip
406: 
407: {\bf Lemma 2 \ }Let $P$ be a random CSP instance generated by Model RB/RD.
408: Almost surely, there does not exist a flawed $i$-constraint assignment tuple
409: $T_{i,u}$ in $P$ with $i\leq 3k\beta \ln n.$
410: 
411: {\bf Proof: }Now consider an $i$-constraint assignment tuple $T_{i,u}$ with $%
412: i\leq 3k\beta \ln n.$ It is easy to see that the probability that $T_{i,u}$
413: is flawed increases the number of constraints $i.$ Recall that in Model RD,
414: for every constraint, each tuple of values is selected to be incompatible
415: with probability $p.$ So, given a value $v$ of $u,$ the probability that $v$
416: is flawed by $T_{i,u}$ is
417: \[
418: 1-(1-p)^{i}.
419: \]
420: 
421: \noindent Thus the probability that all the $d=n^{\alpha }$ values of $u$
422: are flawed by $T_{i,u},$ i.e. the probability of $T_{i,u}$ being flawed is
423: \[
424: \left[ 1-(1-p)^{i}\right] ^{d}.
425: \]
426: 
427: \noindent Note that $\beta =\frac{\alpha }{6k\ln \frac{1}{1-p}}.$ Thus for $%
428: 0<i\leq 3k\beta \ln n,$ we have
429: \begin{eqnarray*}
430: \Pr (T_{i,u}\text{ is flawed})|_{i\leq 3k\beta \ln n} &\leq &\left[
431: 1-(1-p)^{3k\beta \ln n}\right] ^{n^{\alpha }} \\
432: &=&[1-\frac{1}{n^{\frac{\alpha }{2}}}]^{n^{\alpha }}\approx e^{-n^{\frac{%
433: \alpha }{2}}}.
434: \end{eqnarray*}
435: 
436: \noindent The above analysis only applies to Model RD. For Model RB, such an
437: analysis is much more complicated, and so we leave it in the appendix.
438: Recall that there are $n$ variables and $m=rn\ln n$ constraints. So the
439: number of possible choices for $i$-constraint assignment tuples is at most
440: \[
441: n\binom{m}{i}d^{(k-1)i}.
442: \]
443: 
444: \noindent For $i\leq 3k\beta \ln n,$ when $n$ is sufficiently large, there
445: exists a constant $c_{2}>0$ such that
446: \begin{eqnarray*}
447: n\binom{m}{i}d^{(k-1)i} &=&n\binom{rn\ln n}{i}n^{(k-1)\alpha i}\leq n\binom{%
448: rn\ln n}{3k\beta \ln n}n^{3(k-1)\alpha k\beta \ln n} \\
449: &\leq &n\left( \frac{ern\ln n}{3k\beta \ln n}\right) ^{3k\beta \ln
450: n}n^{3(k-1)\alpha k\beta \ln n}<e^{c_{2}\ln ^{2}n}.
451: \end{eqnarray*}
452: 
453: \noindent Thus the expected number of flawed $i$-constraint assignment
454: tuples with $i\leq 3k\beta \ln n$ is at most
455: \begin{eqnarray*}
456: \overset{3k\beta \ln n}{\underset{i=1}{\sum }}n\binom{m}{i}d^{(k-1)i}\Pr
457: (T_{i,u}\text{ is flawed}) &<&e^{c_{2}\ln ^{2}n}\overset{3k\beta \ln n}{%
458: \underset{i=1}{\sum }}\Pr (T_{i,u}\text{ is flawed}) \\
459: &=&e^{c_{2}\ln ^{2}n}\cdot O(e^{-n^{\frac{\alpha }{2}}})\cdot 3k\beta \ln n
460: \\
461: &=&o(1).
462: \end{eqnarray*}
463: 
464: \noindent This implies that almost surely, there does not exist a variable $%
465: u $ and an $i$-constraint assignment tuple $T_{i,u}$ with $i\leq 3k\beta \ln
466: n$ such that $u$ is flawed by $T_{i,u}.$ This is exactly what we need and so
467: we are done. \hfill $\Box $
468: 
469: \smallskip
470: 
471: {\bf Lemma 3 }Let $P$ be a random CSP instance generated by Model RB/RD.
472: Almost surely, every sub-problem of $P$ with size at most $cn$ is
473: satisfiable.
474: 
475: {\bf Proof: }Here by the size of a problem we mean the number of variables
476: in this problem. We will prove this lemma by contradiction. Assume that we
477: have an unsatisfiable sub-problem of size at most $cn.$ Thus we can get a
478: minimum sized unsatisfiable sub-problem with size $s\leq cn,$ denoted by $%
479: P_{1}.$ From Lemma 1 we know that almost surely $P_{1}$ has at most $\beta
480: s\ln n$ constraints. Thus there exists a variable $u$ in $P_{1}$ with degree
481: at most $k\beta \ln n,$ i.e. the number of constraints in $P_{1}$ associated
482: with $u$ is not greater than $k\beta \ln n.$ Removing $u$ and the
483: constraints associated with $u$ from $P_{1},$ we get a sub-problem $P_{2}.$
484: By minimality of $P_{1},$ we know that $P_{2}$ is satisfiable, and so there
485: exists an assignment satisfying $P_{2}$. Suppose that the variables in $%
486: P_{2} $ have been assigned values by such an assignment. Now consider the
487: variable $u$ and the $i$ constraints associated with $u,$ where $i\leq
488: k\beta \ln n.$ By Definition 2 this constitutes an $i$-constraint assignment
489: tuple for $u,$ denoted by $T_{i,u}.$ Recall that $P_{1}$ is unsatisfiable.
490: This means that no value of $u$ can satisfy all the $i$ constraints. That is
491: to say, the variable $u$ is flawed by $T_{i,u}.$ Therefore, if a sub-problem
492: of size at most $cn$ is unsatisfiable, then, almost surely, there is a
493: variable $u$ and an $i$-constraint assignment tuple $T_{i,u}$ such that $u$
494: is flawed by $T_{i,u},$ where $i\leq k\beta \ln n.$ This is in contradiction
495: with Lemma 2 and so finishes the proof. \hfill $\Box $
496: 
497: \smallskip
498: 
499: Now we will prove that there almost surely exist a complex clause in the
500: refutation proofs of Model RB/RD. The complexity of a clause was defined in
501: [26] by Mitchell, i.e. for any refutation $\pi ,$ the complexity of a clause
502: $C$ in $\pi ,$ denoted by $\mu (C),$ is the size of the smallest sub-problem
503: $\Pi $ such that $C$ can be derived by resolution from $\phi (\Pi ).$ Along
504: the same line as in the proof of [26], we have the following lemma.
505: 
506: {\bf Lemma 4 }Let $P$ be a random CSP instance generated by Model RB/RD.
507: Almost surely, every refutation $\pi $ of $\phi (P)$ has a clause $C$ of
508: complexity $\frac{cn}{2}\leq \mu (C)\leq cn.$
509: 
510: {\bf Proof: }For this proof, please refer to [26]. \hfill  $\Box $
511: 
512: \smallskip
513: 
514: {\bf Lemma 5. }Let $C$ be a clause of complexity $\frac{cn}{2}\leq \mu
515: (C)\leq cn.$ Then, almost surely, $C$ has at least $\frac{c}{6}n$ literals,
516: i.e. $w(C)\geq \frac{c}{6}n$.
517: 
518: {\bf Proof: }We will prove this by contradiction. For a CSP instance $P,$
519: its CNF encoding is denoted by $\phi (P).$ Let $C$ be a clause of complexity
520: $\frac{cn}{2}\leq \mu (C)\leq cn$ and $P_{1}$ be the smallest problem such
521: that $\phi (P_{1})\models C.$ Hence, the size of $P_{1}$ is at least $\frac{c%
522: }{2}n$ and at most $cn$. By Lemma 1, there are at most $\beta cn\ln n$
523: constraints in $P_{1}.$ So, there are at most $\frac{c}{3}n$ variables with
524: degree greater than $3k\beta \ln n.$ Then, there are at least $\frac{c}{2}n-%
525: \frac{c}{3}n=\frac{c}{6}n$ variables in $P_{1}$ with degree at most $3k\beta
526: \ln n.$ We will prove that for these variables, almost surely, there does
527: not exist a variable such that no domain variable of it appears in $C.$ Now
528: assume that we have a variable $u$ in $P_{1}$ with degree $i\leq 3k\beta \ln
529: n$ and no domain variable of it appears in $C.$ Removing $u$ and the
530: constraints associated with it from $P_{1},$ we get a sub-problem $P_{2}.$
531: By minimality of $P_{1},$ we know that $\phi (P_{2})\not\models C.$ So we
532: can find an assignment satisfying $P_{2}$ but not satisfying $C.$ Suppose
533: that the propositional variables in $P_{2}$ and $C$ have been assigned
534: values by such an assignment. Now consider the variable $u$ and the
535: constraints associated with it. By Definition 2, this constitutes an $i$%
536: -constraint assignment tuple for $u,$ denoted by $T_{i,u}.$ By assumption,
537: no domain variable of $u$ appears in $C.$ So, assigning any value to $u$ \
538: will not affect the truth value of $C.$ Recall that $\phi (P_{1})\models C$
539: and $C$ is false under the current assignment. Therefore, no value of $u$
540: can satisfy $\phi (P_{1}),$ i.e. setting any value to $u$ will violate at
541: least one constraint associated with it. It follows that $u$ is flawed by $%
542: T_{i,u},$ i.e. there exists a flawed $i$-constraint assignment tuple with $%
543: i\leq 3k\beta \ln n.$ This is in contradiction with Lemma 2 and so we are
544: done. \hfill $\Box $
545: 
546: Combining Lemma 4 and Lemma 5, we have that, for a random CSP
547: instance $P$
548: generated by Model RB/RD, almost surely, $w(\phi (P)\vdash 0)\geq \frac{c}{6}%
549: n.$ Now, by use of Theorem 4, we finish the proof. One point worth
550: mentioning is that when $\alpha \geq 1,$ the initial width of clauses is
551: greater than or equal to the number of variables. In such a case, to make
552: Theorem 4 applicable, we only need to introduce some new variables and
553: reduce the widths of domain clauses, which has no effect on our results.
554: 
555: \bigskip
556: 
557: \noindent {\large {\bf 4. Generating Hard Satisfiable Instances}} 
558: 
559: \noindent As mentioned before, the finding of phase transitions in NP-complete
560: problems provides a good method for generating random hard instances which
561: are very useful in the evaluation of algorithms. In recent years, a
562: remarkable progress in Artificial Intelligence has been the development of
563: incomplete algorithms for various kinds of problems. To evaluate the
564: efficiency of such incomplete algorithms, we need a source to generate 
565: only hard satisfiable instances [3]. However, since the probability of being
566: satisfiable is about 0.5 at the threshold point where the hardest instances
567: are concentrated, the generator based on phase transitions will usually
568: produce a mixture of satisfiable and unsatisfiable instances. So, it is
569: interesting to study how the phase transition phenomenon can be used to
570: generate hard satisfiable instances. Besides practical importance, more
571: interestingly, the problem of generating random hard satisfiable instances
572: is related to some open problems in cryptography, e.g. computing a one-way
573: function, generating pseudo-random numbers and private key cryptography
574: [12, 21, 23].
575: 
576: In fact, for constraint satisfaction and Boolean satisfiability problems,
577: there is a natural strategy to generate instances that are guaranteed to
578: have at least one satisfying assignment. The strategy is as follows [3]: first
579: generate a random truth assignment $t,$ and then generate a certain number
580: of random constraints or clauses one by one to form a random instance, 
581: where any clause or constraint violating $t$ will
582: be rejected. The above strategy is very simple and can be easily
583: implemented. But unfortunately, this strategy was proved to be unsuitable
584: for random 3-SAT because it in fact produces a biased sampling of
585: instances with many satisfying assignments (clustered around $t$), and experiments also
586: show that these instances are much easier to solve than random satisfiable instances [3]. 
587: In the following, for convenience, we will call the satisfiable 
588: instances generated using the strategy as forced satisfiable instances.
589: 
590: Now let us look further into the problem why the strategy fails for
591: random 3-SAT. 
592: As defined in [33, 34], an {\it assignment pair} $<t_{1},t_{2}>$ is an ordered pair
593: of two assignments $t_{1}$ and $t_{2}.$ We say that $<t_{1},t_{2}>$ satisfies a CSP
594: if and only if both  $t_{1}$ and $t_{2}$ satisfy this CSP. Suppose that the
595: number of variables is $n$ and the domain size is $d.$ Then we have totally
596: $d^{n}$ possible assignments, denoted by $t_{1},t_{2},\cdots,t_{d^{n}},$ and
597: $d^{2n}$ possible assignment pairs. Let $t_{i}$ be a forced satisfying
598: assignment. Then the expected number of solutions for forced satisfiable
599: instances satisfying $t_{i},$ denoted by $E_{f}[N]$, is%
600: \[
601: E_{f}[N]=\frac{\overset{d^{n}}{\underset{j=1}{%
602: {\displaystyle\sum}
603: }}\Pr[<t_{i},t_{j}>]}{\Pr[<t_{i},t_{i}>]},
604: \]
605: 
606: \noindent where $\Pr[<t_{i},t_{j}>]$ denotes the probability that $<t_{i},t_{j}>$
607: satisfies a random instance. Note that $E_{f}[N]$ should be independent of the
608: choice of the forced satisfying assignment $t_{i}.$ So we have%
609: \[
610: E_{f}[N]=\frac{\underset{1\leq i,j\leq d^{n}}{%
611: {\displaystyle\sum}
612: }\Pr[<t_{i},t_{j}>]}{d^{n}\Pr[<t_{i},t_{i}>]}=\frac{E[N^{2}]}{E[N]}.
613: \]
614: \noindent where $E[N^{2}]$ and $E[N]$ are, respectively, the
615: second moment and the first moment of the number of solutions for instances
616: generated randomly. 
617: %It is straightforward to derive, from the results on ordered 
618: %pairs of assignments for random $k$-SAT in [34], that the expected number of
619: %solutions for random forced satisfiable instances is
620: %equal to $E(N^{2})/E(N),$ where $E(N^{2})$ and $E(N)$ are, respectively, the
621: %second moment and the first moment of the number of solutions for instances generated randomly. 
622: For random 3-SAT, it follows from the result on satisfying 
623: assignment pairs in [34] that 
624: asymptotically,  $E[N^{2}]$ is exponentially greater than $E^{2}[N]$. 
625: This conclusion can also be found in [4].
626: Thus, the expected number of solutions for forced satisfiable instances 
627: is exponentially larger than that for random satisfiable 
628: instances, which gives  
629: a good theoretical explanation of why, for random 3-SAT,
630: the strategy is highly biased towards generating instances with many solutions.
631: 
632: We now consider the problem of generating satisfiable instances for Model
633: RB/RD using the same strategy. Recall that when we established the exact
634: phase transitions for RB/RD [33], it was proved that $E[N^{2}]/E^{2}[N]$ is
635: asymptotically equal to 1 below the threshold, where almost all
636: instances are satisfiable, i.e. $E[N^{2}]/E^{2}[N]\approx 1$ for $r<r_{cr}$ 
637: or $p<p_{cr}$. So, we have that for
638: RB/RD, the expected number of solutions for forced satisfiable instances
639: below the threshold is asymptotically equal to that for random satisfiable 
640: instances, i.e. $E_{f}[N]=E[N^{2}]/E[N]\approx E[N]$. In other words, the strategy
641: has almost no effect on the number of solutions for RB/RD and thus 
642: will not lead to a biased sampling of instances with many solutions. 
643: 
644: In addition to the analysis above, we can also study the influence of the 
645: strategy on the distribution of solutions with respect to the 
646: forced satisfying assignment. 
647: %we know, from $E(N^{2})\approx E^{2}(N),$ that for randomly 
648: %generated instances of RB/RD, the distribution of the number of solutions is 
649: %quite uniform with concentration around $E(N).$  Note that the truth assignment 
650: %$t$ is generated randomly and the strategy will in fact generate all the possible 
651: %satisfiable instances with $t$ as the satisfying assignment. 
652: Based on the definition of {\it similarity number} in [33], we first define a 
653: distance on the assignments as $d^{f}(t_1,t_2)=1-S^f(\langle t_1,t_2\rangle)/n,$
654: where $t_1,t_2$ are two assignments, $n$ is the total number of variables and
655:  $S^f(\langle t_1,t_2\rangle)$ is  equal to the number of 
656: variables at which the two assignments take the identical values. It is easy
657: to see that $0\leq d^{f}(t_1,t_2)\leq 1.$ 
658: Let $E_{f}[X]$ and $E[X]$ respectively denote, for forced satisfiable instances 
659: and random satisfiable instances, the expected number of solutions with a 
660: fixed distance $d_{t}$ from the forced satisfying assignment.
661: By an analysis similar to that in [33] (pp.96-97), we have
662: \begin{align*}
663: E_{f}[X] &  =\binom{n}{nd_{t}}\left(  n^{\alpha}-1\right)  ^{nd_{t}}\frac
664: {\Pr[<t_{1},t_{2}>]}{\Pr[<t_{1},t_{1}>]}\text{ \ \ where }d^{f}(t_{1},t_{2}%
665: )=d_{t}\\
666: &  =\binom{n}{nd_{t}}\left(  n^{\alpha}-1\right)  ^{nd_{t}}\left[
667: \frac{\binom{n-nd_{t}}{k}}{\binom{n}{k}}+(1-p)\left(  1-\frac{\binom{n-nd_{t}%
668: }{k}}{\binom{n}{k}}\right)  \right]  ^{rn\ln n}\\
669: &  = \exp\left[  n\ln n\left(  r\ln\left(
670: 1-p+p(1-d_{t})^{k}\right)  +\alpha d_{t}\right)+O(n)  \right]  .
671: \end{align*}
672: Indeed, it can be shown, from the results in [33] (pp.97-98),  
673: that $E_{f}[X],$ for $r<r_{cr}$ or $p<p_{cr},$
674: will be asymptotically maximized when $d_{t}$ takes the largest possible
675: value, i.e. $d_{t}=1.$
676: For random satisfiable instances of RB/RD, we have
677: \begin{align*}
678: E[X]  & =\binom{n}{nd_{t}}\left(  n^{\alpha}-1\right)  ^{nd_{t}}\left(
679: 1-p\right)  ^{rn\ln n}\\
680: & =  \exp\left[  n\ln n\left(  r\ln(1-p)+\alpha
681: d_{t}\right)+O(n)  \right]  .
682: \end{align*}
683: It is straightforward to see that the same pattern holds
684: for this case, i.e. $E[X]$ will be asymptotically maximized when $d_{t}=1.$
685: So, intuitively speaking, for RB/RD, given an assignment $t,$ for both forced
686: satisfiable instances satisfying $t$ and random satisfiable instances, 
687: most solutions distribute in a place far from $t.$
688: This further indicates that the strategy has little effect on the distribution 
689: of solutions for RB/RD, and so it will not be be biased towards generating 
690: instances with many solutions around the forced satisfying assignment. 
691: For random 3-SAT, similarly, we have%
692: \begin{align*}
693: E_{f}[X]  & =\binom{n}{nd_{t}}\left[  \frac{\binom{n-nd_{t}}{3}}{\binom
694: {n}{3}}+\frac{6}{7}\left(  1-\frac{\binom{n-nd_{t}}{3}}{\binom{n}{3}}\right)
695: \right]  ^{rn}\\
696: & =f_{1}(n)\exp\left[  n\left(  -d_{t}\ln d_{t}-(1-d_{t})\ln(1-d_{t}%
697: )+r\ln\frac{6+(1-d_{t})^{3}}{7}\right)  \right]  ,
698: \end{align*}
699: 
700: \noindent and%
701: \begin{align*}
702: E[X]  & =\binom{n}{nd_{t}}\left(  \frac{7}{8}\right)  ^{rn}\\
703: & =f_{2}(n)\exp\left[  n\left(  -d_{t}\ln d_{t}-(1-d_{t})\ln(1-d_{t}%
704: )+r\ln\frac{7}{8}\right)  \right]  ,
705: \end{align*}
706: where $f_{1}(n)$ and $f_{2}(n)$ are two polynomial functions.
707: It follows from the results in [34] that 
708: as $r$ (the ratio of clauses to variables) approaches 4.25, 
709: $E_{f}[X]$ and $E[X]$ will be asymptotically maximized 
710: when $d_{t}\approx 0.24$ and $d_{t}=0.5$ respectively. This means, 
711: in contrast to RB/RD, that compared with random 
712: satisfiable instances, most solutions of forced 
713: satisfiable instances distribute in a place much closer to the 
714: forced satisfying assignment when $r$ is near the threshold.
715: 
716: Note that the number and the distribution of solutions are the two most
717: important factors determining the cost of solving satisfiable instances. 
718: So, we can expect, from the above analysis, that for RB/RD,
719: the hardness of solving forced satisfiable instances should be similar
720: to that of solving random satisfiable instances. 
721: More interestingly, it therefore seems that we can, based on the hardness
722: of RB/RD, propose  
723: a new method to generate hard satisfiable instances, i.e. generating
724: forced satisfiable instances of RB/RD with a large number of variables
725: near the threshold
726: identified exactly by Theorem 1 or Theorem 2.
727: Experimental results have further confirmed this idea\footnote{\small 
728: We thank Dr. Christophe Lecoutre and Liu Yang very much for performing the experiments.}. 
729: It is shown, in one experiment for RB with $k=2, n=30, d=15$ and $m=250,$ 
730: that the mean time of solving forced satisfiable instances near the
731: threshold is only slightly smaller (11 percent) than that of 
732: solving random satisfiable instances with the same 
733: parameters\footnote{\small As specified by the conditions of Theorem 2, 
734: to make exact phase transitions hold, the values of $\alpha$ and $r$ 
735: should not be small. So, we should choose dense CSPs with a large domain.}.  
736: More importantly, experiments for RB also indicate that the hardness of 
737: solving forced satisfiable 
738: instances grows exponentially with the number of variables\footnote
739: {\small According to the
740: definitions of RB/RD and Theorems 1 and 2, the parameters $\alpha,$ $r$ and $p$ 
741: should be fixed when $n$ increases. The values of the threshold points can also
742: be obtained from these two theorems.}
743: near the threshold, 
744: and we can, in fact, generate forced satisfiable instances appearing to be
745: very hard to solve (for both complete and incomplete algorithms) even when the 
746: number of variables is only moderately 
747: large (e.g. $k=2, n=59, \alpha=0.8$ and $r=0.8/\ln\frac{4}{3}$ with 
748: constraint tightness 
749: $p=p_{cr}=0.25$ computed by Theorem 2, or equivalently expressed
750: as $k=2, n=59, d=26$ and $m=669$ with the same tightness\footnote
751: {\small If non-integer values occur in the computation 
752: of $d$ and $m$ from $n,$ $\alpha$ and $r,$ then we round them to the 
753: nearest integers.})\footnote 
754: {\small Benchmarks of Model RB (in both SAT and CSP format) are available at
755: www.nlsde.buaa.edu.cn/\symbol{126}kexu/ benchmarks/benchmarks.htm.}.
756: %More interestingly, we have successfully generated some forced 
757: %satisfiable instances which appear to be very hard to solve, i.e. these instances can not be
758: %solved by state-of-the-art CSP algorithms in a reasonable time (e.g. 1 day).
759: Although there have been some other ways to generate hard satisfiable 
760: instances empirically, e.g. the quasigroup method [3], we think that 
761: the simple and natural method presented in this paper, 
762: based on models (i.e. Model RB/RD) with exact phase transitions and many 
763: hard instances, should be well worth further investigation.  
764: 
765: \bigskip
766: \smallskip
767: \noindent {\large {\bf 5. Exponential Lower Bounds for Satisfiable Instances of Model RB/RD}} 
768: 
769: \noindent For random CSP instances of RB/RD, we know from Theorems 1 and 2 that almost
770: surely, they are satisfiable below the threshold and unsatisfiable above the
771: threshold. For satisfiable instances, there are no resolution proofs, or, if
772: any, the resolution proofs are of infinite length. Therefore, the exponential
773: resolution lower bounds, established in Theorem 4, are of interest only for
774: instances above the threshold. Also, in many other cases, exponential lower
775: bounds have been shown only for unsatisfiable instances, and it seems quite
776: difficult to derive such lower bounds for satisfiable instances. A recent progress
777: in this direction, made by Achlioptas et. al. [5], is that 
778: exponential lower bounds have been established for certain natural
779: DPLL algorithms on some provably satisfiable instances of random $k$-SAT for $k\geq 4.$
780: In this section, we will analyze the complexity of solving RB/RD below the threshold
781: and obtain the following results.
782: 
783: {\bf Theorem 5} \ Given a random CSP instance of RB/RD with $r_{cr}-\epsilon
784: _{r}<r\leq r_{cr}$ or $p_{cr}-\epsilon_{p}<p\leq p_{cr}$, where $\epsilon_{r}%
785: =-\frac{\alpha}{\ln(1-p)}+\frac{\alpha(1-\frac{c}{24})}{\ln\left(  1-p\left(
786: 1-\frac{c^{k}}{12^{k}}\right)  \right)  }$ and $\epsilon_{p}=\left[
787: 1-\exp\left(  -\frac{\alpha}{r}(1-\frac{c}{24})\right)  \right]  \frac{12^{k}%
788: }{12^{k}-c^{k}}-1+\exp\left(  -\frac{\alpha}{r}\right)  $ are two positive constants, we uniformly select
789: without repetition $\frac{c}{12}n$ variables, and assign each of these
790: variables a value from its domain at random. If such values does not violate any constraint, 
791: then, almost surely, the residual
792: formula is unsatisfiable and has no tree-like resolution proofs of less than
793: exponential size.
794: 
795: {\bf Proof:} Let $E[X]$ denote the expected number of assignments satisfying the
796: residual formula. By assumption, the partial assignment to
797: the $\frac{c}{12}n$ variables does not violate any constraint. Then%
798: \[
799: E[X]= d^{n-\frac{c}{12}n}\left[  1-p\left(  1-\frac{c^{k}}{12^{k}}\right)
800: \right]  ^{rn\ln n}.
801: \]
802: 
803: \noindent For $r_{cr}-\epsilon_{r}<r\leq r_{cr},$ we have%
804: \begin{align*}
805: E[X] &  \leq n^{\alpha n(1-\frac{c}{12})}\left[  1-p\left(  1-\frac{c^{k}%
806: }{12^{k}}\right)  \right]  ^{(r_{cr}-\epsilon_{r})n\ln n}\\
807: &  \leq\exp\left[  \left(  -\epsilon_{r}\ln\left(  1-p\left(  1-\frac{c^{k}%
808: }{12^{k}}\right)  \right)  -\frac{\alpha c}{12}\right)  n\ln n\right]  \\
809: &  =\exp\left(  -\frac{\alpha c}{24}n\ln n\right)  =o(1).
810: \end{align*}
811: 
812: \noindent By Markov's inequality, we know that the residual formula will be almost
813: surely unsatisfiable. For the phase transition with respect to $p,$ the proof
814: can be done similarly. 
815: Now we prove that for the residual formula, any sub-problem of size at most
816: $cn$ is almost surely satisfiable. Based on the
817: proofs of Lemmas 2 and 3, we only need to show that for any sub-problem with
818: size $1\leq s\leq$ $cn$ containing unassigned variables, there almost surely
819: exists an unassigned variable with degree at most $3k\beta\ln n.$ Thus, it is
820: sufficient to prove that for any sub-problem with size $1+\frac{c}{12}n\leq
821: s\leq$ $cn$ $+\frac{c}{12}n$ containing the $\frac{c}{12}n$ assigned
822: variables, there almost surely exists an unassigned variable with degree at
823: most $3k\beta\ln n.$ For such a sub-problem, the probability that an
824: unassigned variable has a degree at least $3k\beta\ln n$ is not greater than%
825: \[
826: \binom{rn\ln n}{b}\binom{kb}{b}\left(  \frac{1}{n}\right)  ^{b}\left(
827: \frac{s}{n}\right)  ^{kb-b}\text{ \ where }b=3k\beta\ln n.
828: \]
829: 
830: 
831: \noindent Then, the probabilty that all the unassigned variables have degrees at least
832: $3k\beta\ln n$ is not greater than%
833: \[
834: \left[  \binom{rn\ln n}{b}\binom{kb}{b}\left(  \frac{1}{n}\right)  ^{b}\left(
835: \frac{s}{n}\right)  ^{kb-b}\right]  ^{s-\frac{c}{12}n}.
836: \]
837: 
838: 
839: \noindent There are $\binom{n-\frac{c}{12}n}{s-\frac{c}{12}n}$ possible choices for such
840: sub-problems$.$ So the expected number of such sub-problems with size
841: $1+\frac{c}{12}n\leq s\leq$ $cn$ $+\frac{c}{12}n$ is at most %
842: 
843: \begin{align*}
844: & \underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%
845: %TCIMACRO{\dsum }%
846: %BeginExpansion
847: {\displaystyle\sum}
848: %EndExpansion
849: }}\binom{n-\frac{c}{12}n}{s-\frac{c}{12}n}\left[  \binom{rn\ln n}{b}\binom
850: {kb}{b}\left(  \frac{1}{n}\right)  ^{b}\left(  \frac{s}{n}\right)
851: ^{kb-b}\right]  ^{s-\frac{c}{12}n}\text{ where }b=3k\beta\ln n\\
852: & \leq\underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%
853: %TCIMACRO{\dsum }%
854: %BeginExpansion
855: {\displaystyle\sum}
856: %EndExpansion
857: }}\left(  \frac{e(n-\frac{c}{12}n)}{s-\frac{c}{12}n}\right)  ^{s-\frac{c}%
858: {12}n}\left[  \left(  \frac{rn\ln n}{b}\right)  ^{b}\left(  \frac{ekb}%
859: {b}\right)  ^{b}\left(  \frac{1}{n}\right)  ^{b}\left(  \frac{s}{n}\right)
860: ^{kb-b}\right]  ^{s-\frac{c}{12}n}\\
861: & \leq\underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%
862: %TCIMACRO{\dsum }%
863: %BeginExpansion
864: {\displaystyle\sum}
865: %EndExpansion
866: }}\left[  en\left(  \frac{re}{3\beta}\right)  ^{3k\beta\ln n}\left(  \frac
867: {s}{n}\right)  ^{3k(k-1)\beta\ln n}\right]  ^{s-\frac{c}{12}n}.
868: \end{align*}
869: 
870: 
871: \noindent In the proof of Lemma 1, we define $e\left(  \frac{re}{\beta
872: }\right)  ^{\beta\ln n}<n^{c_{1}}$ and $c<\frac{1}{2}\exp\left(
873: -\frac{2+c_{1}}{(k-1)\beta}\right)  .$ Substituting them into the above
874: inequality, we get%
875: 
876: \begin{align*}
877: & \underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%
878: %TCIMACRO{\dsum }%
879: %BeginExpansion
880: {\displaystyle\sum}
881: %EndExpansion
882: }}\left[  en\left(  \frac{re}{3\beta}\right)  ^{3k\beta\ln n}\left(  \frac
883: {s}{n}\right)  ^{3k(k-1)\beta\ln n}\right]  ^{s-\frac{c}{12}n}\text{ where
884: }1+\frac{c}{12}n\leq s\leq cn+\frac{c}{12}n\text{\  }\\
885: & \leq\underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%
886: %TCIMACRO{\dsum }%
887: %BeginExpansion
888: {\displaystyle\sum}
889: %EndExpansion
890: }}\left[  en\frac{n^{3kc_{1}}}{e^{3k}}\frac{1}{3^{3k\beta\ln n}}%
891: n^{-3kc_{1}-6k}\right]  \\
892: & =\underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%
893: %TCIMACRO{\dsum }%
894: %BeginExpansion
895: {\displaystyle\sum}
896: %EndExpansion
897: }}O\left(  \frac{1}{n^{2}}\right)  =o(1),
898: \end{align*}
899: as required. Now for the residual formula, Lemmas 3 and 4 follow immediately.
900: Recall that in Lemma 5, we prove that there are at
901: least $\frac{c}{6}n$ variables in $P_{1}$ with degree at most $3k\beta\ln n.$
902: For the residual formula where $\frac{c}{12}n$ variables have been assigned
903: values, there are at least $\frac{c}{12}n$ variables in $P_{1}$ with degree at
904: most $3k\beta\ln n$. Similarly, we can prove that almost surely, there is a
905: clause with at least $\frac{c}{12}n$ literals for the residual formula. By
906: Theorem 4, we finish the proof. Note that the constant $c$ can be
907: chosen to monotonically decrease with $r$ or $p.$ Here we can, therefore,
908: take the value of $c$ as that for $r=r_{cr}$ or $p=p_{cr}$ and try to
909: make it as small as possible (in order to guarantee that $\epsilon_{r}$ and $\epsilon_{p}$
910: are two positive constants). \hfill $\Box $
911: 
912: Generally speaking, different search algorithms use different strategies to
913: search for solutions. Rather than focusing on some specific algorithms, we relate
914: the hardness of solving satisfiable instances to that of solving unsatisfiable
915: sub-problems, because if it takes a long time to solve the sub-problems
916: generated in the search process, then the original problem can not be solved
917: quickly [24]. Theorem 5 indicates that for satisfiable instances of RB/RD below
918: and close to the threshold, if a resolution-based algorithm can not detect any 
919: contradiction
920: in the early stage of a search branch, then the algorithm will, very likely, 
921: generate a large-sized unsatisfiable sub-problem. As a result, it will, then,
922: almost surely take exponential time to explore large subtrees to prove the
923: unsatisfiability of the sub-problem. 
924: Indeed, there are exponentially many large-sized unsatisfiable sub-problems. 
925: More precisely, it can be computed
926: that the total number of residual formulas with $\frac{c}{12}n$ assigned
927: variables and without violating any constraint is at least%
928: \begin{align*}
929: \binom{n}{\frac{c}{12}n}d^{\frac{c}{12}n}\left(  1-(\frac{c}{12})^{k}p\right)
930: ^{r_{cr}n\ln n}  & \geq\binom{n}{\frac{c}{12}n}\exp\left[  \frac{\alpha cn\ln
931: n}{12}\left(  1-\frac{p}{12\ln(1-p)}\right)  \right]  \\
932: & =\exp\left(  \Omega(n\ln n)\right)  .
933: \end{align*}
934: So, intuitively speaking, when solving
935: satisfiable instances of RB/RD near the threshold, backtrack-style algorithms
936: will very easily fall into pitfalls with no solutions, and then, worse still,
937: take a long time to escape from these pitfalls. To our best knowledge, this is
938: the first result on the complexity of solving satisfiable instances near the proved
939: threshold, which can help us to gain a better understanding of the extreme
940: hardness of instances in the phase transition region.
941: 
942: For random forced satisfiable instances near the proved threshold, similarly, we have
943: the following result.
944: 
945: {\bf Theorem 6} \ Given a random forced satisfiable instance of RB/RD with
946: $r_{cr}-\epsilon_{r}<r\leq r_{cr}$ or $p_{cr}-\epsilon_{p}<p\leq p_{cr}$, where $\epsilon_{r}%
947: =-\frac{\alpha}{\ln(1-p)}+\frac{\alpha(1-\frac{c}{24})}{\ln\left(  1-p\left(
948: 1-\frac{c^{k}}{12^{k}}\right)  \right)  }$ and $\epsilon_{p}=\left[
949: 1-\exp\left(  -\frac{\alpha}{r}(1-\frac{c}{24})\right)  \right]  \frac{12^{k}%
950: }{12^{k}-c^{k}}-1+\exp\left(  -\frac{\alpha}{r}\right)  $ are two positive constants, we
951: uniformly select without repetition $\frac{c}{12}n$ variables, and assign each
952: of these variables a value from its domain at random. If such values does not violate 
953: any constraint, then, almost surely, the
954: residual formula is unsatisfiable and has no tree-like resolution proofs of
955: less than exponential size.
956: 
957: {\bf Proof:} Due to limited space, we only give the proof for the case of the phase
958: transition with respect to $r$ in Model RD with $\frac{1}{k}<\alpha<1. $ The
959: other cases can be handled similarly. Assume that we have two assignments
960: $t_{1}$ and $t_{2}$ and the similarity number [33] between $t_{1}$ and $t_{2}$
961: is $S^{f}(<t_{1},t_{2}>)=S.$ Let $P$ be a random instance of Model RD. Based
962: on the analysis in [33] (p.96), the probability that both $t_{1}$ and $t_{2}$ satisfy
963: $P$ is%
964: \[
965: \Pr[t_{1}\text{ and }t_{2}\text{ satisfy }P]=\left[  (1-p)\frac{\binom{S}{k}%
966: }{\binom{n}{k}}+(1-p)^{2}\left(  1-\frac{\binom{S}{k}}{\binom{n}{k}}\right)
967: \right]  ^{rn\ln n}.
968: \]
969: 
970: \noindent Now we suppose that $t_{0}$ is a random forced satisfying assignment and $t$
971: is an assignment with $S^{f}(<t_{0},t>)=S.$ Let $P_{sat}$ be a random forced
972: satisfiable formula of Model RD with $t_{0}$ as the forced satisfying
973: assignment. Then the probability that $t$ satisfies $P_{sat}$ is%
974: \begin{align*}
975: \Pr[t\text{ satisfies }P_{sat}]  & =\frac{\Pr[t_{0}\text{ and }t\text{ satisfy
976: }P]}{\Pr[t_{0}\text{ satisfy }P]}\\
977: & =\left[  1-p+p\left(  \left(  \frac{S}{n}\right)  ^{k}+\frac{g\left(
978: \frac{S}{n}\right)  }{n}\right)  +O\left(  \frac{1}{n^{2}}\right)  \right]
979: ^{rn\ln n}.
980: \end{align*}
981: 
982: \noindent where $g(s)=\frac{k(k-1)}{2}(s^{k}-s^{k-1}).$ Now, for the random forced
983: satisfiable formula $P_{sat},$ we uniformly select without repetition
984: $\frac{c}{12}n$ variables and then assign each of these variables a value from
985: its domain at random. By the standard Chernoff bound, it is easy to show
986: that the similarity number between the forced satisfying assignment $t_{0}$
987: and the random partial assignment to the $\frac{c}{12}n$ variables is almost
988: surely less than $\frac{c}{6}n^{1-\alpha}.$ For the residual formula, we have
989: totally $d^{n-\frac{c}{12}n}$ possible assignments. Let $t^{\prime}$ be an
990: assignment to the $n-\frac{c}{12}n$ variables of the residual formula with
991: $S^{f}(<t_{0},t^{\prime}>)=S^{\prime}.$ By assumption, the partial assignment to
992: the $\frac{c}{12}n$ variables does not violate any constraint. 
993: Thus, almost surely, the probability
994: that $t^{\prime}$ satisfies the residual formula is at most%
995: \[
996: \left[  1-p\left(  1-\frac{c^{k}}{12^{k}}\right)  \left(  1-\left(  \frac
997: {c}{6n^{\alpha}}+\frac{S^{\prime}}{n}\right)  ^{k}O(1)-\frac{g\left(  \frac
998: {c}{6n^{\alpha}}+\frac{S^{\prime}}{n}\right)  }{n}O(1)\right)  \right]  ^{rn\ln
999: n}.
1000: \]
1001: 
1002: \noindent Let $E[X]$ be the expected number of assignments satisfying the residual
1003: formula. Similar to the asymptotic analysis in [33] (p.99), for
1004: $r_{cr}-\epsilon_{r}<r\leq r_{cr},$ we have%
1005: \begin{align*}
1006: E[X] &  \leq\overset{n-\frac{c}{12}n}{\underset{S^{\prime}=0}%
1007: {{\displaystyle\sum}}}\binom{n-\frac{c}{12}n}{S^{\prime}}\left(  n^{\alpha
1008: }-1\right)  ^{n-\frac{c}{12}n-S^{\prime}}\\
1009: &  \cdot\left[  1-p\left(  1-\frac{c^{k}}{12^{k}}\right)  \left(  1-\left(
1010: \frac{c}{6n^{\alpha}}+\frac{S^{\prime}}{n}\right)  ^{k}O(1)-\frac{g\left(
1011: \frac{c}{6n^{\alpha}}+\frac{S^{\prime}}{n}\right)  }{n}O(1)\right)  \right]
1012: ^{rn\ln n}\\
1013: &  \approx n^{\alpha n(1-\frac{c}{12})}\left[  1-p\left(  1-\frac{c^{k}%
1014: }{12^{k}}\right)  \right]  ^{rn\ln n}\underset{S^{\prime}=0}%
1015: {{\displaystyle\sum}}\binom{n-\frac{c}{12}n}{S^{\prime}}\left(  \frac
1016: {1}{n^{\alpha}}\right)  ^{S^{\prime}}\left(  1-\frac{1}{n^{\alpha}}\right)
1017: ^{n-S^{\prime}}\text{ \ for }\frac{1}{k}<\alpha<1\\
1018: &  \approx n^{\alpha n(1-\frac{c}{12})}\left[  1-p\left(  1-\frac{c^{k}%
1019: }{12^{k}}\right)  \right]  ^{rn\ln n}.
1020: \end{align*}
1021: 
1022: \noindent Note that the forced satisfying assignment has no effect on the
1023: structure of constraint graphs.
1024: The rest of the proof is identical to that in Theorem 5 and so we are done. \hfill $\Box $
1025: 
1026: The above theorem, as far as we know, is the first complexity result of
1027: resolution-based algorithms on forced satisfiable instances, which further 
1028: provides, from another aspect, a
1029: strong theoretical support for the method of generating hard satisfiable
1030: instances proposed in the last section.
1031: 
1032: 
1033: \bigskip
1034: 
1035: \noindent {\large {\bf 6. Conclusions}}
1036: 
1037: \smallskip
1038: 
1039: \noindent In this paper, by encoding CSPs into CNF formulas, we proved
1040: exponential lower bounds for tree-like resolution proofs of two random CSP
1041: models with exact phase transitions, i.e. Model RB/RD. This result suggests
1042: that we not only introduce new families of CNF formulas hard for resolution,
1043: which is a central task of Proof-Complexity theory, but also propose models 
1044: with both many hard instances and exact phase transitions. More interestingly, 
1045: it is shown both theoretically and experimentally that an application of RB/RD 
1046: might be in the generation of hard satisfiable instances, which is further
1047: supported by the exponential lower bounds established in Section 6.
1048: 
1049: As mentioned before, there are some other NP-complete problems with proved
1050: exact phase transitions, e.g. Hamiltonian cycle problem and random 2+$p$-SAT
1051: ($0<p\leq 0.4$). However, it has been shown either experimentally or
1052: theoretically that the instances produced by these problems are generally
1053: easy to solve. So one would naturally ask what the main difference between
1054: these ``easy" NP-complete problems and RB/RD is. It seems that for these ``easy"
1055: NP-complete problems with exact phase transitions, they usually have some 
1056: kind of local property which can be used to design polynomial time algorithms 
1057: working with high probability, and the exact phase transitions are, in fact, 
1058: obtained by probabilistic analysis of such algorithms.
1059: So, it appears that if a problem has exact phase transitions obtained
1060: by algorithm analysis, then it also means that the problem is  
1061: not hard to solve. For RB/RD, the situation is, however, completely different. 
1062: More specifically, the exact phase transitions of RB/RD are 
1063: obtained, not by analysis of algorithms, but by use of the 
1064: first and the second moment methods which say nothing about the local 
1065: property of the problem and are, therefore, unlikely to be useful for designing
1066: more efficient algorithms. 
1067: Thus, it seems that RB/RD, unlike the ``easy" NP-complete problems, 
1068: can indeed provide a reliable source 
1069: to generate random benchmark instances, as many and as hard as we need. 
1070: 
1071: 
1072: Note that more recently, Frieze and Wormald [15] studied random $k$-SAT for moderately
1073: growing $k,$ i.e. $k=k(n)$ satisfies $k-\log _{2}n\rightarrow \infty$ 
1074: where $n$ is the number of variables. 
1075: For this model, they established similarly, by use of the first and the second 
1076: moment methods, that there exists a satisfiability threshold at
1077: which the number of clauses is $m=2^{k}n\ln 2$. 
1078: %They proved that for this model, a random instance is satisfiable (unsatisfiable)
1079: %with probability tending to 1 as the number of variables $n\rightarrow\infty$ 
1080: %if the number of clauses $m\leq (1-\epsilon)m_0$ 
1081: %($m\geq (1-\epsilon)m_0$) where $m_0=2^{k}n\ln2$ and $\epsilon=\epsilon(n)>0$
1082: %satisfying $\epsilon n\rightarrow\infty.$
1083: From Beame et al's earlier work on the complexity of unsatisfiability 
1084: proofs for random $k$-SAT formulas [6, 7], we know that the 
1085: size of resolution refutations for this
1086: model is exponential with high probability. So, the variant of
1087: random $k$-SAT studied by Frieze and Wormald is also a model with both proved
1088: phase transitions and many hard instances. 
1089: %But unlike the phase transitions
1090: %of random $k$-SAT with fixed $k$ such as random 3-SAT, the critical value of 
1091: %the ratio of clauses to variables for this variant model is not a
1092: %constant but grows with the number of variables. 
1093: 
1094: To gain a better understanding of Model RB/RD, we now 
1095: make a comparison of them with the well-studied
1096: %Now, we can also make a comparison between Model RB/RD and the well-studied 
1097: random 3-SAT of similar proof complexity. 
1098: First, we think that the exact phase transitions should be one advantage 
1099: of RB/RD, which 
1100: can help us to locate the hardest instances more 
1101: precisely and conveniently when implementing 
1102: large-scale computational experiments. As for the theoretical aspect, it seems
1103: that RB/RD, intrinsically, are much mathematically 
1104: easier to analyze 
1105: than random 3-SAT, such as in the derivation of thresholds. 
1106: From a personal perspective, we think that 
1107: such mathematical tractability should be another advantage of RB/RD, making 
1108: it possible to obtain some interesting results which do not hold or can not 
1109: be easily obtained for random 3-SAT, just as shown on forced satisfiable
1110: instances.
1111: 
1112: In summary, the Hamiltonian cycle problem, random 3-SAT and Model RB/RD,
1113: respectively, exhibit three different kinds of phase transition behavior in
1114: NP-complete problems. Compared with the former two that have been
1115: extensively explored in the past decade, the third one (i.e. the phase
1116: transition behavior with both exact thresholds and many hard instances),
1117: due to various reasons, has not received much attention so far. 
1118: From this point, the main contribution of this paper, we can say, is not 
1119: in the mathematical techniques used, nor the concrete models studied 
1120: (although such models are useful for CSP research in their own right), but  
1121: pointing out an interesting behavior for study.  
1122: Finally, we hope that more investigations, either experimental or
1123: theoretical, will be carried out on this behavior, and we also believe that
1124: such studies will lead to deep insights and new discoveries in this active
1125: area of research (i.e. on phase transitions and computational complexity). 
1126: 
1127: \bigskip
1128: 
1129: \noindent {\large {\bf References}}
1130: {\small
1131: %\smallskip
1132: 
1133: \begin{enumerate}
1134: \item D. Achlioptas, L. Kirousis, E. Kranakis and D. Krizanc, Rigorous
1135: results for random (2+$p$)-SAT, In: {\it Proceedings of RALCOM-97}, pp.1-10.
1136: 
1137: \item D. Achlioptas, LM Kirousis, E. Kranakis, D. Krizanc, M. SO Molloy, and YC. Stamatiou, 
1138: Random Constraint
1139: Satisfaction: A More Accurate Picture, In: {\it Proc. Third International Conference on Principles and
1140: Practice of Constraint Programming} (CP 97), LNCS 1330, pp.107-120, 1997. 
1141: 
1142: \item D. Achlioptas, C. Gomes, H. Kautz, and B. Selman, Generating Satisfiable 
1143: Problem Instances, In: {\it Proceedings of AAAI-00}, pp.256-301.
1144: 
1145: \item D. Achlioptas and C. Moore. The Asymptotic Order of the Random $k$-SAT Threshold.
1146: In {\it Proc. FOCS 2002}, pp.779-788.
1147: 
1148: \item D. Achlioptas, P. Beame and M. Molloy. Exponential Bounds for DPLL below the 
1149: Satisfiability Threshold. In: {\it Proc. SODA 2004}, to appear.
1150: 
1151: \item P. Beame, R. Karp, T. Pitassi, and M. Saks. On the complexity of 
1152: unsatisfiability proofs for random $k$-CNF formulas. In: {\it Proceeding of STOC-98}, pp.561-571.
1153: 
1154: \item P. Beame, R. Karp, T. Pitassi, and M. Saks. The efficiency of resolution 
1155: and Davis-Putnam procedures. {\it SIAM Journal on Computing}, 31(4):1048-1075, 2002.
1156: 
1157: \item E. Ben-Sasson and A. Wigderson. Short proofs are narrow - resolution
1158: made simple. {\it Journal of the ACM}, 48(2):149-169, 2001.
1159: 
1160: \item B. Bollob\'{a}s, T.I. Fenner and A.M. Frieze. An algorithm for finding
1161: Hamilton paths and cycles in random graphs. {\it Combinatorica}
1162: 7(4):327-341, 1987.
1163: 
1164: \item V. Chv\'{a}tal and E. Szemer\'{e}di. Many hard examples for
1165: resolution. {\it Journal of the ACM}, 35(4) (1988) 759-208.
1166: 
1167: \item V. Chv\'{a}tal and B. Reed. Miks gets some (the odds are on his side).
1168: In: {\it Proceedings of the 33rd IEEE Symp. on Foundations of Computer
1169: Science}, pages 620-627, 1992.
1170: 
1171: \item S. Cook and D. Mitchell. Finding Hard Instances of the Satisfiability
1172: Problem: A Survey, In: {\it Satisfiability Problem: Theory and Applications}%
1173: . Du, Gu and Pardalos (Eds). DIMACS Series in Discrete Mathematics and
1174: Theoretical Computer Science, Volume 35, 1997.
1175: 
1176: \item O. Dubois and J. Mandler. The 3-XORSAT threshold. In: {\it Proc. FOCS 2002}.
1177: 
1178: \item E. Friedgut, Sharp thresholds of graph properties, and the k-sat
1179: problem. With an appendix by Jean Bourgain. {\it Journal of the American
1180: Mathematical Society} 12 (1999) 1017-1054.
1181: 
1182: \item  A.M. Frieze and N.C. Wormald. Random $k$-SAT: A tight threshold for moderately 
1183: growing $k,$ In: {\it Proceedings of the Fifth International Symposium on Theory 
1184: and Applications of Satisfiability Testing}, pp.1-6, 2002.
1185: 
1186: \item A. Flaxman. A sharp threshold for a random constraint satisfaction problem, preprint.
1187: 
1188: \item A. Frieze and M. Molloy. The satisfiability threshold for randomly generated 
1189: binary constraint satisfaction problems. In: {\it Proceedings of RANDOM-03}, 2003.
1190: 
1191: \item Y. Gao and J. Culberson. Resolution Complexity of Random Constraint Satisfaction 
1192: Problems: Another Half of the Story. In: {\it Proc. of LICS-03, Workshop on Typical Case 
1193: Complexity and Phase Transitions}, Ottawa, Canada, June, 2003.
1194: 
1195: \item I.P. Gent, E. MacIntyre, P. Prosser, B.M. Smith and T. Walsh, Random Constraint 
1196: Satisfaction: flaws and structures. {\it Journal of Constraints} 6(4), 345-372, 2001.
1197: 
1198: \item A. Goerdt. A threshold for unsatisfiability. In: {\it 17th
1199: International Symposium of Mathematical Foundations of Computer Science},
1200: Springer LNCS 629 (1992), pp.264-275.
1201: 
1202: \item R. Impagliazzo, L. Levin, and M. Luby. Pseudo-random number generation from 
1203: one-way functions. In: {\it Proceedings of STOC-89}, pp.12-24. 
1204: 
1205: \item M. Koml\'{o}s and E. Szemer\'{e}di. Limit distribution for the
1206: existence of a Hamilton cycle in a random graph. {\it Discrete Mathematics},
1207: 43, pp.55-63, 1983.
1208: 
1209: \item M. Luby. Pseudorandomness and Cryptographic Applications. Princeton 
1210: University Press, 1996. 
1211: 
1212: \item D. Mitchell: Hard Problems for CSP Algorithms. In: {\it Proceedings of 15th
1213: National Conf. on Artificial Intelligence} (AAAI-98), pp.398-405, 1998.
1214: 
1215: \item D. Mitchell, B. Selman, and H. Levesque. Hard and easy distributions
1216: of sat problems. In: {\it Proceedings of 10th National Conf. on Artificial
1217: Intelligence} (AAAI-92), pp.459-465, 1992.
1218: 
1219: \item D. Mitchell. Resolution Complexity of Random Constraints, In: {\it %
1220: Proceedings of CP 2002}, LNCS 2470, pp.295-309. 
1221: 
1222: \item M. Molloy. Models for Random Constraint Satisfaction Problems,
1223: submitted. Conference version in {\it Proceedings of STOC 2002}.
1224: 
1225: \item M. Molloy and M. Salavatipour. The resolution complexity of random 
1226: constraint satisfaction problems. In: {\it Proc. FOCS-03}, 2003.
1227: 
1228: \item R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman and L. Troyansky.
1229: Determining computational complexity from characteristic phase transitions.
1230: {\it Nature}, 400(8):133-137, 1999.
1231: 
1232: \item R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman and L. Troyansky,
1233: Phase transition and search Cost in the 2+$p$-SAT problem, In: {\it 4th
1234: Workshop on Physics and Computation}, Boston University 22-24 November 1996,
1235: (PhysComp96).
1236: 
1237: \item B.M. Smith. Constructing an Asymptotic Phase Transition in Random Binary 
1238: Constraint Satisfaction Problems. {\it Theoretical Computer Science}, vol. 265, 
1239: pp. 265-283 (Special Issue on NP-Hardness and Phase Transitions), 2001.
1240: 
1241: \item B. Vandegriend and J. Culberson. The $G_{n,m}$ phase transition is not
1242: hard for the Hamiltonian Cycle problem. {\it Journal of Artificial
1243: Intelligence Research}, 9:219-245, 1998.
1244: 
1245: \item K. Xu and W. Li. Exact Phase Transitions in Random Constraint
1246: Satisfaction Problems. {\it Journal of Artificial Intelligence Research},
1247: 12:93-103, 2000. 
1248: 
1249: \item K. Xu. A Study on the Phase Transitions of SAT and CSP (in Chinese). 
1250: Ph.D. Thesis, Beihang University, 2000.
1251: 
1252: \item K. Xu and W. Li. On the Average Similarity Degree between Solutions of Random 
1253: $k$-SAT and Random CSPs. {\it Discrete Applied Mathematics}, to appear. 
1254: 
1255: \medskip
1256: \end{enumerate}
1257: }
1258: 
1259: \noindent {\large {\bf Appendix}}
1260: 
1261: \smallskip
1262: 
1263: Now we consider the proof of Lemma 2 for Model RB. Given a variable $u$ an $i
1264: $-constraint assignment tuple $T_{i,u}.$ It is easy to see that the
1265: probability that $u$ is flawed by $T_{i,u}$ increases with the number of
1266: constraints $i.$ Thus we have
1267: 
1268: \[
1269: \Pr (T_{i,u}\text{ is flawed})|_{i\leq 3k\beta \ln n}\leq \Pr (T_{i,u}\text{
1270: is flawed})|_{i=3k\beta \ln n}.
1271: \]
1272: 
1273: \noindent For the variable $u,$ there are $d=n^{\alpha }$ values in its
1274: domain, denoted by $v_{1},v_{2},\cdots ,v_{d}.$ Let $\Pr (A_{j})$ denote the
1275: probability that $v_{j}$ is not flawed by $T_{i,u}.$ Thus the probability
1276: that at least one value is not flawed by $T_{i,u},$ i.e. the probability
1277: that the variable $u$ is not flawed by $T_{i,u}$ is
1278: \begin{eqnarray*}
1279: \Pr (A_{1}\cup A_{2}\cup \cdots \cup A_{d}) &=&\underset{1\leq p\leq d}{\sum
1280: }\Pr (A_{p})-\underset{1\leq p,q\leq d,p\neq q}{\sum }\Pr (A_{p}A_{q}) \\
1281: &&+\cdots +(-1)^{d-1}\Pr (A_{1}A_{2}\cdots A_{d}).
1282: \end{eqnarray*}
1283: 
1284: \noindent Then
1285: \begin{eqnarray*}
1286: \Pr (T_{i,u}\text{ is flawed}) &=&1-\Pr (A_{1}\cup A_{2}\cup \cdots \cup
1287: A_{d}) \\
1288: &=&1+\underset{j=1}{\overset{d}{\sum }}(-1)^{j}\binom{d}{j}\Pr
1289: (A_{1}A_{2}\cdots A_{j}).
1290: \end{eqnarray*}
1291: 
1292: \noindent Recall that in Model RB, for each constraint, we uniformly select
1293: without repetition $pd^{k}$ incompatible tuples of values and each
1294: constraint is generated independently. So we have
1295: \begin{eqnarray*}
1296: \Pr (A_{1}A_{2}\cdots A_{j}) &=&\left[ \frac{\binom{d^{k}-j}{pd^{k}}}{\binom{%
1297: d^{k}}{pd^{k}}}\right] ^{i} \\
1298: &=&\left[ \frac{(d^{k}-pd^{k})(d^{k}-pd^{k}-1)\cdots (d^{k}-pd^{k}-j+1)}{%
1299: d^{k}(d^{k}-1)\cdots (d^{k}-j+1)}\right] ^{i}.
1300: \end{eqnarray*}
1301: 
1302: \noindent Note that $j\leq d=n^{\alpha }$ and $k\geq 2.$ Now consider the
1303: case of $i=3k\beta \ln n,$ where $\beta =\frac{\alpha }{6k\ln \frac{1}{1-p}}%
1304: . $ By asymptotic analysis, we have
1305: \begin{eqnarray*}
1306: &&\Pr (A_{1}A_{2}\cdots A_{j})|_{i=3k\beta \ln n} \\
1307: &=&[(1-p)(\frac{1-p-\frac{1}{n^{k\alpha }}}{1-\frac{1}{n^{k\alpha }}})(\frac{%
1308: 1-p-\frac{2}{n^{k\alpha }}}{1-\frac{2}{n^{k\alpha }}})\cdots (\frac{1-p-%
1309: \frac{j-1}{n^{k\alpha }}}{1-\frac{j-1}{n^{k\alpha }}})]^{3k\beta \ln n} \\
1310: &=&[(1-p)^{3k\beta \ln n}]^{j}[1-\frac{p}{1-p}\frac{(j-1)j}{2n^{k\alpha }}+O(%
1311: \frac{j^{4}}{n^{2k\alpha }})]^{3k\beta \ln n} \\
1312: &=&(n^{-\frac{\alpha }{2}})^{j}[1-\frac{p}{1-p}\frac{(j-1)j}{2n^{k\alpha }}%
1313: +O(\frac{j^{4}}{n^{2k\alpha }})]^{3k\beta \ln n}.
1314: \end{eqnarray*}
1315: 
1316: \noindent Let $H(j)=[1-\frac{p}{1-p}\frac{(j-1)j}{2n^{k\alpha }}+O(\frac{%
1317: j^{4}}{n^{2k\alpha }})]^{3k\beta \ln n}.$ Then we get
1318: \begin{eqnarray*}
1319: \Pr (T_{i,u}\text{ is flawed})|_{i=3k\beta \ln n} &=&1+\underset{j=1}{%
1320: \overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{j}\Pr
1321: (A_{1}A_{2}\cdots A_{j})|_{i=3k\beta \ln n} \\
1322: &=&1+\underset{j=1}{\overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{%
1323: j}(n^{-\frac{\alpha }{2}})^{j}H(j).
1324: \end{eqnarray*}
1325: 
1326: \noindent For $0\leq j\leq n^{\frac{4}{5}\alpha },$ we can easily show that $%
1327: H(j)=1+o(1).$ Therefore,
1328: \begin{eqnarray*}
1329: &&\Pr (T_{i,u}\text{ is flawed})|_{i=3k\beta \ln n} \\
1330: &\approx &1+\underset{j=1}{\overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{%
1331: n^{\alpha }}{j}(n^{-\frac{\alpha }{2}})^{j}+\underset{j=n^{\frac{4}{5}\alpha
1332: }}{\overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{j}(n^{-\frac{%
1333: \alpha }{2}})^{j}(H(j)-1) \\
1334: &=&(1-\frac{1}{n^{\frac{\alpha }{2}}})^{n^{\alpha }}+\underset{j=n^{\frac{4}{%
1335: 5}\alpha }}{\overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{j}(n^{-%
1336: \frac{\alpha }{2}})^{j}(H(j)-1) \\
1337: &\approx &e^{-n^{\frac{\alpha }{2}}}+\underset{j=n^{\frac{4}{5}\alpha }}{%
1338: \overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{j}(n^{-\frac{\alpha
1339: }{2}})^{j}(H(j)-1).
1340: \end{eqnarray*}
1341: 
1342: \noindent It is easy to verify that
1343: 
1344: \[
1345: \binom{n^{\alpha }}{j}(n^{-\frac{\alpha }{2}})^{j}\leq (\frac{en^{\alpha }}{j%
1346: })^{j}(n^{-\frac{\alpha }{2}})^{j}=e^{j-j\ln j+\frac{\alpha }{2}j\ln n}.
1347: \]
1348: 
1349: \noindent Let $B(j)=j-j\ln j+\frac{\alpha }{2}j\ln n.$ Differentiating $B(j)$
1350: with respect to $j,$ we obtain
1351: 
1352: \[
1353: B^{\prime }(j)=\frac{\alpha }{2}\ln n-\ln j<0\text{ when }j\geq n^{\frac{4}{5%
1354: }\alpha }.
1355: \]
1356: 
1357: \noindent So for $n^{\frac{4}{5}\alpha }\leq j\leq n^{\alpha },$ we have
1358: 
1359: \[
1360: \binom{n^{\alpha }}{j}(n^{-\frac{\alpha }{2}})^{j}\leq e^{B(n^{\frac{4}{5}%
1361: \alpha })}=(\frac{e}{n^{\frac{3}{10}\alpha }})^{n^{\frac{4}{5}\alpha
1362: }}=o(e^{-n^{\frac{4}{5}\alpha }}).
1363: \]
1364: 
1365: \noindent Note that $H(j)=O(n^{c_{2}})$ for $n^{\frac{4}{5}\alpha }\leq
1366: j\leq n^{\alpha },$ where $c_{2}>0$ is a constant. Hence,
1367: \begin{eqnarray*}
1368: |\underset{j=n^{\frac{4}{5}\alpha }}{\overset{n^{\alpha }}{\sum }}(-1)^{j}%
1369: \binom{n^{\alpha }}{j}(n^{-\frac{\alpha }{2}})^{j}(H(j)-1)| &\leq &\underset{%
1370: j=n^{\frac{4}{5}\alpha }}{\overset{n^{\alpha }}{\sum }}\binom{n^{\alpha }}{j}%
1371: (n^{-\frac{\alpha }{2}})^{j}|H(j)-1| \\
1372: &=&O(n^{\alpha })O(n^{c_{2}})o(e^{-n^{\frac{4}{5}\alpha }})=o(e^{-n^{\frac{%
1373: \alpha }{2}}}).
1374: \end{eqnarray*}
1375: 
1376: \noindent Thus we get
1377: 
1378: \[
1379: \Pr (T_{i,u}\text{ is flawed})|_{i\leq 3k\beta \ln n}\leq \Pr (T_{i,u}\text{
1380: is flawed})|_{i=3k\beta \ln n}\approx e^{-n^{\frac{\alpha }{2}}}.
1381: \]
1382: 
1383: \noindent The remaining part of the proof is identical to that of Lemma 2
1384: for Model RD, and so we are done.
1385: 
1386: \end{document}
1387: