0302:cs0302001/cs0302001

1:

2:

3: \documentclass[letterpaper, 11pt]{article}

4: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

5: %TCIDATA{OutputFilter=LATEX.DLL}

6: %TCIDATA{LastRevised=Thursday, October 24, 2002 16:50:28}

7: %TCIDATA{<META NAME="GraphicsSave" CONTENT="32">}

8:

9: %\usepackage{times}

10: \setlength{\headsep}{0.cm}

11: \renewcommand{\baselinestretch}{1.0}

12: \renewcommand{\arraystretch}{1}

13: \setlength{\oddsidemargin}{0.1cm}

14: \setlength{\evensidemargin}{0.1cm}

15: \setlength{\topmargin}{0.cm}

16: \setlength{\parskip}{0.05cm}

17: \textheight=22.9cm

18: \textwidth=16.5cm

19:

20: \usepackage{amsmath}

21: \usepackage{amsfonts}

22: %\usepackage{times}

23:

24:

25: \begin{document}

26:

27:

28: %\begin{center}

29: %{\Large {\bf Many Hard Examples in Exact Phase Transitions\\[0.3cm]

30: %{\normalsize with Application to Generating Hard Satisfiable Instances\footnote{%

31: %{\small This research was partially supported by the National Key Basic Research

32: %Program (973 Program) of China under Grant No. G1999032701.}}}}}\\[0.5cm]

33: %{\large Ke Xu and Wei Li}

34:

35: \begin{center}

36: {\Large {\bf Many Hard Examples in Exact Phase Transitions with\\[0.3cm]

37: Application to Generating Hard Satisfiable Instances\footnote{%

38: {\small This research was partially supported by the National Key Basic Research

39: Program (973 Program) of China under Grant No. G1999032701 and Special Funds

40: for Authors of National Excellent Doctoral Dissertations of China under Grant No.

41: 200241. Preliminary version of this paper appeared as Technical Report cs.CC/0302001

42: of CoRR in Feb. 2003. }}}}\\[0.5cm]

43: {\large Ke Xu and Wei Li}

44:

45: \bigskip

46: %{\setlength{\parskip}{0.cm}

47: National Lab of Software Development Environment

48:

49: Department of Computer Science

50:

51: Beihang University, Beijing 100083, China

52:

53: Email:\{kexu,liwei\}@nlsde.buaa.edu.cn

54:

55: \end{center}

56:

57: \begin{quotation}

58: {\noindent {\small {\bf Abstract.} This paper first analyzes the resolution

59: complexity of two random CSP models (i.e. Model RB/RD) for which we can

60: establish the existence of phase transitions and identify the threshold

61: points exactly. By encoding CSPs into CNF formulas, it is proved that

62: almost all instances of Model RB/RD have no tree-like resolution proofs of

63: less than exponential size. Thus, we not only introduce new families of CNF

64: formulas hard for resolution, which is a central task of Proof-Complexity

65: theory, but also propose models with both many hard instances and exact

66: phase transitions. Then, the implications of such models are addressed.

67: It is shown both theoretically and experimentally that an application of

68: Model RB/RD might be in the generation of hard satisfiable instances, which is not only

69: of practical importance but also related to some open problems in cryptography

70: such as generating one-way functions. Subsequently, a further theoretical support for

71: the generation method is shown by establishing exponential lower bounds

72: on the complexity of solving random satisfiable and forced satisfiable

73: instances of RB/RD near the threshold. Finally, conclusions are presented,

74: as well as a detailed comparison of Model RB/RD with the Hamiltonian cycle problem and

75: random 3-SAT, which, respectively, exhibit three different kinds of phase transition

76: behavior in NP-complete problems.}}

77: \end{quotation}

78:

79: \bigskip

80:

81: \noindent {\large {\bf 1. Introduction }}

82:

83: \smallskip

84:

85: \noindent Over the past ten years, the study of phase transition

86: phenomena has been one of the most exciting areas in computer science and

87: artificial intelligence. Numerous empirical studies suggest that for many

88: NP-complete problems, as a parameter is varied, there is a sharp transition

89: from 1 to 0 at a threshold point with respect to the probability of a random

90: instance being soluble. More interestingly, the hardest instances to solve

91: are concentrated in the sharp transition region. As well known, finding ways

92: to generate hard instances for a problem is important both for understanding

93: the complexity of the problem and for providing challenging benchmarks for

94: experimental evaluation of algorithms [12]. So the finding of phase

95: transition phenomena in computer science not only gives a new method to

96: generate hard instances but also provides useful insights into the study of

97: computational complexity from a new perspective.

98:

99: Although tremendous progress has been made in the study of phase

100: transitions, there is still some lack of research about the connections

101: between the threshold phenomena and the generation of hard instances,

102: especially from a theoretical point of view. For example, some problems can

103: be used to generate hard instances but the existence of phase transitions in

104: such problems has not been proved. One such an example is the well-studied

105: random 3-SAT. A theoretical result by Chv\'{a}tal and Szemer\'{e}di [10]

106: shows that for random 3-SAT, no short proofs exists in general, which means

107: that almost all proofs for this problem require exponential resolution

108: lengths. Experimental results further indicate that instances from the phase

109: transition region of random 3-SAT tend to be particularly hard to solve

110: [25]. Since the early 1990's, considerable efforts have been put into random

111: 3-SAT, but until now, the existence of the phase transition phenomenon in

112: this problem has not been established, although recently, Friedgut [14] made

113: tremendous progress in proving that the width of the phase transition region

114: narrows as the number of variables increases. On the other hand, for some

115: problems with proved phase transitions, it was found either theoretically or

116: experimentally that instances generated by these problems are easy to solve

117: or easy in general. Such examples include random 2-SAT, Hamiltonian cycle

118: problem and random 2+$p$-SAT ($0<p\leq 0.4$). For random 2-SAT, Chv\'{a}tal

119: and Reed [11] and Goerdt [20] proved that the phase transition phenomenon will

120: occur when the ratio of clauses to variables is 1. But we know that 2-SAT is

121: in P class which can be solved in polynomial time, implying that random

122: 2-SAT can not be used to generate hard instances. For the Hamiltonian cycle

123: problem which is NP-compete, Koml\'{o}s and Szemer\'{e}di [22] not only

124: proved the existence of the phase transition in this problem but also gave

125: the exact location of the transition point. However, both theoretical

126: results [9] and experimental results [32] suggest that generally, the

127: instances produced by this problem are not hard to solve.

128: Different from the above two problems, random

129: 2+$p$-SAT [30] was first proposed as an attempt to interpolate between the

130: polynomial time problem random 2-SAT with $p=0$ and the NP-complete problem

131: random 3-SAT with $p=1.$ It is not hard to see that random 2+$p$-SAT is in

132: fact NP-compelte for $p>0.$ The phase transition behavior in this problem

133: with $0<p\leq 0.4$ was established by Achlioptas et al. and the exact

134: location of the threshold point was also obtained [1]. But it was further

135: shown that random 2+$p$-SAT is essentially similar to random 2-SAT when $%

136: 0<p\leq 0.4$ with the typical computational cost scaling linearly with the

137: number of variables [29].

138:

139: As mentioned before, from a computational theory point of view, what

140: attracts people most in the study of phase transitions is the finding of

141: many hard instances in the phase transition region. Hence, starting from

142: this point, we can say that the problem models which can not be used to generate

143: random hard instances are not so interesting for study as random 3-SAT.

144: However, until now, for the models with many hard instances, e.g. random

145: 3-SAT, the existence of phase transitions has not been established, not even

146: the exact location of the threshold points. So, from a theoretical

147: perspective, we still do not have sufficient evidence to support the

148: long-standing observation that there exists a close relation between the

149: generation of many hard instances and the threshold phenomena, although this

150: observation opened the door for, and has greatly advanced the study of phase

151: transitions in the last decade. From the discussion above, an interesting

152: question naturally arises: {\em whether there exist models with both

153: proved phase transitions and many hard instances and, if so, what are the

154: implications of such models.}

155:

156: Recently, to overcome the trivial asymptotic insolubility of the previous

157: random CSP models, Xu and Li [33] proposed a new CSP model, i.e. Model RB,

158: which is a revision to the standard Model B. It was proved that the phase

159: transitions from solubility to insolubility do exist for Model RB as the

160: number of variables approaches infinity. Moreover, the threshold points at

161: which the phase transitions occur are also known exactly. Based on previous

162: experiments and by relating the hardness of Model RB to Model B, it has

163: already been shown that Model RB abounds with hard instances in the phase

164: transition region. In this paper, we will first propose a random CSP model,

165: called Model RD, along the same line as for Model RB. Then, by encoding CSPs

166: into CNF formulas, we will prove that almost all instances of Model RB/RD

167: have no tree-like resolution proofs of less than exponential size. This

168: means that Model RB/RD are hard for all popular CSP algorithms because

169: such algorithms are

170: essentially based on tree-like resolutions [24]. Therefore, we not only

171: introduce new families of CNF formulas hard for resolution, which is a

172: central task of Proof-Complexity theory, but also propose models

173: with both many hard instances and exact phase transitions.

174: More importantly, it will be shown that an application of RB/RD

175: might be in the generation of hard satisfiable instances, which is not only

176: of significance for experimental studies, but also of interest to the theoretical

177: computer science community.

178: Finally, exponential lower bounds will be established for random satisfiable

179: and forced satisfiable instances of RB/RD near the threshold.

180:

181: \bigskip

182:

183: \noindent {\large {\bf 2. Model RB and Model RD}}

184:

185: \smallskip

186:

187: \noindent A {\it Constraint Satisfaction Problem}, or CSP for short,

188: consists of a set of variables, a set of possible values for

189: each variable (its domain) and a set

190: of constraints defining the allowed tuples of values for the variables

191: (a well-studied special case of it is SAT).

192: The CSP is a fundamental problem in Artificial Intelligence, with a distinguished

193: history and many applications, such as in knowledge representation, scheduling

194: and pattern recognition. To compare the efficiency of different CSP algorithms,

195: some standard random CSP models have been widely used experimentally to

196: generate benchmark instances in the past decade. For the most widely used CSP

197: model (i.e. standard Model B), Achlioptas et al. [2] proved that except for a small

198: range of values of the constraint tightness, almost all instances generated

199: are unsatisfiable as the number of variables approaches infinity. This result,

200: as shown in [19], implies that most previous experimental results about random

201: CSPs are asymptotically uninteresting. However, it should be noted that

202: Achlioptas et al.'s result holds under the condition of fixed domain size and

203: so is applicable only when the number of variables is overwhelmingly larger

204: than the domain size. But in fact, it can be observed that the domain size,

205: compared to the number of variables, is not very small in most experimental

206: CSP studies. This, in turn, explains why there is a big gap between Achlioptas

207: et al.'s theoretical result and the experimental findings about the phase

208: transition behavior in random CSPs. Motivated by the observation above, and

209: to overcome the trivial asymptotic insolubility of the previous random

210: CSP models, Xu and Li [33] proposed an alternative CSP model as follows.

211:

212: {\bf Model RB: }First, we select with repetition $m=rn\ln n$ random

213: constraints. Each random constraint is formed by selecting without

214: repetition $k$ of $n$ variables, where $k\geq 2$ is an integer. Next, for

215: each constraint we uniformly select without repetition $q=p\cdot d^{k}$

216: incompatible tuples of values, i.e., each constraint contains exactly $%

217: (1-p)\cdot d^{k}$ allowed tuples of values, where $d=n^{\alpha }$ is the

218: domain size of each variable and $\alpha >0$ is a constant.{\em \ }

219:

220: Note that the way of generating random instances for Model RB is almost the

221: same as that for Model B. However, like the N-queens problem and Latin square,

222: the domain size of Model RB is not fixed but polynomial in the number of

223: variables. It is proved that Model RB not only avoids the trivial asymptotic

224: behavior but also has exact phase transitions. More precisely, the

225: following theorems hold for Model RB, where $\Pr (Sat)$ denotes the probability

226: that a random CSP instance generated by Model RB is satisfiable.

227:

228: %Xu and Li [23] further proved that the probability that a random CSP

229: %instance generated by Model RB is satisfiable, denoted by $\Pr (Sat),$

230: %exhibits phase transitions at a threshold point known exactly, i.e. the

231: %following theorems hold for Model RB.

232:

233: {\bf Theorem 1} \ (Xu and Li [33]) Let $r_{cr}=-\frac{\alpha }{\ln (1-p)}$.

234: If $\alpha >\frac{1}{k}$, $0<p<1$ are two constants and $k$, $p$ satisfy the

235: inequality $k\geq \frac{1}{1-p}$, then

236: \begin{eqnarray*}

237: \underset{n\rightarrow \infty }{\lim }\Pr (Sat) &=&1\text{ when }r<r_{cr}, \\

238: \underset{n\rightarrow \infty }{\lim }\Pr (Sat) &=&0\text{ when }r>r_{cr}.

239: \end{eqnarray*}

240:

241: {\bf Theorem 2} \ (Xu and Li [33]) Let $p_{cr}=1-e^{-\frac{\alpha }{r}}$. If

242: $\alpha >\frac{1}{k}$, $r>0$ are two constants and $k$, $\alpha $ and $r$

243: satisfy the inequality $ke^{-\frac{\alpha }{r}}\geq 1$, then

244: \begin{eqnarray*}

245: \underset{n\rightarrow \infty }{\lim }\Pr (Sat) &=&1\text{ when }p<p_{cr}, \\

246: \underset{n\rightarrow \infty }{\lim }\Pr (Sat) &=&0\text{ when }p>p_{cr}.

247: \end{eqnarray*}

248:

249: As shown in [33], many instances generated following Model B in previous

250: experiments can also be viewed as instances of Model RB, and more importantly,

251: the experimental results for these instances agree well with the theoretical

252: predictions for Model RB. Therefore, in this sense, we can say that Model B

253: can still be used experimentally to produce benchmark instances. However, to

254: guarantee an asymptotic phase transition behavior and to generate random hard

255: instances, a natural and convenient way is to vary the values of CSP parameters

256: under the framework of Model RB. Note that another standard CSP\

257: Model, i.e. Model D, is almost the same as Model B except that for every

258: constraint, each tuple of values is selected to be incompatible with

259: probability $p.$ Similarly, we can make a revision to Model D and then get a

260: new Model as follows.

261:

262: {\bf Model RD: }First, we select with repetition $m=rn\ln n$ random

263: constraints. Each random constraint is formed by selecting without

264: repetition $k$ of $n$ variables, where $k\geq 2$ is an integer. Next, for

265: each constraint, from $d^{k}$ possible tuples of values, each tuple is

266: selected to be incompatible with probability $p$, where $d=n^{\alpha }$ is

267: the domain size of each variable and $\alpha >0$ is a constant.{\em \ }

268:

269: Along the same line as in the proof for Model RB [33], we can easily prove

270: that exact phase transitions also exist for Mode RD. More precisely, Theorem

271: 1 and Theorem 2 hold for Model RD too. In fact, it is exactly because the

272: differences between Model RB and Model RD are very small that many

273: properties hold for both of them and the proof techniques are also almost

274: the same. So in this paper, we will discuss both models, denoted by Model

275: RB/RD.

276:

277: Recently, there has been a growing theoretical interest in random CSPs,

278: especially with respect to their phase transition behaviors [13, 16, 17, 27, 31, 35]

279: and resolution complexity [18, 26, 28].

280: To discuss the resolution complexity of CSPs, we first need to encode a CSP

281: instance into a CNF formula. In this paper we will adopt the encoding method

282: used in [24]. For convenience, we give the outline of this method here. For

283: each CSP variable $u,$ we introduce $d$ propositional variables, called {\it %

284: domain variables}, to represent assignments of values to $u.$ There are

285: three sets of clauses needed in the encoding, i.e. the {\it domain clauses}

286: asserting that each variable must be assigned a value from its domain, the

287: {\it conflict clauses} excluding assignments violating constraints and

288: clauses asserting that each variable is assigned at most one value from its

289: domain.

290:

291: \bigskip

292:

293: \noindent {\large {\bf 3. Resolution Lower Bounds for Model RB/RD}}

294:

295: \smallskip

296:

297: \noindent In this section, we will analyze the resolution complexity of

298: unsatisfiability proofs for Model RB/RD and get the following result.

299:

300: {\bf Theorem 3 \ }Let $P$ be a random CSP instance generated following Model

301: RB/RD. Then, almost surely, $P$ has no tree-like resolutions of length less

302: than 2$^{\Omega (n)}.$

303:

304: When we say that a property holds almost surely it means that this property

305: holds with probability tending to 1 as the number of variables approaches

306: infinity.

307:

308: The core of the proof for Theorem 3 is to show that almost surely there exists a

309: clause with large width in every refutation. The width of a clause $C$,

310: denoted by $w(C),$ is the number of variables appearing in it. The width of

311: a set of clauses is the maximal width of a clause in the set. The width of

312: deriving a clause $C$ from the formula $F,$ denoted by $w(F\vdash C)$ is

313: defined as the minimum of the widths of all derivations of $C$ from $F.$ So,

314: the width of refutations for $F$\ can be denoted by $w(F\vdash 0).$

315: Ben-Sasson and Wigderson [8] gave the following theorem on size-width

316: relations and proposed a general strategy for proving width

317: lower bounds for CNF formulas.

318:

319: {\bf Theorem 4 \ }(Ben-Sasson and Wigderson [8]) Let $F$ be a CNF formula

320: and $S_{T}(F)$ be the minimal size of a tree-like refutation. Then we have

321: \[

322: S_{T}(F)\geq 2^{(w(F\vdash 0)-w(F))}.

323: \]

324:

325: By extending Ben-Sasson and Wigderson's strategy, Mitchell [26] proved

326: exponential resolution lower bounds for some random CSPs of fixed domain

327: size. In what follows, to obtain

328: lower bounds on width for RB/RD, we will basically use the same strategy

329: as in [26], but adapt it to handle random CSPs with growing domains.

330: First, we prove the following local sparse property for RB/RD.

331:

332: {\bf Lemma 1 }Let $P$ be a random CSP instance generated by Model RB/RD.

333: There is constant $c>0$ such that almost surely every sub-problem of $P$

334: with size $s\leq cn$ has at most $b=\beta s\ln n$ constraints, where $\beta =%

335: \frac{\alpha }{6k\ln \frac{1}{1-p}}.$

336:

337: {\bf Proof: }As mentioned in [27], this is a standard type of argument in

338: random graph theory. Similarly, we consider the number of sub-problems on $s$

339: variables with $b=\beta s\ln n$ constraints for $0<s\leq cn.$ There are $%

340: \binom{n}{s}$ possible choices for the variables and $\binom{m}{b}$ for the

341: constraints. Given such choices, the probability that all the $b$

342: constraints are in the $s$ variables is not greater than $\left( \frac{s}{n}%

343: \right) ^{kb}.$ So, the number of such sub-problems is at most

344: \begin{eqnarray*}

345: \binom{n}{s}\binom{m}{b}\left( \frac{s}{n}\right) ^{kb} &\leq &\left( \frac{%

346: en}{s}\right) ^{s}\left( \frac{em}{b}\right) ^{b}\left( \frac{s}{n}\right)

347: ^{kb} \\

348: &=&\left( \frac{en}{s}\right) ^{s}\left( \frac{ern\ln n}{\beta s\ln n}%

349: \right) ^{\beta s\ln n}\left( \frac{s}{n}\right) ^{k\beta \ln n} \\

350: &=&\left[ \frac{e^{1+\beta \ln n}r^{\beta \ln n}}{\beta ^{\beta \ln n}}%

351: \left( \frac{s}{n}\right) ^{(k-1)\beta \ln n-1}\right] ^{s}.

352: \end{eqnarray*}

353:

354: \noindent For sufficiently large $n,$ there exists a constant $c_{1}>0$ such

355: that

356:

357: \[

358: \frac{e^{1+\beta \ln n}r^{\beta \ln n}}{\beta ^{\beta \ln n}}<n^{c_{1}}.

359: \]

360:

361: \noindent Thus we get

362:

363: \[

364: \binom{n}{s}\binom{m}{b}\left( \frac{s}{n}\right) ^{kb}<\left[

365: n^{c_{1}}\left( \frac{s}{n}\right) ^{(k-1)\beta \ln n-1}\right] ^{s}.

366: \]

367:

368: \noindent Let $c<\frac{1}{2}\exp\left(-\frac{2+c_{1}}{(k-1)\beta}\right)$ be

369: a positive constant.

370: For $0<s\leq cn,$ it follows from the above inequality that

371:

372: \[

373: \binom{n}{s}\binom{m}{b}\left( \frac{s}{n}\right) ^{kb}<\left( \frac{1}{n^{2}%

374: }\right) ^{s}\leq \frac{1}{n^{2}}.

375: \]

376:

377: \noindent Thus the expected number of such sub-problems with $s\leq cn$ is

378: at most

379:

380: \[

381: \overset{cn}{\underset{s=1}{\sum }}\binom{n}{s}\binom{m}{b}\left( \frac{s}{n}%

382: \right) ^{kb}<\frac{1}{n^{2}}cn=o(1).

383: \]

384:

385: \noindent This finishes the proof. \hfill $\Box $

386:

387: \smallskip

388:

389: The following two definitions will be of use later.

390:

391: {\bf Definition 1 \ }Consider a variable $u$ and $i$ constraints associated

392: with $u.$ In these $i$ constraints, all the variables except $u$ have

393: already been assigned values from their domains. We call this an $i$-{\it %

394: constraint assignment tuple}, denoted by $T_{i,u}.$

395:

396: {\bf Definition 2 \ }Given a variable $u$ and an $i$-constraint assignment

397: tuple $T_{i,u}.$ We assign a value $v$ to $u$ from its domain$.$ So, all the

398: variables in the $i$ constraints of $T_{i,u}$ have been assigned values. If

399: at least one constraint in $T_{i,u}$ is violated by these values, then we

400: say that {\it the} {\it value }$v${\it \ of }$u${\it \ is flawed} {\it by} $%

401: T_{i,u}.$ If all the values of $u$ in its domain are flawed by $T_{i,u},$

402: then we say that {\it the variable }$u${\it \ is flawed by} $T_{i,u},$ and $%

403: T_{i,u}$ is called a {\it flawed }$i${\it -constraint assignment tuple}.

404:

405: \smallskip

406:

407: {\bf Lemma 2 \ }Let $P$ be a random CSP instance generated by Model RB/RD.

408: Almost surely, there does not exist a flawed $i$-constraint assignment tuple

409: $T_{i,u}$ in $P$ with $i\leq 3k\beta \ln n.$

410:

411: {\bf Proof: }Now consider an $i$-constraint assignment tuple $T_{i,u}$ with $%

412: i\leq 3k\beta \ln n.$ It is easy to see that the probability that $T_{i,u}$

413: is flawed increases the number of constraints $i.$ Recall that in Model RD,

414: for every constraint, each tuple of values is selected to be incompatible

415: with probability $p.$ So, given a value $v$ of $u,$ the probability that $v$

416: is flawed by $T_{i,u}$ is

417: \[

418: 1-(1-p)^{i}.

419: \]

420:

421: \noindent Thus the probability that all the $d=n^{\alpha }$ values of $u$

422: are flawed by $T_{i,u},$ i.e. the probability of $T_{i,u}$ being flawed is

423: \[

424: \left[ 1-(1-p)^{i}\right] ^{d}.

425: \]

426:

427: \noindent Note that $\beta =\frac{\alpha }{6k\ln \frac{1}{1-p}}.$ Thus for $%

428: 0<i\leq 3k\beta \ln n,$ we have

429: \begin{eqnarray*}

430: \Pr (T_{i,u}\text{ is flawed})|_{i\leq 3k\beta \ln n} &\leq &\left[

431: 1-(1-p)^{3k\beta \ln n}\right] ^{n^{\alpha }} \\

432: &=&[1-\frac{1}{n^{\frac{\alpha }{2}}}]^{n^{\alpha }}\approx e^{-n^{\frac{%

433: \alpha }{2}}}.

434: \end{eqnarray*}

435:

436: \noindent The above analysis only applies to Model RD. For Model RB, such an

437: analysis is much more complicated, and so we leave it in the appendix.

438: Recall that there are $n$ variables and $m=rn\ln n$ constraints. So the

439: number of possible choices for $i$-constraint assignment tuples is at most

440: \[

441: n\binom{m}{i}d^{(k-1)i}.

442: \]

443:

444: \noindent For $i\leq 3k\beta \ln n,$ when $n$ is sufficiently large, there

445: exists a constant $c_{2}>0$ such that

446: \begin{eqnarray*}

447: n\binom{m}{i}d^{(k-1)i} &=&n\binom{rn\ln n}{i}n^{(k-1)\alpha i}\leq n\binom{%

448: rn\ln n}{3k\beta \ln n}n^{3(k-1)\alpha k\beta \ln n} \\

449: &\leq &n\left( \frac{ern\ln n}{3k\beta \ln n}\right) ^{3k\beta \ln

450: n}n^{3(k-1)\alpha k\beta \ln n}<e^{c_{2}\ln ^{2}n}.

451: \end{eqnarray*}

452:

453: \noindent Thus the expected number of flawed $i$-constraint assignment

454: tuples with $i\leq 3k\beta \ln n$ is at most

455: \begin{eqnarray*}

456: \overset{3k\beta \ln n}{\underset{i=1}{\sum }}n\binom{m}{i}d^{(k-1)i}\Pr

457: (T_{i,u}\text{ is flawed}) &<&e^{c_{2}\ln ^{2}n}\overset{3k\beta \ln n}{%

458: \underset{i=1}{\sum }}\Pr (T_{i,u}\text{ is flawed}) \\

459: &=&e^{c_{2}\ln ^{2}n}\cdot O(e^{-n^{\frac{\alpha }{2}}})\cdot 3k\beta \ln n

460: \\

461: &=&o(1).

462: \end{eqnarray*}

463:

464: \noindent This implies that almost surely, there does not exist a variable $%

465: u $ and an $i$-constraint assignment tuple $T_{i,u}$ with $i\leq 3k\beta \ln

466: n$ such that $u$ is flawed by $T_{i,u}.$ This is exactly what we need and so

467: we are done. \hfill $\Box $

468:

469: \smallskip

470:

471: {\bf Lemma 3 }Let $P$ be a random CSP instance generated by Model RB/RD.

472: Almost surely, every sub-problem of $P$ with size at most $cn$ is

473: satisfiable.

474:

475: {\bf Proof: }Here by the size of a problem we mean the number of variables

476: in this problem. We will prove this lemma by contradiction. Assume that we

477: have an unsatisfiable sub-problem of size at most $cn.$ Thus we can get a

478: minimum sized unsatisfiable sub-problem with size $s\leq cn,$ denoted by $%

479: P_{1}.$ From Lemma 1 we know that almost surely $P_{1}$ has at most $\beta

480: s\ln n$ constraints. Thus there exists a variable $u$ in $P_{1}$ with degree

481: at most $k\beta \ln n,$ i.e. the number of constraints in $P_{1}$ associated

482: with $u$ is not greater than $k\beta \ln n.$ Removing $u$ and the

483: constraints associated with $u$ from $P_{1},$ we get a sub-problem $P_{2}.$

484: By minimality of $P_{1},$ we know that $P_{2}$ is satisfiable, and so there

485: exists an assignment satisfying $P_{2}$. Suppose that the variables in $%

486: P_{2} $ have been assigned values by such an assignment. Now consider the

487: variable $u$ and the $i$ constraints associated with $u,$ where $i\leq

488: k\beta \ln n.$ By Definition 2 this constitutes an $i$-constraint assignment

489: tuple for $u,$ denoted by $T_{i,u}.$ Recall that $P_{1}$ is unsatisfiable.

490: This means that no value of $u$ can satisfy all the $i$ constraints. That is

491: to say, the variable $u$ is flawed by $T_{i,u}.$ Therefore, if a sub-problem

492: of size at most $cn$ is unsatisfiable, then, almost surely, there is a

493: variable $u$ and an $i$-constraint assignment tuple $T_{i,u}$ such that $u$

494: is flawed by $T_{i,u},$ where $i\leq k\beta \ln n.$ This is in contradiction

495: with Lemma 2 and so finishes the proof. \hfill $\Box $

496:

497: \smallskip

498:

499: Now we will prove that there almost surely exist a complex clause in the

500: refutation proofs of Model RB/RD. The complexity of a clause was defined in

501: [26] by Mitchell, i.e. for any refutation $\pi ,$ the complexity of a clause

502: $C$ in $\pi ,$ denoted by $\mu (C),$ is the size of the smallest sub-problem

503: $\Pi $ such that $C$ can be derived by resolution from $\phi (\Pi ).$ Along

504: the same line as in the proof of [26], we have the following lemma.

505:

506: {\bf Lemma 4 }Let $P$ be a random CSP instance generated by Model RB/RD.

507: Almost surely, every refutation $\pi $ of $\phi (P)$ has a clause $C$ of

508: complexity $\frac{cn}{2}\leq \mu (C)\leq cn.$

509:

510: {\bf Proof: }For this proof, please refer to [26]. \hfill  $\Box $

511:

512: \smallskip

513:

514: {\bf Lemma 5. }Let $C$ be a clause of complexity $\frac{cn}{2}\leq \mu

515: (C)\leq cn.$ Then, almost surely, $C$ has at least $\frac{c}{6}n$ literals,

516: i.e. $w(C)\geq \frac{c}{6}n$.

517:

518: {\bf Proof: }We will prove this by contradiction. For a CSP instance $P,$

519: its CNF encoding is denoted by $\phi (P).$ Let $C$ be a clause of complexity

520: $\frac{cn}{2}\leq \mu (C)\leq cn$ and $P_{1}$ be the smallest problem such

521: that $\phi (P_{1})\models C.$ Hence, the size of $P_{1}$ is at least $\frac{c%

522: }{2}n$ and at most $cn$. By Lemma 1, there are at most $\beta cn\ln n$

523: constraints in $P_{1}.$ So, there are at most $\frac{c}{3}n$ variables with

524: degree greater than $3k\beta \ln n.$ Then, there are at least $\frac{c}{2}n-%

525: \frac{c}{3}n=\frac{c}{6}n$ variables in $P_{1}$ with degree at most $3k\beta

526: \ln n.$ We will prove that for these variables, almost surely, there does

527: not exist a variable such that no domain variable of it appears in $C.$ Now

528: assume that we have a variable $u$ in $P_{1}$ with degree $i\leq 3k\beta \ln

529: n$ and no domain variable of it appears in $C.$ Removing $u$ and the

530: constraints associated with it from $P_{1},$ we get a sub-problem $P_{2}.$

531: By minimality of $P_{1},$ we know that $\phi (P_{2})\not\models C.$ So we

532: can find an assignment satisfying $P_{2}$ but not satisfying $C.$ Suppose

533: that the propositional variables in $P_{2}$ and $C$ have been assigned

534: values by such an assignment. Now consider the variable $u$ and the

535: constraints associated with it. By Definition 2, this constitutes an $i$%

536: -constraint assignment tuple for $u,$ denoted by $T_{i,u}.$ By assumption,

537: no domain variable of $u$ appears in $C.$ So, assigning any value to $u$ \

538: will not affect the truth value of $C.$ Recall that $\phi (P_{1})\models C$

539: and $C$ is false under the current assignment. Therefore, no value of $u$

540: can satisfy $\phi (P_{1}),$ i.e. setting any value to $u$ will violate at

541: least one constraint associated with it. It follows that $u$ is flawed by $%

542: T_{i,u},$ i.e. there exists a flawed $i$-constraint assignment tuple with $%

543: i\leq 3k\beta \ln n.$ This is in contradiction with Lemma 2 and so we are

544: done. \hfill $\Box $

545:

546: Combining Lemma 4 and Lemma 5, we have that, for a random CSP

547: instance $P$

548: generated by Model RB/RD, almost surely, $w(\phi (P)\vdash 0)\geq \frac{c}{6}%

549: n.$ Now, by use of Theorem 4, we finish the proof. One point worth

550: mentioning is that when $\alpha \geq 1,$ the initial width of clauses is

551: greater than or equal to the number of variables. In such a case, to make

552: Theorem 4 applicable, we only need to introduce some new variables and

553: reduce the widths of domain clauses, which has no effect on our results.

554:

555: \bigskip

556:

557: \noindent {\large {\bf 4. Generating Hard Satisfiable Instances}}

558:

559: \noindent As mentioned before, the finding of phase transitions in NP-complete

560: problems provides a good method for generating random hard instances which

561: are very useful in the evaluation of algorithms. In recent years, a

562: remarkable progress in Artificial Intelligence has been the development of

563: incomplete algorithms for various kinds of problems. To evaluate the

564: efficiency of such incomplete algorithms, we need a source to generate

565: only hard satisfiable instances [3]. However, since the probability of being

566: satisfiable is about 0.5 at the threshold point where the hardest instances

567: are concentrated, the generator based on phase transitions will usually

568: produce a mixture of satisfiable and unsatisfiable instances. So, it is

569: interesting to study how the phase transition phenomenon can be used to

570: generate hard satisfiable instances. Besides practical importance, more

571: interestingly, the problem of generating random hard satisfiable instances

572: is related to some open problems in cryptography, e.g. computing a one-way

573: function, generating pseudo-random numbers and private key cryptography

574: [12, 21, 23].

575:

576: In fact, for constraint satisfaction and Boolean satisfiability problems,

577: there is a natural strategy to generate instances that are guaranteed to

578: have at least one satisfying assignment. The strategy is as follows [3]: first

579: generate a random truth assignment $t,$ and then generate a certain number

580: of random constraints or clauses one by one to form a random instance,

581: where any clause or constraint violating $t$ will

582: be rejected. The above strategy is very simple and can be easily

583: implemented. But unfortunately, this strategy was proved to be unsuitable

584: for random 3-SAT because it in fact produces a biased sampling of

585: instances with many satisfying assignments (clustered around $t$), and experiments also

586: show that these instances are much easier to solve than random satisfiable instances [3].

587: In the following, for convenience, we will call the satisfiable

588: instances generated using the strategy as forced satisfiable instances.

589:

590: Now let us look further into the problem why the strategy fails for

591: random 3-SAT.

592: As defined in [33, 34], an {\it assignment pair} $<t_{1},t_{2}>$ is an ordered pair

593: of two assignments $t_{1}$ and $t_{2}.$ We say that $<t_{1},t_{2}>$ satisfies a CSP

594: if and only if both  $t_{1}$ and $t_{2}$ satisfy this CSP. Suppose that the

595: number of variables is $n$ and the domain size is $d.$ Then we have totally

596: $d^{n}$ possible assignments, denoted by $t_{1},t_{2},\cdots,t_{d^{n}},$ and

597: $d^{2n}$ possible assignment pairs. Let $t_{i}$ be a forced satisfying

598: assignment. Then the expected number of solutions for forced satisfiable

599: instances satisfying $t_{i},$ denoted by $E_{f}[N]$, is%

600: \[

601: E_{f}[N]=\frac{\overset{d^{n}}{\underset{j=1}{%

602: {\displaystyle\sum}

603: }}\Pr[<t_{i},t_{j}>]}{\Pr[<t_{i},t_{i}>]},

604: \]

605:

606: \noindent where $\Pr[<t_{i},t_{j}>]$ denotes the probability that $<t_{i},t_{j}>$

607: satisfies a random instance. Note that $E_{f}[N]$ should be independent of the

608: choice of the forced satisfying assignment $t_{i}.$ So we have%

609: \[

610: E_{f}[N]=\frac{\underset{1\leq i,j\leq d^{n}}{%

611: {\displaystyle\sum}

612: }\Pr[<t_{i},t_{j}>]}{d^{n}\Pr[<t_{i},t_{i}>]}=\frac{E[N^{2}]}{E[N]}.

613: \]

614: \noindent where $E[N^{2}]$ and $E[N]$ are, respectively, the

615: second moment and the first moment of the number of solutions for instances

616: generated randomly.

617: %It is straightforward to derive, from the results on ordered

618: %pairs of assignments for random $k$-SAT in [34], that the expected number of

619: %solutions for random forced satisfiable instances is

620: %equal to $E(N^{2})/E(N),$ where $E(N^{2})$ and $E(N)$ are, respectively, the

621: %second moment and the first moment of the number of solutions for instances generated randomly.

622: For random 3-SAT, it follows from the result on satisfying

623: assignment pairs in [34] that

624: asymptotically,  $E[N^{2}]$ is exponentially greater than $E^{2}[N]$.

625: This conclusion can also be found in [4].

626: Thus, the expected number of solutions for forced satisfiable instances

627: is exponentially larger than that for random satisfiable

628: instances, which gives

629: a good theoretical explanation of why, for random 3-SAT,

630: the strategy is highly biased towards generating instances with many solutions.

631:

632: We now consider the problem of generating satisfiable instances for Model

633: RB/RD using the same strategy. Recall that when we established the exact

634: phase transitions for RB/RD [33], it was proved that $E[N^{2}]/E^{2}[N]$ is

635: asymptotically equal to 1 below the threshold, where almost all

636: instances are satisfiable, i.e. $E[N^{2}]/E^{2}[N]\approx 1$ for $r<r_{cr}$

637: or $p<p_{cr}$. So, we have that for

638: RB/RD, the expected number of solutions for forced satisfiable instances

639: below the threshold is asymptotically equal to that for random satisfiable

640: instances, i.e. $E_{f}[N]=E[N^{2}]/E[N]\approx E[N]$. In other words, the strategy

641: has almost no effect on the number of solutions for RB/RD and thus

642: will not lead to a biased sampling of instances with many solutions.

643:

644: In addition to the analysis above, we can also study the influence of the

645: strategy on the distribution of solutions with respect to the

646: forced satisfying assignment.

647: %we know, from $E(N^{2})\approx E^{2}(N),$ that for randomly

648: %generated instances of RB/RD, the distribution of the number of solutions is

649: %quite uniform with concentration around $E(N).$  Note that the truth assignment

650: %$t$ is generated randomly and the strategy will in fact generate all the possible

651: %satisfiable instances with $t$ as the satisfying assignment.

652: Based on the definition of {\it similarity number} in [33], we first define a

653: distance on the assignments as $d^{f}(t_1,t_2)=1-S^f(\langle t_1,t_2\rangle)/n,$

654: where $t_1,t_2$ are two assignments, $n$ is the total number of variables and

655:  $S^f(\langle t_1,t_2\rangle)$ is  equal to the number of

656: variables at which the two assignments take the identical values. It is easy

657: to see that $0\leq d^{f}(t_1,t_2)\leq 1.$

658: Let $E_{f}[X]$ and $E[X]$ respectively denote, for forced satisfiable instances

659: and random satisfiable instances, the expected number of solutions with a

660: fixed distance $d_{t}$ from the forced satisfying assignment.

661: By an analysis similar to that in [33] (pp.96-97), we have

662: \begin{align*}

663: E_{f}[X] &  =\binom{n}{nd_{t}}\left(  n^{\alpha}-1\right)  ^{nd_{t}}\frac

664: {\Pr[<t_{1},t_{2}>]}{\Pr[<t_{1},t_{1}>]}\text{ \ \ where }d^{f}(t_{1},t_{2}%

665: )=d_{t}\\

666: &  =\binom{n}{nd_{t}}\left(  n^{\alpha}-1\right)  ^{nd_{t}}\left[

667: \frac{\binom{n-nd_{t}}{k}}{\binom{n}{k}}+(1-p)\left(  1-\frac{\binom{n-nd_{t}%

668: }{k}}{\binom{n}{k}}\right)  \right]  ^{rn\ln n}\\

669: &  = \exp\left[  n\ln n\left(  r\ln\left(

670: 1-p+p(1-d_{t})^{k}\right)  +\alpha d_{t}\right)+O(n)  \right]  .

671: \end{align*}

672: Indeed, it can be shown, from the results in [33] (pp.97-98),

673: that $E_{f}[X],$ for $r<r_{cr}$ or $p<p_{cr},$

674: will be asymptotically maximized when $d_{t}$ takes the largest possible

675: value, i.e. $d_{t}=1.$

676: For random satisfiable instances of RB/RD, we have

677: \begin{align*}

678: E[X]  & =\binom{n}{nd_{t}}\left(  n^{\alpha}-1\right)  ^{nd_{t}}\left(

679: 1-p\right)  ^{rn\ln n}\\

680: & =  \exp\left[  n\ln n\left(  r\ln(1-p)+\alpha

681: d_{t}\right)+O(n)  \right]  .

682: \end{align*}

683: It is straightforward to see that the same pattern holds

684: for this case, i.e. $E[X]$ will be asymptotically maximized when $d_{t}=1.$

685: So, intuitively speaking, for RB/RD, given an assignment $t,$ for both forced

686: satisfiable instances satisfying $t$ and random satisfiable instances,

687: most solutions distribute in a place far from $t.$

688: This further indicates that the strategy has little effect on the distribution

689: of solutions for RB/RD, and so it will not be be biased towards generating

690: instances with many solutions around the forced satisfying assignment.

691: For random 3-SAT, similarly, we have%

692: \begin{align*}

693: E_{f}[X]  & =\binom{n}{nd_{t}}\left[  \frac{\binom{n-nd_{t}}{3}}{\binom

694: {n}{3}}+\frac{6}{7}\left(  1-\frac{\binom{n-nd_{t}}{3}}{\binom{n}{3}}\right)

695: \right]  ^{rn}\\

696: & =f_{1}(n)\exp\left[  n\left(  -d_{t}\ln d_{t}-(1-d_{t})\ln(1-d_{t}%

697: )+r\ln\frac{6+(1-d_{t})^{3}}{7}\right)  \right]  ,

698: \end{align*}

699:

700: \noindent and%

701: \begin{align*}

702: E[X]  & =\binom{n}{nd_{t}}\left(  \frac{7}{8}\right)  ^{rn}\\

703: & =f_{2}(n)\exp\left[  n\left(  -d_{t}\ln d_{t}-(1-d_{t})\ln(1-d_{t}%

704: )+r\ln\frac{7}{8}\right)  \right]  ,

705: \end{align*}

706: where $f_{1}(n)$ and $f_{2}(n)$ are two polynomial functions.

707: It follows from the results in [34] that

708: as $r$ (the ratio of clauses to variables) approaches 4.25,

709: $E_{f}[X]$ and $E[X]$ will be asymptotically maximized

710: when $d_{t}\approx 0.24$ and $d_{t}=0.5$ respectively. This means,

711: in contrast to RB/RD, that compared with random

712: satisfiable instances, most solutions of forced

713: satisfiable instances distribute in a place much closer to the

714: forced satisfying assignment when $r$ is near the threshold.

715:

716: Note that the number and the distribution of solutions are the two most

717: important factors determining the cost of solving satisfiable instances.

718: So, we can expect, from the above analysis, that for RB/RD,

719: the hardness of solving forced satisfiable instances should be similar

720: to that of solving random satisfiable instances.

721: More interestingly, it therefore seems that we can, based on the hardness

722: of RB/RD, propose

723: a new method to generate hard satisfiable instances, i.e. generating

724: forced satisfiable instances of RB/RD with a large number of variables

725: near the threshold

726: identified exactly by Theorem 1 or Theorem 2.

727: Experimental results have further confirmed this idea\footnote{\small

728: We thank Dr. Christophe Lecoutre and Liu Yang very much for performing the experiments.}.

729: It is shown, in one experiment for RB with $k=2, n=30, d=15$ and $m=250,$

730: that the mean time of solving forced satisfiable instances near the

731: threshold is only slightly smaller (11 percent) than that of

732: solving random satisfiable instances with the same

733: parameters\footnote{\small As specified by the conditions of Theorem 2,

734: to make exact phase transitions hold, the values of $\alpha$ and $r$

735: should not be small. So, we should choose dense CSPs with a large domain.}.

736: More importantly, experiments for RB also indicate that the hardness of

737: solving forced satisfiable

738: instances grows exponentially with the number of variables\footnote

739: {\small According to the

740: definitions of RB/RD and Theorems 1 and 2, the parameters $\alpha,$ $r$ and $p$

741: should be fixed when $n$ increases. The values of the threshold points can also

742: be obtained from these two theorems.}

743: near the threshold,

744: and we can, in fact, generate forced satisfiable instances appearing to be

745: very hard to solve (for both complete and incomplete algorithms) even when the

746: number of variables is only moderately

747: large (e.g. $k=2, n=59, \alpha=0.8$ and $r=0.8/\ln\frac{4}{3}$ with

748: constraint tightness

749: $p=p_{cr}=0.25$ computed by Theorem 2, or equivalently expressed

750: as $k=2, n=59, d=26$ and $m=669$ with the same tightness\footnote

751: {\small If non-integer values occur in the computation

752: of $d$ and $m$ from $n,$ $\alpha$ and $r,$ then we round them to the

753: nearest integers.})\footnote

754: {\small Benchmarks of Model RB (in both SAT and CSP format) are available at

755: www.nlsde.buaa.edu.cn/\symbol{126}kexu/ benchmarks/benchmarks.htm.}.

756: %More interestingly, we have successfully generated some forced

757: %satisfiable instances which appear to be very hard to solve, i.e. these instances can not be

758: %solved by state-of-the-art CSP algorithms in a reasonable time (e.g. 1 day).

759: Although there have been some other ways to generate hard satisfiable

760: instances empirically, e.g. the quasigroup method [3], we think that

761: the simple and natural method presented in this paper,

762: based on models (i.e. Model RB/RD) with exact phase transitions and many

763: hard instances, should be well worth further investigation.

764:

765: \bigskip

766: \smallskip

767: \noindent {\large {\bf 5. Exponential Lower Bounds for Satisfiable Instances of Model RB/RD}}

768:

769: \noindent For random CSP instances of RB/RD, we know from Theorems 1 and 2 that almost

770: surely, they are satisfiable below the threshold and unsatisfiable above the

771: threshold. For satisfiable instances, there are no resolution proofs, or, if

772: any, the resolution proofs are of infinite length. Therefore, the exponential

773: resolution lower bounds, established in Theorem 4, are of interest only for

774: instances above the threshold. Also, in many other cases, exponential lower

775: bounds have been shown only for unsatisfiable instances, and it seems quite

776: difficult to derive such lower bounds for satisfiable instances. A recent progress

777: in this direction, made by Achlioptas et. al. [5], is that

778: exponential lower bounds have been established for certain natural

779: DPLL algorithms on some provably satisfiable instances of random $k$-SAT for $k\geq 4.$

780: In this section, we will analyze the complexity of solving RB/RD below the threshold

781: and obtain the following results.

782:

783: {\bf Theorem 5} \ Given a random CSP instance of RB/RD with $r_{cr}-\epsilon

784: _{r}<r\leq r_{cr}$ or $p_{cr}-\epsilon_{p}<p\leq p_{cr}$, where $\epsilon_{r}%

785: =-\frac{\alpha}{\ln(1-p)}+\frac{\alpha(1-\frac{c}{24})}{\ln\left(  1-p\left(

786: 1-\frac{c^{k}}{12^{k}}\right)  \right)  }$ and $\epsilon_{p}=\left[

787: 1-\exp\left(  -\frac{\alpha}{r}(1-\frac{c}{24})\right)  \right]  \frac{12^{k}%

788: }{12^{k}-c^{k}}-1+\exp\left(  -\frac{\alpha}{r}\right)  $ are two positive constants, we uniformly select

789: without repetition $\frac{c}{12}n$ variables, and assign each of these

790: variables a value from its domain at random. If such values does not violate any constraint,

791: then, almost surely, the residual

792: formula is unsatisfiable and has no tree-like resolution proofs of less than

793: exponential size.

794:

795: {\bf Proof:} Let $E[X]$ denote the expected number of assignments satisfying the

796: residual formula. By assumption, the partial assignment to

797: the $\frac{c}{12}n$ variables does not violate any constraint. Then%

798: \[

799: E[X]= d^{n-\frac{c}{12}n}\left[  1-p\left(  1-\frac{c^{k}}{12^{k}}\right)

800: \right]  ^{rn\ln n}.

801: \]

802:

803: \noindent For $r_{cr}-\epsilon_{r}<r\leq r_{cr},$ we have%

804: \begin{align*}

805: E[X] &  \leq n^{\alpha n(1-\frac{c}{12})}\left[  1-p\left(  1-\frac{c^{k}%

806: }{12^{k}}\right)  \right]  ^{(r_{cr}-\epsilon_{r})n\ln n}\\

807: &  \leq\exp\left[  \left(  -\epsilon_{r}\ln\left(  1-p\left(  1-\frac{c^{k}%

808: }{12^{k}}\right)  \right)  -\frac{\alpha c}{12}\right)  n\ln n\right]  \\

809: &  =\exp\left(  -\frac{\alpha c}{24}n\ln n\right)  =o(1).

810: \end{align*}

811:

812: \noindent By Markov's inequality, we know that the residual formula will be almost

813: surely unsatisfiable. For the phase transition with respect to $p,$ the proof

814: can be done similarly.

815: Now we prove that for the residual formula, any sub-problem of size at most

816: $cn$ is almost surely satisfiable. Based on the

817: proofs of Lemmas 2 and 3, we only need to show that for any sub-problem with

818: size $1\leq s\leq$ $cn$ containing unassigned variables, there almost surely

819: exists an unassigned variable with degree at most $3k\beta\ln n.$ Thus, it is

820: sufficient to prove that for any sub-problem with size $1+\frac{c}{12}n\leq

821: s\leq$ $cn$ $+\frac{c}{12}n$ containing the $\frac{c}{12}n$ assigned

822: variables, there almost surely exists an unassigned variable with degree at

823: most $3k\beta\ln n.$ For such a sub-problem, the probability that an

824: unassigned variable has a degree at least $3k\beta\ln n$ is not greater than%

825: \[

826: \binom{rn\ln n}{b}\binom{kb}{b}\left(  \frac{1}{n}\right)  ^{b}\left(

827: \frac{s}{n}\right)  ^{kb-b}\text{ \ where }b=3k\beta\ln n.

828: \]

829:

830:

831: \noindent Then, the probabilty that all the unassigned variables have degrees at least

832: $3k\beta\ln n$ is not greater than%

833: \[

834: \left[  \binom{rn\ln n}{b}\binom{kb}{b}\left(  \frac{1}{n}\right)  ^{b}\left(

835: \frac{s}{n}\right)  ^{kb-b}\right]  ^{s-\frac{c}{12}n}.

836: \]

837:

838:

839: \noindent There are $\binom{n-\frac{c}{12}n}{s-\frac{c}{12}n}$ possible choices for such

840: sub-problems$.$ So the expected number of such sub-problems with size

841: $1+\frac{c}{12}n\leq s\leq$ $cn$ $+\frac{c}{12}n$ is at most %

842:

843: \begin{align*}

844: & \underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%

845: %TCIMACRO{\dsum }%

846: %BeginExpansion

847: {\displaystyle\sum}

848: %EndExpansion

849: }}\binom{n-\frac{c}{12}n}{s-\frac{c}{12}n}\left[  \binom{rn\ln n}{b}\binom

850: {kb}{b}\left(  \frac{1}{n}\right)  ^{b}\left(  \frac{s}{n}\right)

851: ^{kb-b}\right]  ^{s-\frac{c}{12}n}\text{ where }b=3k\beta\ln n\\

852: & \leq\underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%

853: %TCIMACRO{\dsum }%

854: %BeginExpansion

855: {\displaystyle\sum}

856: %EndExpansion

857: }}\left(  \frac{e(n-\frac{c}{12}n)}{s-\frac{c}{12}n}\right)  ^{s-\frac{c}%

858: {12}n}\left[  \left(  \frac{rn\ln n}{b}\right)  ^{b}\left(  \frac{ekb}%

859: {b}\right)  ^{b}\left(  \frac{1}{n}\right)  ^{b}\left(  \frac{s}{n}\right)

860: ^{kb-b}\right]  ^{s-\frac{c}{12}n}\\

861: & \leq\underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%

862: %TCIMACRO{\dsum }%

863: %BeginExpansion

864: {\displaystyle\sum}

865: %EndExpansion

866: }}\left[  en\left(  \frac{re}{3\beta}\right)  ^{3k\beta\ln n}\left(  \frac

867: {s}{n}\right)  ^{3k(k-1)\beta\ln n}\right]  ^{s-\frac{c}{12}n}.

868: \end{align*}

869:

870:

871: \noindent In the proof of Lemma 1, we define $e\left(  \frac{re}{\beta

872: }\right)  ^{\beta\ln n}<n^{c_{1}}$ and $c<\frac{1}{2}\exp\left(

873: -\frac{2+c_{1}}{(k-1)\beta}\right)  .$ Substituting them into the above

874: inequality, we get%

875:

876: \begin{align*}

877: & \underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%

878: %TCIMACRO{\dsum }%

879: %BeginExpansion

880: {\displaystyle\sum}

881: %EndExpansion

882: }}\left[  en\left(  \frac{re}{3\beta}\right)  ^{3k\beta\ln n}\left(  \frac

883: {s}{n}\right)  ^{3k(k-1)\beta\ln n}\right]  ^{s-\frac{c}{12}n}\text{ where

884: }1+\frac{c}{12}n\leq s\leq cn+\frac{c}{12}n\text{\  }\\

885: & \leq\underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%

886: %TCIMACRO{\dsum }%

887: %BeginExpansion

888: {\displaystyle\sum}

889: %EndExpansion

890: }}\left[  en\frac{n^{3kc_{1}}}{e^{3k}}\frac{1}{3^{3k\beta\ln n}}%

891: n^{-3kc_{1}-6k}\right]  \\

892: & =\underset{s=1+\frac{c}{12}n}{\overset{cn+\frac{c}{12}n}{%

893: %TCIMACRO{\dsum }%

894: %BeginExpansion

895: {\displaystyle\sum}

896: %EndExpansion

897: }}O\left(  \frac{1}{n^{2}}\right)  =o(1),

898: \end{align*}

899: as required. Now for the residual formula, Lemmas 3 and 4 follow immediately.

900: Recall that in Lemma 5, we prove that there are at

901: least $\frac{c}{6}n$ variables in $P_{1}$ with degree at most $3k\beta\ln n.$

902: For the residual formula where $\frac{c}{12}n$ variables have been assigned

903: values, there are at least $\frac{c}{12}n$ variables in $P_{1}$ with degree at

904: most $3k\beta\ln n$. Similarly, we can prove that almost surely, there is a

905: clause with at least $\frac{c}{12}n$ literals for the residual formula. By

906: Theorem 4, we finish the proof. Note that the constant $c$ can be

907: chosen to monotonically decrease with $r$ or $p.$ Here we can, therefore,

908: take the value of $c$ as that for $r=r_{cr}$ or $p=p_{cr}$ and try to

909: make it as small as possible (in order to guarantee that $\epsilon_{r}$ and $\epsilon_{p}$

910: are two positive constants). \hfill $\Box $

911:

912: Generally speaking, different search algorithms use different strategies to

913: search for solutions. Rather than focusing on some specific algorithms, we relate

914: the hardness of solving satisfiable instances to that of solving unsatisfiable

915: sub-problems, because if it takes a long time to solve the sub-problems

916: generated in the search process, then the original problem can not be solved

917: quickly [24]. Theorem 5 indicates that for satisfiable instances of RB/RD below

918: and close to the threshold, if a resolution-based algorithm can not detect any

919: contradiction

920: in the early stage of a search branch, then the algorithm will, very likely,

921: generate a large-sized unsatisfiable sub-problem. As a result, it will, then,

922: almost surely take exponential time to explore large subtrees to prove the

923: unsatisfiability of the sub-problem.

924: Indeed, there are exponentially many large-sized unsatisfiable sub-problems.

925: More precisely, it can be computed

926: that the total number of residual formulas with $\frac{c}{12}n$ assigned

927: variables and without violating any constraint is at least%

928: \begin{align*}

929: \binom{n}{\frac{c}{12}n}d^{\frac{c}{12}n}\left(  1-(\frac{c}{12})^{k}p\right)

930: ^{r_{cr}n\ln n}  & \geq\binom{n}{\frac{c}{12}n}\exp\left[  \frac{\alpha cn\ln

931: n}{12}\left(  1-\frac{p}{12\ln(1-p)}\right)  \right]  \\

932: & =\exp\left(  \Omega(n\ln n)\right)  .

933: \end{align*}

934: So, intuitively speaking, when solving

935: satisfiable instances of RB/RD near the threshold, backtrack-style algorithms

936: will very easily fall into pitfalls with no solutions, and then, worse still,

937: take a long time to escape from these pitfalls. To our best knowledge, this is

938: the first result on the complexity of solving satisfiable instances near the proved

939: threshold, which can help us to gain a better understanding of the extreme

940: hardness of instances in the phase transition region.

941:

942: For random forced satisfiable instances near the proved threshold, similarly, we have

943: the following result.

944:

945: {\bf Theorem 6} \ Given a random forced satisfiable instance of RB/RD with

946: $r_{cr}-\epsilon_{r}<r\leq r_{cr}$ or $p_{cr}-\epsilon_{p}<p\leq p_{cr}$, where $\epsilon_{r}%

947: =-\frac{\alpha}{\ln(1-p)}+\frac{\alpha(1-\frac{c}{24})}{\ln\left(  1-p\left(

948: 1-\frac{c^{k}}{12^{k}}\right)  \right)  }$ and $\epsilon_{p}=\left[

949: 1-\exp\left(  -\frac{\alpha}{r}(1-\frac{c}{24})\right)  \right]  \frac{12^{k}%

950: }{12^{k}-c^{k}}-1+\exp\left(  -\frac{\alpha}{r}\right)  $ are two positive constants, we

951: uniformly select without repetition $\frac{c}{12}n$ variables, and assign each

952: of these variables a value from its domain at random. If such values does not violate

953: any constraint, then, almost surely, the

954: residual formula is unsatisfiable and has no tree-like resolution proofs of

955: less than exponential size.

956:

957: {\bf Proof:} Due to limited space, we only give the proof for the case of the phase

958: transition with respect to $r$ in Model RD with $\frac{1}{k}<\alpha<1. $ The

959: other cases can be handled similarly. Assume that we have two assignments

960: $t_{1}$ and $t_{2}$ and the similarity number [33] between $t_{1}$ and $t_{2}$

961: is $S^{f}(<t_{1},t_{2}>)=S.$ Let $P$ be a random instance of Model RD. Based

962: on the analysis in [33] (p.96), the probability that both $t_{1}$ and $t_{2}$ satisfy

963: $P$ is%

964: \[

965: \Pr[t_{1}\text{ and }t_{2}\text{ satisfy }P]=\left[  (1-p)\frac{\binom{S}{k}%

966: }{\binom{n}{k}}+(1-p)^{2}\left(  1-\frac{\binom{S}{k}}{\binom{n}{k}}\right)

967: \right]  ^{rn\ln n}.

968: \]

969:

970: \noindent Now we suppose that $t_{0}$ is a random forced satisfying assignment and $t$

971: is an assignment with $S^{f}(<t_{0},t>)=S.$ Let $P_{sat}$ be a random forced

972: satisfiable formula of Model RD with $t_{0}$ as the forced satisfying

973: assignment. Then the probability that $t$ satisfies $P_{sat}$ is%

974: \begin{align*}

975: \Pr[t\text{ satisfies }P_{sat}]  & =\frac{\Pr[t_{0}\text{ and }t\text{ satisfy

976: }P]}{\Pr[t_{0}\text{ satisfy }P]}\\

977: & =\left[  1-p+p\left(  \left(  \frac{S}{n}\right)  ^{k}+\frac{g\left(

978: \frac{S}{n}\right)  }{n}\right)  +O\left(  \frac{1}{n^{2}}\right)  \right]

979: ^{rn\ln n}.

980: \end{align*}

981:

982: \noindent where $g(s)=\frac{k(k-1)}{2}(s^{k}-s^{k-1}).$ Now, for the random forced

983: satisfiable formula $P_{sat},$ we uniformly select without repetition

984: $\frac{c}{12}n$ variables and then assign each of these variables a value from

985: its domain at random. By the standard Chernoff bound, it is easy to show

986: that the similarity number between the forced satisfying assignment $t_{0}$

987: and the random partial assignment to the $\frac{c}{12}n$ variables is almost

988: surely less than $\frac{c}{6}n^{1-\alpha}.$ For the residual formula, we have

989: totally $d^{n-\frac{c}{12}n}$ possible assignments. Let $t^{\prime}$ be an

990: assignment to the $n-\frac{c}{12}n$ variables of the residual formula with

991: $S^{f}(<t_{0},t^{\prime}>)=S^{\prime}.$ By assumption, the partial assignment to

992: the $\frac{c}{12}n$ variables does not violate any constraint.

993: Thus, almost surely, the probability

994: that $t^{\prime}$ satisfies the residual formula is at most%

995: \[

996: \left[  1-p\left(  1-\frac{c^{k}}{12^{k}}\right)  \left(  1-\left(  \frac

997: {c}{6n^{\alpha}}+\frac{S^{\prime}}{n}\right)  ^{k}O(1)-\frac{g\left(  \frac

998: {c}{6n^{\alpha}}+\frac{S^{\prime}}{n}\right)  }{n}O(1)\right)  \right]  ^{rn\ln

999: n}.

1000: \]

1001:

1002: \noindent Let $E[X]$ be the expected number of assignments satisfying the residual

1003: formula. Similar to the asymptotic analysis in [33] (p.99), for

1004: $r_{cr}-\epsilon_{r}<r\leq r_{cr},$ we have%

1005: \begin{align*}

1006: E[X] &  \leq\overset{n-\frac{c}{12}n}{\underset{S^{\prime}=0}%

1007: {{\displaystyle\sum}}}\binom{n-\frac{c}{12}n}{S^{\prime}}\left(  n^{\alpha

1008: }-1\right)  ^{n-\frac{c}{12}n-S^{\prime}}\\

1009: &  \cdot\left[  1-p\left(  1-\frac{c^{k}}{12^{k}}\right)  \left(  1-\left(

1010: \frac{c}{6n^{\alpha}}+\frac{S^{\prime}}{n}\right)  ^{k}O(1)-\frac{g\left(

1011: \frac{c}{6n^{\alpha}}+\frac{S^{\prime}}{n}\right)  }{n}O(1)\right)  \right]

1012: ^{rn\ln n}\\

1013: &  \approx n^{\alpha n(1-\frac{c}{12})}\left[  1-p\left(  1-\frac{c^{k}%

1014: }{12^{k}}\right)  \right]  ^{rn\ln n}\underset{S^{\prime}=0}%

1015: {{\displaystyle\sum}}\binom{n-\frac{c}{12}n}{S^{\prime}}\left(  \frac

1016: {1}{n^{\alpha}}\right)  ^{S^{\prime}}\left(  1-\frac{1}{n^{\alpha}}\right)

1017: ^{n-S^{\prime}}\text{ \ for }\frac{1}{k}<\alpha<1\\

1018: &  \approx n^{\alpha n(1-\frac{c}{12})}\left[  1-p\left(  1-\frac{c^{k}%

1019: }{12^{k}}\right)  \right]  ^{rn\ln n}.

1020: \end{align*}

1021:

1022: \noindent Note that the forced satisfying assignment has no effect on the

1023: structure of constraint graphs.

1024: The rest of the proof is identical to that in Theorem 5 and so we are done. \hfill $\Box $

1025:

1026: The above theorem, as far as we know, is the first complexity result of

1027: resolution-based algorithms on forced satisfiable instances, which further

1028: provides, from another aspect, a

1029: strong theoretical support for the method of generating hard satisfiable

1030: instances proposed in the last section.

1031:

1032:

1033: \bigskip

1034:

1035: \noindent {\large {\bf 6. Conclusions}}

1036:

1037: \smallskip

1038:

1039: \noindent In this paper, by encoding CSPs into CNF formulas, we proved

1040: exponential lower bounds for tree-like resolution proofs of two random CSP

1041: models with exact phase transitions, i.e. Model RB/RD. This result suggests

1042: that we not only introduce new families of CNF formulas hard for resolution,

1043: which is a central task of Proof-Complexity theory, but also propose models

1044: with both many hard instances and exact phase transitions. More interestingly,

1045: it is shown both theoretically and experimentally that an application of RB/RD

1046: might be in the generation of hard satisfiable instances, which is further

1047: supported by the exponential lower bounds established in Section 6.

1048:

1049: As mentioned before, there are some other NP-complete problems with proved

1050: exact phase transitions, e.g. Hamiltonian cycle problem and random 2+$p$-SAT

1051: ($0<p\leq 0.4$). However, it has been shown either experimentally or

1052: theoretically that the instances produced by these problems are generally

1053: easy to solve. So one would naturally ask what the main difference between

1054: these ``easy" NP-complete problems and RB/RD is. It seems that for these ``easy"

1055: NP-complete problems with exact phase transitions, they usually have some

1056: kind of local property which can be used to design polynomial time algorithms

1057: working with high probability, and the exact phase transitions are, in fact,

1058: obtained by probabilistic analysis of such algorithms.

1059: So, it appears that if a problem has exact phase transitions obtained

1060: by algorithm analysis, then it also means that the problem is

1061: not hard to solve. For RB/RD, the situation is, however, completely different.

1062: More specifically, the exact phase transitions of RB/RD are

1063: obtained, not by analysis of algorithms, but by use of the

1064: first and the second moment methods which say nothing about the local

1065: property of the problem and are, therefore, unlikely to be useful for designing

1066: more efficient algorithms.

1067: Thus, it seems that RB/RD, unlike the ``easy" NP-complete problems,

1068: can indeed provide a reliable source

1069: to generate random benchmark instances, as many and as hard as we need.

1070:

1071:

1072: Note that more recently, Frieze and Wormald [15] studied random $k$-SAT for moderately

1073: growing $k,$ i.e. $k=k(n)$ satisfies $k-\log _{2}n\rightarrow \infty$

1074: where $n$ is the number of variables.

1075: For this model, they established similarly, by use of the first and the second

1076: moment methods, that there exists a satisfiability threshold at

1077: which the number of clauses is $m=2^{k}n\ln 2$.

1078: %They proved that for this model, a random instance is satisfiable (unsatisfiable)

1079: %with probability tending to 1 as the number of variables $n\rightarrow\infty$

1080: %if the number of clauses $m\leq (1-\epsilon)m_0$

1081: %($m\geq (1-\epsilon)m_0$) where $m_0=2^{k}n\ln2$ and $\epsilon=\epsilon(n)>0$

1082: %satisfying $\epsilon n\rightarrow\infty.$

1083: From Beame et al's earlier work on the complexity of unsatisfiability

1084: proofs for random $k$-SAT formulas [6, 7], we know that the

1085: size of resolution refutations for this

1086: model is exponential with high probability. So, the variant of

1087: random $k$-SAT studied by Frieze and Wormald is also a model with both proved

1088: phase transitions and many hard instances.

1089: %But unlike the phase transitions

1090: %of random $k$-SAT with fixed $k$ such as random 3-SAT, the critical value of

1091: %the ratio of clauses to variables for this variant model is not a

1092: %constant but grows with the number of variables.

1093:

1094: To gain a better understanding of Model RB/RD, we now

1095: make a comparison of them with the well-studied

1096: %Now, we can also make a comparison between Model RB/RD and the well-studied

1097: random 3-SAT of similar proof complexity.

1098: First, we think that the exact phase transitions should be one advantage

1099: of RB/RD, which

1100: can help us to locate the hardest instances more

1101: precisely and conveniently when implementing

1102: large-scale computational experiments. As for the theoretical aspect, it seems

1103: that RB/RD, intrinsically, are much mathematically

1104: easier to analyze

1105: than random 3-SAT, such as in the derivation of thresholds.

1106: From a personal perspective, we think that

1107: such mathematical tractability should be another advantage of RB/RD, making

1108: it possible to obtain some interesting results which do not hold or can not

1109: be easily obtained for random 3-SAT, just as shown on forced satisfiable

1110: instances.

1111:

1112: In summary, the Hamiltonian cycle problem, random 3-SAT and Model RB/RD,

1113: respectively, exhibit three different kinds of phase transition behavior in

1114: NP-complete problems. Compared with the former two that have been

1115: extensively explored in the past decade, the third one (i.e. the phase

1116: transition behavior with both exact thresholds and many hard instances),

1117: due to various reasons, has not received much attention so far.

1118: From this point, the main contribution of this paper, we can say, is not

1119: in the mathematical techniques used, nor the concrete models studied

1120: (although such models are useful for CSP research in their own right), but

1121: pointing out an interesting behavior for study.

1122: Finally, we hope that more investigations, either experimental or

1123: theoretical, will be carried out on this behavior, and we also believe that

1124: such studies will lead to deep insights and new discoveries in this active

1125: area of research (i.e. on phase transitions and computational complexity).

1126:

1127: \bigskip

1128:

1129: \noindent {\large {\bf References}}

1130: {\small

1131: %\smallskip

1132:

1133: \begin{enumerate}

1134: \item D. Achlioptas, L. Kirousis, E. Kranakis and D. Krizanc, Rigorous

1135: results for random (2+$p$)-SAT, In: {\it Proceedings of RALCOM-97}, pp.1-10.

1136:

1137: \item D. Achlioptas, LM Kirousis, E. Kranakis, D. Krizanc, M. SO Molloy, and YC. Stamatiou,

1138: Random Constraint

1139: Satisfaction: A More Accurate Picture, In: {\it Proc. Third International Conference on Principles and

1140: Practice of Constraint Programming} (CP 97), LNCS 1330, pp.107-120, 1997.

1141:

1142: \item D. Achlioptas, C. Gomes, H. Kautz, and B. Selman, Generating Satisfiable

1143: Problem Instances, In: {\it Proceedings of AAAI-00}, pp.256-301.

1144:

1145: \item D. Achlioptas and C. Moore. The Asymptotic Order of the Random $k$-SAT Threshold.

1146: In {\it Proc. FOCS 2002}, pp.779-788.

1147:

1148: \item D. Achlioptas, P. Beame and M. Molloy. Exponential Bounds for DPLL below the

1149: Satisfiability Threshold. In: {\it Proc. SODA 2004}, to appear.

1150:

1151: \item P. Beame, R. Karp, T. Pitassi, and M. Saks. On the complexity of

1152: unsatisfiability proofs for random $k$-CNF formulas. In: {\it Proceeding of STOC-98}, pp.561-571.

1153:

1154: \item P. Beame, R. Karp, T. Pitassi, and M. Saks. The efficiency of resolution

1155: and Davis-Putnam procedures. {\it SIAM Journal on Computing}, 31(4):1048-1075, 2002.

1156:

1157: \item E. Ben-Sasson and A. Wigderson. Short proofs are narrow - resolution

1158: made simple. {\it Journal of the ACM}, 48(2):149-169, 2001.

1159:

1160: \item B. Bollob\'{a}s, T.I. Fenner and A.M. Frieze. An algorithm for finding

1161: Hamilton paths and cycles in random graphs. {\it Combinatorica}

1162: 7(4):327-341, 1987.

1163:

1164: \item V. Chv\'{a}tal and E. Szemer\'{e}di. Many hard examples for

1165: resolution. {\it Journal of the ACM}, 35(4) (1988) 759-208.

1166:

1167: \item V. Chv\'{a}tal and B. Reed. Miks gets some (the odds are on his side).

1168: In: {\it Proceedings of the 33rd IEEE Symp. on Foundations of Computer

1169: Science}, pages 620-627, 1992.

1170:

1171: \item S. Cook and D. Mitchell. Finding Hard Instances of the Satisfiability

1172: Problem: A Survey, In: {\it Satisfiability Problem: Theory and Applications}%

1173: . Du, Gu and Pardalos (Eds). DIMACS Series in Discrete Mathematics and

1174: Theoretical Computer Science, Volume 35, 1997.

1175:

1176: \item O. Dubois and J. Mandler. The 3-XORSAT threshold. In: {\it Proc. FOCS 2002}.

1177:

1178: \item E. Friedgut, Sharp thresholds of graph properties, and the k-sat

1179: problem. With an appendix by Jean Bourgain. {\it Journal of the American

1180: Mathematical Society} 12 (1999) 1017-1054.

1181:

1182: \item  A.M. Frieze and N.C. Wormald. Random $k$-SAT: A tight threshold for moderately

1183: growing $k,$ In: {\it Proceedings of the Fifth International Symposium on Theory

1184: and Applications of Satisfiability Testing}, pp.1-6, 2002.

1185:

1186: \item A. Flaxman. A sharp threshold for a random constraint satisfaction problem, preprint.

1187:

1188: \item A. Frieze and M. Molloy. The satisfiability threshold for randomly generated

1189: binary constraint satisfaction problems. In: {\it Proceedings of RANDOM-03}, 2003.

1190:

1191: \item Y. Gao and J. Culberson. Resolution Complexity of Random Constraint Satisfaction

1192: Problems: Another Half of the Story. In: {\it Proc. of LICS-03, Workshop on Typical Case

1193: Complexity and Phase Transitions}, Ottawa, Canada, June, 2003.

1194:

1195: \item I.P. Gent, E. MacIntyre, P. Prosser, B.M. Smith and T. Walsh, Random Constraint

1196: Satisfaction: flaws and structures. {\it Journal of Constraints} 6(4), 345-372, 2001.

1197:

1198: \item A. Goerdt. A threshold for unsatisfiability. In: {\it 17th

1199: International Symposium of Mathematical Foundations of Computer Science},

1200: Springer LNCS 629 (1992), pp.264-275.

1201:

1202: \item R. Impagliazzo, L. Levin, and M. Luby. Pseudo-random number generation from

1203: one-way functions. In: {\it Proceedings of STOC-89}, pp.12-24.

1204:

1205: \item M. Koml\'{o}s and E. Szemer\'{e}di. Limit distribution for the

1206: existence of a Hamilton cycle in a random graph. {\it Discrete Mathematics},

1207: 43, pp.55-63, 1983.

1208:

1209: \item M. Luby. Pseudorandomness and Cryptographic Applications. Princeton

1210: University Press, 1996.

1211:

1212: \item D. Mitchell: Hard Problems for CSP Algorithms. In: {\it Proceedings of 15th

1213: National Conf. on Artificial Intelligence} (AAAI-98), pp.398-405, 1998.

1214:

1215: \item D. Mitchell, B. Selman, and H. Levesque. Hard and easy distributions

1216: of sat problems. In: {\it Proceedings of 10th National Conf. on Artificial

1217: Intelligence} (AAAI-92), pp.459-465, 1992.

1218:

1219: \item D. Mitchell. Resolution Complexity of Random Constraints, In: {\it %

1220: Proceedings of CP 2002}, LNCS 2470, pp.295-309.

1221:

1222: \item M. Molloy. Models for Random Constraint Satisfaction Problems,

1223: submitted. Conference version in {\it Proceedings of STOC 2002}.

1224:

1225: \item M. Molloy and M. Salavatipour. The resolution complexity of random

1226: constraint satisfaction problems. In: {\it Proc. FOCS-03}, 2003.

1227:

1228: \item R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman and L. Troyansky.

1229: Determining computational complexity from characteristic phase transitions.

1230: {\it Nature}, 400(8):133-137, 1999.

1231:

1232: \item R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman and L. Troyansky,

1233: Phase transition and search Cost in the 2+$p$-SAT problem, In: {\it 4th

1234: Workshop on Physics and Computation}, Boston University 22-24 November 1996,

1235: (PhysComp96).

1236:

1237: \item B.M. Smith. Constructing an Asymptotic Phase Transition in Random Binary

1238: Constraint Satisfaction Problems. {\it Theoretical Computer Science}, vol. 265,

1239: pp. 265-283 (Special Issue on NP-Hardness and Phase Transitions), 2001.

1240:

1241: \item B. Vandegriend and J. Culberson. The $G_{n,m}$ phase transition is not

1242: hard for the Hamiltonian Cycle problem. {\it Journal of Artificial

1243: Intelligence Research}, 9:219-245, 1998.

1244:

1245: \item K. Xu and W. Li. Exact Phase Transitions in Random Constraint

1246: Satisfaction Problems. {\it Journal of Artificial Intelligence Research},

1247: 12:93-103, 2000.

1248:

1249: \item K. Xu. A Study on the Phase Transitions of SAT and CSP (in Chinese).

1250: Ph.D. Thesis, Beihang University, 2000.

1251:

1252: \item K. Xu and W. Li. On the Average Similarity Degree between Solutions of Random

1253: $k$-SAT and Random CSPs. {\it Discrete Applied Mathematics}, to appear.

1254:

1255: \medskip

1256: \end{enumerate}

1257: }

1258:

1259: \noindent {\large {\bf Appendix}}

1260:

1261: \smallskip

1262:

1263: Now we consider the proof of Lemma 2 for Model RB. Given a variable $u$ an $i

1264: $-constraint assignment tuple $T_{i,u}.$ It is easy to see that the

1265: probability that $u$ is flawed by $T_{i,u}$ increases with the number of

1266: constraints $i.$ Thus we have

1267:

1268: \[

1269: \Pr (T_{i,u}\text{ is flawed})|_{i\leq 3k\beta \ln n}\leq \Pr (T_{i,u}\text{

1270: is flawed})|_{i=3k\beta \ln n}.

1271: \]

1272:

1273: \noindent For the variable $u,$ there are $d=n^{\alpha }$ values in its

1274: domain, denoted by $v_{1},v_{2},\cdots ,v_{d}.$ Let $\Pr (A_{j})$ denote the

1275: probability that $v_{j}$ is not flawed by $T_{i,u}.$ Thus the probability

1276: that at least one value is not flawed by $T_{i,u},$ i.e. the probability

1277: that the variable $u$ is not flawed by $T_{i,u}$ is

1278: \begin{eqnarray*}

1279: \Pr (A_{1}\cup A_{2}\cup \cdots \cup A_{d}) &=&\underset{1\leq p\leq d}{\sum

1280: }\Pr (A_{p})-\underset{1\leq p,q\leq d,p\neq q}{\sum }\Pr (A_{p}A_{q}) \\

1281: &&+\cdots +(-1)^{d-1}\Pr (A_{1}A_{2}\cdots A_{d}).

1282: \end{eqnarray*}

1283:

1284: \noindent Then

1285: \begin{eqnarray*}

1286: \Pr (T_{i,u}\text{ is flawed}) &=&1-\Pr (A_{1}\cup A_{2}\cup \cdots \cup

1287: A_{d}) \\

1288: &=&1+\underset{j=1}{\overset{d}{\sum }}(-1)^{j}\binom{d}{j}\Pr

1289: (A_{1}A_{2}\cdots A_{j}).

1290: \end{eqnarray*}

1291:

1292: \noindent Recall that in Model RB, for each constraint, we uniformly select

1293: without repetition $pd^{k}$ incompatible tuples of values and each

1294: constraint is generated independently. So we have

1295: \begin{eqnarray*}

1296: \Pr (A_{1}A_{2}\cdots A_{j}) &=&\left[ \frac{\binom{d^{k}-j}{pd^{k}}}{\binom{%

1297: d^{k}}{pd^{k}}}\right] ^{i} \\

1298: &=&\left[ \frac{(d^{k}-pd^{k})(d^{k}-pd^{k}-1)\cdots (d^{k}-pd^{k}-j+1)}{%

1299: d^{k}(d^{k}-1)\cdots (d^{k}-j+1)}\right] ^{i}.

1300: \end{eqnarray*}

1301:

1302: \noindent Note that $j\leq d=n^{\alpha }$ and $k\geq 2.$ Now consider the

1303: case of $i=3k\beta \ln n,$ where $\beta =\frac{\alpha }{6k\ln \frac{1}{1-p}}%

1304: . $ By asymptotic analysis, we have

1305: \begin{eqnarray*}

1306: &&\Pr (A_{1}A_{2}\cdots A_{j})|_{i=3k\beta \ln n} \\

1307: &=&[(1-p)(\frac{1-p-\frac{1}{n^{k\alpha }}}{1-\frac{1}{n^{k\alpha }}})(\frac{%

1308: 1-p-\frac{2}{n^{k\alpha }}}{1-\frac{2}{n^{k\alpha }}})\cdots (\frac{1-p-%

1309: \frac{j-1}{n^{k\alpha }}}{1-\frac{j-1}{n^{k\alpha }}})]^{3k\beta \ln n} \\

1310: &=&[(1-p)^{3k\beta \ln n}]^{j}[1-\frac{p}{1-p}\frac{(j-1)j}{2n^{k\alpha }}+O(%

1311: \frac{j^{4}}{n^{2k\alpha }})]^{3k\beta \ln n} \\

1312: &=&(n^{-\frac{\alpha }{2}})^{j}[1-\frac{p}{1-p}\frac{(j-1)j}{2n^{k\alpha }}%

1313: +O(\frac{j^{4}}{n^{2k\alpha }})]^{3k\beta \ln n}.

1314: \end{eqnarray*}

1315:

1316: \noindent Let $H(j)=[1-\frac{p}{1-p}\frac{(j-1)j}{2n^{k\alpha }}+O(\frac{%

1317: j^{4}}{n^{2k\alpha }})]^{3k\beta \ln n}.$ Then we get

1318: \begin{eqnarray*}

1319: \Pr (T_{i,u}\text{ is flawed})|_{i=3k\beta \ln n} &=&1+\underset{j=1}{%

1320: \overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{j}\Pr

1321: (A_{1}A_{2}\cdots A_{j})|_{i=3k\beta \ln n} \\

1322: &=&1+\underset{j=1}{\overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{%

1323: j}(n^{-\frac{\alpha }{2}})^{j}H(j).

1324: \end{eqnarray*}

1325:

1326: \noindent For $0\leq j\leq n^{\frac{4}{5}\alpha },$ we can easily show that $%

1327: H(j)=1+o(1).$ Therefore,

1328: \begin{eqnarray*}

1329: &&\Pr (T_{i,u}\text{ is flawed})|_{i=3k\beta \ln n} \\

1330: &\approx &1+\underset{j=1}{\overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{%

1331: n^{\alpha }}{j}(n^{-\frac{\alpha }{2}})^{j}+\underset{j=n^{\frac{4}{5}\alpha

1332: }}{\overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{j}(n^{-\frac{%

1333: \alpha }{2}})^{j}(H(j)-1) \\

1334: &=&(1-\frac{1}{n^{\frac{\alpha }{2}}})^{n^{\alpha }}+\underset{j=n^{\frac{4}{%

1335: 5}\alpha }}{\overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{j}(n^{-%

1336: \frac{\alpha }{2}})^{j}(H(j)-1) \\

1337: &\approx &e^{-n^{\frac{\alpha }{2}}}+\underset{j=n^{\frac{4}{5}\alpha }}{%

1338: \overset{n^{\alpha }}{\sum }}(-1)^{j}\binom{n^{\alpha }}{j}(n^{-\frac{\alpha

1339: }{2}})^{j}(H(j)-1).

1340: \end{eqnarray*}

1341:

1342: \noindent It is easy to verify that

1343:

1344: \[

1345: \binom{n^{\alpha }}{j}(n^{-\frac{\alpha }{2}})^{j}\leq (\frac{en^{\alpha }}{j%

1346: })^{j}(n^{-\frac{\alpha }{2}})^{j}=e^{j-j\ln j+\frac{\alpha }{2}j\ln n}.

1347: \]

1348:

1349: \noindent Let $B(j)=j-j\ln j+\frac{\alpha }{2}j\ln n.$ Differentiating $B(j)$

1350: with respect to $j,$ we obtain

1351:

1352: \[

1353: B^{\prime }(j)=\frac{\alpha }{2}\ln n-\ln j<0\text{ when }j\geq n^{\frac{4}{5%

1354: }\alpha }.

1355: \]

1356:

1357: \noindent So for $n^{\frac{4}{5}\alpha }\leq j\leq n^{\alpha },$ we have

1358:

1359: \[

1360: \binom{n^{\alpha }}{j}(n^{-\frac{\alpha }{2}})^{j}\leq e^{B(n^{\frac{4}{5}%

1361: \alpha })}=(\frac{e}{n^{\frac{3}{10}\alpha }})^{n^{\frac{4}{5}\alpha

1362: }}=o(e^{-n^{\frac{4}{5}\alpha }}).

1363: \]

1364:

1365: \noindent Note that $H(j)=O(n^{c_{2}})$ for $n^{\frac{4}{5}\alpha }\leq

1366: j\leq n^{\alpha },$ where $c_{2}>0$ is a constant. Hence,

1367: \begin{eqnarray*}

1368: |\underset{j=n^{\frac{4}{5}\alpha }}{\overset{n^{\alpha }}{\sum }}(-1)^{j}%

1369: \binom{n^{\alpha }}{j}(n^{-\frac{\alpha }{2}})^{j}(H(j)-1)| &\leq &\underset{%

1370: j=n^{\frac{4}{5}\alpha }}{\overset{n^{\alpha }}{\sum }}\binom{n^{\alpha }}{j}%

1371: (n^{-\frac{\alpha }{2}})^{j}|H(j)-1| \\

1372: &=&O(n^{\alpha })O(n^{c_{2}})o(e^{-n^{\frac{4}{5}\alpha }})=o(e^{-n^{\frac{%

1373: \alpha }{2}}}).

1374: \end{eqnarray*}

1375:

1376: \noindent Thus we get

1377:

1378: \[

1379: \Pr (T_{i,u}\text{ is flawed})|_{i\leq 3k\beta \ln n}\leq \Pr (T_{i,u}\text{

1380: is flawed})|_{i=3k\beta \ln n}\approx e^{-n^{\frac{\alpha }{2}}}.

1381: \]

1382:

1383: \noindent The remaining part of the proof is identical to that of Lemma 2

1384: for Model RD, and so we are done.

1385:

1386: \end{document}

1387: