0108:cs0108016/paper.tex

1: \documentclass[10pt]{article}

2: \usepackage{xspace}

3: \usepackage{amssymb}

4: \usepackage{subfigure}

5: \usepackage{graphicx}

6:

7: \title{Verifying Sequential Consistency on Shared-Memory Multiprocessors by Model Checking}

8: \author{

9: Shaz Qadeer \\

10: Compaq Systems Research Center \\

11: 130 Lytton Ave \\

12: Palo Alto, CA 94301 \\

13: {\tt shaz.qadeer@compaq.com}

14: }

15: \date{}

16:

17: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

18:

19: \begin{document}

20: \maketitle

21:

22: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

23: \newcommand{\fact}[1]{\ensuremath{{#1}!}}

24: %\newcommand{\msystemt}{\ensuremath{\overline{\msystem}}}

25: %\newcommand{\msystemut}{\ensuremath{\overline{\msystemu}}}

26: \newcommand{\lang}[1]{\ensuremath{\overline{#1}}}

27: \newcommand{\cons}{\mathit{Constrain}}

28: \newcommand{\checker}{\mathit{Check}}

29: \newcommand{\upto}{\mathit{upto}}

30: \newcommand{\trace}[1]{\ensuremath{\overline{#1}}}

31: \newcommand{\store}{{\it st}}

32: \newcommand{\zug}[1]{\mbox {\ensuremath{\langle #1 \rangle }}}

33: \newcommand{\set}[1]{\ensuremath{\{#1\}}}

34: \newcommand{\ordera}{\ensuremath{<}}

35: \newcommand{\orderb}{\ensuremath{\prec}}

36: \newcommand{\minimum}[1]{\mbox {\ensuremath{min({#1})}} }

37: \newcommand{\maximum}[1]{\mbox {\ensuremath{max({#1})}} }

38: \newcommand{\msystem}{S}

39: %\newcommand{\msystemord}{\ensuremath{S^o}}

40: %\newcommand{\msystemu}{\ensuremath{S^u}}

41: %\newcommand{\mmodel}{M}

42: \newcommand{\op}{{\it op}}

43: \newcommand{\proc}{{\it proc}}

44: \newcommand{\loc}{{\it loc}}

45: \newcommand{\data}{{\it data}}

46: \newcommand{\proj}{\ensuremath{\rho}}

47: %\newcommand{\natzero}{{\mathbb N}}

48: %\newcommand{\nat}{\natzero^+}

49: \newcommand{\nat}{{\mathbb N}}

50: \newcommand{\natzero}{{\mathbb W}}

51: \newcommand{\dataset}{\nat}

52: \newcommand{\memevent}{{\it E}}

53: \newcommand{\intevent}{{\it F}}

54: \newcommand{\rdevent}{\ensuremath{\memevent^r}}

55: \newcommand{\wrtevent}{\ensuremath{\memevent^w}}

56: \newcommand{\event}{\ensuremath{E^a}}

57: \newcommand{\rd}{{\it R}}

58: \newcommand{\wrt}{{\it W}}

59: \newcommand{\mb}{{\it MB}}

60: \newcommand{\witness}{\ensuremath{\Omega}}

61: \newcommand{\procorder}[3]{\ensuremath{{#1}({#2},{#3})}}

62: \newcommand{\locorder}[3]{\ensuremath{{#1}({#2},{#3})}}

63: \newcommand{\expand}[1]{\ensuremath{{#1}^e}}

64: \newcommand{\domain}{{\it dom}}

65: \newcommand{\dreduce}{\ensuremath{\Gamma}}

66: \newcommand{\order}{{\it order}}

67: \newcommand{\compose}{\ensuremath{\circ}}

68: \newcommand{\err}{{\it err}}

69: %\newcommand{\msc}{\ensuremath{\mmodel^{sc}}}

70: \newcommand{\msc}{\ensuremath{M}}

71: \newcommand{\malpha}{\ensuremath{\mmodel^{\alpha}}}

72: \newcommand{\StatesSC}{\ensuremath{\Sigma^{sc}}}

73: \newcommand{\state}[2]{\ensuremath{[\![#1]\!](#2)}}

74:

75: \newcommand{\LET}{{\it let}\xspace}

76: \newcommand{\IN}{{\it in}\xspace}

77: \newcommand{\IF}{{\it if}\xspace}

78: \newcommand{\OF}{{\it of}\xspace}

79: \newcommand{\ELSE}{{\it else}\xspace}

80: \newcommand{\THEN}{{\it then}\xspace}

81: \newcommand{\FORALL}{{\it forall}\xspace}

82: \newcommand{\typedef}{{\it typedef}\xspace}

83: \newcommand{\Msg}{{\it Msg}\xspace}

84: \newcommand{\Entry}{{\it CacheEntry}\xspace}

85: \newcommand{\Queue}{{\it Queue}\xspace}

86: \newcommand{\record}{{\it record}\xspace}

87: \newcommand{\ARRAY}{{\it array}\xspace}

88: \newcommand{\Proc}{{\it Proc}\xspace}

89: \newcommand{\Addr}{{\it Addr}\xspace}

90: \newcommand{\Data}{{\it Data}\xspace}

91: \newcommand{\Index}{{\it Index}\xspace}

92: \newcommand{\INVAL}{{\it INVAL}\xspace}

93: \newcommand{\SHDACK}{{\it ACKS}\xspace}

94: \newcommand{\EXCACK}{{\it ACKX}\xspace}

95: \newcommand{\INV}{{\it INV}\xspace}

96: \newcommand{\SHD}{{\it SHD}\xspace}

97: \newcommand{\EXC}{{\it EXC}\xspace}

98: \newcommand{\head}{{\it head}\xspace}

99: \newcommand{\tail}{{\it tail}\xspace}

100: \newcommand{\cache}{{\it cache}\xspace}

101: \newcommand{\inQ}{{\it inQ}\xspace}

102: \newcommand{\owner}{{\it owner}\xspace}

103: \newcommand{\RD}{{\it RD}\xspace}

104: \newcommand{\WR}{{\it WR}\xspace}

105: \newcommand{\EXCRSP}{{\it ACKX}\xspace}

106: \newcommand{\SHDRSP}{{\it ACKS}\xspace}

107: \newcommand{\UPDATE}{{\it UPD}\xspace}

108: \newcommand{\full}{{\it full}\xspace}

109: \newcommand{\EMPTY}{{\it isEmpty}\xspace}

110: \newcommand{\APPEND}{{\it append}\xspace}

111: \newcommand{\HEAD}{{\it head}\xspace}

112: \newcommand{\TAIL}{{\it tail}\xspace}

113: \newcommand{\msg}{{\it msg}\xspace}

114:

115: \newtheorem{definition}{Definition}

116: \newtheorem{theorem}{Theorem}[section]

117: \newtheorem{lemma}[theorem]{Lemma}

118: \newtheorem{claim}[theorem]{Claim}

119: \newtheorem{corollary}[theorem]{Corollary}

120: \newtheorem{proposition}[theorem]{Proposition}

121: \newtheorem{assumption}{Assumption}

122: \newtheorem{requirement}{Requirement}

123: \def\EndProof{\hfill \quad \vrule width 1ex height 1ex depth 0pt  }

124: \newenvironment{proof}{\noindent{\bf Proof}:}{\EndProof}

125: \newenvironment{example}{{\bf Example}.}{\EndProof}

126:

127: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

128:

129: \begin{abstract}

130:

131: The memory model of a shared-memory multiprocessor is a contract

132: between the designer and programmer of the multiprocessor.

133: The sequential consistency memory model specifies a total order

134: among the memory (read and write) events performed at each processor.

135: A trace of a memory system satisfies sequential consistency if there exists a

136: total order of all memory

137: events in the trace that is both consistent with the total order at each processor

138: and has the property that every read event to a location

139: returns the value of the last write to that location.

140:

141: Descriptions of shared-memory systems are typically parameterized by the

142: number of processors, the number of memory locations, and the number

143: of data values.

144: %Sequential consistency is a well-known memory model.

145: It has been shown that even for finite

146: parameter values, verifying sequential consistency on general shared-memory

147: systems is undecidable.

148: We observe that, in practice, shared-memory systems satisfy the properties of

149: causality and data independence.

150: Causality is the property that values of read events flow from values

151: of write events.

152: Data independence is the property that all traces can be generated by

153: renaming data values from traces where the written values are distinct

154: from each other.

155: If a causal and data independent system also has the property that

156: the logical order of write events to each location

157: is identical to their temporal order, then sequential consistency can be

158: verified algorithmically.

159: Specifically, we present a model checking algorithm to verify sequential consistency

160: on such systems

161: for a finite number of processors and memory locations and an arbitrary

162: number of data values.

163: \end{abstract}

164:

165: \section{Introduction}

166: Shared-memory multiprocessors are very complex computer systems.

167: Multithreaded programs running on shared-memory

168: multiprocessors use an abstract view of the shared memory that is specified

169: by a memory model.

170: Examples of memory models for multiprocessors include

171: sequential consistency \cite{Lamport79}, partial store ordering

172: \cite{sparc-manual}, and the Alpha memory model \cite{alpha-manual}.

173: The implementation of the memory model, achieved by a protocol running

174: either in hardware or software, is one of the most complex aspects of

175: multiprocessor design.

176: These protocols are commonly referred to as cache-coherence protocols.

177: Since parallel programs running on such systems rely on the memory

178: model for their correctness, it is important to implement the

179: protocols correctly.

180: However, since efficiency is important for the commercial viability

181: of these systems,

182: the protocols are heavily optimized, making them prone to design errors.

183: Formal verification of cache-coherence protocols can detect these

184: errors effectively.

185:

186: Descriptions of cache-coherence protocols are typically parameterized

187: by the number of processors, the number of memory locations, and the

188: number of data values.

189: Verifying parameterized systems for arbitrary values of

190: these parameters is undecidable for nontrivial systems.

191: Interactive theorem proving is one approach to parameterized

192: verification.

193: This approach is not automated and is typically expensive in terms of

194: the required human effort.

195: % , although automatic methods (not

196: % guaranteed to terminate) have received attention in the

197: % research community in the past few years.

198: Another approach is to model check a parameterized system for small

199: values of the parameters.

200: This is a good {\em debugging\/} technique that can find a number of

201: errors prior to the more time-consuming effort of verification for

202: arbitrary parameter values.

203: In this paper, we present an automatic method based on model checking to

204: verify that a cache-coherence protocol with fixed parameter values

205: is correct with respect to the sequential consistency memory model.

206:

207: The sequential consistency memory model \cite{Lamport79} specifies a

208: {\em total order\/} among

209: the memory events (reads and writes) performed locally at each

210: processor.

211: This total order at a processor is the order in which memory events

212: occur at that processor.

213: A trace of a memory system satisfies sequential consistency

214: if there exists a total order of all memory events that

215: is both consistent with the local total order at each processor,

216: and has the property that every read to a location returns the latest

217: (according to the total order) value written to that location.

218: Surprisingly, verifying sequential consistency, even for fixed

219: parameter values, is undecidable \cite{AMP96b}.

220: Intuitively, this is because the witness total order

221: could be quite different from the global temporal order of events

222: for some systems.

223: An event might need to be logically ordered after an event that

224: occurs much later in a run.

225: Hence any algorithm needs to keep track of a potentially unbounded

226: history of a run.

227:

228: In this paper, we consider the problem of verifying that a

229: shared-memory system $\msystem(n,m,v)$ with $n$ processors, $m$

230: locations and $v$ data values is sequentially consistent.

231: We present a method that can check sequential consistency for any

232: fixed $n$ and $m$ and for arbitrary $v$.

233: The correctness of our method depends on two assumptions

234: ---causality and data independence.

235: The property of {\em causality\/} arises from the observation that

236: protocols do not conjure up data values; data is injected into the

237: system by the initial values stored in the memory and by the writes

238: performed by the processors.

239: Therefore every read operation $r$ to location $l$ is associated

240: with either the initial value of $l$ or some

241: write operation $w$ to $l$ that wrote the value read by $r$.

242: The property of {\em data independence\/} arises from the observation that

243: protocols do not examine data values; they just forward the

244: data from one component of the system (cache or memory) to another.

245: Since protocol behavior is not affected by the data values, we can

246: restrict our attention, without loss of generality, to unambiguous

247: runs in which the

248: written data values to a location are distinct from each other and from

249: the initial value.

250: We have observed that these two assumptions are true of shared-memory

251: systems that occur in practice \cite{LLG90,KOH94,BDH99,BGM00}.

252:

253: For a causal and unambiguous run, we can

254: deduce the association between a read and the associated write just

255: by looking at their data values.

256: This leads to a vast simplification in the task of specifying the

257: witness total order for sequential consistency.

258: It suffices to specify for each location, a total order on the writes

259: to that location.

260: By virtue of the association of write events and read events, the

261: total order on the write events can be extended to a partial order

262: on all memory events (both reads and writes) to that location.

263: If a read event~$r$ reads the value written by the write event~$w$,

264: the partial order puts~$r$ {\em after\/}~$w$ and all write events

265: preceding~$w$, and {\em before\/} all write events succeeding~$w$.

266: As described before, sequential consistency specifies

267: a total order on the memory events for each processor.

268: Thus, there are $n$ total orders, one for each processor,

269: and $m$ partial orders, one for each location, imposed on the graph of

270: memory events of a run.

271: A necessary and sufficient condition for the run to be

272: sequentially consistent is that this graph is acyclic.

273: % In other words, there exists a witness total order if and only if

274: % the graph imposed on the set of all memory events by these orders is

275: % acyclic.

276: We further show that existence of a cycle in this graph implies the existence of a

277: {\em nice\/} cycle in which no two processor edges (imposed by the memory model)

278: are for the same processor

279: and no two location edges (imposed by the write order) are for the same location.

280: This implies that a nice cycle can have at most $2 \times \minimum{\{n,m\}}$ edges;

281: we call a nice cycle with $2 \times k$ edges a $k$-nice cycle.

282: %that is more structured and hence easier to detect.

283: Further if the memory system is symmetric with respect to processor

284: and location ids, then processor and location edges occur in a certain

285: canonical order in the nice cycle.

286: These two observations drastically reduce the number of cycles for any search.

287: % The number of nice cycles is proportional to the product of the number

288: % of permutations

289: % of the processors and the number of permutations of memory locations.

290: % We use the symmetric nature of the memory system and the memory model

291: % with respect to

292: % processor and location indices to reduce the set of possible nice

293: % cycles to a single

294: % {\em canonical\/} nice cycle.

295:

296: We finally argue that a number of causal and data independent shared-memory systems

297: occurring in practice also have the property that the witness write order at each

298: location is simply the temporal order of the write events.

299: In other words, a write event $w$ is ordered before $w'$ if $w$ occurs

300: before $w'$.

301: We call this a simple write order, and it is in fact

302: the correct witness for a number of shared-memory systems.

303: For cache-based shared-memory systems, the intuitive explanation is

304: that at any time there is at most one cache with

305: write privilege to a location.

306: The write privilege moves from one cache to another with time.

307: Hence, the logical timestamps \cite{Lamport78} of the writes to a

308: location order them exactly according to their global temporal order.

309: We show that the proof that a simple write order is a correct witness for a

310: memory system can be performed by model checking

311: \cite{ClarkeEmerson81, QueilleSifakis81}.

312: Specifically, the proof for the memory system $\msystem(n,m,v)$ for

313: fixed $n$ and $m$ and arbitrary $v$ is broken into

314: $\minimum{\{n,m\}}$ model checking lemmas, where the $k$-th lemma checks for the

315: existence of canonical $k$-nice cycles.

316:

317: % The results in this paper depend on a set of assumptions on the memory

318: % model and the shared-memory system.

319: % We state the assumptions as they are needed for a result.

320: % Hence they are distributed over the various sections of the paper.

321: % In Section~\ref{sec:discussion}, we discuss our assumptions and the

322: % derived results.

323:

324: The rest of the paper is organized as follows.

325: Sections~\ref{sec:msystem} and~\ref{sec:causal-data-ind} formalize

326: shared-memory systems and our assumptions of causality and data

327: independence about them.

328: Section~\ref{sec:mmodel} defines the sequential consistency memory

329: model.

330: Section~\ref{sec:witness} defines the notions of a witness and

331: a constraint graph for an unambiguous and causal run.

332: Section~\ref{sec:cycle} and~~\ref{sec:symmetry} show that it is

333: sufficient to search for canonical nice cycles in the constraint graph.

334: Section~\ref{sec:model-check} shows how to use model checking to

335: detect canonical nice cycles in the constraint graphs of the runs of a

336: memory system.

337: Finally, we discuss related work in Section~\ref{sec:rel-work} and

338: conclude in Section~\ref{sec:conclusions}.

339:

340: \section{Shared-memory systems}

341: \label{sec:msystem}

342:

343: \begin{figure}

344: \begin{tabbing}

345: $\zug{\SHDRSP,i,j}~~$ \= xx \= xx \= xx \= xx \= xx \= xx \= xx \= \kill

346: $\typedef\ \Msg\ \{m : \{\SHDACK, \EXCACK\}, a : \nat_m, d : \natzero_v\} \cup \{m : \{\INVAL\}, a : \nat_m\};$ \\

347: $\typedef\ \Entry\ \{d : \natzero_v, s : \{\INV, \SHD, \EXC\}\};$ \\

348: %$\typedef\ \Queue\ \{q : \ARRAY\ \Index\ \OF\ \Msg, h : \Index, t : \Index\};$ \\

349: $\cache: \ARRAY\ \nat_n\ \OF\ \ARRAY\ \nat_m\ \OF\ \Entry;$ \\

350: $\inQ: \ARRAY\ \nat_n\ \OF\ \Queue(\Msg);$ \\

351: $\owner: \ARRAY\ \nat_m\ \OF\ \natzero_n;$ \\

352: \\

353: Initial predicate\\

354: $\forall i \in \nat_n, j \in \nat_m: (cache[i][j] = \zug{0,\SHD} \wedge

355: \inQ[i].\EMPTY \wedge owner[j] \neq 0)$ \\

356: Events\\

357: $\zug{\rd,i,j,k}$ \> $\cache[i][j].s \neq \INV \wedge \cache[i][j].d = k

358: \rightarrow$ \\

359: $\zug{\wrt,i,j,k}$ \> $\cache[i][j].s = \EXC \rightarrow$ \\

360: \>\> $\cache[i][j].d := k$ \\

361: $\zug{\EXCRSP,i,j}$ \> $\cache[i][j].s \neq \EXC \wedge \owner[j] \neq 0 \rightarrow$ \\

362: %                  \> $\forall p \in \nat_n: (p \neq \owner[j] \wedge \cache[p][j].s \neq \INV

363: %                                          \Rightarrow \neg\inQ[p].\full) \rightarrow$ \\

364: \>\>		   $\IF\ owner[j] \neq i\ \THEN\ \cache[\owner[j]][j].s := \INV;$ \\

365: \>\>	           $\owner[j] := 0;$ \\

366: \>\>		   for each $(p \in \nat_n)$ \\

367: \>\>\>	           $\IF\ (p = i)\ \THEN$ \\

368: \>\>\>\>              $\inQ[p] := \APPEND(\inQ[p],\zug{\EXCACK, j, \cache[\owner[j]][j].d})$ \\

369: \>\>\>		   $\ELSE\ \IF\ (p \neq \owner[j] \wedge

370: \cache[p][j].s \neq \INV)\ \THEN$ \\

371: \>\>\>\>	 	   $\inQ[p] := \APPEND(\inQ[p],\zug{\INVAL, j})$ \\

372: $\zug{\SHDRSP,i,j}$ \> $\cache[i][j].s = INV \wedge \owner[j] \neq 0 \rightarrow$ \\

373: \>\>			$\cache[\owner[j]][j].s := \SHD;$ \\

374: \>\>                    $owner[j] := 0;$ \\

375: \>\>			$\inQ[i] := \APPEND(inQ[i],\zug{\SHDACK, j,

376: \cache[\owner[j]][j].d});$ \\

377: $\zug{\UPDATE,i}$   \> $\neg \EMPTY(\inQ[i]) \rightarrow$ \\

378: \>\>			$\LET\ msg = \HEAD(\inQ[i])\ \IN$ \\

379: \>\>\>   	 	  $\IF\ (\msg.m = \INVAL)\ \THEN$ \\

380: \>\>\>\>                       $\cache[i][\msg.a].s := \INV$ \\

381: \>\>\>			  $\ELSE\ \IF\ (\msg.m = \SHDACK)\ \THEN\ \{$ \\

382: \>\>\>\>                       $\cache[i][\msg.a] := \zug{\SHD,\msg.d};$ \\

383: \>\>\>\>                       $\owner[\msg.a] := i$ \\

384: \>\>\>			  $\}\ \ELSE\ \{$ \\

385: \>\>\>\>                       $\cache[i][\msg.a] := \zug{\EXC,\msg.d};$ \\

386: \>\>\>\>                       $\owner[\msg.a] := i$ \\

387: \>\>\>                    $\}$ \\

388: \>\>                   $\inQ[i] := \TAIL(inQ[i])$

389: \end{tabbing}

390: \caption{Example of memory system}

391: \label{fig:example}

392: \end{figure}

393:

394: Let $\nat$ denote the set of positive integers and $\natzero$

395: denote the set of non-negative integers.

396: For any $n \geq 1$, let $\nat_n$ denote the set of positive integers

397: up to $n$ and $\natzero_n$ denote the set of non-negative integers up

398: to $n$.

399:

400: A memory system is parameterized by the number of processors, the

401: number of memory locations, and the number of data values.

402: The value $0$ is used to model the initial value of all memory

403: locations.

404: In a memory system with $n$ processors, $m$ memory locations, and $v$

405: data values, read and write events denoted by $\rd$ and $\wrt$ can

406: occur at any processor in $\nat_n$, to any

407: location in $\nat_m$, and have any data value in $\natzero_v$ (the

408: data values in $\nat_v$ together with the initial value $0$).

409: Formally, we define the following sets of events parameterized by the

410: number of processors $n$, the number of locations $m$, and the number

411: of data values $v$, where $n,m,v \geq 1$.

412: \begin{enumerate}

413: \item

414: $\rdevent(n,m,v) = \{\rd\} \times \nat_n \times \nat_m \times

415: \natzero_v$ is the set of {\em read events\/}.

416: \item

417: $\wrtevent(n,m,v) = \{\wrt\} \times \nat_n \times \nat_m \times

418: \natzero_v$ is the set of {\em write events\/}.

419: \item

420: $\memevent(n,m,v) = \rdevent(n,m,v) \cup \wrtevent(n,m,v)$ is the

421: set of {\em memory events\/}.

422: \item

423: $\event(n,m,v) \supseteq \memevent(n,m,v)$ is the set of

424: {\em all events\/}.

425: \item

426: $\event(n,m,v) \setminus \memevent(n,m,v)$ is the set of

427: {\em internal events\/}.

428: \end{enumerate}

429: The set of all finite sequences of events in $\event(n,m,v)$ is

430: denoted by $\event(n,m,v)^*$.

431: A {\em memory system\/} $\msystem(n,m,v)$ is a regular subset of

432: $\event(n,m,v)^*$.

433: A sequence $\sigma \in \msystem(n,m,v)$ is said to be a {\em run\/}.

434: We denote by $\msystem(n,m)$ the union $\bigcup_{v \geq 1} \msystem(n,m,v)$.

435:

436: Consider any $\sigma \in \event(n,m,v)^*$.

437: We denote the length of $\sigma$ by $|\sigma|$ and write $\sigma(i)$ for

438: the $i$-th element of the sequence.

439: The set of indices of the memory events in $\sigma$ is denoted by

440: $\domain(\sigma) = \{1 \leq k \leq |\sigma|~|~\sigma(k) \in

441: \memevent(n,m,v)\}$.

442: For every memory event $e = \zug{a,b,c,d} \in \memevent(n,m,v)$, we define

443: $\op(e) = a$, $\proc(e) = b$, $\loc(e) = c$, and $\data(e) = d$.

444: The set of memory events by processor $i$

445: for all~$1 \leq i \leq n$ is denoted by

446: $P(\sigma,i) = \{k \in \domain(\sigma)~ |~ \proc(\sigma(k)) = i\}$.

447: The set of memory events to location $i$

448: for all~$1 \leq i \leq m$ is denoted by

449: $L(\sigma,i) = \{k \in \domain(\sigma)~ |~ \loc(\sigma(k)) = i\}$.

450: For all~$1 \leq i \leq m$,

451: the set of write events to location $i$ is denoted by

452: $L^w(\sigma,i) = \{k \in L(\sigma,i)~ |~ \op(\sigma(k)) = \wrt\}$, and

453: the set of read events to location $i$ is denoted by

454: $L^r(\sigma,i) = \{k \in L(\sigma,i)~ |~ \op(\sigma(k)) = \rd\}$.

455:

456: The subsequence obtained by projecting $\sigma$ onto $\domain(\sigma)$ is

457: denoted by $\trace{\sigma}$.

458: If $\sigma \in \msystem(n,m,v)$, the

459: sequence $\trace{\sigma}$ is a {\em trace\/} of $\msystem(n,m,v)$.

460: Similarly, if $\sigma \in \msystem(n,m)$, the sequence

461: $\trace{\sigma}$ is a {\em trace\/} of $\msystem(n,m)$.

462: % We denote by $\msystemt(n,m,v)$ the set of traces of $\msystem(n,m,v)$, and

463: % by $\msystemt(n,m)$ the union $\bigcup_{v \geq 1} \msystemt(n,m,v)$.

464:

465: \begin{example}

466: Consider the memory system in Figure~\ref{fig:example}.

467: It is a highly simplified model of the protocol used to maintain cache

468: coherence within a single node in the Piranha chip multiprocessor system

469: \cite{BGM00}.

470: The system has three variables ---$\cache$, $\inQ$ and $\owner$---

471: and five events ---the memory events $\{\rd,\wrt\}$ and the internal

472: events $\{\EXCRSP,\SHDRSP,\UPDATE\}$.

473: The variables $inQ$ and $\owner$ need some explanation.

474: For each processor $i$, there is an input queue $\inQ[i]$ where

475: incoming messages are put.

476: The type of $\inQ[i]$ is $\Queue$.

477: The operations $\EMPTY$, $\HEAD$ and $\TAIL$ are defined on $\Queue$,

478: and the operation $\APPEND$ is defined on $\Queue \times \Msg$.

479: They have the obvious meanings and their definitions have been omitted

480: in the figure.

481: For each memory location $j$, either $\owner[j] = 0$ or $\owner[j]$

482: contains the index of a processor.

483: Each event is associated with a guarded command.

484: The memory events $\rd$ and $\wrt$ are parameterized by

485: three parameters ---processor $i$, location $j$ and data value $k$.

486: The internal events $\EXCRSP$ and $\SHDRSP$ are parameterized by

487: two parameters ---processor $i$ and location $j$.

488: The internal event $\UPDATE$ is parameterized by processor $i$.

489: A {\em state\/} is a valuation to the variables.

490: An {\em initial\/} state is a state that satisfies the initial predicate.

491: An event is {\em enabled\/} in a state

492: if the guard of its guarded command is true in the state.

493: The variables are initialized to an initial state and updated by

494: nondeterministically choosing an

495: enabled event and executing the guarded command corresponding to it.

496: A run of the system is any finite sequence of events that can be executed

497: starting from some initial state.

498:

499: A processor $i$ can perform a read to location $j$ if

500: $\cache[i][j].s \in \{\SHD,\EXC\}$, otherwise it requests $\owner[j]$

501: for shared access to location $j$.

502: The processor $\owner[j]$ is the last one to have received shared or

503: exclusive access to location $j$.

504: The request by $i$ has been abstracted away but the response of $\owner[j]$

505: is modeled by the action $\SHDRSP[i][j]$, which sends a $\SHDACK$ message

506: containing the data in location $j$ to $i$ and temporarily sets $\owner[j]$ to $0$.

507: Similarly, processor $i$ can perform a write to location $j$ if

508: $\cache[i][j].s = \EXC$, otherwise it requests $\owner[j]$

509: for exclusive access to location $j$.

510: The processor $\owner[j]$ responds by sending a $\EXCACK$ message to

511: $i$ and $\INVAL$ messages to all other processors that have a valid

512: copy of location $j$.

513: $owner[j]$ is set to $i$ when processor $i$ reads the $\SHDACK$ or

514: $\EXCACK$ message from $\inQ[i]$ in the event $\UPDATE[i]$.

515: Note that new requests for $j$ are blocked while $\owner[j] = 0$.

516: A processor $i$ that receives an $\INVAL$ message for location $j$ sets

517: $\cache[i][j].s$ to $\INV$.

518: \end{example}

519:

520: \section{Causality and data independence}

521: \label{sec:causal-data-ind}

522: In this section, we will state our main assumptions on memory

523: systems ---causality and data independence.

524:

525: We assume that the runs of memory systems are {\em causal}.

526: That is, every read event to location $m$ ``reads'' either the initial value of

527: $m$ or the value ``written'' to $m$ by some write event.

528: We believe that this assumption is reasonable because

529: memory systems do not conjure up data values; they just move around

530: data values that were introduced by initial values or write events.

531: % It follows that any value that is read must have been either an

532: % initial value or ``written'' by a write event.

533: % Moreover, note that a non-causal memory system trivially violates

534: % any memory model including sequential consistency.

535: We state the causality assumption formally as follows.

536:

537: \begin{assumption}[Causality]

538: \label{ass:causality}

539: For all $n,m,v \geq 1$,

540: for all traces $\tau$ of $\msystem(n,m,v)$,

541: and for all locations~$1 \leq i \leq m$,

542: if $x \in L^r(\tau,i)$, then either $\data(\tau(x)) = 0$ or

543: there is $y \in L^w(\tau,i)$ such that

544: $\data(\tau(x)) = \data(\tau(y))$.

545: \end{assumption}

546:

547: The memory system in Figure~\ref{fig:example} is causal.

548: Only the write event $\wrt$ introduces a fresh data value in

549: the system by updating the cache;

550: the internal events $\SHDACK$, $\EXCACK$ and $\UPDATE$ move data

551: around and the read event $\rd$ reads the data present in the cache.

552: Therefore, the data value of a read operation must either be the

553: initial value $0$ or the value

554: introduced by a write event, thus satisfying Assumption~\ref{ass:causality}.

555:

556: Memory systems occurring in practice also have the property of

557: {\em data independence\/}, that is, control decisions are oblivious to the

558: data values.

559: A cache line carries along with the actual program data a

560: few state bits for recording whether it is

561: in shared, exclusive or invalid mode.

562: Typically, actions do not depend on the value of the data in the cache

563: line.

564: This can be observed, for example, in the memory system shown in

565: Figure~\ref{fig:example}.

566: Note that there are no predicates involving the data fields of the

567: cache lines and the messages in any of the internal events of the system.

568: In such systems, renaming the data values of a run results in yet

569: another run of the system.

570: Moreover, every run can be obtained by data value renaming from some

571: run in which the initial value and values of write events to any

572: location $i$ are all distinct from each other.

573: In order to define data independence formally,

574: we define below the notion of an unambiguous run and the notion

575: of data value renaming.

576:

577: Formally, a run $\sigma$ of $\msystem(n,m,v)$ is {\em unambiguous\/} if for

578: all~$1 \leq i \leq m$ and $x \in L^w(\sigma,i)$, we have

579: (1)~$\data(\sigma(x)) \neq 0$, and

580: (2)~$\data(\sigma(x)) \neq \data(\sigma(y))$

581:     for all $y \in L^w(\sigma,i)\setminus\{x\}$.

582: In an unambiguous run, every write event to a location $i$ has a value

583: distinct from the initial value of $i$ and the value of every other

584: write to $i$.

585: The trace $\trace{\sigma}$ corresponding to an unambiguous run $\sigma$

586: is called an {\em unambiguous trace}.

587: If a run is both unambiguous and causal, each read event to location

588: $i$ with data value $0$ reads the initial value of $i$, and each read

589: event with a nonzero data value reads the value written by the unique

590: write event with a matching data value.

591: Thus, a read event can be paired with its source write event

592: just by comparing data values.

593:

594: % Let $\msystemu(n,m,v) = \{\sigma \in

595: % \msystem(n,m,v)~ |~ \sigma~\mathrm{is}~unambiguous\}$ be the subset of

596: % unambiguous runs in $\msystem(n,m,v)$.

597: % We denote by $\msystemu(n,m)$ the union $\bigcup_{v \geq 1}

598: % \msystemu(n,m,v)$.

599: % We denote by $\msystemut(n,m,v)$ the set of traces of $\msystemu(n,m,v)$, and by

600: % $\msystemut(n,m)$ the union $\bigcup_{v \geq 1} \msystemut(n,m,v)$.

601:

602: A function $\lambda: \nat_m \times \natzero \rightarrow \natzero$ is

603: called a {\em renaming function\/} if $\lambda(j,0) = 0$

604: for all $1 \leq j \leq m$.

605: Intuitively, the function $\lambda$ provides for each memory location $c$ and

606: data value $d$ the renamed data value $\lambda(c,d)$.

607: Since $0$ models the fixed initial value of all locations, the function $\lambda$

608: does not rename the value $0$.

609: Let $\lambda^d$ be a function on $\memevent(n,m,v)$ such that

610: for all $e=\zug{a,b,c,d} \in \memevent(n,m)$, we have

611: $\lambda^d(e) = \zug{a,b,c,\lambda(c,d)}$.

612: The function $\lambda^d$ is extended to sequences in

613: $\memevent(n,m,v)^*$ in the natural way.

614:

615: We state the data independence assumption formally as follows.

616: \begin{assumption}[Data independence]

617: \label{ass:data-independence}

618: For all $n,m,v \geq 1$ and sequences $\tau \in \memevent(n,m,v)^*$,

619: we have that $\tau$ is a trace of $\msystem(n,m,v)$ iff

620: there is an unambiguous trace $\tau'$ of $\msystem(n,m)$ and a renaming function

621: $\lambda: \nat_m \times \natzero \rightarrow \natzero_v$ such that

622: $\tau = \lambda^d(\tau')$.

623: \end{assumption}

624:

625: Assumptions~\ref{ass:causality} and~\ref{ass:data-independence}

626: are motivated by the data handling in typical cache-coherence

627: protocols.

628: We can have these assumptions be true on protocol descriptions

629: by imposing restrictions on the operations allowed on variables

630: that contain data values \cite{Nalumasu99}.

631: For example, one restriction can be that no data variable appears in the

632: guard expression of an internal event or in the control expression of a

633: conditonal.

634:

635: \section{Sequential consistency}

636: \label{sec:mmodel}

637: Suppose $\msystem(n,m,v)$ is a memory system for some $n,m,v \geq 1$.

638: The sequential consistency memory model \cite{Lamport79}

639: is a correctness requirement on the runs of $\msystem(n,m,v)$.

640: % Sequential consistency totally orders the memory events at a processor

641: % and requires that the memory events of the run can be rearranged

642: % while respecting the total orders so that every read event returns the

643: % value of the latest write event according to the rearrangement.

644: In this section, we define sequential consistency formally.

645:

646: We first define the simpler notion of a sequence being serial.

647: For all $\tau \in \memevent(n,m,v)^*$ and

648: $1 \leq i \leq |\tau|$, let $\upto(\tau,i)$ be the set

649: $\{1 \leq k \leq i ~|~ \op(\tau(k)) = \wrt \wedge \loc(\tau(k)) =

650: \loc(\tau(i))\}$.

651: In other words, the set $\upto(\tau,i)$ is the set of write events in

652: $\tau$ to location $\loc(\tau(i))$ occurring not later than $i$.

653: A sequence $\tau \in \memevent(n,m,v)^*$ is

654: {\em serial\/} if for all $1 \leq u \leq |\tau|$, we have

655: \[\begin{array}{ll}

656: \data(\tau(u)) = 0, & \mathrm{if}~\upto(\tau,u) = \emptyset \\

657: \data(\tau(u)) = \data(\tau(\maximum{\upto(\tau,u)})), & \mathrm{if}~\upto(\tau,u)

658: \neq \emptyset.

659: \end{array}\]

660: Thus, a sequence is serial if every read to a location $i$ returns

661: the value of the latest write to $i$ if one exists, and the initial

662: value $0$ otherwise.\footnote{The decision to

663: model the initial values of all locations by the value $0$ is

664: implicit in our definition of a serial sequence.}

665:

666: The {\em sequential consistency\/} memory model $\msc$ is a function

667: that maps every sequence of memory events~$\tau \in

668: \memevent(n,m,v)^*$ and processor~$1 \leq i

669: \leq n$ to a total order $\msc(\tau,i)$ on $P(\tau,i)$

670: defined as follows:

671: for all $u,v \in P(\tau,i)$, we have

672: $\zug{u,v} \in \procorder{\msc}{\tau}{i}$ iff $u < v$.

673: A sequence $\tau$ is {\em sequentially consistent\/}

674: if there is a permutation $f$ on

675: $\nat_{|\tau|}$ such that the following conditions are satisfied.

676: \begin{enumerate}

677: \item[C1]

678: For all $1 \leq u,v \leq |\tau|$ and~$1 \leq i \leq n$,

679: if $\zug{u,v} \in \procorder{\msc}{\tau}{i}$

680: then $f(u) < f(v)$.

681: \item[C2]

682: The sequence $\tau' = \tau_{f^{-1}(1)} \tau_{f^{-1}(2)} \ldots \tau_{f^{-1}(|\tau|)}$

683: is serial.

684: \end{enumerate}

685: Intuitively, the sequence $\tau'$ is a permutation of the sequence $\tau$

686: such that the event at index $u$ in $\tau$ is moved to index $f(u)$

687: in $\tau'$.

688: According to C1, this permutation must respect the total order

689: $\msc(\tau,i)$ for all $1 \leq i \leq n$.

690: According to C2, the permuted sequence must be serial.

691: A run $\sigma \in \msystem(n,m,v)$ is sequentially

692: consistent if $\trace{\sigma}$ satisfies $\msc$.

693: The memory system $\msystem(n,m,v)$ is sequentially consistent

694: iff every run of $\msystem(n,m,v)$ is sequentially consistent.

695:

696: The memory system in Figure~\ref{fig:example} is supposed to be

697: sequentially consistent.

698: Here is an example of a sequentially consistent run $\sigma$ of that memory

699: system, the corresponding trace $\tau$ of $\sigma$, and the sequence

700: $\tau'$ obtained by permuting $\tau$.

701: \begin{center}

702: $\sigma$ = \parbox{1in}{$\zug{\EXCACK,1,1}$\\

703:                    $\zug{\UPDATE,1}$\\

704:                    $\zug{\wrt,1,1,1}$\\

705:                    $\zug{\rd,2,1,0}$\\

706:                    $\zug{\UPDATE,2}$\\

707:                    $\zug{\SHDACK,2,1}$\\

708:                    $\zug{\UPDATE,2}$\\

709:                    $\zug{\rd,2,1,1}$}

710: $\tau = \trace{\sigma}$ = \parbox{1in}{$\zug{\wrt,1,1,1}$\\

711:                             $\zug{\rd,2,1,0}$\\

712:                             $\zug{\rd,2,1,1}$}

713: $\tau'$ = \parbox{1in}{$\zug{\rd,2,1,0}$\\

714:                       $\zug{\wrt,1,1,1}$\\

715:                       $\zug{\rd,2,1,1}$}

716: \end{center}

717: Sequential consistency orders the event $\tau(2)$ before

718: the event $\tau(3)$ at processor 2.

719: Let $f$ be the permutation on $\nat_3$ defined by

720: $f(1) = 2$, $f(2) = 1$, and $f(3) = 3$.

721: The sequence $\tau'$ is the permutation of $\tau$ under $f$.

722: It is easy to check that both conditions C1 and C2 mentioned above are

723: satisfied.

724:

725: In order to prove that a run of a memory system is sequentially consistent, one needs to

726: provide a reordering of the memory events of the run.

727: This reordering should be serial and should respect the total orders imposed by

728: sequential consistency at each processor.

729: Since the memory systems we consider in this paper are data

730: independent, we only need to show sequential consistency for the

731: unambiguous runs of the memory system.

732: This reduction is stated formally in the following theorem.

733: \begin{theorem}

734: \label{thm:red-to-unamb}

735: For all $n,m \geq 1$,

736: every trace of $\msystem(n,m)$ is sequentially consistent iff

737: every unambiguous trace of $\msystem(n,m)$ is sequentially consistent.

738: \end{theorem}

739: \begin{proof}

740: The $\Rightarrow$ case is trivial.

741:

742: ($\Leftarrow$)

743: Let $\tau$ be a trace of $\msystem(n,m,v)$ for some $v \geq 1$.

744: From Assumption~\ref{ass:data-independence} there is an unambiguous trace

745: $\tau'$ of $\msystem(n,m)$ and a renaming function

746: $\lambda:\nat_m \times \natzero \rightarrow \natzero_v$

747: such that $\tau = \lambda^d(\tau')$.

748: Since $\tau'$ is sequentially consistent, we know that conditions C1 and C2

749: are satisfied by $\tau'$.

750: It is not difficult to see that both conditions C1 and C2 are

751: satisfied by $\lambda^d(\tau')$ as well.

752: Therefore $\tau$ is sequentially consistent.

753: \end{proof}

754:

755: \section{Witness}

756: \label{sec:witness}

757: Theorem~\ref{thm:red-to-unamb} allows us to prove that a memory system

758: $\msystem(n,m,v)$ is sequentially consistent by proving that all

759: unambiguous runs in $\msystem(n,m)$ is sequentially consistent.

760: In this section, we reduce the problem of checking

761: sequential consistency on an unambiguous run to the problem of

762: detecting a cycle in a constraint graph.

763: % Proving sequential consistency for unambiguous runs is conceptually

764: % simpler that proving it for general runs.

765: % For an unambiguous run, one only needs to provide a linearization of

766: % the write events to each location.

767: % Each read event in the run with a nonzero value can be associated with the

768: % unique write event that wrote that value.

769:

770: Consider a memory system $\msystem(n,m,v)$ for some fixed $n,m,v \geq 1$.

771: A {\em witness\/} $\witness$ for $\msystem(n,m,v)$ maps every

772: trace $\tau$ of $\msystem(n,m,v)$ and location $1 \leq i \leq m$ to a total

773: order $\locorder{\witness}{\tau}{i}$

774: on the set of writes $L^w(\tau,i)$ to location $i$.

775: If the trace~$\tau$ is unambiguous, the total order $\locorder{\witness}{\tau}{i}$

776: on the write events to location~$i$ can be extended to a partial

777: order $\locorder{\expand{\witness}}{\tau}{i}$ on

778: all memory events (including read events) to location~$i$.

779: If a read event~$r$ reads the value written by the write event~$w$,

780: the partial order puts~$r$ {\em after\/} $w$ and all write events

781: preceding $w$, and {\em before\/} all write events succeeding~$w$.

782: Formally, for every unambiguous trace $\tau$ of $\msystem(n,m,v)$,

783: location~$1 \leq i \leq m$, and $x,y \in L(\tau,i)$, we have that

784: $\zug{x,y} \in \locorder{\expand{\witness}}{\tau}{i}$ iff

785: %iff $u \neq v$ and

786: one of the following conditions holds.

787: % \begin{enumerate}

788: % \item

789: % $\locorder{\witness}{\tau}{i} \subseteq

790: % \locorder{\expand{\witness}}{\tau}{i}$.

791: % \item

792: % For all $u \in L^r(\tau,i)$ and $v \in L^w(\tau,i)$ if

793: % $\data(\tau(u)) = 0$ then

794: % $\zug{u,v} \in \locorder{\expand{\witness}}{\tau}{i}$.

795: % \item

796: % For all $u,v \in L(\tau,i)$,

797: % if $\op(\tau(u)) = \wrt$, $\op(\tau(v))

798: % = \rd$, and $\data(\tau(u)) = \data(\tau(v))$ then

799: % $\zug{u,v} \in \locorder{\expand{\witness}}{\tau}{i}$.

800: % \item

801: % For all $u,u',v,v' \in L(\tau,i)$,

802: % if $\data(\tau(u)) = \data(\tau(v))$,

803: % $\data(\tau(u')) = \data(\tau(v'))$,

804: % $\data(\tau(u)) \neq \data(\tau(u'))$, and

805: % $\zug{u,u'} \in \locorder{\expand{\witness}}{\tau}{i}$,

806: % then $\zug{v, v'} \in \locorder{\expand{\witness}}{\tau}{i}$.

807: % \end{enumerate}

808: \begin{enumerate}

809: \item

810: $\data(\tau(x)) = \data(\tau(y))$, $\op(\tau(x)) = \wrt$, and

811: $\op(\tau(y)) = \rd$.

812: \item

813: $\data(\tau(x)) = 0$ and $\data(\tau(y)) \neq 0$.

814: \item

815: $\exists a,b \in L^w(\tau,i)$ such that

816:     $\zug{a,b} \in \locorder{\witness}{\tau}{i}$,

817:     $\data(\tau(a)) = \data(\tau(x))$, and

818:     $\data(\tau(b)) = \data(\tau(y))$.

819: \end{enumerate}

820:

821: We now show that the relation $\locorder{\expand{\witness}}{\tau}{i}$

822: is a partial order.

823: First, we need the following lemma about $\locorder{\expand{\witness}}{\tau}{i}$.

824: %The lemma will be used in Section~\ref{sec:cycle}.

825: \begin{lemma}

826: \label{lemma:almost-ord}

827: For all unambiguous traces $\tau$ of $\msystem(n,m,v)$,

828: locations~$1 \leq i \leq m$

829: and $r,s,t \in L(\tau,i)$, if $\zug{r,s} \in

830: \locorder{\expand{\witness}}{\tau}{i}$,

831: then either $\zug{r,t} \in \locorder{\expand{\witness}}{\tau}{i}$ or

832: $\zug{t,s} \in \locorder{\expand{\witness}}{\tau}{i}$.

833: \end{lemma}

834: \begin{proof}

835: Since $\zug{r,s} \in \locorder{\expand{\witness}}{\tau}{i}$,

836: either $\data(\tau(s)) \neq 0$ or there is a $x \in L^w(\tau,i)$ such

837: that $\data(\tau(s)) = \data(\tau(x))$.

838: Since $\tau$ is an unambiguous trace, we have that $\data(\tau(x))

839: \neq 0$.

840: Therefore, we get that $\data(\tau(s)) \neq 0$ in both cases.

841: If $\data(\tau(t)) = 0$ we immediately get that

842: $\zug{t,s} \in \locorder{\expand{\witness}}{\tau}{i}$.

843: So suppose $\data(\tau(t)) \neq 0$.

844: Since $\tau$ is unambiguous, there is $y \in L^w(\tau,i)$ such that

845: $\data(\tau(t)) = \data(\tau(y))$.

846: We have three cases from the definition of

847: $\zug{r,s} \in \locorder{\expand{\witness}}{\tau}{i}$.

848: \begin{enumerate}

849: \item

850: $\data(\tau(r)) = \data(\tau(s))$, $\op(\tau(r)) = \wrt$, and

851: and $\op(\tau(s)) = \rd$.

852: Since $\witness$ is a total order on $L^w(\tau,i)$, either

853: $\zug{r,y} \in \locorder{\witness}{\tau}{i}$ or

854: $\zug{y,r} \in \locorder{\witness}{\tau}{i}$.

855: In the first case, we have

856: $\zug{r,t} \in \locorder{\expand{\witness}}{\tau}{i}$.

857: In the second case, we have

858: $\zug{t,s} \in \locorder{\expand{\witness}}{\tau}{i}$.

859: \item

860: $\data(\tau(r)) = 0$ and $\data(\tau(s)) \neq 0$.

861: We get that $\zug{r,t} \in \locorder{\expand{\witness}}{\tau}{i}$.

862: \item

863: $\exists a,b \in L^w(\tau,i)$ such that

864:     $\zug{a,b} \in \locorder{\witness}{\tau}{i}$,

865:     $\data(\tau(a)) = \data(\tau(r))$, and

866:     $\data(\tau(b)) = \data(\tau(s))$.

867: Since $\witness$ is a total order on $L^w(\tau,i)$, either

868: $\zug{a,y} \in \locorder{\witness}{\tau}{i}$ or

869: $\zug{y,a} \in \locorder{\witness}{\tau}{i}$.

870: In the first case, we have

871: $\zug{r,t} \in \locorder{\expand{\witness}}{\tau}{i}$.

872: In the second case, we have by transitivity $\zug{y,b} \in

873: \locorder{\witness}{\tau}{i}$ and therefore

874: $\zug{t,s} \in \locorder{\expand{\witness}}{\tau}{i}$.

875: \end{enumerate}

876: \end{proof}

877:

878: % We also show that $\locorder{\expand{\witness}}{\tau}{i}$ is a partial

879: % order for all locations $1 \leq i \leq m$.

880: \begin{lemma}

881: \label{lemma:partial-order}

882: For all unambiguous traces $\tau$ of $\msystem(n,m,v)$ and locations~$1 \leq i

883: \leq m$, we have that $\locorder{\expand{\witness}}{\tau}{i}$ is a

884: partial order.

885: \end{lemma}

886: \begin{proof}

887: We show that $\locorder{\expand{\witness}}{\tau}{i}$ is

888: irreflexive.

889: In other words, for all $1 \leq x \leq |\tau|$, we have that

890: $\zug{x,x} \not \in \locorder{\expand{\witness}}{\tau}{i}$.

891: This is an easy proof by contradiction by assuming

892: $\zug{x,x} \in \locorder{\expand{\witness}}{\tau}{i}$ and

893: performing a case analysis over the three resulting

894: conditions.

895:

896: We show that $\locorder{\expand{\witness}}{\tau}{i}$ is

897: anti-symmetric.

898: In other words, for all $1 \leq x < y \leq |\tau|$, if

899: $\zug{x,y} \in \locorder{\expand{\witness}}{\tau}{i}$ then

900: $\zug{y,x} \not \in \locorder{\expand{\witness}}{\tau}{i}$.

901: We do a proof by contradiction.

902: Suppose both $\zug{x,y} \in \locorder{\expand{\witness}}{\tau}{i}$

903: and $\zug{y,x} \in \locorder{\expand{\witness}}{\tau}{i}$.

904: We reason as in the proof of Lemma~\ref{lemma:almost-ord} to obtain

905: $\data(\tau(x)) \neq 0$ and $\data(\tau(y)) \neq 0$.

906: Therefore there are $a,b \in L^w(\tau,i)$ such that $\data(\tau(a)) =

907: \data(\tau(x))$ and $\data(\tau(b)) = \data(\tau(y))$.

908: We perform the following case analysis.

909: \begin{enumerate}

910: \item

911: $a = b$.

912: Either $\op(x) = \rd$ and $\op(y) = \rd$,

913: or $\op(x) = \wrt$ and $\op(y) = \rd$,

914: or $\op(x) = \rd$ and $\op(y) = \wrt$.

915: In the first case $\zug{x,y} \not \in \locorder{\expand{\witness}}{\tau}{i}$ and

916: $\zug{y,x} \not \in \locorder{\expand{\witness}}{\tau}{i}$.

917: In the second case $\zug{y,x} \not \in \locorder{\expand{\witness}}{\tau}{i}$.

918: In the third case $\zug{x,y} \not \in \locorder{\expand{\witness}}{\tau}{i}$.

919: \item

920: $\zug{a,b} \in \locorder{\witness}{\tau}{i}$.

921: We have $\data(\tau(x)) \neq \data(\tau(y))$ since $\tau$ is

922: unambiguous.

923: Since $\locorder{\witness}{\tau}{i}$ is a total order, we have

924: $\zug{b,a} \not \in \locorder{\witness}{\tau}{i}$.

925: Therefore $\zug{y,x} \not \in \locorder{\expand{\witness}}{\tau}{i}$.

926: \item

927: $\zug{b,a} \in \locorder{\witness}{\tau}{i}$.

928: This case is symmetric to Case 2.

929: % We have $\data(\tau(x)) \neq \data(\tau(y))$ since $\tau$ is

930: % unambiguous.

931: % Since $\locorder{\witness}{\tau}{i}$ is a total order, we have

932: % $\zug{a,b} \not \in \locorder{\witness}{\tau}{i}$.

933: % Therefore $\zug{x,y} \not \in \locorder{\expand{\witness}}{\tau}{i}$.

934: \end{enumerate}

935:

936: Finally, we show that $\locorder{\expand{\witness}}{\tau}{i}$ is

937: transitive.

938: Suppose $\zug{x,y} \in \locorder{\expand{\witness}}{\tau}{i}$ and

939: $\zug{y,z} \in \locorder{\expand{\witness}}{\tau}{i}$.

940: From Lemma~\ref{lemma:almost-ord}, either

941: $\zug{x,z} \in \locorder{\expand{\witness}}{\tau}{i}$ or

942: $\zug{z,y} \in \locorder{\expand{\witness}}{\tau}{i}$.

943: We have shown $\locorder{\expand{\witness}}{\tau}{i}$ to be

944: anti-symmetric.

945: Therefore $\zug{x,z} \in \locorder{\expand{\witness}}{\tau}{i}$.

946: \end{proof}

947:

948: \subsection{Constraint graph}

949: Suppose $\tau$ is an unambiguous trace of $\msystem(n,m,v)$.

950: We have that $\procorder{\msc}{\tau}{i}$ is a total order on

951: $P(\tau,i)$ for all~$1 \leq i \leq n$ from the definition of

952: sequential consistency.

953: We also have that $\locorder{\expand{\witness}}{\tau}{j}$ is a

954: partial order on $L(\tau,j)$ for all $1 \leq j \leq m$ from

955: Lemma~\ref{lemma:partial-order}.

956: The union of the $n$ total orders $\procorder{\msc}{\tau}{i}$ and

957: $m$ partial orders $\locorder{\expand{\witness}}{\tau}{j}$ imposes a

958: graph on $\domain(\tau)$.

959: The acyclicity of this graph is a necessary and sufficient condition

960: for the trace $\tau$ to satisfy sequential consistency.

961: We define a function $G$ that for every

962: witness $\witness$ returns a function $G(\witness)$.

963: The function $G(\witness)$

964: maps every unambiguous trace $\tau$ of $\msystem(n,m,v)$ to the graph

965: $\zug{\domain(\tau),

966: \bigcup_{1 \leq i \leq n} \procorder{\msc}{\tau}{i} \cup

967: \bigcup_{1 \leq j \leq m} \locorder{\expand{\witness}}{\tau}{j}}$.

968: The work of Gibbons and Korach \cite{GibbonsKorach97} defines a constraint

969: graph on the memory events of a run that is similar to $G(\witness)(\tau)$.

970:

971: \begin{theorem}

972: \label{thm:witness-sat}

973: For all $n,m,v \geq 1$, every unambiguous trace of

974: $\msystem(n,m,v)$ is sequentially consistent iff

975: there is a witness $\witness$ such that the graph $G(\witness)(\tau)$

976: is acyclic for every unambiguous trace $\tau$ of $\msystem(n,m,v)$.

977: \end{theorem}

978: % \begin{theorem}

979: % \label{thm:witness-sat}

980: % For all $n,m \geq 1$, we have that

981: % every trace of $\msystem(n,m)$ satisfies sequential consistency iff

982: % there is a witness $\witness$ such that

983: % for every unambiguous trace $\tau$ of  $\msystemu(n,m)$

984: % the graph $G(\witness)(\tau)$ is acyclic.

985: % \end{theorem}

986: \begin{proof}

987: ($\Rightarrow$)

988: Suppose $\tau$ is an unambiguous trace of $\msystem(n,m,v)$.

989: Then $\tau$ satisfies sequential consistency.

990: There is a permutation $f$ on $\nat_{|\tau|}$ such that conditions C1

991: and C2 are satisfied.

992: For all $1 \leq i \leq m$, define $\witness(\tau,i)$ to be the total

993: order on $L^w(\tau,i)$ such that for all $x,y \in L^w(\tau,i)$,

994: we have $\zug{x,y} \in \witness(\tau,i)$ iff $f(x) < f(y)$.

995: We show that the permutation $f$ is a linearization of the vertices in

996: $G(\witness)(\tau)$ that preserves all the edges.

997: In other words, if

998: $\zug{x,y} \in \procorder{\msc}{\tau}{i}$ for some $1 \leq i \leq

999: n$ or $\zug{x,y} \in \locorder{\expand{\witness}}{\tau}{j}$ for some

1000: $1 \leq j \leq m$, then $f(x) < f(y)$.

1001: If $\zug{x,y} \in \procorder{\msc}{\tau}{i}$ then we have from C1

1002: that $f(x) < f(y)$.

1003: We show below that if

1004: $\zug{x,y} \in \locorder{\expand{\witness}}{\tau}{j}$ then

1005: $f(x) < f(y)$.

1006:

1007: Let $\tau' = \tau_{f^{-1}(1)} \tau_{f^{-1}(2)} \ldots \tau_{f^{-1}(|\tau|)}$.

1008: For all $1 \leq u \leq |\tau|$ we have that $\tau(u) = \tau'(f(u))$.

1009: We first show for all $a \in L^w(\tau,j)$ and $x \in L(\tau,j)$, if

1010: $\data(\tau(a)) = \data(\tau(x))$ then $f(a) \leq f(x)$.

1011: Since $\tau$ is unambiguous, we have that

1012: $\data(\tau(a)) = \data(\tau(x)) \neq 0$.

1013: Therefore $\data(\tau'(f(a))) = \data(\tau'(f(x))) \neq 0$.

1014: We have that either $\op(\tau'(f(x))) = \rd$ or $x = a$.

1015: In the first case $f(a) \in \upto(\tau',f(x))$ which implies

1016: that $f(a) < f(x)$, and in the second case $f(a) = f(x)$.

1017: Therefore $f(a) \leq f(x)$.

1018:

1019: If $\zug{x,y} \in \locorder{\expand{\witness}}{\tau}{j}$ then we have

1020: three cases.

1021: In each case, we show that $f(x) < f(y)$.

1022: \begin{enumerate}

1023: \item

1024: $\data(\tau(x)) = \data(\tau(y))$, $\op(\tau(x)) = \wrt$, and

1025: $\op(\tau(y)) = \rd$.

1026: Since $\tau$ is unambiguous $\data(\tau(x)) = \data(\tau(y)) \neq 0$.

1027: We get that $\data(\tau'(f(y))) \neq 0$ which means that

1028: $\upto(\tau',f(y)) \neq \emptyset$ and $f(x) \in \upto(\tau',f(y))$.

1029: Therefore $f(x) < f(y)$.

1030: \item

1031: $\data(\tau(x)) = 0$ and $\data(\tau(y)) \neq 0$.

1032: Since $x \neq y$ we have $f(x) \neq f(y)$.

1033: Suppose $f(y) < f(x)$.

1034: Since $\data(\tau(y)) \neq 0$ there is $b \in L^w(\tau,j)$ such that

1035: $\data(\tau(b)) = \data(\tau(y))$.

1036: Therefore we have that $f(b) \leq f(y) < f(x)$.

1037: Therefore the set $\upto(\tau',f(x)) \neq \emptyset$.

1038: Since $\tau'$ is unambiguous and $\data(\tau'(f(x))) = 0$

1039: we have a contradiction.

1040: \item

1041: $\exists a,b \in L^w(\tau,j)$ such that

1042:     $\zug{a,b} \in \locorder{\witness}{\tau}{j}$,

1043:     $\data(\tau(a)) = \data(\tau(x))$, and

1044:     $\data(\tau(b)) = \data(\tau(y))$.

1045: We show $f(x) < f(y)$ by contradiction.

1046: Suppose $f(x) = f(y)$.

1047: Then $x = y$ and $\data(\tau(a)) = \data(\tau(b))$.

1048: Since $\tau$ is unambiguous we get $a = b$ which contradicts

1049: $\zug{a,b} \in \locorder{\witness}{\tau}{j}$.

1050: Suppose $f(y) < f(x)$.

1051: % Since $\tau$ is unambiguous, we have that

1052: % $\data(\tau(a)) = \data(\tau(x)) \neq 0$.

1053: % Therefore $\data(\tau'(f^{-1}(a))) = \data(\tau'(f^{-1}(x))) \neq 0$.

1054: % We have that either $\op(\tau'(f^{-1}(x))) = \rd$ or $x = a$.

1055: % In the first case $f^{-1}(a) \in \upto(\tau',f^{-1}(x))$ which implies

1056: % that $f^{-1}(a) < f^{-1}(x)$, and in the second case

1057: % $f^{-1}(a) = f^{-1}(x)$.

1058: We have that $f(a) \leq f(x)$ and

1059: % Similarly, we have that

1060: % $\data(\tau(b)) = \data(\tau(y)) \neq 0$,

1061: % $\data(\tau'(f^{-1}(b))) = \data(\tau'(f^{-1}(y))) \neq 0$, and

1062: $f(b) \leq f(y)$.

1063: Since $\zug{a,b} \in \locorder{\witness}{\tau}{j}$, we have $f(a)

1064: < f(b)$ from the definition of $\witness$.

1065: Thus we have $f(a) < f(b) \leq f(y) < f(x)$

1066: Therefore $f(a) \neq \maximum{\upto(\tau',f(x))}$.

1067: Since $\tau'$ is unambiguous and $\data(\tau'(f(a))) =

1068: \data(\tau'(f(x)))$ we have a contradiction.

1069: \end{enumerate}

1070:

1071: ($\Leftarrow$)

1072: Suppose $\tau$ is a trace of $\msystem(n,m,v)$.

1073: Then there is a witness $\witness$ such that

1074: $G(\witness)(\tau)$ is acyclic.

1075: Let $f$ be a linearization of the vertices in

1076: $G(\witness)(\tau)$ that respects all edges.

1077: Then C1 is  satisfied.

1078: Let $\tau'$ denote

1079: $\tau_{f^{-1}(1)} \tau_{f^{-1}(2)} \ldots \tau_{f^{-1}(|\tau|)}$.

1080: Then we have that $\tau'(x) = \tau(f^{-1}(x))$ for all $1 \leq x \leq |\tau'|$.

1081: For any $1 \leq x \leq |\tau'|$, suppose $\loc(\tau'(x)) = j$.

1082: There are two cases.

1083: \begin{enumerate}

1084: \item

1085: $\data(\tau'(x)) = 0$.

1086: We show that $\upto(\tau',x) = \emptyset$.

1087: Consider any vertex $1 \leq y \leq |\tau'|$ such that

1088: $\op(\tau'(y)) = \wrt$ and $\loc(\tau'(y)) = j$.

1089: Then $\tau(f^{-1}(x)) = 0$ and $\tau(f^{-1}(y)) \neq 0$.

1090: Therefore $\zug{f^{-1}(x),f^{-1}(y)} \in \locorder{\expand{\witness}}{\tau}{j}$ and

1091: $\zug{f^{-1}(x),f^{-1}(y)}$ is an edge in $G(\witness)(\tau)$.

1092: Therefore $f(f^{-1}(x)) < f(f^{-1}(y))$ or $x < y$.

1093: Thus we have that $\upto(\tau',x) = \emptyset$.

1094: \item

1095: $\data(\tau'(x)) \neq 0$.

1096: We show that $\upto(\tau',x) \neq \emptyset$ and

1097: if $y = \maximum{\upto(\tau',x)}$ then $\data(\tau'(x)) = \data(\tau'(y))$.

1098: From Assumption~\ref{ass:causality}, there is $a \in

1099: L^w(\tau',j)$ such that $\data(\tau'(a)) = \data(\tau'(x))$

1100: and since $\tau'$ is unambiguous this write is unique.

1101: Therefore $\data(\tau(f^{-1}(a))) = \data(\tau(f^{-1}(x)))$.

1102: Either $f^{-1}(a) = f^{-1}(x)$ or $\op(\tau(f^{-1}(x))) = \rd$ in which case

1103: $\zug{f^{-1}(a),f^{-1}(x)} \in \locorder{\expand{\witness}}{\tau}{j}$.

1104: In both cases, we have $a \leq x$ and therefore $\upto(\tau',x) \neq \emptyset$.

1105: Consider any vertex $1 \leq b \leq |\tau'|$ such that

1106: $\op(\tau'(b)) = \wrt$, $\loc(\tau'(b)) = j$, and $a < b$.

1107: Then $\zug{f^{-1}(a),f^{-1}(b)} \in \locorder{\witness}{\tau}{j}$ and

1108: $\zug{f^{-1}(x),f^{-1}(b)} \in \locorder{\expand{\witness}}{\tau}{j}$.

1109: Therefore $x < b$.

1110: We thus get $a = \maximum{\upto(\tau',x)}$.

1111: \end{enumerate}

1112: \end{proof}

1113:

1114: Theorems~\ref{thm:red-to-unamb} and~\ref{thm:witness-sat} can be combined easily

1115: to yield the following theorem.

1116: \begin{corollary}

1117: \label{thm:unamb-witness-sat}

1118: For all $n,m \geq 1$, every trace of

1119: $\msystem(n,m)$ is sequentially consistent iff

1120: there is a witness $\witness$ such that the graph $G(\witness)(\tau)$

1121: is acyclic for every unambiguous trace $\tau$ of $\msystem(n,m)$.

1122: \end{corollary}

1123:

1124: \subsection{Simple witness}

1125: \label{sec:simple-witness}

1126: Corollary~\ref{thm:unamb-witness-sat} suggests that in order to prove

1127: that the memory system $\msystem(n,m,v)$ is sequentially consistent,

1128: we produce a witness $\witness$ and show for every unambiguous

1129: trace $\tau$ of $\msystem(n,m)$ that the graph $G(\witness)(\tau)$ is

1130: acyclic.

1131: But the construction of the witness is still left to the verifier.

1132: In this section, we argue that a simple witness, which orders the

1133: write events to a location exactly in the order in which they occur,

1134: suffices for a number of memory systems occurring in practice.

1135: Formally, a witness $\witness$ is {\em simple\/} if for all traces $\tau$ of

1136: $\msystem(n,m,v)$ and locations $1 \leq i \leq m$, we have

1137: $\zug{x,y} \in \locorder{\witness}{\tau}{i}$ iff $x < y$

1138: for all $x,y \in L^w(\tau,i)$.

1139:

1140: Consider the memory system of Figure~\ref{fig:example}.

1141: We argue informally that a simple witness is a good witness

1142: for this memory system.

1143: Permission to perform writes flows from one cache to another by means

1144: of the $\EXCACK$ message.

1145: Note that for each location $j$, the variable $\owner[j]$ is set to

1146: $0$ (which is not the id of any processor) when an $\EXCACK$ message

1147: is generated.

1148: When the $\EXCACK$ message is received at the destination (by the

1149: $\UPDATE$ event), the destination moves to $\EXC$ state and sets

1150: $\owner[j]$ to the destination id.

1151: A new $\EXCACK$ message is generated only when $\owner[j] \neq 0$.

1152: Thus, the memory system has the property that

1153: each memory location can be held in $\EXC$ state by at most one cache.

1154: Moreover, writes to the location $j$ can happen only when the cache has

1155: the location in $\EXC$ state.

1156: Therefore, at most one cache can be performing writes to a memory

1157: location.

1158: This indicates that the logical order of the write events is the same

1159: as their temporal order.

1160: In other words, a simple witness is the correct witness for

1161: demonstrating that a run is sequentially consistent.

1162:

1163: In general, for any memory system in which at any time at most one

1164: processor can perform write events to a location, a simple witness is

1165: very likely to be the correct witness.

1166: Most memory systems occurring in practice \cite{LLG90,KOH94,BDH99,BGM00} have

1167: this property.

1168: In Section~\ref{sec:model-check}, we describe a model checking

1169: algorithm to verify the correctness of a memory system with respect to

1170: a simple witness.

1171: If a simple witness is indeed the desired witness and the memory

1172: system is designed correctly, then our algorithm will be able to verify

1173: its correctness.

1174: Otherwise, it will produce an error trace suggesting to the verifier

1175: that either there is an error in the memory system or the simple

1176: witness is not the correct witness.

1177: Thus our method for checking sequential consistency is clearly sound.

1178: We have argued that it is also complete on most shared-memory systems that occur in

1179: practice.

1180:

1181: \section{Nice cycle reduction}

1182: \label{sec:cycle}

1183: For some $n,m,v \geq 1$, let $\msystem(n,m,v)$ be a memory system

1184: and $\witness$ a witness for it.

1185: Let $\tau$ be an unambiguous trace of $\msystem(n,m,v)$.

1186: Corollary~\ref{thm:unamb-witness-sat} tells us that the absence of cycles in

1187: the graphs $G(\witness)(\tau)$ generated by the unambiguous traces of

1188: $\msystem(n,m)$ is a necessary and sufficient condition for every trace of

1189: $\msystem(n,m)$ to be sequentially consistent.

1190: In this section, we show that it suffices to detect a special class of

1191: cycles called nice cycles.

1192: In Section~\ref{sec:model-check}, we will show that detection of nice

1193: cycles can be performed by model checking.

1194:

1195: We fix some $k \geq 1$ and

1196: use the symbol $\oplus$ to denote addition over

1197: the additive group with elements $\nat_k$ and identity element $k$.

1198: A {\em $k$-nice\/} cycle in $G(\witness)(\tau)$ is a sequence

1199: $u_1,v_1,\ldots,u_k,v_k$ of distinct vertices in

1200: $\nat_{|\tau|}$ such that the following conditions are true.

1201: \begin{enumerate}

1202: \item

1203: For all $1 \leq x \leq k$, we have $\zug{u_x,v_x} \in

1204: \procorder{\msc}{\tau}{i}$

1205: for some $1 \leq i \leq n$ and $\zug{v_x,u_{x\oplus1}} \in

1206: \locorder{\expand{\witness}}{\tau}{j}$

1207: for some $1 \leq j \leq m$.

1208: \item

1209: For all $1 \leq x < y \leq k$ and for all $1 \leq i,j \leq n$,

1210: if $\zug{u_x,v_x} \in \procorder{\msc}{\tau}{i}$ and

1211: $\zug{u_y,v_y} \in \procorder{\msc}{\tau}{j}$ then $i \neq j$.

1212: \item

1213: For all $1 \leq x < y \leq k$ and for all $1 \leq i,j \leq m$,

1214: if $\zug{v_x,u_{x\oplus1}} \in \locorder{\expand{\witness}}{\tau}{i}$ and

1215: $\zug{v_y,u_{y\oplus1}} \in \locorder{\expand{\witness}}{\tau}{j}$ then $i \neq j$.

1216: \end{enumerate}

1217: In a $k$-nice cycle, no two edges belong to the relation

1218: $\procorder{\msc}{\tau}{i}$ for any processor $i$.

1219: Similarly, no two edges belong to the relation

1220: $\locorder{\expand{\witness}}{\tau}{j}$ for any location $j$.

1221: The above definition also implies that if a cycle is $k$-nice then

1222: $k \leq \minimum{\{n,m\}}$.

1223: % The following lemma allows us to restrict our search to

1224: % $k$-nice cycles for some $1 \leq k \leq \minimum{\{n,m\}}$

1225: % while determining whether $G(\witness)(\tau)$ is acyclic.

1226:

1227: \begin{theorem}

1228: \label{thm:cycle-nice}

1229: If the graph $G(\witness)(\tau)$ has a cycle, then it has a

1230: $k$-nice cycle for some $k$ such that $1 \leq k \leq \minimum{\{n,m\}}$.

1231: \end{theorem}

1232: \begin{proof}

1233: Suppose $G(\witness)(\tau)$ has no $k$-nice cycles

1234: but does have a cycle.

1235: Consider the shortest such cycle $u_1,\ldots,u_l$ where

1236: $l \geq 1$.

1237: For this proof, we denote by $\oplus$ addition over

1238: the additive group with elements $\nat_l$ and identity element $l$.

1239: Then for all $1 \leq x \leq l$ either

1240: $\zug{u_x,u_{x\oplus1}} \in \procorder{\msc}{\tau}{i}$ for some $1

1241: \leq i \leq n$ or

1242: $\zug{u_x,u_{x\oplus1}} \in \locorder{\expand{\witness}}{\tau}{i}$

1243: for some $1 \leq i \leq m$.

1244:

1245: Since the cycle $u_1,\ldots,u_l$ is not $k$-nice for any $k$, there are

1246: $1 \leq a < b \leq l$ such that either

1247: (1)~$\zug{u_a,u_{a\oplus1}} \in \procorder{\msc}{\tau}{i}$

1248: and $\zug{u_b,u_{b\oplus1}} \in \procorder{\msc}{\tau}{i}$

1249: for some $1 \leq i \leq n$, or

1250: (2)~$\zug{u_a,u_{a\oplus1}} \in \locorder{\expand{\witness}}{\tau}{i}$

1251: and $\zug{u_b,u_{b\oplus1}} \in \locorder{\expand{\witness}}{\tau}{i}$

1252: for some $1 \leq i \leq m$.

1253:

1254: Case (1).

1255: We have from the definition of $\msc$ that $u_a < u_{a\oplus1}$ and

1256: $u_b < u_{b\oplus1}$.

1257: Either $u_a < u_b$ or $u_b < u_a$.

1258: If $u_a < u_b$ then $u_a < u_{b\oplus1}$ or

1259: $\zug{u_a,u_{b\oplus1}} \in \procorder{\msc}{\tau}{i}$.

1260: If $u_b < u_a$ then $u_b < u_{a\oplus1}$ or

1261: $\zug{u_b,u_{a\oplus1}} \in \procorder{\msc}{\tau}{i}$.

1262: In both cases, we have a contradiction since the cycle can be

1263: made shorter.

1264:

1265: Case (2).

1266: From Lemma~\ref{lemma:almost-ord}, either

1267: $\zug{u_a,u_b} \in \locorder{\expand{\witness}}{\tau}{i}$ or

1268: $\zug{u_b,u_{a\oplus1}} \in \locorder{\expand{\witness}}{\tau}{i}$.

1269: In both cases, we have a contradiction since the cycle can be made

1270: shorter.

1271: \end{proof}

1272:

1273:

1274: \section{Symmetry reduction}

1275: \label{sec:symmetry}

1276: Suppose $\msystem(n,m,v)$ is a memory system for some $n,m,v \geq 1$.

1277: In this section, we use symmetry arguments to further reduce the class

1278: of cycles that need to be detected in constraint graphs.

1279: Each $k$-nice cycle has $2 \times k$ edges with one edge each for $k$

1280: different processors and $k$ different locations.

1281: These edges can potentially occur in any order

1282: yielding a set of isomorphic cycles.

1283: But if the memory system $\msystem(n,m,v)$

1284: is symmetric with respect to processor and memory location ids,

1285: presence of any one of

1286: the isomorphic nice cycles implies the existence of a nice

1287: cycle in which the edges are arranged in a canonical order.

1288: Thus, it suffices to search for a cycle with edges in a

1289: canonical order.

1290:

1291: % Hence, we make Assumptions~\ref{ass:msystem-proc-sym}

1292: % and~\ref{ass:msystem-loc-sym} about $\msystem$,

1293: % % Assumptions~\ref{ass:mmodel-proc-sym}

1294: % % and~\ref{ass:mmodel-loc-sym} about $\mmodel$,

1295: % and Assumptions~\ref{ass:witness-proc-sym} and~\ref{ass:witness-loc-sym}

1296: % about $\witness$.

1297: % For any $k \geq 1$, we say that a function

1298: % $\lambda:\nat\rightarrow\nat$ is a permutation on $\nat_k$

1299: % if $\lambda$ restricted to $\nat_k$ maps one-to-one to $\nat_k$.

1300: % For any permutation $\lambda$ we define functions $\lambda^p$ and

1301: % $\lambda^l$ in the following way.

1302: % For all $e=\zug{a,b,c,d} \in \memevent(n,m,v)$, we define

1303: % $\lambda^p(e) = \zug{a,\lambda(b),c,d}$ and

1304: % $\lambda^l(e) = \zug{a,b,\lambda(c),d}$.

1305: % In other words, the function $\lambda^p$ permutes the processor ids

1306: % and the function $\lambda^l$ permutes the location ids

1307: % of memory events.

1308: % The functions $\lambda^p$ and $\lambda^l$ are extended to sequences in

1309: % $\memevent(n,m,v)^*$ in the natural way.

1310:

1311: We discuss processor symmetry in Section~\ref{sec:proc-sym} and

1312: location symmetry in Section~\ref{sec:loc-sym}.

1313: We combine processor and location symmetry to demonstrate the

1314: reduction from nice cycles to canonical nice cycles in

1315: Section~\ref{sec:comb}.

1316:

1317: \subsection{Processor symmetry}

1318: \label{sec:proc-sym}

1319: For any permutation $\lambda$ on $\nat_n$, the function

1320: $\lambda^p$ on $\memevent(n,m,v)$ permutes the processor ids of events

1321: according to $\lambda$.

1322: Formally, for all $e=\zug{a,b,c,d} \in \memevent(n,m,v)$, we define

1323: $\lambda^p(e) = \zug{a,\lambda(b),c,d}$.

1324: The function $\lambda^p$ is extended to sequences in

1325: $\memevent(n,m,v)^*$ in the natural way.

1326:

1327: \begin{assumption}[Processor symmetry]

1328: \label{ass:msystem-proc-sym}

1329: For every permutation $\lambda$ on $\nat_n$ and for all traces $\tau$

1330: of the memory system $\msystem(n,m,v)$, we have that $\lambda^p(\tau)$

1331: is a trace of $\msystem(n,m,v)$.

1332: \end{assumption}

1333:

1334: We argue informally that the memory system in Figure~\ref{fig:example}

1335: satisfies Assumption~\ref{ass:msystem-proc-sym}.

1336: The operations performed by the various parameterized actions on

1337: the state variables that store processor ids are symmetric.

1338: Suppose $s$ is a state of the system.

1339: We denote by $\lambda^p(s)$ the state obtained by permuting

1340: the values of variables that store processors ids according to $\lambda$.

1341: Then, for example, if the action $\UPDATE(i)$ in some state $s$ yields state

1342: $t$, then the action $\UPDATE(\lambda(i))$ in state $\lambda^p(s)$

1343: yields the state $\lambda^p(t)$.

1344: Thus, from any run $\sigma$ we can construct another run

1345: $\lambda^p(\sigma)$.

1346: If a shared-memory system is described with symmetric types, such as

1347: scalarsets \cite{IpDill96}, used to model variables containing

1348: processor ids, then it has the property of processor symmetry by

1349: construction.

1350: % The variables of a memory system containing processor ids can be

1351: % modeled with symmetric types, such as scalarsets \cite{IpDill96}.

1352: % Such systems have the property of processor symmetry by construction.

1353:

1354: The following lemma states that the sequential consistency memory

1355: model is symmetric with respect to processor ids.

1356: It states that two events in a trace $\tau$ ordered by sequential

1357: consistency remain ordered under any permutation of processor ids.

1358: \begin{lemma}

1359: \label{lemma:mmodel-proc-sym}

1360: Suppose $\lambda$ is a permutation on $\nat_n$.

1361: Suppose $\tau$ and $\tau'$ are traces of $\msystem(n,m,v)$ such that

1362: $\tau' = \lambda^p(\tau)$.

1363: Then for all $1 \leq x,y \leq |\tau|$, and for all $1 \leq i \leq n$,

1364: we have that $\zug{x,y} \in \procorder{\msc}{\tau}{i}$ iff

1365: $\zug{x,y} \in \procorder{\msc}{\tau'}{\lambda(i)}$.

1366: \end{lemma}

1367: \begin{proof}

1368: For all $1 \leq x,y \leq |\tau|$ and for all $1 \leq i \leq n$, we have that

1369: \[\begin{array}{ll}

1370: & \zug{x,y} \in \procorder{\msc}{\tau}{i} \\

1371: \Leftrightarrow

1372: & \proc(\tau(x)) = \proc(\tau(y)) = i~\mathrm{and}~x < y \\

1373: \Leftrightarrow

1374: & \proc(\tau'(x)) = \proc(\tau'(y)) = \lambda(i)~\mathrm{and}~x < y \\

1375: \Leftrightarrow

1376: & \zug{x,y} \in \procorder{\msc}{\tau'}{\lambda(i)}.

1377: \end{array}\]

1378: \end{proof}

1379:

1380: The following lemma states that the partial order $\expand{\witness}$

1381: obtained from a simple witness $\witness$ is

1382: symmetric with respect to processor ids.

1383: It states that two events to location $i$

1384: ordered by $\locorder{\expand{\witness}}{\tau}{i}$

1385: in a trace $\tau$ remain

1386: ordered under any permutation of processor ids.

1387: \begin{lemma}

1388: \label{lemma:expand-witness-proc-sym}

1389: Suppose $\witness$ is a simple witness for the memory system

1390: $\msystem(n,m,v)$ and $\lambda$ is a permutation on $\nat_n$.

1391: Suppose $\tau$ and $\tau'$ are unambiguous traces of $\msystem(n,m,v)$

1392: such that $\tau' = \lambda^p(\tau)$.

1393: Then for all $1 \leq x,y \leq |\tau|$ and for all $1 \leq i \leq m$,

1394: we have that $\zug{x,y} \in \locorder{\expand{\witness}}{\tau}{i}$ iff

1395: $\zug{x,y} \in \locorder{\expand{\witness}}{\tau'}{i}$.

1396: \end{lemma}

1397: \begin{proof}

1398: We have $\zug{x,y} \in \locorder{\witness}{\tau}{i}$ iff $x < y$ iff

1399: $\zug{x,y} \in \locorder{\witness}{\tau'}{i}$.

1400: From the definition of $\locorder{\expand{\witness}}{\tau}{i}$ we have the

1401: following three cases.

1402: \begin{enumerate}

1403: \item

1404: $\data(\tau(x)) = \data(\tau(y))$, $\op(\tau(x)) = \wrt$,

1405: $\op(\tau(y)) = \rd$ iff

1406: $\data(\tau'(x)) = \data(\tau'(y))$, $\op(\tau'(x)) = \wrt$,

1407: $\op(\tau'(y)) = \rd$.

1408: \item

1409: $\data(\tau(x)) = 0$ and $\data(\tau(y)) \neq 0$ iff

1410: $\data(\tau'(x)) = 0$ and $\data(\tau'(y)) \neq 0$.

1411: \item

1412: $\exists a,b \in L^w(\tau,i)$ such that

1413:     $a < b$,

1414:     $\data(\tau(a)) = \data(\tau(x))$,

1415:     $\data(\tau(b)) = \data(\tau(y))$ iff

1416: $\exists a,b \in L^w(\tau',i)$ such that

1417:     $a < b$,

1418:     $\data(\tau'(a)) = \data(\tau'(x))$,

1419:     $\data(\tau'(b)) = \data(\tau'(y))$.

1420: \end{enumerate}

1421: \end{proof}

1422:

1423: \subsection{Location symmetry}

1424: \label{sec:loc-sym}

1425: For any permutation $\lambda$ on $\nat_m$, the function

1426: $\lambda^l$ on $\memevent(n,m,v)$ permutes the location ids of events

1427: according to $\lambda$.

1428: Formally, for all $e=\zug{a,b,c,d} \in \memevent(n,m,v)$, we define

1429: $\lambda^l(e) = \zug{a,b,\lambda(c),d}$.

1430: The function $\lambda^l$ is extended to sequences in

1431: $\memevent(n,m,v)^*$ in the natural way.

1432:

1433: \begin{assumption}[Location symmetry]

1434: \label{ass:msystem-loc-sym}

1435: For every permutation $\lambda$ on $\nat_m$ and for all traces

1436: $\tau$ of the memory system $\msystem(n,m,v)$,

1437: we have that $\lambda^l(\tau)$ is a trace of $\msystem(n,m,v)$.

1438: \end{assumption}

1439:

1440: We can argue informally that the memory system in

1441: Figure~\ref{fig:example} satisfies

1442: Assumption~\ref{ass:msystem-loc-sym} also.

1443: The operations performed by the various parameterized actions on

1444: the state variables that store location ids are symmetric.

1445: Suppose $s$ is a state of the system.

1446: We denote by $\lambda^l(s)$ the state obtained by permuting

1447: the values of variables that store location ids according to $\lambda$.

1448: Then, for example, if the action $\UPDATE(i)$ in some state $s$ yields state

1449: $t$, then the action $\UPDATE(\lambda(i))$ in state $\lambda^l(s)$

1450: yields the state $\lambda^l(t)$.

1451: If scalarsets are used for modeling variables containing location ids,

1452: the shared-memory system will have the property of location symmetry

1453: by construction.

1454:

1455: The following lemma states that the sequential consistency memory

1456: model is symmetric with respect to location ids.

1457: It states that two events in a trace $\tau$ ordered by sequential

1458: consistency remain ordered under any permutation of location ids.

1459:

1460: \begin{lemma}

1461: \label{lemma:mmodel-loc-sym}

1462: Suppose $\lambda$ is a permutation on $\nat_m$.

1463: Suppose $\tau$ and $\tau'$ are traces of $\msystem(n,m,v)$ such that

1464: $\tau' = \lambda^l(\tau)$.

1465: Then for all $1 \leq x,y \leq |\tau|$, and for all $1 \leq i \leq n$,

1466: we have that $\zug{x,y} \in \procorder{\msc}{\tau}{i}$ iff

1467: $\zug{x,y} \in \procorder{\msc}{\tau'}{i}$.

1468: \end{lemma}

1469: \begin{proof}

1470: For all $1 \leq x,y \leq |\tau|$ and for all $1 \leq i \leq m$, we have that

1471: \[\begin{array}{ll}

1472: & \zug{x,y} \in \procorder{\msc}{\tau}{i} \\

1473: \Leftrightarrow

1474: & \proc(\tau(x)) = \proc(\tau(y)) = i~\mathrm{and}~x < y \\

1475: \Leftrightarrow

1476: & \proc(\tau'(x)) = \proc(\tau'(y)) = i~\mathrm{and}~x < y \\

1477: \Leftrightarrow

1478: & \zug{x,y} \in \procorder{\msc}{\tau'}{i}.

1479: \end{array}\]

1480: \end{proof}

1481:

1482: The following lemma states that the partial order $\expand{\witness}$

1483: obtained from a simple witness $\witness$ is

1484: symmetric with respect to location ids.

1485: It states that two events to location $i$

1486: ordered by $\locorder{\expand{\witness}}{\tau}{i}$

1487: in a trace $\tau$ remain

1488: ordered under any permutation of location ids.

1489:

1490: \begin{lemma}

1491: \label{lemma:expand-witness-loc-sym}

1492: Suppose $\witness$ is a simple witness for the memory system

1493: $\msystem(n,m,v)$ and $\lambda$ is a permutation on $\nat_m$.

1494: Suppose $\tau$ and $\tau'$ are unambiguous traces of $\msystem(n,m,v)$ such that

1495: $\tau' = \lambda^l(\tau)$.

1496: Then for all $1 \leq x,y \leq |\tau|$ and for all $1 \leq i \leq m$,

1497: we have that $\zug{x,y} \in \locorder{\expand{\witness}}{\tau}{i}$ iff

1498: $\zug{x,y} \in \locorder{\expand{\witness}}{\tau'}{\lambda(i)}$.

1499: \end{lemma}

1500: \begin{proof}

1501: We have $\zug{x,y} \in \locorder{\witness}{\tau}{i}$ iff $x < y$ iff

1502: $\zug{x,y} \in \locorder{\witness}{\tau'}{\lambda(i)}$.

1503: From the definition of $\locorder{\expand{\witness}}{\tau}{i}$ we have the

1504: following three cases.

1505: \begin{enumerate}

1506: \item

1507: $\data(\tau(x)) = \data(\tau(y))$, $\op(\tau(x)) = \wrt$,

1508: $\op(\tau(y)) = \rd$ iff

1509: $\data(\tau'(x)) = \data(\tau'(y))$, $\op(\tau'(x)) = \wrt$,

1510: $\op(\tau'(y)) = \rd$.

1511: \item

1512: $\data(\tau(x)) = 0$ and $\data(\tau(y)) \neq 0$ iff

1513: $\data(\tau'(x)) = 0$ and $\data(\tau'(y)) \neq 0$.

1514: \item

1515: $\exists a,b \in L^w(\tau,i)$ where %such that

1516:     $a < b$,

1517:     $\data(\tau(a)) = \data(\tau(x))$, and

1518:     $\data(\tau(b)) = \data(\tau(y))$ iff

1519: $\exists a,b \in L^w(\tau',\lambda(i))$ where %such that

1520:     $a < b$,

1521:     $\data(\tau'(a)) = \data(\tau'(x))$, and

1522:     $\data(\tau'(b)) = \data(\tau'(y))$.

1523: \end{enumerate}

1524: \end{proof}

1525:

1526: \subsection{Combining processor and location symmetry}

1527: \label{sec:comb}

1528: We fix some $k \geq 1$ and

1529: use the symbol $\oplus$ to denote addition over

1530: the additive group with elements $\nat_k$ and identity element $k$.

1531: A $k$-nice cycle $u_1,v_1,\ldots,u_k,v_k$

1532: is {\em canonical\/} if

1533: $\zug{u_x,v_x} \in \procorder{\msc}{\tau}{x}$ and

1534: $\zug{v_x,u_{x\oplus1}}~\in~\locorder{\expand{\witness}}{\tau}{x\oplus1}$

1535: for all $1 \leq x \leq k$.

1536: In other words, the processor edges in a canonical nice cycle are

1537: arranged in increasing order of processor ids.

1538: Similarly, the location edges are arranged in increasing order of

1539: location ids.

1540: The following theorem claims that if the constraint graph of a run has

1541: a nice cycle then there is some run with a canonical nice cycle as well.

1542: % The number of $k$-nice cycles isomorphic to each other is equal to

1543: % $\fact{(k-1)} \times \fact{k}$.

1544: % varies as the product of the number of permutations

1545: % of the processors and the number of permutations of memory locations.

1546:

1547: \begin{theorem}

1548: \label{thm:cycle-canonical}

1549: Suppose $\witness$ is a simple witness for the memory system

1550: $\msystem(n,m,v)$.

1551: Let $\tau$ be an unambiguous trace of $\msystem(n,m,v)$.

1552: If the graph $G(\witness)(\tau)$ has a $k$-nice cycle, then

1553: there is an unambiguous trace $\tau''$ of $\msystem(n,m,v)$ such that

1554: $G(\witness)(\tau'')$ has a canonical $k$-nice cycle.

1555: \end{theorem}

1556: \begin{proof}

1557: Let $u_1,v_1,\ldots,u_k,v_k$ be a

1558: $k$-nice cycle in $G(\witness)(\tau)$.

1559: Let $1 \leq i_1,\ldots,i_k \leq n$ and $1 \leq j_1,\ldots,j_k

1560: \leq m$ be such that $\zug{u_x,v_x} \in \procorder{\msc}{\tau}{i_x}$ and

1561: $\zug{v_x,u_{x\oplus1}} \in \locorder{\expand{\witness}}{\tau}{j_{x\oplus1}}$

1562: for all $1 \leq x \leq k$.

1563: Let $\alpha$ be a permutation on $\nat_n$ that

1564: maps $i_x$ to $x$ for all $1 \leq x \leq k$.

1565: Then from Assumption~\ref{ass:msystem-proc-sym} there is a trace

1566: $\tau'$ of $\msystem(n,m,v)$ such that $\tau' = \alpha^p(\tau)$.

1567: Let $\beta$ be a permutation on $\nat_m$ that maps $j_x$ to $x$

1568: for all $1 \leq x \leq k$.

1569: Then from Assumption~\ref{ass:msystem-loc-sym} there is a trace $\tau''$

1570: of $\msystem(n,m,v)$ such that $\tau'' = \beta^l(\tau')$.

1571: For all $1 \leq x \leq k$, we have that

1572: \[\begin{array}{lll}

1573: & \zug{u_x,v_x} \in \procorder{\msc}{\tau}{i_x} & \\

1574: \Leftrightarrow

1575: & \zug{u_x,v_x} \in \procorder{\msc}{\tau'}{\alpha(i_x)} =

1576: \procorder{\msc}{\tau'}{x}

1577: & \mathrm{from~Lemma~\ref{lemma:mmodel-proc-sym}} \\

1578: \Leftrightarrow

1579: & \zug{u_x,v_x} \in \procorder{\msc}{\tau''}{x}

1580: & \mathrm{from~Lemma~\ref{lemma:mmodel-loc-sym}}.

1581: \end{array}\]

1582: For all $1 \leq x \leq k$, we also have that

1583: \[\begin{array}{lll}

1584: & \zug{v_x,u_{x\oplus1}} \in \locorder{\expand{\witness}}{\tau}{j_{x\oplus1}} & \\

1585: \Leftrightarrow

1586: & \zug{v_x,u_{x\oplus1}} \in \locorder{\expand{\witness}}{\tau'}{j_{x\oplus1}}

1587: & \mathrm{from~Lemma~\ref{lemma:expand-witness-proc-sym}} \\

1588: \Leftrightarrow

1589: & \zug{v_x,u_{x\oplus1}} \in

1590: \locorder{\expand{\witness}}{\tau''}{\beta(j_{x\oplus1})}

1591: = \locorder{\expand{\witness}}{\tau''}{x\oplus1}

1592: & \mathrm{from~Lemma~\ref{lemma:expand-witness-loc-sym}}.

1593: \end{array}\]

1594: Therefore $u_1,v_1,\ldots,u_k,v_k$ is a canonical

1595: $k$-nice cycle in $G(\witness)(\tau'')$.

1596: \end{proof}

1597:

1598: Finally, Corollary~\ref{thm:unamb-witness-sat} and

1599: Theorems~\ref{thm:cycle-nice} and~\ref{thm:cycle-canonical} yield the following

1600: theorem.

1601: \begin{corollary}

1602: \label{thm:nice-canonical-cycle-sat}

1603: Suppose there is a simple

1604: witness $\witness$ such that for all unambiguous traces $\tau$ of

1605: $\msystem(n,m)$

1606: the graph $G(\witness)(\tau)$ does not have a canonical

1607: $k$-nice cycle for all $1 \leq k \leq \minimum{\{n,m\}}$.

1608: Then every trace of $\msystem(n,m)$ is sequentially consistent.

1609: \end{corollary}

1610:

1611: % Corollary~\ref{thm:nice-canonical-cycle-sat} presents a sufficient

1612: % condition for checking that every run in $\msystem(n,m,v)$ satisfies

1613: % sequential consistency.

1614: % Suppose we prove that for a simple witness $\witness$ and for every

1615: % unambiguous trace $\tau$ of $\msystem(n,m)$, there is no canonical $k$-nice

1616: % cycle in $G(\witness)(\tau)$ for any $1 \leq k \leq

1617: % \minimum{\{n,m\}}$.

1618: % Then every run in $\msystem(n,m)$ satisfies sequential consistency.

1619: % In particular, every run in $\msystem(n,m,v)$ satisfies sequential

1620: % consistency.

1621: % In the next section, we present a model checking algorithm for performing

1622: % this check.

1623:

1624: \section{Model checking memory systems}

1625: \label{sec:model-check}

1626:

1627: \begin{figure}

1628: \subfigure{

1629: \parbox{2in}{

1630: {\scriptsize

1631: \begin{tabbing}

1632: \hspace{0.2cm} \= xx \= xx \= xx \= xx \= xx \= xx \= \kill

1633: Automaton $\cons_k(j)$ for $1 \leq j \leq k$ \\

1634: States $\{a,b\}$ \\

1635: Initial state $a$ \\

1636: Accepting states $\{a,b\}$ \\

1637: Alphabet $\memevent(n,m,2)$ \\

1638: Transitions \\

1639: $[]~\neg (\op(e) = \wrt \wedge \loc(e) = j)$ \\

1640: \>\> $\rightarrow s' = s$ \\

1641: $[]~s = a \wedge \op(e) = \wrt \wedge \loc(e) = j \wedge \data(e) = 0$ \\

1642: \>\> $\rightarrow s' = a$ \\

1643: $[]~s = a \wedge \op(e) = \wrt \wedge \loc(e) = j \wedge \data(e) = 1$ \\

1644: \>\> $\rightarrow s' = b$ \\

1645: $[]~s = b \wedge \op(e) = \wrt \wedge \loc(e) = j \wedge \data(e) = 2$ \\

1646: \>\> $\rightarrow s' = b$

1647: \end{tabbing}

1648: }

1649: }

1650: }\quad

1651: \subfigure{

1652: \parbox{2in}{

1653: {\scriptsize

1654: \begin{tabbing}

1655: \hspace{0.2cm} \= xx \= xx \= xx \= xx \= xx \= xx \= \kill

1656: Automaton $\cons_k(j)$ for $k < j \leq m$ \\

1657: States $\{a\}$ \\

1658: Initial state $a$ \\

1659: Accepting states $\{a\}$ \\

1660: Alphabet $\memevent(n,m,2)$ \\

1661: Transitions \\

1662: $[]~\neg (\op(e) = \wrt \wedge \loc(e) = j) \vee \data(e) = 0$ \\

1663: \>\> $\rightarrow s' = s$

1664: \end{tabbing}

1665: }

1666: }

1667: }

1668: \caption{Automaton $\cons_k(j)$}

1669: \label{fig:cons}

1670: \end{figure}

1671:

1672: \begin{figure}

1673: {\scriptsize

1674: \begin{tabbing}

1675: \hspace{0.2cm} \= xx \= xx \= xx \= xx \= xx \= xx \= \kill

1676: Automaton $\checker_k(i)$ \\

1677: States $\{a,b,\err\}$ \\

1678: Initial state $a$ \\

1679: Accepting states $\{\err\}$ \\

1680: Alphabet $\memevent(n,m,2)$ \\

1681: Transitions \\

1682: $[]~s = a \wedge \proc(e) = i \wedge \loc(e) = i \wedge \data(e) \in \{1,2\}$ \\

1683: \>\> $ \rightarrow s' = b$ \\

1684: $[]~s = b \wedge \proc(e) = i \wedge \loc(e) = i\oplus1 \wedge

1685: (\data(e) = 0 \vee (\op(e) = \wrt \wedge \data(e) = 1))$ \\

1686: \>\> $\rightarrow s' = \err$ \\

1687: $[]~\mathit{otherwise}$ \\

1688: \>\> $\rightarrow s' = s$

1689: \end{tabbing}

1690: }

1691: \caption{Automaton $\checker_k(i)$}

1692: \label{fig:automata-sc}

1693: \end{figure}

1694:

1695: Suppose $\msystem(n,m,v)$ is a memory system for some $n,m,v \geq 1$.

1696: Let $\witness$ be a simple witness for $\msystem(n,m)$.

1697: In this section, we present a model checking algorithm that,

1698: given a $k$ such that $1 \leq k \leq \minimum{\{n,m\}}$,

1699: determines whether there is a trace $\tau$ in $\msystem(n,m)$ such

1700: that the graph $G(\witness)(\tau)$ has a canonical $k$-nice cycle.

1701: Corollary~\ref{thm:nice-canonical-cycle-sat} then allows us to verify

1702: sequential consistency on $\msystem(n,m,v)$ by

1703: $\minimum{\{n,m\}}$ such model checking lemmas.

1704: We fix some $k$ such that $1 \leq k \leq \minimum{\{n,m\}}$.

1705: We use the symbol $\oplus$ to denote addition over

1706: the additive group with elements $\nat_k$ and identity element $k$.

1707: The model checking algorithm makes use of $m$ automata named $\cons_k(j)$

1708: for $1 \leq j \leq m$, and $k$ automata named $\checker_k(i)$ for

1709: $1 \leq i \leq k$.

1710: We define these automata formally below.

1711:

1712: % Let us fix some $k$ such that $1 \leq k \leq \minimum{\{n,m\}}$.

1713: % We construct $m$ automata, one for each location, where the automaton

1714: % for location $j$ is denoted by $\cons_k(j)$.

1715: % The automaton $\cons_k(j)$ when composed with $\msystem(n,m,v)$

1716: % restricts the sequence of values written by write

1717: % events to location $j$.

1718: % We also construct $k$ automata, one for each processor from $1$ to

1719: % $k$, where the automaton for processor $i$ is denoted by

1720: % $\checker_k(i)$.

1721: % The automaton $\checker_k(i)$ when composed with $\msystem(n,m,v)$

1722: % monitors the sequence of events at processor $i$ and accepts runs with a certain

1723: % sequence of two events.

1724:

1725: For all memory locations $1 \leq j \leq m$, let $\cons_k(j)$ be

1726: the regular set of sequences in $\memevent(n,m,2)$ represented

1727: by the automaton in Figure~\ref{fig:cons}.

1728: The automaton $\cons_k(j)$, when composed with $\msystem(n,m,v)$,

1729: constrains the write events to location $j$.

1730: If $1 \leq j \leq k$ then $\cons_k(j)$ accepts traces where the first

1731: few ($0$ or more) write events have data value $0$ followed by exactly one write

1732: with data value $1$ followed by ($0$ or more) write events with data value $2$.

1733: If $k < j \leq m$ then $\cons_k(j)$ accepts traces where all writes to

1734: location $j$ have data value $0$.

1735:

1736: For all $1 \leq i \leq k$, let $\checker_k(i)$ be the regular

1737: set of sequences in $\memevent(n,m,2)$ represented by the automaton in

1738: Figure~\ref{fig:automata-sc}.

1739: The automaton $\checker_k(i)$ accepts a trace

1740: $\tau$ if there are events $x$ and $y$ at processor $i$, with $x$

1741: occurring before $y$, such that $x$ is an event to location $i$ with

1742: data value $1$ or $2$ and $y$ is an event to location $i \oplus 1$

1743: with data value $0$ or $1$.

1744: Moreover, the event $y$ is required to be a write event if its data

1745: value is $1$.

1746:

1747: % $1 \leq x < y \leq |\tau|$ such that $x,y \in

1748: % P(\tau,i)$, $x \in L(\tau,i)$, $y \in L(\tau,i \oplus 1)$,

1749: % $\data(\tau(x)) \in \{1,2\}$, and

1750: % either $\data(\tau(y)) = 0$ or

1751: % $\op(\tau(y)) = \wrt$ and $\data(\tau(y)) = 1$.

1752: % In other words, the automaton $\checker_k(i)$ observes only the events

1753: % at processor $i$ and accepts a trace if there is an event $x$ to

1754: % location $i$ with data value $1$ or $2$, followed by an event $y$ to

1755: % location $i \oplus 1$ with data value $0$ or $1$.

1756: % The event $y$ is required to be a write event if its data value is $1$.

1757:

1758: \begin{figure}

1759: \begin{center}

1760: \input{cycle.pstex_t}

1761: \end{center}

1762: \caption{Canonical $k$-nice cycle}

1763: \label{fig:cycle}

1764: \end{figure}

1765:

1766: In order to check for canonical $k$-nice cycles, we compose the memory system

1767: $\msystem(n,m,2)$ with $\cons_k(j)$ for all $1 \leq j \leq m$ and with

1768: $\checker_k(i)$ for all $1 \leq i \leq k$ and use a model checker to determine if

1769: the resulting automaton has a run.

1770:

1771: Any accepting run of the composed system has $2 \times k$ events which

1772: can be arranged as shown in Figure~\ref{fig:cycle} to yield a

1773: canonical $k$-nice cycle.

1774: Each processor $i$ for $1 \leq i \leq k$ and

1775: each location $j$ for $1 \leq j \leq k$ supplies $2$ events.

1776: Each event is marked by a $4$-tuple denoting the possible values for

1777: that event.

1778: For example, the $4$-tuple $\langle \{\rd,\wrt\},1,1,\{1,2\} \rangle$

1779: denotes a read event or a write event by processor $1$ to location $1$ with

1780: data value $1$ or $2$.

1781: The edge labeled by $\msc(\tau,i)$ is due

1782: to the total order imposed by sequential consistency on the events at processor $i$.

1783: The edge labeled by $\expand{\witness}(\tau,j)$ is due to the partial

1784: order imposed by the simple witness on the events to location $j$.

1785: For example, consider the edge labeled $\expand{\witness}(\tau,2)$

1786: with the source event labeled by

1787: $\langle \{\rd,\wrt\},1,2,0 \rangle \vee \langle \wrt,1,2,1 \rangle$

1788: and the sink event labeled by

1789: $\langle \{\rd,\wrt\},2,2,\{1,2\} \rangle$.

1790: In any run of the composed system, the write events to location $2$

1791: with value $0$ occur before the write event with value $1$ which

1792: occurs before the write events with value $2$.

1793: Since $\witness$ is a simple witness,

1794: the partial order $\expand{\witness}(\tau,2)$

1795: orders all events labeled with $0$ before

1796: all events labeled with $1$ or $2$.

1797: Hence any event denoted by $\langle \{\rd,\wrt\},1,2,0 \rangle$ is

1798: ordered before any event denoted by $\langle \{\rd,\wrt\},2,2,\{1,2\} \rangle$.

1799: Moreover, the unique write event to location $2$ with data value $1$

1800: is ordered before any other events with value $1$ or $2$.

1801: Hence the event $\langle \wrt,1,2,1 \rangle$ is ordered before

1802: any event denoted by $\langle \{\rd,\wrt\},2,2,\{1,2\} \rangle$.

1803:

1804: We have given an intuitive argument above that a canonical $k$-nice

1805: cycle can be constructed from any run in the composed system.

1806: The following theorem proves that it is necessary and sufficient to

1807: check that the composed system has a run.

1808:

1809: \begin{theorem}

1810: \label{thm:reduction}

1811: % Suppose $\msystem(n,m,v)$ is a memory system for some

1812: % $n,m,v \geq 1$, and $\witness$ is a simple witness for it.

1813: There is a canonical

1814: $k$-nice cycle in $G(\witness)(\tau)$ for some unambiguous trace $\tau$

1815: of $\msystem(n,m)$

1816: iff there is a trace $\tau'$ of $\msystem(n,m,2)$ such that

1817: $\tau' \in \cons_k(j)$ for all $1 \leq j \leq m$ and

1818: $\tau' \in \checker_k(i)$ for all $1 \leq i \leq k$.

1819: \end{theorem}

1820: \begin{proof}

1821: $(\Rightarrow)$

1822: Suppose there is a canonical $k$-nice cycle

1823: $u_1,v_1,\ldots,u_{k},v_{k}$ in the graph

1824: $G(\witness)(\tau)$ for some unambiguous trace $\tau$ of $\msystem(n,m)$.

1825: Then $\zug{u_x,v_x} \in \procorder{\msc}{\tau}{x}$ and

1826: $\zug{v_x, u_{x\oplus1}} \in \locorder{\expand{\witness}}{\tau}{x\oplus1}$

1827: for all $1 \leq x \leq k$.

1828: From the definition of $\locorder{\expand{\witness}}{\tau}{x}$, we

1829: have that $\data(\tau(x)) \neq 0$ for all $1 \leq x \leq k$.

1830: Therefore, for all $1 \leq x \leq k$, there is a unique write event

1831: $w_x$ such that $\data(\tau(w_x)) = \data(\tau(u_x))$.

1832:

1833: For all $1 \leq j \leq m$, let $V_j$ be the set of data values written

1834: by the write events to location $j$ in $\tau$, and let

1835: $f_j:V_j \rightarrow \nat_{|\tau|}$ be the function such that $f_j(v)$

1836: is the index of the unique write event to location $j$ with data value $v$.

1837: We define a renaming function $\lambda: \nat_m \times \natzero \rightarrow

1838: \natzero_2$ as follows.

1839: For all $k < j \leq m$ and $v \in \natzero$,

1840: we have $\lambda(j,x) = 0$.

1841: For all $1 \leq j \leq k$ and $v \in \natzero$, we split the

1842: definition into two cases.

1843: For $v \in V_j$, we have

1844: \[

1845: \begin{array}{lll}

1846: \lambda(j,v) = & 0, & \mathrm{if}~f_j(v) < w_j \\

1847: & 1, & \mathrm{if}~f_j(v) = w_j \\

1848: & 2, & \mathrm{if}~f_j(v) > w_j.

1849: \end{array}

1850: \]

1851: For $v \not \in V_j$, we have

1852: \[

1853: \begin{array}{lll}

1854: \lambda(j,v) = & 0, & \mathrm{if}~v = 0 \\

1855: & 2, & \mathrm{if}~v \neq 0.

1856: \end{array}

1857: \]

1858: % For all locations $1 \leq j \leq m$, define $f_j:\nat \rightarrow

1859: % \nat$ to be the function given by $f_j(x) = u$ if $x =

1860: % \data(\tau(u))$ for some $u \in L^w(\tau,j)$ and $f_j(x) = x$

1861: % otherwise.

1862: % In other words, the value of $f_j$ at $x$ is the index of the unique write

1863: % event in $\tau$ that wrote $x$ if one exists, and $x$ otherwise.

1864: % We define a function $\lambda: \nat_m \times \nat \rightarrow

1865: % \natzero_2$ as follows.

1866: % For all $k < j \leq m$ and $x \in \nat$, we have $\lambda(j,x) = 0$.

1867: % For all $1 \leq j \leq k$ and $x \in \nat$, we have

1868: % \[

1869: % \begin{array}{lll}

1870: % \lambda(j,x) = & 0, & \mathrm{if}~f_j(x) < f_j(\data(\tau(u_j))) \\

1871: % & 1, & \mathrm{if}~f_j(x) = f_j(\data(\tau(u_j))) \\

1872: % & 2, & \mathrm{if}~f_j(x) > f_j(\data(\tau(u_j))).

1873: % \end{array}

1874: % \]

1875: From Assumption~\ref{ass:data-independence},

1876: there is a trace $\tau'$ of $\msystem(n,m,2)$ such that

1877: $\tau' = \lambda^d(\tau)$.

1878: In $\tau'$, for every location $j$ such that $1 \leq j \leq k$ every

1879: write event before $w_j$ has the data value 0, the write event at $w_j$

1880: has the data value 1, and the write events after $w_j$ have the data

1881: value $2$.

1882: Moreover, for every location $j$ such that $k < j \leq m$ every write

1883: event has the data value $0$.

1884: Therefore $\tau' \in \cons_k(i)$ for all $1 \leq i \leq k$.

1885:

1886: We show that $\tau' \in \checker_k(i)$ for all $1 \leq i \leq k$.

1887: Since $\zug{u_i,v_i} \in \procorder{\msc}{\tau}{i}$,

1888: we have that $u_i < v_i$ for all $1 \leq i \leq k$.

1889: We already have that $\data(\tau'(u_i)) = \data(\tau'(w_i)) = 1$ for all $1 \leq i \leq k$.

1890: Therefore all we need to show is that for all $1 \leq i \leq k$ we

1891: have $\data(\tau'(v_i)) = 0$ or $\op(\tau'(v_i)) = \wrt$ and

1892: $\data(\tau'(v_i)) = 1$.

1893: Since $\zug{v_i, u_{i\oplus1}} \in

1894: \locorder{\expand{\witness}}{\tau}{i\oplus1}$,

1895: one of the following conditions hold.

1896: \begin{enumerate}

1897: \item

1898: $\data(\tau(v_i)) = \data(\tau(u_{i\oplus1}))$, $\op(\tau(v_i)) = \wrt$,

1899: and $\op(\tau(u_{i\oplus1})) = \rd$.

1900: We have that $\op(\tau'(v_i)) = \op(\tau(v_i)) = \wrt$.

1901: Since $\data(\tau(v_i)) = \data(\tau(u_{i\oplus1}))$

1902: we have $\data(\tau'(v_i)) = \data(\tau'(u_{i\oplus1})) = 1$.

1903: Thus, we get $\op(\tau'(v_i)) = \wrt$ and

1904: $\data(\tau'(v_i)) = 1$.

1905: % Since $\data(\tau'(u_{i\oplus1})) = 1$, we get from the

1906: % definition of $\lambda$ that $\lambda(i\oplus1,\data(\tau(u_{i\oplus1}))) = 1$.

1907: % Since $\data(\tau(v_i)) = \data(\tau(u_{i\oplus1}))$,

1908: % we have that $\lambda(i\oplus1,\data(\tau(v_i))) = 1$.

1909: % Therefore $\data(\tau'(v_i)) = 1$.

1910: % Moreover, since $\op(\tau(v_i)) = \wrt$, we get that $\op(\tau'(v_i)) = \wrt$.

1911: \item

1912: $\data(\tau(v_i)) = 0$ and $\data(\tau(u_{i\oplus1})) \neq 0$.

1913: From the definition of $\lambda$, we get that $\data(\tau'(v_i)) = 0$.

1914: \item

1915: $\exists a \in L^w(\tau,i\oplus1)$ such that

1916:     $\zug{a,w_{i \oplus 1}} \in \locorder{\witness}{\tau}{i\oplus1}$ and

1917:     $\data(\tau(a)) = \data(\tau(v_i))$.

1918: %    $\data(\tau(b)) = \data(\tau(u_{i\oplus1}))$.

1919: % From the definition of $f_{i\oplus1}$, we get that

1920: % $f_{i\oplus1}(\data(\tau(v_i))) = a$ and

1921: % $f_{i\oplus1}(\data(\tau(u_{i\oplus1}))) = b$.

1922: Since $\zug{a,b} \in \locorder{\witness}{\tau}{i\oplus1}$ and

1923: $\witness$ is a simple witness we get $a < b$.

1924: Therefore $\lambda(i \oplus 1, \data(\tau(a))) = 0$.

1925: %$f_{i\oplus1}(\data(\tau(v_i))) < f_{i\oplus1}(\data(\tau(u_{i\oplus1})))$.

1926: Thus $\lambda(i\oplus1,\data(\tau(v_i))) = 0$ and

1927: $\data(\tau'(v_i)) = 0$.

1928: \end{enumerate}

1929: Thus, in all cases we have that either $\data(\tau'(v_i)) = 0$ or

1930: $\op(\tau'(v_i)) = \wrt$ and $\data(\tau'(v_i)) = 1$.

1931: Therefore $\tau' \in \checker_k(i)$.

1932:

1933: $(\Leftarrow)$

1934: Suppose there is a trace $\tau'$ of $\msystem(n,m,2)$ such that

1935: $\tau' \in \cons_k(j)$ for all $1 \leq j \leq m$ and

1936: $\tau' \in \checker_k(i)$ for all $1 \leq i \leq k$.

1937: %Let $\tau' = \trace{\sigma'}$.

1938: For all $1 \leq i \leq k$, let $1 \leq u_i < v_i \leq |\tau'|$ be such

1939: that the automaton $\checker_k(i)$ enters state $b$ for the first time

1940: on observing $\tau'(u_i)$ and enters state $\err$ for the first time on

1941: observing $\tau'(v_i)$.

1942: Therefore we have $\proc(\tau'(u_i)) = i$, $\loc(\tau'(u_i)) = i$, and

1943: $\data(\tau'(u_i)) \in \{1,2\}$.

1944: We also have $\proc(\tau'(v_i)) = i$, $\loc(\tau'(v_i)) = i\oplus1$, and

1945: either $\data(\tau'(v_i)) = 0$ or $\op(\tau'(v_i)) = \wrt$ and

1946: $\data(\tau'(v_i)) = 1$.

1947: From Assumption~\ref{ass:data-independence}, there is an unambiguous

1948: trace $\tau$ of $\msystem(n,m)$ and a renaming function $\lambda:\nat_m \times

1949: \natzero \rightarrow \natzero_2$ such that $\tau' = \lambda^d(\tau)$.

1950: %Let $\sigma$ be the run corresponding to $\tau$.

1951: We will show that $u_1,v_1,\ldots,u_k,v_k$ is a canonical $k$-nice

1952: cycle in $G(\witness)(\tau)$.

1953: Since $\proc(\tau(u_i)) = \proc(\tau(v_i)) = i$ and $u_i < v_i$, we

1954: have $\zug{u_i,v_i} \in \procorder{\msc}{\tau}{i}$ for all $1 \leq

1955: i \leq k$.

1956: We show that

1957: $\zug{v_i, u_{i\oplus1}} \in

1958: \locorder{\expand{\witness}}{\tau}{i\oplus1}$

1959: for all $1 \leq i \leq k$.

1960: First $\loc(\tau(v_i)) = \loc(\tau(u_{i\oplus1})) = i\oplus1$.

1961: For all $u,v \in L^w(\tau,i)$, if $\lambda(i,\data(\tau(u)) <

1962: \lambda(i,\data(\tau(v))$ then $u < v$ from the property of

1963: $\cons_k(i)$.

1964: Since $\data(\tau'(u_{i\oplus1})) \in \{1,2\}$,

1965: we have from the property of a renaming function that

1966: $\data(\tau(u_{i\oplus1})) \neq 0$.

1967: There are two cases on $\tau'(v_i)$.

1968: \begin{enumerate}

1969: \item

1970: $\data(\tau'(v_i)) = 0$.

1971: There are two subcases: $\data(\tau(v_i)) = 0$ or

1972: $\lambda(i\oplus1,\data(\tau(v_i))) = 0$.

1973: In the first subcase, since $\data(\tau(u_{i\oplus1})) \neq 0$, we have

1974: $\zug{v_i, u_{i\oplus1}} \in \locorder{\expand{\witness}}{\tau}{i\oplus1}$.

1975: In the second subcase, there are $a,b \in L^w(\tau,i\oplus1)$ such that

1976: $\data(\tau(a)) = \data(\tau(v_i))$ and $\data(\tau(b)) =

1977: \data(\tau(u_{i\oplus1}))$.

1978: Since $\data(\tau'(a)) = 0$ and $\data(\tau'(b)) \in \{1,2\}$, we get from the

1979: definition of $\cons_k(i\oplus1)$ that $a < b$ or

1980: $\zug{a,b} \in \locorder{\witness}{\tau}{i\oplus1}$.

1981: Therefore $\zug{v_i, u_{i\oplus1}} \in

1982: \locorder{\expand{\witness}}{\tau}{i\oplus1}$.

1983: \item

1984: $\op(\tau'(v_i)) = \wrt$ and $\data(\tau'(v_i)) = 1$.

1985: We have that $\op(\tau(v_i)) = \wrt$.

1986: There is an event $b \in L^w(\tau,i\oplus1)$ such that

1987: $\data(\tau(b)) = \data(\tau(u_{i\oplus1}))$.

1988: There are two subcases:

1989: $\data(\tau'(u_{i\oplus1})) = 1$ or

1990: $\data(\tau'(u_{i\oplus1})) = 2$.

1991: In the first subcase, we have $v_i=b$ since $\cons_k(i\oplus1)$ accepts

1992: traces with a single write event labeled with 1.

1993: Therefore $\data(\tau(v_i)) = \data(\tau(u_{i\oplus1}))$,

1994: $\op(\tau(v_i)) = \wrt$ and $\op(\tau(u_{i\oplus1})) = \rd$, and we

1995: get $\zug{v_i, u_{i\oplus1}} \in

1996: \locorder{\expand{\witness}}{\tau}{i\oplus1}$.

1997: In the second subcase, since

1998: $\data(\tau'(a)) = 1$ and $\data(\tau'(b)) = 2$, we get from the

1999: definition of $\cons_k(i\oplus1)$ that $a < b$ or

2000: $\zug{a,b} \in \locorder{\witness}{\tau}{i\oplus1}$.

2001: Therefore $\zug{v_i, u_{i\oplus1}} \in

2002: \locorder{\expand{\witness}}{\tau}{i\oplus1}$.

2003: \end{enumerate}

2004: Therefore $u_1,v_1,\ldots,u_k,v_k$ is a canonical $k$-nice

2005: cycle in $G(\witness)(\tau)$.

2006: \end{proof}

2007:

2008: % Theorem~\ref{thm:reduction} suggests a model checking algorithm for

2009: % checking if for all $v \geq 1$ the memory system $\msystem(n,m,v)$ is

2010: % sequentially consistent with respect to a simple witness $\witness$.

2011: % % We have to check for every $1 \leq k \leq \minimum{\{n,m\}}$ that

2012: % % there is no canonical $k$-nice cycle in $G(\witness)(\tau)$ for

2013: % % every trace $\tau$ of $\msystem(n,m)$.

2014: % For each $1 \leq k \leq \minimum{\{n,m\}}$, we can do a proof

2015: % by a model checker to show that there is no canonical $k$-nice cycle

2016: % in $G(\witness)(\tau)$ for every trace $\tau$ of $\msystem(n,m)$.

2017: % This is done in the following way: compose the memory system

2018: % $\msystem(n,m,2)$ with $\cons_k(j)$ for all $1 \leq j \leq m$ and with

2019: % $\checker_k(i)$ for all $1 \leq i \leq k$ and check if

2020: % the resulting system has a run.  The existence of a run implies the

2021: % existence of a canonical $k$-nice cycle.

2022:

2023: \begin{example}

2024: We now give an example to illustrate the method described in this section.

2025: Although the memory system in Figure~\ref{fig:example} is sequentially

2026: consistent, an earlier version had an error.

2027: The assignment $\owner[j] := 0$ was missing in the guarded command of

2028: the action $\zug{\SHDRSP,i,j}$.

2029: We modeled the system in TLA+ \cite{Lamport94} and model checked the system

2030: configuration with two processors and two locations using the model

2031: checker TLC \cite{YML99}.

2032: The error manifests itself while checking for the existence of a

2033: canonical $2$-nice cycle.

2034: The erroneous behavior is when the system starts in the initial

2035: state with all cache lines in $\SHD$ state and $\owner[1] = \owner[2]

2036: = 1$, and then executes the following sequence of 12 events:\\

2037: $1.~\zug{\EXCACK,2,2}$\\

2038: $2.~\zug{\UPDATE,2}$\\

2039: $3.~\zug{\SHDACK,1,2}$\\

2040: $4.~\zug{\EXCACK,2,2}$\\

2041: $5.~\zug{\EXCACK,1,1}$\\

2042: $6.~\zug{\UPDATE,1}$\\

2043: $7.~\zug{\UPDATE,1}$\\

2044: $8.~\zug{\wrt,1,1,1}$\\

2045: $9.~\zug{\rd,1,2,0}$\\

2046: $10.~\zug{\UPDATE,2}$\\

2047: $11.~\zug{\wrt,2,2,1}$\\

2048: $12.~\zug{\rd,2,1,0}$\\

2049: After event~2, $\owner[2] = 2$, $\cache[1][2].s = \INV$, and

2050: $\cache[2][2].s = \EXC$.

2051: Now processor~$1$ gets a shared ack message $\zug{\SHDACK,1,2}$ for

2052: location~$2$.

2053: Note that in the erroneous previous version of the example, this event

2054: does not set $\owner[2]$ to~$0$.

2055: Consequently $\owner[2] = 2$ and $\cache[2][2].s = \SHD$ after event~$3$.

2056: An exclusive ack to processor~$2$ for location~$2$ is therefore

2057: allowed to happen at event~$4$.

2058: Since the shared ack message to processor~$1$ in event~$3$ is still

2059: sitting in $\inQ[1]$, $\cache[1][2].s$ is still $\INV$.

2060: Therefore event~$4$ does not generate an $\INVAL$ message to

2061: processor~$1$ for location~$2$.

2062: At event~$5$, processor~$1$ gets an exclusive ack message for

2063: location~$1$.

2064: This event also inserts an $\INVAL$ message on location~$1$ in $\inQ[2]$

2065: behind the $\EXCRSP$ message on location~$2$.

2066: After the $\UPDATE$ events to processor~$1$ in events~$6$ and~$7$, we

2067: have $\cache[1][1].s = \EXC$ and $\cache[1][2].s = \SHD$.

2068: Processor~$1$ writes~$1$ to location~$1$ and reads~$0$ from

2069: location~$2$ in the next two events, thereby sending automaton

2070: $\checker_2(1)$ to the state $\err$.

2071: Processor~$2$ now processes the $\EXCRSP$ message to location~$2$ in

2072: the $\UPDATE$ event~$10$.

2073: Note that processor~$2$ does not process the $\INVAL$ message to location~$1$

2074: sitting in $\inQ[2]$.

2075: At this point, we have $\cache[2][1].s = \SHD$ and $\cache[2][2].s = \EXC$.

2076: Processor~$2$ writes~$1$ to location~$2$ and reads~$0$ from

2077: location~$1$ in the next two events, thereby sending automaton

2078: $\checker_2(2)$ to the state $\err$.

2079: Since there has been only one write event of data value~$1$ to each

2080: location, the run is accepted by $\cons_2(1)$ and $\cons_2(2)$ also.

2081: \end{example}

2082:

2083: Note that while checking for canonical $k$-nice cycles $\cons_k(j)$

2084: has $2$ states for all $1 \leq j \leq k$ and $1$ state for $k < j \leq

2085: m$.

2086: Also $\checker_k(i)$ has $3$ states for all $1 \leq i \leq k$.

2087: Therefore, by composing $\cons_k(j)$ and $\checker_k(i)$ with the

2088: memory system $\msystem(n,m,2)$ we increase the state of the system by a factor of at

2089: most $2^k \times 3^k$.

2090: Actually, for all locations $k < j \leq m$ we are restricting write

2091: events to have only the data value $1$.

2092: Therefore, in practice we might reduce the set of reachable states.

2093:

2094: \section{Related work}

2095: \label{sec:rel-work}

2096: Descriptions of shared-memory systems are parameterized by

2097: the number of processors, the number of memory locations, and the

2098: number of data values.

2099: The specification for such a system can

2100: be either an invariant or a shared-memory model.

2101: These specifications can be verified

2102: for some fixed values of the parameters or for

2103: arbitrary values of the parameters.

2104: The contribution of this paper is to provide a completely

2105: automatic method based on model checking to verify the sequential

2106: consistency memory model for fixed parameter values.

2107: We now describe the related work on verification of shared-memory

2108: systems along the two axes mentioned above.

2109:

2110: A number of papers have looked at invariant verification.

2111: Model checking has been used for fixed parameter values

2112: \cite{McMillanSchwalbe91,CGH93,EirikssonMcMillan95,IpDill96}, while

2113: mechanical theorem proving \cite{LoewensteinDill92,ParkDill96} has

2114: been used for arbitrary parameter values.

2115: Methods combining automatic abstraction with model checking

2116: \cite{PongDubois95,Delzanno00} have been used to verify snoopy

2117: cache-coherence protocols for arbitrary parameter values.

2118: McMillan \cite{McMillan01} has used a combination of

2119: theorem proving and model checking to verify the directory-based FLASH

2120: cache-coherence protocol \cite{KOH94} for arbitrary parameter values.

2121: A limitation of all these approaches is that they do not explicate the

2122: formal connection between the verified invariants and shared-memory

2123: model for the protocol.

2124:

2125: There are some papers that have looked at verification of shared-memory models.

2126: Systematic manual proof methods \cite{LLOR99,PSCH98} and theorem

2127: proving \cite{Arons01} have been used to verify sequential

2128: consistency for arbitrary parameter values.

2129: These approaches require a significant amount of effort on the part of

2130: the verifier.

2131: Our method is completely automatic and is

2132: a good debugging technique which can be applied before

2133: using these methods.

2134: The approach of Henzinger {\em et al.} \cite{HQR99b}

2135: and Condon and Hu \cite{CondonHu01} requires a manually constructed

2136: finite state machine called the serializer.

2137: The serializer generates the witness total order for each run of the

2138: protocol.

2139: By model checking the system composed of the protocol and the serializer,

2140: it can be easily checked that the witness total order for every run is a

2141: trace of serial memory.

2142: This idea is a particular instance

2143: of the more general ``convenient computations''

2144: approach of Katz and Peled \cite{KatzPeled92}.

2145: In general, the manual construction of the serializer can be tedious and

2146: infeasible in the case when unbounded storage is required.

2147: Our work is an improvement since the witness total order is deduced

2148: automatically from the simple write order.

2149: Moreover, the amount of state we add to the cache-coherence protocol

2150: in order to perform the model checking is significantly less than that

2151: added by the serializer approach.

2152: The ``test model checking'' approach of Nalumasu {\em et al.}

2153: \cite{NGMG98} can check a variety of memory models and is automatic.

2154: Their tests are sound but incomplete for sequential consistency.

2155: On the other hand, our

2156: method offers sound and complete verification for a large class of

2157: cache-coherence protocols.

2158:

2159: Recently Glusman and Katz \cite{GlusmanKatz01} have shown that, in

2160: general, interpreting sequential consistency over finite traces is not

2161: equivalent to interpreting it over infinite traces.

2162: They have proposed conditions on shared-memory systems under which the

2163: two are equivalent.

2164: Their work is orthogonal to ours and a combination of the two will

2165: allow verification of sequential consistency over infinite traces for

2166: finite parameter values.

2167:

2168: \section{Conclusions}

2169: \label{sec:conclusions}

2170: We now put the results of this paper in perspective.

2171: Assumption~\ref{ass:causality} about causality

2172: and Assumption~\ref{ass:data-independence} about data independence

2173: are critical to our result that reduces the problem of verifying

2174: sequential consistency to model checking.

2175: Assumption~\ref{ass:msystem-proc-sym} about processor symmetry

2176: and Assumption~\ref{ass:msystem-loc-sym} about location symmetry

2177: are used to reduce the number of model checking lemmas to

2178: $\minimum{\{n,m\}}$ rather than exponential in $n$ and $m$.

2179:

2180: In this paper, the read and write events have been modeled as atomic events.

2181: In most real machines, each read or write event is broken into two

2182: separate events ---a request from the processor to the cache, and a

2183: response from the cache to the processor.

2184: Any memory model including sequential consistency naturally specifies

2185: a partial order on the requests.

2186: If the memory system services processor requests in order then the order of

2187: requests is the same as the order of responses.

2188: In this case, the method described in this paper can be used

2189: by identifying the atomic read and write events with the responses.

2190: The case when the memory system services requests

2191: out of order is not handled by this paper.

2192:

2193: The model checking algorithm described in the paper is sound and complete with

2194: respect to a simple witness for the memory system.

2195: In some protocols, for example the {\em lazy caching\/} protocol \cite{ABM89},

2196: the correct witness is not simple.

2197: But the basic method described in the paper where data values of

2198: writes are constrained by automata can still be used if

2199: ordering decisions about writes can be made before the written values

2200: are read.

2201: The lazy caching protocol has this property and extending the methods

2202: described in the paper to handle it is part of our future work.

2203: We would also like to extend our work to handle other memory

2204: models.

2205:

2206: \bibliographystyle{alpha}

2207: \bibliography{/udir/qadeer/texinputs/diss}

2208:

2209: \end{document}

2210:

2211: \newcommand{\comment}[1]{}

2212: \comment{

2213: \begin{figure}

2214: \subfigure{

2215: \parbox{2in}{

2216: {\scriptsize

2217: \begin{tabbing}

2218: \hspace{0.2cm} \= xx \= xx \= xx \= xx \= xx \= xx \= \kill

2219: Automaton $\cons_k(j)$ for $1 \leq j \leq k$ \\

2220: States $\{a,b\}$ \\

2221: Initial state $a$ \\

2222: Accepting states $\{a,b\}$ \\

2223: Alphabet $\memevent(n,m,2)$ \\

2224: Transitions \\

2225: $[]~\neg (\op(e) = \wrt \wedge \loc(e) = j)

2226: \rightarrow$ \\

2227: \>\> $s' = s$ \\

2228: $[]~s = a \wedge \op(e) = \wrt \wedge \loc(e) = j

2229: \wedge \data(e) = 0 \rightarrow$ \\

2230: \>\> $s' = a$ \\

2231: $[]~s = a \wedge \op(e) = \wrt \wedge \loc(e) = j

2232: \wedge \data(e) = 1 \rightarrow$ \\

2233: \>\> $s' = b$ \\

2234: $[]~s = b \wedge \op(e) = \wrt \wedge \loc(e) = j

2235: \wedge \data(e) = 2 \rightarrow$ \\

2236: \>\> $s' = b$

2237: \end{tabbing}

2238: }

2239: }

2240: }\quad\quad

2241: \subfigure{

2242: \parbox{2in}{

2243: {\scriptsize

2244: \begin{tabbing}

2245: \hspace{0.2cm} \= xx \= xx \= xx \= xx \= xx \= xx \= \kill

2246: Automaton $\cons_k(j)$ for $k < j \leq m$ \\

2247: States $\{a\}$ \\

2248: Initial state $a$ \\

2249: Accepting states $\{a\}$ \\

2250: Alphabet $\memevent(n,m,2)$ \\

2251: Transitions \\

2252: $[]~\neg (\op(e) = \wrt \wedge \loc(e) = j)

2253: \vee \data(e) = 0 \rightarrow$ \\

2254: \>\> $s' = s$

2255: \end{tabbing}

2256: }

2257: }

2258: }

2259: \caption{Automaton $\cons_k(j)$}

2260: \label{fig:cons}

2261: \end{figure}

2262: }

2263: