0312:cs0312012/v3A.tex

1: \documentstyle[epsfig,11pt]{article}

2: \setlength\textwidth{6.5in}

3: \setlength\oddsidemargin{0in}

4: \setlength\evensidemargin{0in}

5: \setlength\topmargin{0in}

6: \setlength\headheight{0in}

7: \setlength\headsep{0in}

8: \setlength\textheight{8.5in}

9: \newcommand{\spin}{{\sc Spin}}

10: \newcommand{\promela}{{\sc Promela}}

11: \newcommand{\otter}{{\sc Otter}}

12: \newcommand{\imp}{{\longrightarrow}}

13: \begin{document}

14: \pagenumbering{roman}

15: \thispagestyle{empty}

16: \setcounter{page}{1}

17: \begin{center}

18:

19: \vspace{.2in}

20: \rule{1.5in}{.01in}\\ [1ex]

21: ANL/MCS-TM-261 \\

22: \rule{1.5in}{.01in}

23:

24: \vspace{1.5in}

25: {\Large\bf Methods to Model-Check Parallel Systems Software}\\[2ex]

26: by \\ [2ex]

27:

28: {\Large\it Olga Shumsky Matlin, William McCune, and Ewing Lusk}\\

29:         {\tt{\{matlin,mccune,lusk\}@mcs.anl.gov}}

30: %

31:

32: \vspace{1.5in}

33: Mathematics and Computer Science Division

34:

35: \bigskip

36:

37: Technical Memorandum No. 261

38:

39: \vspace{1in}

40: April 2003

41: \end{center}

42:

43: \vfill

44: \noindent

45: This work was supported by the Mathematical, Information, and Computational

46: Sciences Division subprogram of the Office of Advanced Scientific Computing

47: Research, Office of Science, U.S. Department of Energy, under Contract

48: W-31-109-ENG-38.

49: %

50:

51: %

52: %

53: %

54: %

55: \newpage

56: \noindent

57: Argonne National Laboratory, with facilities in the states of Illinois

58: and Idaho, is owned by the United States Government and operated by The

59: University of Chicago under the provisions of a contract with the

60: Department of Energy.

61:

62: \vspace{2in}

63:

64: \begin{center}

65: {\bf DISCLAIMER}

66: \end{center}

67:

68: \noindent

69: This report was prepared as an account of work sponsored by an agency

70: of the United States Government.  Neither the United States Government

71: nor any agency thereof, nor The University of Chicago, nor any of

72: their employees or officers, makes any warranty, express or implied,

73: or assumes any legal liability or responsibility for the accuracy,

74: completeness, or usefulness of any information, apparatus, product, or

75: process disclosed, or represents that its use would not infringe

76: privately-owned rights.  Reference herein to any specific commercial

77: product, process, or service by trade name, trademark, manufacturer,

78: or otherwise, does not necessarily constitute or imply its

79: endorsement, recommendation, or favoring by the United States

80: Government or any agency thereof.  The views and opinions of document

81: authors expressed herein do not necessarily state or reflect those of

82: the United States Government or any agency thereof.

83: \vspace{1in}

84: \begin{center}

85: Available electronically at http://www.doe.gov/bridge

86:

87: Available for a processing fee to U.S. Dept.\ of

88: Energy and its contractors, in paper, from:

89:

90: U.S. Department of Energy\\

91: Office of Scientific and Technical Information\\

92: P.O. Box 62\\

93: Oak Ridge, TN 37831-0062\\

94: phone: (865) 576-8401\\

95: fax: (865) 576-5728\\

96: email: reports@adonis.osti.gov

97: \end{center}

98: \newpage

99:

100: \tableofcontents

101:

102: \newpage

103:

104: \setcounter{page}{0}

105: \pagenumbering{arabic}

106: \begin{center}

107: {\Large\bf Methods to Model-Check Parallel Systems Software}\\ [1ex]

108: by \\ [1ex]

109: Olga Shumsky Matlin, William McCune, and Ewing Lusk

110: \end{center}

111:

112: \addcontentsline{toc}{section}{Abstract}

113: \abstract{We report on an effort to develop methodologies for formal

114:   verification of parts of the Multi-Purpose Daemon (MPD) parallel

115:   process management system.  MPD is a distributed collection of

116:   communicating processes.  While the individual components of the

117:   collection execute simple algorithms, their interaction leads to

118:   unexpected errors that are difficult to uncover by conventional

119:   means.  Two verification approaches are discussed here: the standard

120:   model checking approach using the software model checker \spin\ and

121:   the nonstandard use of a general-purpose first-order

122:   resolution-style theorem prover \otter\ to conduct the traditional

123:   state space exploration.  We compare modeling methodology and

124:   analyze performance and scalability of the two methods with respect

125:   to verification of MPD.}

126:

127: \section{Introduction}

128: \label{sec:introduction}

129: Reasoning about parallel programs is surprisingly difficult.  Even

130: small parallel programs are difficult to write correctly, and an

131: incorrect parallel program is equally difficult to debug, as we

132: experienced while writing the Multi-Purpose Daemon (MPD), a process

133: manager for parallel programs

134: \cite{bgl00:mpd:pvmmpi00,butler-lusk-gropp:mpd-parcomp}.  Despite

135: MPD's small size and apparent simplicity, errors have impeded progress

136: toward code in which we have complete confidence.  Such a situation

137: motivates us to explore program verification techniques.

138:

139: MPD is itself a parallel program.  Its function is to start the

140: processes of a parallel job in a scalable way, manage input and

141: output, handle faults, provide services to the application, and

142: terminate jobs cleanly.  MPD is the sort of process manager needed to

143: run applications that use the standard

144: MPI~\cite{mpi-forum:journal,mpi-forum:mpi2-journal} library for

145: parallelism, although it is not MPI specific.  MPD is distributed as

146: part of the portable and publicly available

147: MPICH~\cite{gropp-lusk:mpich-www,gropp-lusk-doss-skjellum:mpich}

148: implementation of MPI.

149:

150: Our first attempt to use formal verification to ensure correctness of

151: MPD algorithms \cite{mpd-acl2} employed the ACL2 \cite{acl2:book}

152: theorem prover, which allows one to both simulate and verify a model

153: within a single environment. Components of the MPD system,

154: as well as the elements of the Unix socket library on which MPD is

155: based, were formalized in a subset of Lisp.  The formalization was based on

156: a so-called oracle \cite{acl2:oracle}, which allows analysis of a

157: parallel system in a sequential environment.  The oracle specifies an

158: execution interleaving of concurrent processes and is randomly

159: generated for simulations.  Verification is conducted with respect to

160: an arbitrary oracle (i.e., an arbitrary execution interleaving);

161: thus, a property proved in such a way holds for all possible

162: executions of a collection of concurrent processes.  In this approach

163: parsing simulation results, formulating desired properties of models

164: of MPD algorithms and reasoning about such models proved difficult,

165: leading us to abandon this traditional theorem-proving method of

166: verification and to try instead model-checking techniques.  Two such

167: techniques are described here.

168:

169: The first technique employed the model checker

170: \spin~\cite{spin:book,spin:article}, which

171: supports design and verification of asynchronous distributed

172: communicating systems.  Models of such systems are recorded in the

173: special high-level verification language \promela, which can also be

174: used to state some correctness properties of the models.  Additional

175: correctness properties are specified by using linear temporal logic.

176: The verification engine of \spin\ is based on on-the-fly reachability

177: analysis with several optimizations and heuristics, such as

178: partial-order reduction and bitstate hashing.  The system also

179: includes a simulation environment and a graphical user interface.

180: \spin\ has been used in various settings (see~\cite{spin-web-page} for

181: all of the proceedings of the \spin\ workshops).  Because the dynamic

182: nature of MPD is easily expressible in the \spin/\promela\ framework,

183: the tool is a natural choice for verification of our system.

184: However, our early experiences with \spin\ \cite{spin-mpd} suggest

185: that the most natural formalization of certain MPD algorithms in

186: \promela\ leads to performance and scalability challenges.  While we

187: have since addressed some of these challenges, as described in

188: Section~\ref{sec:spin}, they led us to explore other ways to formalize

189: and model check MPD algorithms.

190:

191: Specifically, because of considerable in-house theorem-proving

192: expertise, we were led to explore whether a theorem prover can be

193: successfully adapted for our purposes.  We have used the

194: general-purpose first-order resolution and paramodulation theorem

195: prover \otter\ \cite{McCune94a}. While the tool is widely used, its

196: primary application is in proof search, mainly in mathematics and

197: logic.  The input language is that of first-order logic.  Otter does,

198: however, contain extensions for evaluable terms, which are essential

199: to our unusual use of the theorem prover for state-space exploration.

200:

201: The remainder of the paper is structured as follows.  In

202: Section~\ref{sec:mpd} we describe MPD in more detail and briefly

203: present the algorithms, along with their correctness properties, for

204: which the verification approaches are evaluated.  In

205: Sections~\ref{sec:spin} and \ref{sec:otter} we outline formalizations

206: of the MPD algorithms in \promela\ and the input language of \otter. In

207: Section~\ref{sec:results} we compare and analyze concrete results of

208: specific verification experiments.  We conclude with a summary in

209: Section~\ref{sec:summary}.

210:

211: \section{The Multi-Purpose Daemon}

212: The MPD system comprises several types of processes.  The {\em

213:  daemons\/} are persistent (may run for weeks or months at a time,

214: starting many jobs) and are connected in a ring.  {\em Manager\/}

215: processes, started by the daemons to control the application

216: processes ({\em clients\/}) of a single parallel job, provide most of

217: the MPD features and are also connected in a ring.  A separate set of

218: managers supports an individual process environment for each user

219: process.  A {\em console\/} process is an interface between a user

220: and the daemon ring.  A representative topology of the MPD system is

221: shown in Figure~\ref{fig:mpds-all}.

222: \label{sec:mpd}

223: \begin{figure*}%

224:     \centerline{ \epsfxsize=4.75in \epsfbox{newmpds1.eps}}

225:     \caption{Daemons with console process, managers, and clients}

226:     \label{fig:mpds-all}

227: \end{figure*}

228: The vertical solid lines represent

229: connections based on pipes; the remaining solid lines represent

230: connections based on Unix sockets.  The dashed and dotted lines are

231: potential or special-purpose connections.

232:

233: Each of the daemon, manager, and console process types has essentially

234: the same pattern of behavior.  This feature important because it

235: allows us to model these processes in a consistent manner.  After

236: initialization, the process enters an infinite, mostly

237: idle, loop, implemented by the Unix socket function \texttt{select}.

238: When a message arrives on one of its sockets, the process calls the

239: appropriate message handler routine and reenters the idle {\tt

240:  select} state.   The handler does a small amount of processing,

241: creating new sockets or sending messages on existing ones.  The

242: logic of the distributed algorithms executed by the system as a whole

243: is contained primarily in the handlers, and this is where the

244: difficult bugs appear.

245:

246: In the following we compare the two verification approaches on two

247: algorithms: a daemon-level ring establishment algorithm and a

248: manager-level barrier algorithm.  We present an outline of each

249: algorithm and highlights of the modeling approach.

250:

251: \subsection{Ring Establishment Algorithm}

252: Establishment and maintenance of a ring of daemons are central to the

253: operation of MPD.   Informally, daemon ring creation proceeds as

254: follows.  The initial daemon establishes a listening port to which

255: subsequent connections are made.  The daemon connects to its own

256: listening port, creating a ring of one daemon.  The listening port of

257: the first daemon and the name of the host processor are queried from

258: the console.  The desired number of daemons is then initiated and

259: directed  to enter the ring by connecting to the first daemon.

260: Figure~\ref{fig:insert}

261: \begin{figure*}%

262:     \centerline{ \epsfxsize=4.75in \epsfbox{create1.eps} }

263:     \caption{Ring insertion}

264:     \label{fig:insert}

265: \end{figure*}

266: shows the result of inserting a new daemon into an existing ring.

267: After the insertion is completed, the old connection between

268: daemons on the right and left of the new daemon is disconnected (shown

269: in the figure by the dashed line).  Note that in the special case of

270: insertion into a ring of one daemon, the daemon plays both the left

271: and the right roles.

272:

273: The algorithm may be viewed as consisting of two parts, each of which

274: may potentially contain errors.  One part deals with establishment of

275: listening ports and sockets to enable bidirectional communication

276: between processes.  The second part concerns the passage of messages

277: over the established communication links and their handling upon

278: receipt.  Correct socket establishment depends to a large degree on

279: the correct implementation of the Unix socket library.   The \promela\

280: and \otter\ models described below do not check correctness of this

281: part of the algorithm but rather concentrate on the correctness of

282: passing messages between processors, correctness of message handling,

283: and correctness and consistency of the global system state.  Under

284: this partitioning of the algorithm, and assuming that each processor

285: records the identity of its right and left neighbors, the correctness

286: property, stated as a postcondition, can be formulated as follows.

287:

288: The algorithm is correct for $i$ daemons when, upon termination of the

289: algorithm, the ring has $i$ distinct connected components and when, for

290: any processors $i$ and $j$ in the ring, if the right neighbor of $i$

291: is $j$, then the left neighbor of $j$ is $i$.

292:

293: \subsection{Barrier Algorithm}

294:

295: Parallel programs frequently rely on a \textit{barrier} mechanism to

296: ensure that all processes of the job reach a certain point (complete

297: initialization, for example) before any are allowed to proceed

298: further.  Parallel jobs, that is, programs running on the clients, rely

299: on the manager processes to implement the barrier service.  The

300: algorithm proceeds as follows.  A manager process is designated as

301: the leader of the algorithm and is given a rank of 0.   When the

302: leader reads a request from its client to provide the barrier service,

303: it sends a message \texttt{barrier\_in} to its right-hand side neighbor

304: in the ring.  When a non-leader manager receives the

305: \texttt{barrier\_in} message, its behavior is determined by whether its

306: client has already requested the barrier service.  If the client has

307: done so, the manager forwards the message to the right-hand side

308: manager.  Otherwise, it holds the \texttt{barrier\_in} message until

309: the request from the client arrives.  While the \texttt{barrier\_in}

310: message is held, a bit variable \texttt{holding\_barrier\_in} is set.

311: Once the \texttt{barrier\_in} message traverses the entire manager ring

312: and arrives back in the leader, meaning that each client has reached

313: the barrier and notified its manager, the leader sends a

314: \texttt{barrier\_out} message around the ring.  When a manager receives

315: the \texttt{barrier\_out} message, it notifies its client to proceed

316: past the barrier.  The leader can be either the first or the last

317: manager to allow its client to proceed.

318:

319: The barrier algorithm, as the ring establishment algorithm, can be

320: viewed as consisting of two parts: socket handling and message

321: handling.  Here, however, the socket handling portion of the algorithm

322: is largely unimportant to the verification effort, since the

323: communication paths between processes are completely established

324: before the algorithm begins execution.  The socket portion may become

325: important in future verification efforts that concentrate on

326: \textit{interaction} of MPD algorithms and examine the barrier

327: algorithm in conjunction with an algorithm that manages --- that is,

328: establishes, breaks down, or restores after a fault --- connections

329: between processes.

330:

331: Two correctness properties, a postcondition and an invariant, are

332: verified for the barrier algorithm.  The postcondition property is that

333: all clients have been released from the barrier.  The invariant

334: demands that no client be released  until every client

335: has reached the barrier, that is, every client has requested the

336: barrier service from its manager.

337:

338: \section{Formalization and Verification of MPD with \spin}

339: \label{sec:spin}

340:

341: To formalize an MPD algorithm in \promela, the modeling language of

342: \spin, one has to make three related decisions: what variables

343: need be defined globally and locally to record the necessary

344: information during the execution of the algorithm, how to model

345: the communication network, and whether and how to model the Unix

346: socket functions.  When considered together, the three modeling

347: decisions determine how abstract the resulting model will be.

348:

349: The original approach \cite{spin-mpd} to model checking MPD algorithms

350: with \spin\ produced models that were too literal, meaning that the

351: \promela\ modeling language was used in a way that closely resembled

352: C.  The motivation for such an approach was to enable automated

353: extraction of executable C code from verified \promela\ models.  As a

354: consequence of such a modeling methodology it was possible to

355: formalize within a single model both the socket-handling and the

356: message-handling portions of the MPD algorithms.  Unfortunately,

357: another consequence of the approach was poor verification performance.

358: We were able to verify the ring establishment algorithm on only a few (

359: less than five) daemons, far short of the desired goal to verify

360: models with ten to twenty processors (see Section~\ref{sec:size-note}

361: for discussion of why verifying models of this size would be

362: interesting).  During verification attempts on

363: larger models, the number of states in the search space was so large

364: that \spin\ ran out of nearly 1 GB of available memory.   The current

365: formalization approach produces more abstract models, which enables

366: verification to complete within the constraints of available memory on

367: larger models, but at the expense of not considering

368: correctness of the socket-handling portion of the algorithms.

369:

370: Components of the MPD system map naturally to predefined {\sc Promela}

371: entities: a

372: {\tt proctype} is defined for each different MPD process type;

373: individual daemon, manager, console, and client processes correspond to

374: active instances of the corresponding {\tt proctype}s; sockets map to

375: channels; and messages that are read and written over the sockets

376: correspond to messages traveling on the channels.  The functionality of

377: the Unix {\tt select} \cite{stevens-unp1} is implemented by the

378: nondeterministic {\tt do} construct: when a message arrives in the

379: input queue of a process instance, the appropriate guard of the {\tt

380:   do} construct triggers its handling; in the absence of new messages

381: in the input queue, the instance is blocked.

382:

383: The distinguishing characteristic of the \promela\ model of the

384: barrier algorithm is that the clients are not represented by separate

385: process instances. To do otherwise would waste of precious memory. In

386: \spin, each process instance requires a certain amount of space in the

387: description of the entire system state, otherwise known as the state

388: vector.  The functionality of a client is to send a request message to

389: its manager and then wait for a reply.  Both these actions can be

390: adequately modeled by two binary variables, \verb2client_barrier_in2

391: and \verb2client_barrier_out2, per manager process.  We note that the

392: algorithm depends on the fact that a manager records whether it has

393: already received a barrier request from its client, so one of these

394: variables is present in the model regardless of how clients are

395: formalized.  Omission of clients reduces the complexity of the model

396: and the size of the state vector in another way: communication links

397: between managers and clients are not necessary.  Connections between

398: managers are the only communication links that have to be modeled.  To

399: this end we define a global vector of $N$ channels, where $N$ is the

400: number of managers in the model.  Each manager process has local

401: pointers, \texttt{left} and \texttt{right}, to the appropriate

402: channels in the global array.  To manage the bit variables, we rely on

403: an implementation of bit arrays by Ruys \cite{spin:low-fat}, which

404: defines the \texttt{IS\_0} and \texttt{SET\_1} macros.  The

405: formalization ensures that the handler of a barrier request message

406: from the client is executed only once.  The use of the {\tt do}

407: construct ensures that all messages that arrive at the manager have

408: equal precedence and can be handled in any order. We show a partial

409: \promela\ model of the barrier algorithm:

410: {\small

411: \begin{verbatim}

412: end_select:

413: do

414: :: IS_0(client_barrier_in,_pid) ->

415:    SET_1(client_barrier_in,_pid);

416:    /* remainder of client request handling */

417: :: left?barrier_in ->

418:    /* barrier_in handling */

419: :: left?barrier_out ->

420:    /* remainder barrier_out handling */

421:    SET_1(client_barrier_out,_pid);

422:    if

423:    :: _pid == 0 -> skip

424:    :: else -> right!barrier_out

425:    fi;

426: od

427: \end{verbatim}}

428: %

429: %

430: %

431:

432: The socket-based communication network presented the most challenges

433: in the formalization of the ring establishment algorithm.  Unlike the

434: static topology of the communication links in the barrier algorithm,

435: the topology of connections in the ring algorithm is dynamic.  As

436: daemons enter the ring, new connections are established, and the old

437: ones are broken down, as illustrated in Figure~\ref{fig:insert}. In

438: addition, because all execution interleavings are exhaustively

439: considered during the model search, the network has to be set up so

440: that any two processes can communicate with each other. Numerous

441: different formalizations of the communication network were tried

442: in our effort to verify a model of meaningful size.  In particular, we

443: tried the bus and matrix of channels formalizations, as suggested

444: in \cite{phd:ruys}.

445:

446: Our experiments showed that, with respect to verification, the best

447: approach to modeling the communication network of the ring

448: establishment algorithm depends on a vector of global channels.  Each

449: daemon process owns a channel, which serves as an input queue for any

450: messages addressed to that process.  In order to further reduce

451: complexity, as suggested by \spin's documentation, each daemon retains

452: exclusive rights, by using the  \texttt{xr} designation, to read from

453: its owned channel.  In this approach, illustrated in

454: Figure~\ref{fig:channels},

455: \begin{figure}%

456:     \centerline{ \epsfxsize=1.6in \epsfbox{channels.eps} }

457: \caption{Model for message sending and receiving}

458: \label{fig:channels}

459: \end{figure}

460: a single input queue is used by a node to receive messages from all

461: nodes, but different queues are used to send messages addressed to

462: different nodes.

463: We find it interesting that this approach is explicitly advised

464: against in the \spin\ help file: to reduce complexity, the

465: documentation suggests one ``avoid sharing channels between multiple

466: receivers or multiple senders.'' This approach nonetheless succeeds in

467: our model because, while the network is set up to allow sharing

468: channels between multiple senders, most communications take place

469: between distinct pairs of daemons, so channel sharing very rarely occurs

470: during exploration of a single execution path. Thus, the complexity

471: of the model is not adversely affected.

472:

473: \section{Formalization and Verification of MPD with \otter}

474: \label{sec:otter}

475:

476: \otter\ \cite{ McCune94a} is an automated theorem-proving system for

477: first-order logic with equality.  Most successful applications of

478: \otter\ have been in abstract algebra and logic, and its strength is

479: the ability to quickly explore large search spaces.  It can

480: efficiently manage large sets of formulas (hundreds of thousands),

481: which allows it to automatically prove many difficult theorems.

482:

483: \otter's basic operations are (1) to apply various inference rules

484: to formulas, (2) to apply rewrite rules to inferred formulas, and (3)

485: to determine whether newly inferred formulas are already in the

486: database of formulas.  These operations can be applied to

487: state space searches as well as to the heuristic searches

488: used in traditional automated theorem proving.

489:

490: Program verification is a nonstandard application of \otter.

491: Fortunately, \otter's language, data structures, and operations allow

492: reasonably intuitive and efficient implementations of model checking

493: by state space search.  One of the languages \otter\ accepts is a

494: sequent language with which we can write rules, assertions, and goals

495: as follows.

496:

497: \[

498: \begin{array}{l}

499: \mbox{\it hypothesis}_1, \cdots, \mbox{\it hypothesis}_n \imp conclusion.\\

500: \imp \mbox{\it assertion}.\\

501: \mbox{\it goal} \imp.

502: \end{array}

503: \]

504:

505: \noindent

506: Given an input consisting of statements of this type, \otter\

507: can apply the rules to the assertions, generating new assertions,

508: and so on, until it derives one of the goals or until reaches a

509: fixed point (i.e., runs out of things to do).  All of the assertions

510: (initial and derived) are stored in the database.

511:

512: A feature of \otter\ that allows it to conduct an efficient

513: state-space search is the ability to evaluate hypotheses by rewriting,

514: as well as to match the hypotheses with assertions.  For example, the

515: hypotheses $X == 3$, where $X$ is instantiated by a preceding hypothesis,

516: is evaluated to a Boolean value.  The rewrite mechanism includes a

517: simple but general equational programming language so that the evaluable

518: hypotheses can be arbitrary function calls.  In addition,

519: the conclusion of the rule can have function calls to transform

520: the result.

521:

522: One can use this mechanism to implement a state space search in

523: a straightforward way.  The initial states are initial assertions

524: (usually there is just one), say,

525: \[

526: \imp \mbox{\it State($s_0$)},

527: \]

528: where $s_0$ is a data structure representing the initial state.

529: The rules take states to successor states,

530: \[

531: \mbox{\it State(S)}, \mbox{\it P(S)} \imp \mbox{\it State(f(S))},

532: \]

533: where {\it P(S)} are evaluable hypotheses specifying the states to

534: which the rule applies, and {\it f(S)} generates the successor state.

535:

536: When applying this method to model-checking concurrent computations,

537: one frequently needs the hypothesis ``Let $N$ be an arbitrary process.''

538: To implement this capability, we include initial assertions giving the

539: set of process IDs, say, for a three-processor simulation,

540: \[

541: \begin{array}{l}

542: \imp  \mbox{\it PID(0)}.\\

543: \imp  \mbox{\it PID(1)}.\\

544: \imp  \mbox{\it PID(2)}.

545: \end{array}

546: \]

547: Then, the rules can be given as

548: \[

549: \mbox{\it State(S)}, \mbox{\it PID(N)}, \mbox{\it P(S,N)} \imp \mbox{\it State(f(S,N))}.

550: \]

551:

552: In its basic form, the \otter\ search mechanism is designed to be

553: complete; that is, if a conjecture is a theorem, then \otter\ will

554: eventually find a proof.  (In the practice of traditional automated

555: theorem proving, restrictions are frequently imposed by the user,

556: resulting in incomplete search procedures.)  In the context of

557: state space search, this completeness means that (given enough

558: time and memory) all states that can be reached from the initial

559: states will eventually be derived and stored by \otter.

560:

561: To determine whether an illegal state can be reached, one can include

562: a rule such as

563: \[

564: \mbox{\it State(S)}, \mbox{\it P(S)} \imp \mbox{\it Bad\_State(S)}

565: \]

566: and the goal

567: \[

568: Bad\_State(S) \imp,

569: \]

570: with the effect that if any illegal state is derived, \otter\

571: will report it and print the path of states from an initial state to

572: the illegal one.  From \otter's point of view, it has simply found and

573: printed a proof of a goal.  From the point of view of verification,

574: the path of states represents an erroneous execution scenario.

575: The \otter\ proof of the reachability of an illegal state can be examined

576: to determine which state transition rule leads to the error. If \otter\

577: reaches a fixed point without deriving an illegal state, the terminal

578: states or the entire state space can be examined and processed by the

579: user or by another system.

580:

581: We show a

582: %

583: partial three-element \otter\ model\ of the barrier algorithm:

584: %

585: %

586: {\small

587: \begin{verbatim}

588: %

589: -> PID(0).

590: -> PID(1).

591: -> PID(2).

592:

593: %

594: -> State(PS(0,0,0,[]),PS(0,0,0,[]),PS(0,0,0,[])).

595:

596: %

597: State(S), PID(X), X == 0, $TRUE(barrier_out_arrived(S,X)) ->

598: State(assign_client_return(receive_message(S,X),X,1)).

599: \end{verbatim}

600: \newpage

601: \begin{verbatim}

602: State(S), PID(X), X != 0, $TRUE(barrier_out_arrived(S,X)) ->

603: State(assign_client_return(send_message

604:                            (receive_message(S,X),next(X),barrier_out),X,1)).

605:

606: %

607: -> barrier_out_arrived([],I) = false.

608: -> barrier_out_arrived([PS(CLI_IN,CLI_OUT,MSG,Q)|T],I) =

609:    $IF(I == 0,

610:        first(Q) == barrier _out,

611:        barrier_out_arrived(T,I-1)).

612:

613: -> assign_client_return([],I,V) = [].

614: -> assign_client_return([PS(CLI_IN,CLI_OUT,MSG,Q)|T],I,V) =

615:    $IF(I == 0,

616:        [PS(CLI_IN,V,MSG,Q)|T],

617:        [PS(CLI_IN,CLI_OUT,MSG,Q)|assign_client_return(T,I-1,V)]).

618:

619: -> dequeue([]) = [].

620: -> dequeue([H|T]) = T.

621:

622: -> receive_message([],I) = [].

623: -> receive_message([PS(CLI_IN,CLI_OUT,MSG,Q)|T],I) =

624:    $IF(I == 0,

625:        [PS(CLI_IN,CLI_OUT,MSG,dequeue(Q))|T],

626:        [PS(CLI_IN,CLI_OUT,MSG,Q)|receive_message(T,I-1)]).

627: \end{verbatim}}

628:

629: %

630: %

631: %

632:

633:

634: The model is

635: based on the same simplification as the \promela\ model of the

636: algorithm: clients are not explicitly represented.  A valid system

637: state consists of three process states.  Each process state {\tt PS}

638: is a 4-tuple, whose every element implicitly corresponds to either one

639: of the variables used by the algorithm or the input queue.  In this

640: model the process state template is

641:

642: \begin{verbatim}

643:       PS[client_barrier_in, client_barrier_out,

644:          holding_barrier_in, input_queue].

645: \end{verbatim}

646:

647: In the initial state of each process, every variable has a zero value,

648: and the input queue is empty. The two transition rules correspond to

649: the handling operations for the \verb2barrier_out2 message, as also

650: shown in

651: the \promela\ model above.   The rules are applicable when the client

652: barrier requests have been received by all processes, that is, the

653: \verb2client_barrier_in2 bit has been set by every process, any

654: \verb2holding_barrier_in2 bits that might have been set are again

655: reset.  In this formalization, the manager with 0 rank is the last to

656: release its client, so the state of the system to which the first of

657: the two state transition rules applies is

658:

659: \begin{verbatim}

660:              (PS[1,0,0,(barrier_out)],

661:               PS[1,1,0,[]],

662:               PS[1,1,0,[]]),

663: \end{verbatim}

664:

665: \noindent

666: and the result of the application of the rule, including evaluation of

667: the functions

668: \verb2assign_cli-2 \verb2ent_return2 to assign the value of the

669: \verb2client_barrier_out2 bit and \verb2receive_message2 to remove a

670: message from the input queue, is the overall system state

671:

672: \begin{verbatim}

673:                  (PS[1,1,0,[]],

674:                   PS[1,1,0,[]],

675:                   PS[1,1,0,[]]).

676: \end{verbatim}

677:

678: Formalization of MPD algorithms in the sequent language of \otter\ is

679: similar to formalizations in the input language of the Mur$\phi$

680: model checker, especially as presented in the simplified form in

681: \cite{murphi:format}.  Similarities are most evident in the definition

682: of state transition rules.

683:

684: \section{Results and Comparisons}

685: \label{sec:results}

686:

687: At this point in the project the goal is to discover whether existing

688: tools and methodologies can be used for verification of MPD

689: algorithms.  We are also interested in developing formalization

690: techniques that simultaneously produce models abstract enough to allow

691: us to verify models of meaningful size and detailed enough to enable

692: automated model construction and/or code generation.  We apply

693: verification approaches to MPD algorithms that have been developed

694: some time ago, and are well understood and well debugged.  That

695: is, bug hunting is currently only a secondary  goal. (We have

696: nonetheless found a minor error in the ring insertion algorithm using

697: the literal formalization of our earlier approach to using \spin.)

698: Rather, we are interested in developing an arsenal of verification

699: techniques to use for the analysis of the recent and future MPD

700: algorithms.  Therefore, criteria for evaluation of verification

701: methods include the ease of modeling, correlation of the model to the

702: design and/or implementation of the MPD algorithm, and verification

703: performance.

704:

705: \subsection{Comparison of Formalizations}

706:

707: The \spin/\promela\ approach is the more natural one for the MPD

708: application, but the \otter\ approach does offer some, albeit possibly

709: subjective, advantages.

710:

711: \spin\ and its input language \promela\ are specifically developed for

712: verification of concurrent communicating processes.  Therefore, the

713: language and the tool include special built-in constructs and

714: algorithms for message handling, communication path definition, and

715: variable declaration and manipulation.  As a result, there is a

716: natural mapping between MPD components and \promela\ entities.  By

717: contrast, \otter, as a general-purpose tool, has to be programmed from

718: scratch to handle these common operations of MPD algorithms.

719: Although, once constructed, this auxiliary portion of the \otter\

720: model can be reused,

721: the \spin\ models are much more concise, as demonstrated

722: by the model extracts above.

723:

724: From our point of view, the ability to  model an algorithm as a set of

725: state transition rules is the main advantage of the \otter\ approach.

726: The set of rules, which is the main component of the \otter\ model,

727: can be easily extracted from a flow chart representation of the MPD

728: algorithms or from other notations typically used in the early stages

729: of algorithm design.  Construction of \promela\ models looks and feels

730: much more like constructing another implementation of the algorithm,

731: only using a language much less powerful than C or Python, which are

732: used for actual implementations.  Furthermore, all intermediate states

733: produced by the \otter\ search are transparent to the user and can be

734: examined.  Such examination is not possible in \spin\ and may be a

735: limitation when trying to understand a particularly complicated error

736: trace.

737:

738: Correctness properties of \otter\ models of MPD algorithms have to be

739: formulated within the confines of first-order logic.  \spin, on the

740: other hand, allows one to record correctness properties in linear

741: temporal logic.  Although we have not yet encountered a situation where

742: limitations of first order logic prevented us from stating the desired

743: correctness property, such a situation is conceivable and may prove a

744: disqualifying drawback of the \otter\ approach.

745:

746: \subsection{Performance Comparison}

747:

748: All verification runs were conducted on a 933 MHz Pentium~III

749: processor with 970 MB of usable RAM.  We used default \spin\

750: settings for all verification attempts, except when we increased the

751: memory limit to allow the search to complete. In cases where

752: verification did not complete with default parameters within physical

753: memory limits, verification with compression (using \spin's

754: \texttt{-DCOLLAPSE} compile-time directive) was performed.  The input

755: files for the \otter\ experiments included settings to optimize the

756: state space exploration.  Some default flags that are usually needed

757: in standard \otter\ experiments but irrelevant for our application

758: were turned off.

759:

760: Table~\ref{tbl:stats} shows performance statistics of applying \spin\

761: \begin{table*}[t]

762: \caption{Verification statistics}

763: \vspace{.2in}

764: \centerline{

765: \begin{tabular}{lc|c|c|c|c|c}

766:                &        & Model & Time & Memory & States & States \\

767: Problem        & Method & Size  & (s)  & (MB)   & Stored & Matched \\\hline

768: Unordered ring & Otter & 7  & 3979.08 & 391 & 7.83654e+05 & 1.908330e+06  \\

769:                & Spin  & 7  & 30.32  & 375 & 2.27008e+06 &1.90833e+06 \\

770: \hline

771: Ordered ring   & Otter & 14 & 15676 & 561 & 1.028351e+06 & 6.318042e+06 \\

772:                & Spin  & 14 & 263.68  & 734 & 6.50332e+06 & 8.66146e+06 \\

773: \hline

774: Barrier        & Otter & 19 & 27082 & 512 & 1.048593e+06 & 8.912898e+06\\

775:                & Spin  & 21 & 2045 & 746 & 8.38865e+06 & 8.38861e+07

776: \end{tabular}}

777: \vspace{0.1in}

778:     \label{tbl:stats}

779: \end{table*}

780: and \otter\ to three problems.  Two problems are variants of the ring

781: insertion algorithm.  In both versions the first daemon establishes a

782: singleton ring.  In the unordered version the subsequent daemons may

783: enter the ring in any order, resulting in many possible final

784: topologies with respect to the relative position of the processors in

785: the ring, and hence a much greater state search space and poorer

786: verification performance.  In the ordered version, the order in which

787: the processes enter the ring is fixed, resulting in a single possible

788: final topology.  In essence, the difference between the two versions

789: is that in the unordered case the daemons are numbered {\it before}

790: they begin to enter the ring, whereas in the ordered case they are

791: numbered {\it after} they have done so.  As a result, in the ordered

792: case, the processor that enters the ring first is always numbered with

793: one, the processor that is second, with two, and so on.  Because the

794: algorithm is independent of the identities of the processors, the

795: ordered version is less complex.

796:

797: Table 1 shows statistics for the largest model sizes on which a

798: particular verification approach succeeded.  On complex algorithms,

799: such as ring insertion, \otter\ matches \spin\ with respect to the

800: largest verifiable model.  In terms of speed and memory usage per

801: examined state, \otter\ is far behind \spin.  This result is not

802: surprising because \spin\ is a special-purpose tool, specifically

803: designed for applications like ours, while our use of \otter\ is

804: unusual in this case.  In fact, the performance of \otter\ far

805: exceeded our expectations.

806:

807: \otter\ could not verify models of as many processors as \spin\ could

808: for thr barrier algorithm.   The explanation for such a difference in

809: performance lies in the way states are represented in each method.

810: In \spin, the three local variables that contribute to individual

811: process states are bits, and the communications channels are

812: essentially arrays of bits, which allows \spin\ to store state vectors

813: very efficiently.  In addition, in the \spin\ verification run on a

814: model of twenty-one processors, state vectors were compressed,

815: resulting in further improved performance. In \otter, the

816: variables and the input queue are terms, which are not stored as

817: efficiently as bits.  Thus, \otter\ requires much more memory to store

818: an individual system state.  As a result, \otter\ is able to examine a

819: much smaller state spaces.

820:

821: The \otter\ search engine is not optimized for the kind of search that

822: takes place in this application.  It is therefore not surprising that

823: verification with \otter\ is several times slower than with

824: \spin. But, since memory, not time, is the main limitation in this

825: application specifically and in model checking in general, the speed

826: of verification is only of minor concern.

827:

828: \subsection{A Note on Model Sizes}

829: \label{sec:size-note}

830: Scalability was one of the design goals of MPD.  The daemon is

831: intended to run on hundreds, and eventually on thousands, of

832: processors.  Unexpected limitations of the underlying operating

833: systems aside, it is assumed that the same algorithms that execute

834: successfully in a daemon of just a few processors will execute

835: smoothly in a

836: much larger system.  Given the current state of the model-checking

837: technology, it is impossible to formally verify the algorithms of MPD

838: on very large models sizes, even if it were desirable to do so.

839: Luckily, the empirical evidence obtained while debugging the MPD code

840: using the traditional means suggests that the errors, even the most

841: difficult and obscure ones, exhibit themselves in daemons of just a few

842: processors.  It also appears to be an accepted view of the

843: model-checking community that to verify systems with unbounded

844: potential number of elements, it is sufficient to verify a limited( with

845: respect to the number of elements) model of the system

846: \cite{inca-compare,dill-protocol-hardware-92,jackson-counter-detector}.

847:

848: Our goal is to devise a verification approach and a modeling

849: methodology that allows us to verify complex MPD algorithms and

850: interactions of these algorithms on models of ten to twenty daemons.

851: For example, MPD contains an algorithm that merges two daemons that

852: are running parallel jobs.  We want to verify the algorithm on a

853: meaningful model consisting of two MPD structures, as shown

854: in Figure~\ref{fig:mpds-all}, each having a console process and

855: three of each of daemon, manager, and client processes.

856:

857: \section{Summary}

858: \label{sec:summary}

859:

860: We described here two approaches to verification of the algorithms of

861: the parallel process management system called MPD.  One approach is

862: based on the software model checker \spin, the other on the

863: general-purpose first-order theorem prover \otter.  Both approaches

864: are based on model checking, and the use of \otter\ in the model

865: checking-capacity is unusual.  The aim was to model algorithms of MPD

866: in both the \spin\ and the \otter\ approaches so as to enable

867: verification of the largest possible model.

868:

869: The two approaches were compared with respect to the ease of

870: formalization and verification performance characteristics.  Overall,

871: \spin\ is more efficient in terms of absolute time and memory

872: requirements and relative time and memory requirements per stored

873: system state.  In terms of the size of models that each approach

874: allows us to verify, both tools perform roughly the same, with \spin\

875: occasionally demonstrating better performance.  Neither approach allows

876: us to verify complicated algorithms on models of about twenty daemons,

877: which is our goal.  In terms of formalization methodology, the two

878: approaches are too different to compare, and both exhibit

879: advantages and disadvantages.

880:

881: The main goal of this technical note is to document the current

882: approach to modeling MPD in \promela, and to describe how \otter\ can

883: be used to simulate model-checking style search.  We have been

884: applying model checking techniques to MPD algorithms that have been

885: under development and testing for some time.  Consequently, it should

886: come as no surprise that no errors have been discovered.  Even though

887: the current modeling methodologies limit verification to models of

888: only a few entities, applying them to new MPD algorithms could still

889: be beneficial.

890:

891: For further details the reader is referred to the

892: complete models of the presented algorithms.  This information is

893: available at \texttt{http://www.mcs.anl.gov/\~{ }matlin/spin-mpd}.

894:

895: %

896:

897: \addcontentsline{toc}{section}{References}

898: \bibliographystyle{plain}

899: \bibliography{master}

900:

901: %

902:

903: %

904: \end{document}

905: