cs0310007/paper.tex
1: 
2: \documentclass{aadebug}
3: 
4: % Uncomment the following line for the CoRR proceedings. The first number
5: % will be e-mailed to you after the workshop, the second is the page number
6: % of the first page of your article. Please look this number up in the
7: % proceedings.
8: \corr{0309027}{237}
9: 
10: \newtheorem{Def}{Definition}
11: 
12: \begin{document}
13: 
14: \runningheads{Schaubschl\"ager, Kranzlm\"uller, Volkert}
15: {Event-based Program Analysis with DeWiz}
16: 
17: \title{Event-based Program Analysis with DeWiz}
18: 
19: \author{
20: Christian~Schaubschl\"ager\addressnum{1},
21: Dieter~Kranzlm\"uller\addressnum{1},
22: Jens~Volkert\addressnum{1}
23: }
24: 
25: \address{1}{
26: GUP,
27: Joh. Kepler University Linz,
28: Altenbergerstr. 69,
29: A-4040 Linz,
30: Austria/Europe
31: schaubschlaeger@gup.uni-linz.ac.at
32: }
33: 
34: 
35: % This information will show up in `Document Properties' in Acrobat Reader
36: \pdfinfo{
37: /Title (Event-based Program Analysis with DeWiz)
38: /Author (Christian Schaubschl\"ager et al.)
39: }
40: 
41: \begin{abstract}
42: Due to the increased complexity of parallel and distributed programs, debugging
43: of them is considered to be the most difficult and time consuming part of the
44: software lifecycle. Tool support is hence a crucial necessity to hide complexity
45: from the user. However, most existing tools seem inadequate as soon as the
46: program under consideration exploits more than a few processors over a long
47: execution time. This problem is addressed by the novel debugging tool DeWiz
48: (Debugging Wizard), whose focus lies on scalability. DeWiz has a modular,
49: scalable architecture, and uses the event graph model as a representation of the
50: investigated program. DeWiz provides a set of modules, which can be combined to
51: generate, analyze, and visualize event graph data. Within this processing
52: pipeline the toolset tries to extract useful information, which is presented to
53: the user at an arbitrary level of abstraction. Additionally, DeWiz is a
54: framework, which can be used to easily implement arbitrary
55: user-defined modules.
56: \end{abstract}
57: 
58: \keywords{Program analysis; Debugging; Parallel computing; Distributed computing}
59: 
60: \section{Introduction}
61: 
62: It is well known, that performance analysis and program debugging, respectively, are
63: two of the most time consuming and complex parts of the software life-cycle. This
64: is especially true for parallel or distributed programs, since parallelism and
65: (inter-process) communication introduce new obstacles which are unknown in
66: sequential programs, and increase the complexity of the software development
67: process.
68: 
69: During the past years many program analysis and debugging tools have
70: been developed, using different approaches to hide the complexity of the analyzed
71: or debugged program from the user. Due to the (at least) two-dimensional nature
72: of the analysis data, namely time and space (in terms of processes), some kind of graphical
73: representation has turned out to be the most useful way to present the analysis
74: data to the user. Several approaches of graphical representation have been
75: proposed, most of them visualize a given program execution as a two-dimensional
76: space-time diagram. There is a broad range of tools in this field, for example
77: Vampir \cite{Nage96} and Paradyn \cite{Mill94}, just to list two. Some tools use
78: three dimensional environments like a CAVE to visualize a program execution,
79: for example as a Time Tunnel as described in \cite{Reed95}.
80: 
81: A characteristic of parallel programs, which is becoming
82: increasingly important for tool developers, is scalability. With multiprocessor
83: machines and clusters deploying hundreds or thousands of processors, and grid
84: infrastructures combining large numbers of distributed resources, scalability of
85: program analysis tools seems a basic necessity.
86: An important factor which limits scalability of tools, is the sheer amount of
87: analysis data. Therefore it is inevitable for any analysis tool to keep
88: the amount of data presented to the user at manageable sizes. This can be achieved in
89: two ways: firstly by addressing the data collection phase, i.e. by reducing the
90: actual amount of collected data. This approach is utilized in Paradyn, where
91: the amount of collected data is reduced through dynamic
92: instrumentation \cite{HoMi94}. The underlying idea is to extract only those data
93: items, that are actually needed for program analysis. This reduces the
94: total amount of analysis data and thus permits to investigate even large scale
95: programs.
96: 
97: On the other hand, even with data reduction applied in the collection
98: phase, the amount of trace data can grow to an enormous size on large scale
99: programs which utilize a large number of processors ofer a long execution time
100: which may exceed days, weeks, or even months. This makes it necessary to
101: focus on scalability also during the data analysis phase. Obviously trace
102: data must be analyzed in a reasonable time and the results must be presented
103: to the user in a meaningful way. Abstraction and graphical representation are the
104: two most important concepts to achieve scalability. An example for such an
105: abstraction mechanism can be found in EDL, the Event Definition Language
106: introduced by Bates and Wileden \cite{BaWi83}. EDL uses two essential mechanisms for event
107: abstraction: filtering and clustering. With filtering, all but a designated
108: subset of events can be deleted from the original event stream. Clustering means,
109: that one or more primitive events are gathered together into a higher level
110: event. EDL has lead to the high-level debugging approach EBBA, Event Based
111: Behavioural Abstraction \cite{Bat95} and the program behaviour models of
112: FORMAN \cite{Aug98}. Both
113: models follow the idea that the behaviour observed in parallel programs
114: may reveal useful patterns, which can be evaluated during program analysis.
115: Another, more recent approach of program monitoring is EARL and has
116: been proposed by Wolf and Mohr in \cite{WoMo98}. EARL stands for Event
117: Analysis and Recognition Language and it allows to construct target independent
118: monitoring and analysis tools by writing scripts in the EARL language.
119: 
120: In this
121: paper we describe the scalable and modular debugging tool DeWiz (Debugging
122: Wizard), which uses the event graph model to represent a program's execution.
123: Data analysis and presentation is done by independent modules, which try to
124: automatically extract useful information. In Section 2 the architecture of DeWiz
125: is discussed, while in Section 3 we give some examples that show how DeWiz can be
126: used for program analysis. Finally, an outlook on future work concludes the paper.
127: 
128: \section{Tool Architecture}
129: The approach of DeWiz stems from our work on the Monitoring and Debugging
130: environment MAD \cite{KrGr97}. MAD is a collection of software tools for debugging message
131: passing programs based on the MPI standard \cite{Mpi95}. At the core of this toolset are
132: the monitoring tool NOPE and the visualization tool ATEMPT. Although originally
133: developed for message passing programs, the toolset, especially the
134: monitor NOPE, recently has been extended so that also shared memory codes can be traced.
135: The motivation for this extension was, that some of todays architectures are
136: best utilized by using a hybrid MPI/OpenMP programming style \cite{Rab02}.
137: 
138: In the following we will describe the architecture, the theoretical model, as
139: well as some implementation aspects of DeWiz in more detail.
140: 
141: \subsection{Event Graph}
142: 
143: As mentioned above, in DeWiz program executions as recorded with NOPE
144: or event streams generated by online monitors
145: are represented as event graph, which can be defined as follows:
146: 
147: \begin{Def}[Event Graph \cite{Kra00}]
148: An event graph is a directed graph $G=(E,\rightarrow)$ , where $E$
149: is the non-empty set of events $e \in E$, while $\rightarrow$ is a relation
150: connecting events, such that $x \rightarrow y$ means that
151: there is an edge from event $x$ to event $y$ in $G$ with the "tail" at event
152: $x$ and the "head" at event $y$.
153: \end{Def}
154: 
155: The events $e \in E$ of an event graph are the events observed during a program's
156: execution, like for example send or receive events in message passing programs,
157: and read or write memory accesses in a shared memory program. In case of NOPE
158: there is a standard set of events that will be traced, namely (amongst others)
159: all MPI point-to-point communication events. However, it is easily possible
160: to specify additional user-defined events to be recorded with NOPE, which adds
161: great flexibility to the tool.
162: 
163: The relation connecting the events of an event graph is the
164: {\em happened-before relation},
165: which is the transitive, irreflexive closure of the union of the
166: relations $\stackrel{S}{\rightarrow}$ and $\stackrel{C}{\rightarrow}$. It
167: has been defined as follows:
168: 
169: \begin{Def}[Happened-before relation \cite{Lam78}]
170: The happened-before relation $\rightarrow$ is defined as\\
171: \begin{center}
172: $\rightarrow = (\stackrel{S}{\rightarrow} \cup \stackrel{C}{\rightarrow})^{+}$\\
173: \end{center}
174: where $\stackrel{S}{\rightarrow}$ is the sequential order of events
175: relative to a particular responsible object,
176: while $\stackrel{C}{\rightarrow}$ is the concurrent order relation connecting
177: events on arbitrary responsible objects.
178: \end{Def}
179: 
180: In other words, the relation $\stackrel{S}{\rightarrow}$ defines the sequential
181: order of events on a particular process, with the meaning that if two events
182: $e_p^i$ and $e_p^j$ occur on the same process and $e_p^i$ occurs before $e_p^j$
183: then $e_p^i \stackrel{S}{\rightarrow} e_p^j$.
184: The concurrent order relation $\stackrel{C}{\rightarrow}$ describes the order
185: of corresponding events on different processes, which is established by
186: communication and synchronization. If $e_p^i$ is a send event on process $p$
187: and $e_q^j$ is the corresponding receive event on process $q$, then
188: $e_p^i \stackrel{C}{\rightarrow} e_q^j$.
189: 
190: The DeWiz toolset uses the event graph model as its theoretical fundament. The tool
191: itself consists of three main components, the {\em modules}, the
192: {\em protocol}, and a {\em framework}, which are required to construct a DeWiz
193: system for a concrete analysis task.
194: 
195: \subsection{DeWiz System}
196: 
197: A DeWiz system is built by connecting a set of DeWiz modules, which then act
198: as a kind of event-graph processing pipeline, i.e. the DeWiz modules are
199: responsible for the actual work in a DeWiz system. This modular approach
200: has several advantages. It makes the DeWiz system flexible and easily
201: extensible. Users can utilize existing modules or, if needed, implement their
202: own modules, hence adding arbitrary functionality to the system.
203: 
204: Basically we distinguish three kinds of modules:
205: 
206: \begin{itemize}
207:   \item Event graph generation modules
208:   \item Automatic analysis modules
209:   \item Data access modules
210: \end{itemize}
211: 
212: The modules in a DeWiz system communicate with each other using
213: a specialized protocol,
214: the DeWiz protocol. This protocol is based upon TCP/IP, which makes it
215: possible to distribute a DeWiz system across several computers.
216: Due to this approach, the monitoring and analysis tasks itself can
217: utilize a potentially large number of resources, e.g. by putting
218: the analysis tasks on the grid \cite{Foster}.
219: For example it would be feasible to execute only the
220: monitoring module on the computer where the monitored application
221: is running. The monitoring module would then send the collected
222: events to an analysis module which is executed on some other computer, and
223: so on. Since analysis or processing of monitored events in general can be
224: very time-consuming tasks, the distribution of these tasks can speed-up
225: the analysis process significantly.
226: 
227: As mentioned above we distinguish three types of modules. These will be
228: described in more detail in the following sections.
229: 
230: \subsubsection{Event Graph Generation Modules}
231: 
232: Event graph generation modules are those who produce the event graph data
233: stream from a given program execution. This can be done in two ways, either online or
234: post-mortem. In case of online tracing a DeWiz-Module connects to a running,
235: instrumented program, collects events which are generated by the online
236: monitor, and forwards these events to the next module in the DeWiz system.
237: Currently DeWiz supports online monitors which correspond to the
238: OMIS Compliant Monitor OCM \cite{WiTr98}. There is also an interface to the
239: OpenMP Pragma and Region Instrumentator OPARI \cite{MoMa01}.
240: 
241: In case of post-mortem tracing, events are read from tracefiles by a proper
242: DeWiz module. Currently there is a module for reading tracefiles generated
243: by NOPE.
244: 
245: \subsubsection{Automatic Analysis Modules}
246: 
247: Automatic analysis modules process an event graph stream and try to extract
248: useful information like for example communication patterns, or erroneous
249: behaviour like communication errors. The latter is relatively easy,
250: for example by simply comparing the message lengths at a send event
251: and at the corresponding receive event. If the lengths differ, it is an
252: indication for a possible communication error. A more challenging task
253: is to try to find communication patterns in an event graph. By applying
254: pattern-matching algorithms to the event graph, we try to identify patterns
255: like for example loops. If it is  possible to find any
256: irregularities in the pattern, this would again be a possible source for
257: an error in the investigated program.
258: 
259: \subsubsection{Data Access Modules}
260: 
261: At the end of the processing pipeline we have data access modules. Their
262: purpose is to display the various analysis-results, which were generated by
263: the predecessing modules, to the user. Depending on the kind of analysis data
264: a suitable form of visualisation will be chosen. In most cases this will be some
265: form of graphical representation, for example in form of a space-time diagram of
266: the event graph.
267: Figure~\ref{comm_failures} shows a visualization of an example message-passing
268: event-graph.
269: On the vertical axes the participating processes are displayed, whereas
270: the horizontal axes represent the time. The black arrows represent messages
271: which are sent from one process to another, with the tail of the arrow at
272: the send event on the source process, and the tip of the arrow at
273: receive event on the destination process.
274: The colored arrows indicate possible communication errors; these will be
275: described in more detail below.
276: 
277: \subsection{The DeWiz Protocol and Framework}
278: 
279: The DeWiz Protocol is used between modules to transport the event graph stream.
280: For this purpose it is necessary to define data structures which represent the
281: observed events. In our case the following two data structures have been defined:\\
282: 
283: \begin{center}
284:   event: $e_p^i = (p,i,type,data)$\\
285:   \vspace{\baselineskip}
286:   concurrent order relation: $e_p^i \rightarrow e_q^j = (p,i,q,j)$\\
287: \end{center}
288: 
289: The variables $p$ and $i$ represent the responsible object (e.g. a process) on which
290: the event occurred and its sequential order, respectively. The variable $type$
291: denotes the kind of event, in case of a message passing code a send or a receive
292: operation for example, or a semaphore lock in a shared memory environment. Currently
293: only message-passing and shared-memory events are supported, but due to its
294: flexibility, the event graph can be used to model any kind of software system.
295: Table~\ref{evt_table} gives a short overview of several possible software
296: systems, their corresponding event types and event data.
297: The $data$ variable can be used to store additional information concerning the event,
298: like for example timestamps or calling parameter of the function call that caused the
299: event.
300: 
301: \begin{table}
302: \begin{center}
303: \begin{tabular}{|p{4cm}|p{3cm}|p{5cm}|}
304: \hline
305:    target system & event type & event data \\[0.6cm]
306: \hline\hline
307:   parallel/distributed message-passing program &
308:   send &
309:   message data, message-length, destination,message-type,data-type,...\\[0.6cm]
310: \hline
311:   multi-threaded shared memory program & lock & semaphore, waiting time,...\\[0.6cm]
312: \hline
313:   database/transaction system & read record & table, location of table, access time,...\\[0.6cm]
314: \hline
315:   file input/output & write & filename, device, buffer size,...\\[0.6cm]
316: \hline
317: \end{tabular}
318: \caption{Example events and event attributes}
319: \label{evt_table}
320: 
321: \end{center}
322: \end{table}
323: 
324: The concurrent order relation connects corresponding objects as described above.
325: In DeWiz we use logical vector clocks as described in \cite{Fidg91} by Fidge to
326: implement the concurrent order relation.
327: 
328: With the DeWiz Framework it is possible to implement DeWiz modules for any
329: desired functionality. The Framework is written in the Java
330: programming language and provides a set of API functions which simplify
331: the development of user-defined modules, for example by hiding the
332: DeWiz protocol from the user.
333: 
334: \section{Examples}
335: 
336: \subsection{Overview}
337: 
338: \begin{figure}
339: \centering
340: \includegraphics[scale=0.4]{dewiz.eps}
341: \caption{An Example DeWiz System}
342: \label{dewiz}
343: \end{figure}
344: 
345: In this section we present an example DeWiz system. If the modules for
346: a concrete analysis task are available, the user may start to construct a
347: corresponding DeWiz-System. The modules are placed and initialized on arbitrary
348: networked computing nodes. A dedicated module, the DeWiz Sentinel is used to
349: control a particular DeWiz System. With a controller interface, available
350: modules may be arbitrarily interconnected by identifying corresponding input and
351: output interfaces.
352: An example for the DeWiz controller interface is shown in Figure~\ref{dewiz}. The smaller
353: window in front shows the module table, including all registered modules (by id
354: and name), their available interfaces and status, the implemented features
355: (send, receive, or none), and the id's of corresponding consumer or producer
356: modules. The larger background window of Figure~\ref{dewiz} provides the same information
357: in form of a module diagram.
358: 
359: To use DeWiz in a particular programming environment, dedicated event graph
360: generation modules have been implemented. As mentioned above, currently there
361: is a trace-reader modules for NOPE, as well as an interface to OMIS compliant
362: monitors and an extension to OPARI.
363: 
364: Concerning data access modules, DeWiz provides an interface to the analysis tool
365: ATEMPT (Figure~\ref{comm_failures}), a Java applet to display the event graph stream
366: in arbitrary web
367: browsers (Figure~\ref{iedewiz}), and an SMS notifier for critical failures
368: during program execution (Figure~\ref{handy}).
369: 
370: \begin{figure}
371: \centering
372: \includegraphics[scale=0.5]{iedewiz.eps}
373: \caption{Visualization of an event-graph in a Java applet}
374: \label{iedewiz}
375: \end{figure}
376: 
377: \begin{figure}
378: \centering
379: \includegraphics[scale=0.9]{handy.eps}
380: \caption{DeWiz SMS notifier}
381: \label{handy}
382: \end{figure}
383: 
384: The analysis functionality already implemented in DeWiz is illustrated with the
385: following two examples:
386: 
387: \begin{itemize}
388:   \item Extraction of communication failures
389:   \item Pattern matching and loop detection
390: \end{itemize}
391: 
392: \subsection{Communication Failures}
393: 
394: Communication failures can be detected by pairwise analysis of communication
395: events. An example of a possible communication failure is the detection of
396: different message lengths at a send event and the corresponding receive event.
397: Though this is not necessarily a communication failure, the default event-graph
398: visualization module of DeWiz highlights such
399: send or receive events, respectively, and the user can easily check whether
400: this is intended or not. Another more obvious example of a communication
401: failure is the detection of pending send or receive events, which are also
402: highlighted in the event-graph visualization. Isolated events can originate for
403: example from a wrong destination address given at a send event. The consequence
404: would be that the corresponding receive event (in case it is a blocking
405: receive event) would wait forever for the message, thus blocking the
406: receiving process forever.
407: In Figure~\ref{comm_failures}
408: an example event-graph with several possible communication errors is shown.
409: 
410: \begin{figure}
411: \centering
412: \includegraphics[scale=0.7]{commerr.eps}
413: \caption{Possible communication errors in a message-passing program}
414: \label{comm_failures}
415: \end{figure}
416: 
417: \subsection{Pattern Matching - Loop Detection}
418: 
419: A more complex analysis activity compared to the extraction of communication
420: failures is pattern matching and loop detection. The goal of the corresponding
421: DeWiz modules is to identify repeated process interaction patterns in the event
422: graph. An example event graph is shown in Figure~\ref{sim_ex}. This pattern is called
423: {\em simple exchange} pattern and can be defined as the event graph\\
424: 
425: \begin{center}
426: 
427: $EX(i,p,q) = (EX_ev(i,p,q),EX_rel(i,p,q))$ with\\
428: \vspace{\baselineskip}
429: $EX_ev(i,p,q) = \{e_p^i,e_p^{i+1},e_q^i,e_q^{i+1} \}$ and\\
430: \vspace{\baselineskip}
431: $EX_rel(i,p,q)=\{(e_p^i \stackrel{S}{\rightarrow} e_p^{i+1}),
432:                  (e_q^i \stackrel{S}{\rightarrow} e_q^{i+1}),
433:                  (e_p^i \stackrel{C}{\rightarrow} e_q^{i+1}),
434:                  (e_q^i \stackrel{C}{\rightarrow} e_p^{i+1}) \}$
435: \end{center}
436: 
437: where events $e_p^i, e_p^{i+1}$ occur on
438: process $p$ and events $e_q^i, e_q^{i+1}$ occur on process $q$ with $p \neq q$.
439: The existence of this simple pattern in an event graph can easily be verified
440: within a DeWiz module. More complex
441: patterns can be specified and provided in a pattern database according to
442: the needs of users and the characteristics of their programs.
443: 
444: \begin{figure}
445: \centering
446: \includegraphics{sim_ex.eps}
447: \caption{Simple exchange}
448: \label{sim_ex}
449: \end{figure}
450: 
451: The purpose of detecting patterns in an event-graph is two-fold. Firstly,
452: if it is possible to detect repeated iterations of a pattern in an event
453: graph, this knowledge can be used when the event-graph is visualized,
454: e.g. as space-time diagram. By replacing the possible complicated patterns
455: with simpler symbols, the complexity of the visual representation of the
456: event-graph can be reduced greatly, which would give the user a better
457: overview of the investigated program.
458: 
459: Secondly, the user could specify a communication pattern which is
460: expected to occur in the investigated program. DeWiz will compare the
461: given pattern with the event-graph and detect possible deviations, which
462: could possibly originate from an error in the program. Another example is
463: the repeated occurrence of any pattern, possibly within a loop. DeWiz will
464: in a first step detect the pattern, and then check for irregularities in
465: the sequence of this pattern. Figure~\ref{pattern} illustrates such a situation.
466: We see a relatively complex event-graph, which is the trace of an
467: execution of a finite-element message-passing program executed on 16 processes.
468: Despite its complexity, one can relatively easy see the iterations of a pattern,
469: as well as a significant irregularity (in the middle of the diagram). Again,
470: this is an indication for a possible communication error.
471: 
472: \begin{figure}
473: \centering
474: \includegraphics[scale=0.6]{pattern.eps}
475: \caption{Event-graph with iterations of a pattern}
476: \label{pattern}
477: \end{figure}
478: 
479: 
480: \section{Conclusion and Future Work}
481: 
482: Performance analysis and debugging of parallel and distributed programs is a
483: difficult activity. The problems are further increased, if program executions
484: with large numbers of processes need to be investigated. For that reason,
485: scalability of software analysis tools is an important characteristic.
486: 
487: The modular approach of DeWiz provides scalable parallel program analysis by
488: abstracting the program's behavior as an event graph and distributing the
489: analysis activities of this graph across existing resources. With this approach,
490: DeWiz is able to cope with very large amounts of analysis data, while providing
491: capabilities comparable to existing analysis tools.
492: The current implementation of DeWiz represents a first proof of concept.
493: However, for actual application of DeWiz more examinations with real-world
494: applications are needed. In addition, some more interfaces to existing analysis
495: tools are required. With the flexible structure of DeWiz and the well-defined
496: protocol, an interface to an already existing analysis tool can easily be
497: established. In this way, the analysis tool benefits from the capabilities of
498: DeWiz and achieves a higher level of scalability.
499: 
500: \bibliography{paper}
501: 
502: \end{document}
503: