1: \hyphenation{WebLogic}
2: \begin{thebibliography}{10}\setlength{\itemsep}{-1ex}\small
3:
4: \bibitem{GAO:patriot}
5: Patriot missile defense: Software problem led to system failure at {D}hahran,
6: {S}audi {A}rabia.
7: \newblock Technical Report of the U.S. General Accounting Office,
8: GAO/IMTEC-92-26, GAO, 1992.
9:
10: \bibitem{gartner:sustainable}
11: T.~Adams, R.~Igou, R.~Silliman, A.~M. Neela, and E.~Rocco.
12: \newblock Sustainable infrastructures: How {IT} services can address the
13: realities of unplanned downtime.
14: \newblock {R}esearch {B}rief 97843a, Gartner Research, May 2001.
15: \newblock Strategy, Trends \& Tactics Series.
16:
17: \bibitem{metagroup:j2ee}
18: M.~Barnes.
19: \newblock J2ee application servers: Market overview.
20: \newblock Technical Report Delta-1173, {META} Group Research, Aug 13 2002.
21: \newblock {A}pplication {D}elivery {S}trategies Series.
22:
23: \bibitem{nonstop}
24: J.~F. Bartlett.
25: \newblock A {NonStop} kernel.
26: \newblock In {\em Proc. 8th {ACM} {S}ymposium on {O}perating {S}ystems
27: {P}rinciples}, Pacific Grove, CA, 1981.
28:
29: \bibitem{bhatti:response}
30: N.~Bhatti, A.~Bouch, and A.~Kuchinsky.
31: \newblock Integrating user-perceived quality into web server design.
32: \newblock In {\em Proc. 9th International World Wide Web Conference},
33: Amsterdam, Holland, 2000.
34:
35: \bibitem{candea:afpi}
36: G.~Candea, M.~Delgado, M.~Chen, and A.~Fox.
37: \newblock Automatic failure-path inference: A generic introspection technique
38: for software systems.
39: \newblock In {\em Proc. 3rd {IEEE} {W}orkshop on {I}nternet {A}pplications},
40: San Jose, CA, 2003.
41:
42: \bibitem{candea:RR}
43: G.~Candea and A.~Fox.
44: \newblock Recursive restartability: Turning the reboot sledgehammer into a
45: scalpel.
46: \newblock In {\em Proc. 8th {W}orkshop on {H}ot {T}opics in {O}perating
47: {S}ystems}, Elmau/Oberbayern, Germany, 2001.
48:
49: \bibitem{candea:crash-only}
50: G.~Candea and A.~Fox.
51: \newblock Crash-only software.
52: \newblock In {\em Proc. 9th {W}orkshop on {H}ot {T}opics in {O}perating
53: {S}ystems}, Lihue, Hawaii, 2003.
54:
55: \bibitem{candea:jagr}
56: G.~Candea, P.~Keyani, E.~Kiciman, S.~Zhang, and A.~Fox.
57: \newblock {JAGR}: An autonomous self-recovering application server.
58: \newblock In {\em Proc. 5th International Workshop on Active Middleware
59: Services}, Seattle, WA, June 2003.
60:
61: \bibitem{castro:bft}
62: M.~Castro and B.~Liskov.
63: \newblock Practical {B}yzantine fault tolerance.
64: \newblock In {\em Proc. 3rd {USENIX} {S}ymposium on {O}perating {S}ystems
65: {D}esign and {I}mplementation}, New Orleans, LA, 1999.
66:
67: \bibitem{cecchet:ejb}
68: E.~Cecchet, J.~Marguerite, and W.~Zwaenepoel.
69: \newblock Performance and scalability of {EJB} applications.
70: \newblock In {\em Proc. 17th Conference on Object-Oriented Programming,
71: Systems, Languages, and Applications}, Seattle, WA, 2002.
72:
73: \bibitem{chandra:whither}
74: S.~Chandra and P.~M. Chen.
75: \newblock Whither generic recovery from application faults? {A} case study
76: using open-source software.
77: \newblock In {\em Proc. {I}nternational {C}onference on {D}ependable {S}ystems
78: and {N}etworks}, New York, NY, 2000.
79:
80: \bibitem{chen:nsdi}
81: M.~Chen, A.~Accardi, E.~Kiciman, D.~Patterson, A.~Fox, and E.~Brewer.
82: \newblock Path-based macroanalysis for large, distributed systems.
83: \newblock In {\em NSDI}, 2004.
84:
85: \bibitem{chen:pinpoint}
86: M.~Chen, E.~Kiciman, E.~Fratkin, E.~Brewer, and A.~Fox.
87: \newblock Pinpoint: Problem determination in large, dynamic, {I}nternet
88: services.
89: \newblock In {\em Proc. {I}nternational {C}onference on {D}ependable {S}ystems
90: and {N}etworks}, Washington, DC, June 2002.
91:
92: \bibitem{chou:FT}
93: T.~C. Chou.
94: \newblock Beyond fault tolerance.
95: \newblock {\em IEEE Computer}, 30(4):31--36, 1997.
96:
97: \bibitem{oracle:comm}
98: H.~Cohen and K.~Jacobs.
99: \newblock Personal communication.
100: \newblock Oracle Corporation, 2002.
101:
102: \bibitem{sun:mvm}
103: G.~Czajkowski and L.~Dayn\'es.
104: \newblock Multitasking without compromise: A virtual machine evolution.
105: \newblock In {\em Proc. Conference on Object Oriented Programming Systems
106: Languages and Applications}, Tampa Bay, FL, 2001.
107:
108: \bibitem{fox:mttr}
109: A.~Fox and D.~Patterson.
110: \newblock When does fast recovery trump high reliability?
111: \newblock In {\em Proc. 2nd Workshop on Evaluating and Architecting System
112: Dependability}, San Jose, CA, 2002.
113:
114: \bibitem{GRAY_81}
115: J.~Gray.
116: \newblock The transaction concept: Virtues and limitations.
117: \newblock In {\em Proc. International Conference on Very Large Data Bases},
118: Cannes, France, 1981.
119:
120: \bibitem{gray:challenges}
121: J.~Gray.
122: \newblock A dozen information-technology research goals.
123: \newblock {\em ACM SIGMOD Digital Symposium Collection}, 2(2), 2000.
124: \newblock ACM Turing Award Lecture.
125:
126: \bibitem{hawblitzel:luna}
127: C.~Hawblitzel and T.~von Eicken.
128: \newblock Luna: A flexible {J}ava protection system.
129: \newblock In {\em Proc. 5th {USENIX} {S}ymposium on {O}perating {S}ystems
130: {D}esign and {I}mplementation}, Boston, MA, 2002.
131:
132: \bibitem{ebay:power}
133: J.~Hu.
134: \newblock {eBay} blacks out for three hours.
135: \newblock {\em {CNET} News.com}, Aug. 2003.
136: \newblock http://news.com/2100-1018\_3-5066669.html.
137:
138: \bibitem{huang:dstore}
139: A.~C. Huang and A.~Fox.
140: \newblock Decoupled storage: State with stateless-like properties.
141: \newblock Submitted to the 22nd Symposium on Reliable Distributed Systems,
142: 2003.
143:
144: \bibitem{huang:libft}
145: Y.~Huang and C.~M.~R. Kintala.
146: \newblock Software implemented fault tolerance: Technologies and experience.
147: \newblock In {\em Proc. 23rd International Symposium on Fault-Tolerant
148: Computing}, Toulouse, France, 1993.
149:
150: \bibitem{huang:swrejuvenation}
151: Y.~Huang, C.~M.~R. Kintala, N.~Kolettis, and N.~D. Fulton.
152: \newblock Software rejuvenation: Analysis, module and applications.
153: \newblock In {\em Proc. 25th International Symposium on Fault-Tolerant
154: Computing}, Pasadena, CA, 1995.
155:
156: \bibitem{jacobs:bea}
157: D.~Jacobs.
158: \newblock Distributed computing with {BEA} {WebLogic} server.
159: \newblock In {\em Proc. Conference on Innovative Data Systems Research},
160: Asilomar, CA, 2003.
161:
162: \bibitem{jboss:docs}
163: JBoss.
164: \newblock Homepage.
165: \newblock http://www.jboss.org/docs, 2002.
166:
167: \bibitem{chameleon}
168: Z.~T. Kalbarczyk, R.~K. Iyer, S.~Bagchi, and K.~Whisnant.
169: \newblock Chameleon: a software infrastructure for adaptive fault tolerance.
170: \newblock {\em IEEE Transactions on Parallel and Distributed Systems},
171: 10:560--579, 1999.
172:
173: \bibitem{lefebvre:cnn}
174: W.~Le{F}ebvre.
175: \newblock {CNN}.com---{F}acing a world crisis.
176: \newblock In {\em 15th {USENIX} Systems Administration Conference}, 2001.
177: \newblock Invited Talk.
178:
179: \bibitem{ht:reboot}
180: H.~Levine.
181: \newblock Personal communication.
182: \newblock EBates.com, 2003.
183:
184: \bibitem{ling:ssm}
185: B.~Ling, E.~Kiciman, and A.~Fox.
186: \newblock Session state: Beyond soft state.
187: \newblock In {\em NSDI}, 2004.
188:
189: \bibitem{washpost:raptor}
190: V.~Loeb.
191: \newblock {F/A-22} {R}aptor software soars in hard tests.
192: \newblock {\em Washington Post}, July 28 2003.
193: \newblock web edition.
194:
195: \bibitem{lowell:transparency}
196: D.~E. Lowell, S.~Chandra, and P.~M. Chen.
197: \newblock Exploring failure transparency and the limits of generic recovery.
198: \newblock In {\em Proc. 4th {USENIX} {S}ymposium on {O}perating {S}ystems
199: {D}esign and {I}mplementation}, San Diego, CA, 2000.
200:
201: \bibitem{marcus:ha}
202: E.~Marcus and H.~Stern.
203: \newblock {\em Blueprints for High Availability}.
204: \newblock John Wiley \& Sons, Inc., New York, NY, 2000.
205:
206: \bibitem{meyer:performability}
207: J.~F. Meyer.
208: \newblock On evaluating the performability of degradable computer systems.
209: \newblock {\em IEEE Transactions on Computers}, C-29:720--731, Aug 1980.
210:
211: \bibitem{miller:thinktimes}
212: R.~Miller.
213: \newblock Response time in man-computer conversational transactions.
214: \newblock In {\em Proc. AFIPS Fall Joint Computer Conference}, volume~33, 1968.
215:
216: \bibitem{murphy:reliability}
217: B.~Murphy and T.~Gent.
218: \newblock Measuring system and software reliability using an automated data
219: collection process.
220: \newblock {\em Quality and Reliability Engineering International}, 11:341--353,
221: 1995.
222:
223: \bibitem{press_performability}
224: K.~Nagaraja, X.~Li, R.~Bianchini, R.~P. Martin, , and T.~D. Nguyen.
225: \newblock Using fault injection and modeling to evaluate the performability of
226: cluster-based services.
227: \newblock In {\em Proc. 3rd {USENIX} Symposium on Internet Technologies and
228: Systems}, Seattle, WA, March 2003.
229:
230: \bibitem{pal:comm}
231: A.~Pal.
232: \newblock Personal communication.
233: \newblock Yahoo!, Inc., 2002.
234:
235: \bibitem{rodrigues:base}
236: R.~Rodrigues, B.~Liskov, and M.~Castro.
237: \newblock {BASE}: Using abstraction to improve fault tolerance.
238: \newblock In {\em Proc. 18th {ACM} {S}ymposium on {O}perating {S}ystems
239: {P}rinciples}, Banff, Canada, 2001.
240:
241: \bibitem{sullivan:defects}
242: M.~Sullivan and R.~Chillarege.
243: \newblock Software defects and their impact on system availability -- a study
244: of field failures in operating systems.
245: \newblock In {\em Proc. 21st International Symposium on Fault-Tolerant
246: Computing}, Montr\'eal, Canada, 1991.
247:
248: \bibitem{sun:j2ee}
249: Sun\_Microsystems.
250: \newblock {J2EE} platform specification.
251: \newblock http://java.sun.com/j2ee/, 2002.
252:
253: \bibitem{tullmann:janos}
254: P.~Tullmann, M.~Hibler, and J.~Lepreau.
255: \newblock Janos: A {J}ava-oriented {OS} for active networks.
256: \newblock {\em {IEEE} Journal on Selected Areas in Communications},
257: 19(3):501--510, 2001.
258:
259: \bibitem{whisnant:ree_sift_armor}
260: K.~Whisnant, R.~Iyer, P.~Hones, R.~Some, and D.~Rennels.
261: \newblock Experimental evaluation of the {REE SIFT} environment for spaceborne
262: applications.
263: \newblock In {\em Proc. {I}nternational {C}onference on {D}ependable {S}ystems
264: and {N}etworks}, Washington, DC, 2002.
265:
266: \bibitem{whitaker:denali}
267: A.~Whitaker, M.~Shaw, and S.~Gribble.
268: \newblock Scale and performance in the {D}enali isolation kernel.
269: \newblock In {\em Proc. 5th {USENIX} {S}ymposium on {O}perating {S}ystems
270: {D}esign and {I}mplementation}, Boston, MA, 2002.
271:
272: \bibitem{xie:modeling}
273: W.~Xie, H.~Sun, Y.~Cao, and K.~Trivedi.
274: \newblock Modeling of online service availability perceived by {W}eb users.
275: \newblock Technical report, Center for Advanced Computing and Communication
276: (CACC), Duke University, 2002.
277:
278: \end{thebibliography}
279: