cs0402042/jcs.tex
1: \documentclass[11pt]{article}         %
2: \usepackage{times}                               %
3: %
4: %
5: %
6: \usepackage{chicagor}
7: 
8: \input{defn.tex}
9: 
10: %
11: %
12: 
13: \newcommand{\im}{im}
14: \newcommand{\kernel}{ker}
15: %
16: %
17: \renewcommand{\S}{{\cal S}}
18: 
19: \newcommand{\send}{\mbox{\it send}}
20: \newcommand{\recv}{\mbox{\it receive}}
21: 
22: \begin{document}
23: 
24: \title{Anonymity and Information Hiding \\ in Multiagent Systems%
25: \thanks{Authors were supported in part by NSF under grant 
26: CTC-0208535, by ONR under grants  N00014-00-1-03-41 and
27: N00014-01-10-511, by the DoD Multidisciplinary University Research
28: Initiative (MURI) program administered by the ONR under
29: grant N00014-01-1-0795, and by AFOSR under grant F49620-02-1-0101.
30: %
31: Kevin O'Neill was also supported in part by a graduate fellowship from the
32: National Science and Engineering Research Council of Canada.
33: %
34: %
35: %
36: %
37: %
38: A preliminary version of this paper appeared at the 16th IEEE Computer Security
39: Foundations Workshop in Pacific Grove, California.
40: }
41: }
42: 
43: \author{Joseph Y. Halpern \ \ \ \ \ \ \ \ \ \ \ Kevin R. O'Neill \\
44: Department of Computer Science\\
45: Cornell University\\
46: $\{$halpern, oneill$\}$@cs.cornell.edu\\
47: }
48: 
49: 
50: \maketitle
51: \thispagestyle{empty}
52: 
53: \begin{abstract}
54: We provide a framework for reasoning about information-hiding requirements
55: in multiagent systems and for reasoning about anonymity in particular. 
56: Our framework employs the modal logic of knowledge within the context of the
57: runs and systems framework, much in the spirit of our earlier work on
58: secrecy 
59: %
60: %
61: %
62: \cite{HalOn02}. 
63: %
64: We give several definitions of anonymity with respect 
65: to agents, actions, and observers in multiagent systems, and we relate our definitions
66: of anonymity to other definitions of information hiding, such as secrecy. We also
67: give probabilistic definitions of anonymity that are able to quantify an observer's
68: uncertainty about the state of the system. Finally, we relate our definitions of
69: anonymity to other formalizations of anonymity and information hiding, including
70: definitions of anonymity in the process algebra CSP and definitions of information
71: hiding using function views.
72: \end{abstract}
73: 
74: 
75: \section{Introduction}
76: 
77: %
78: %
79: %
80: %
81: %
82: %
83: %
84: %
85: %
86: %
87: %
88: %
89: %
90: 
91: The primary goal of this paper is to provide 
92: a formal framework for reasoning
93: about anonymity in multiagent systems.   
94: The importance of anonymity has increased over the past few years
95: as more communication passes over the Internet. 
96: Web-browsing, message-sending, and file-sharing are all important 
97: examples of activities that computer users would like to 
98: engage in, but may be reluctant to do unless they can
99: receive guarantees that their anonymity will be protected to some
100: reasonable degree. 
101: %
102: %
103: %
104: %
105: Systems are being built that attempt to implement 
106: anonymity
107: %
108: for various kinds of network communication
109: %
110: %
111: %
112: %
113: %
114: %
115: (see, for example,
116: \cite{goel02,hopper03,levine02,reiter98,sherwood02,syverson97}). 
117: It would be helpful to have a formal framework in which to reason 
118: about the level of anonymity that such systems provide. 
119: 
120: We view anonymity as an instance of a more general problem:
121: information hiding.
122: In the theory of computer security, many of the fundamental problems and much of the 
123: research has been concerned with the hiding of information. Cryptography, for instance,
124: is used
125: to hide the contents of a message from untrusted observers as it passes from
126: one party to another. Anonymity requirements are intended to ensure that the identity
127: of the agent who performs some action remains hidden from other
128: observers. Noninterference requirements 
129: essentially say that {\em everything} about classified or high-level 
130: users of a system should be hidden from low-level users.
131: Privacy is a catch-all term that means different things to different
132: people, but 
133: it typically involves hiding personal or private information from others.
134: 
135: Information-hiding properties such as these can be 
136: thought of as providing answers to the following set of questions: 
137: \ul
138: \item What information needs to be hidden?
139: \item Who does it need to be hidden from?
140: \item How well does it need to be hidden?
141: \eul
142: By analyzing security properties with these questions in
143: mind, it often 
144: becomes clear how different properties relate to each other. These
145: questions can 
146: also serve as a test of a definition's usefulness: an information-hiding
147: property 
148: should be able to provide clear answers to these three questions.
149: 
150: %
151: %
152: %
153: %
154: In an earlier paper \cite{HalOn02}, we formalized 
155: %
156: %
157: %
158: %
159: %
160: secrecy 
161: %
162: %
163: in terms of knowledge.
164: %
165: %
166: %
167: %
168: %
169: Our focus was on capturing what it 
170: means for one agent to have {\em total\/} secrecy with respect to
171: another,
172: %
173: in the sense that no information flows from the first agent to the
174: second.
175: Roughly speaking, 
176: %
177: %
178: %
179: a high-level user has total secrecy 
180: if the low-level user never knows anything about the
181: high-level user that he didn't initially know.  Knowledge 
182: provides a
183: natural way to express information-hiding properties---information is
184: hidden from $a$ if $a$ does not know about it.  Not surprisingly, our
185: formalization of anonymity is similar in spirit to our
186: formalization of secrecy.  
187: Our definition of secrecy says that a classified agent maintains 
188: secrecy with respect to an unclassified agent if the unclassified
189: agent doesn't learn any new fact that depends only
190: on the state of the classified agent.
191: That is, if the agent didn't know a classified
192: fact $\phi$ to start with, then the
193: agent doesn't know it at any point in the system.
194: Our definitions of anonymity
195: say that an agent performing an action maintains anonymity with 
196: respect to an observer if the observer never learns certain facts
197: having to do with whether or not the agent performed the action.
198: 
199: %
200: %
201: %
202: %
203: %
204: Obviously, total secrecy and anonymity are different.
205: %
206: %
207: It is possible
208: for $i$ to 
209: %
210: %
211: have
212: complete secrecy while still not having 
213: %
214: %
215: very strong guarantees of
216: anonymity, for example,
217: and it is possible to have anonymity without preserving secrecy.  
218: %
219: %
220: However, thinking carefully about
221: the relationship between
222: secrecy and anonymity suggests new and interesting
223: ways of thinking about anonymity.
224: More generally, formalizing anonymity and information hiding in terms
225: of knowledge is useful for capturing the intuitions that
226: practitioners have.
227: 
228: We are not the first to use knowledge and belief to
229: formalize notions of information hiding.  
230: Glasgow, MacEwen, and Panangaden \citeyear{GMP92} describe 
231: a logic for reasoning about security that includes both 
232: {\em epistemic\/}
233: operators (for reasoning about knowledge) and {\em deontic\/} operators
234: (for reasoning about permission and obligation).
235: They characterize some security policies in terms of 
236: the facts that an agent is
237: permitted to know.  Intuitively, everything that an agent is not
238: permitted to know must remain hidden.
239: Our approach is similar,
240: except that we specify the formulas that an
241: agent is {\em not} allowed to know, rather than the formulas she is permitted
242: to know. 
243: One advantage of accentuating the negative is that we do not need to use
244: deontic operators in our logic.
245: 
246: Epistemic logics have also been used to define information-hiding
247: properties, including noninterference and anonymity.
248: Gray and Syverson \citeyear{gray98} 
249: use an epistemic logic to define probabilistic noninterference, and
250: Syverson and Stubblebine \citeyear{syverson99} use one
251: to formalize definitions of anonymity. 
252: The thrust of our paper is quite different from these.
253: Gray and Syverson focus on one particular definition of information
254: hiding in a probabilistic setting, while Syverson 
255: and Stubblebine
256: %
257: %
258: focus on describing an axiom system that is useful for reasoning
259: about real-world systems, and
260: on how to reason about and compose parts of the system into
261: %
262: %
263: adversaries and honest agents.
264: Our focus, on the other hand, is on giving a
265: semantic characterization of anonymity in a framework that lends itself
266: well to modeling systems.
267: 
268: Shmatikov and Hughes \citeyear{hughes02} position their approach to
269: anonymity (which is discussed in more detail in Section~\ref{sec:functionviews}) as an
270: attempt to provide an interface between logic-based approaches,
271: which they claim are good for specifying the desired properties (like
272: anonymity), and formalisms like CSP, which they claim are good for
273: specifying systems.  We agree with their claim that logic-based approaches
274: are good for specifying properties of systems, but also claim
275: that, with an appropriate semantics for the logic, there is no need to
276: provide such
277: an interface.  While there are many ways of specifying systems,
278: many end up identifying a system with a set of runs or traces, and 
279: can thus be embedded in the runs and systems framework
280: that we use.
281: 
282: Definitions of anonymity using epistemic logic are {\em possibilistic}.
283: Certainly, if $j$ believes that any of 1000 users (including $i$)
284: could have performed the action that $i$ in fact performed, 
285: then $i$ has some degree of anonymity with respect to $j$.  However, if $j$
286: believes that the probability that $i$ performed the action is 
287: %
288: %
289: .99, the possibilistic assurance of anonymity provides little comfort.
290: %
291: %
292: %
293: %
294: %
295: %
296: %
297: %
298: %
299: %
300: %
301: %
302: %
303: %
304: Most previous formalizations of anonymity have not 
305: dealt with probability; they
306: typically conclude with an acknowledgment that it is
307: important to do so, and suggest that their formalism can
308: indeed handle probability.
309: %
310: %
311: %
312: %
313: %
314: %
315: %
316: %
317: %
318: %
319: %
320: %
321: One significant
322: advantage of our formalism is that it
323: is completely straightforward to add probability in a natural way,
324: using known techniques \cite{HT}.
325: %
326: %
327: %
328: %
329: %
330: %
331: %
332: %
333: %
334: %
335: %
336: As we show in Section \ref{sec:probanon},
337: %
338: %
339: this lets us formalize the (somewhat less formal)
340: definitions of probabilistic anonymity given
341: %
342: %
343: %
344: by Reiter and Rubin \citeyear{reiter98}.
345: %
346: %
347: %
348: %
349: %
350: %
351: %
352: %
353: 
354: %
355: In this paper, we are more concerned with defining and specifying
356: anonymity 
357: %
358: %
359: %
360: %
361: %
362: %
363: %
364: properties than with describing systems 
365: for achieving anonymity or with verifying anonymity
366: properties. We want to define what anonymity means by using syntactic 
367: statements that have a well-defined semantics. Our 
368: %
369: work
370: is similar in spirit to
371: previous papers that have given definitions of anonymity and other
372: similar properties, such as the proposal for terminology given by
373: %
374: %
375: Pfitzmann and K\"ohntopp~\citeyear{terminology} and the
376: information-theoretic definitions of anonymity given by
377: Diaz, Seys, Claessens, and Preneel~\citeyear{Diaz02}.
378: 
379: The rest of this paper is organized as follows.  In
380: Section~\ref{sec:review} we briefly review the runs and systems
381: formalism of \cite{FHMV} and describe
382: how it can be used to represent knowledge.
383: In Section~\ref{sec:anonymity}, we show how anonymity can be defined
384: using knowledge, and relate this definition to other notions of 
385: information hiding, particularly 
386: secrecy (as defined in our earlier work).
387: In Section~\ref{sec:probanon}, we extend the
388: possibilistic definition of Section~\ref{sec:anonymity} so that it can
389: capture probabilistic concerns.  
390: As others have observed \cite{hughes02,reiter98,syverson99}, there
391: are a number of 
392: ways to define anonymity. Some definitions provide very strong
393: guarantees of anonymity, while others are easier to verify in practice.
394: Rather than giving an
395: exhaustive list of definitions, we focus on a few representative
396: notions, and 
397: show by example that our logic is expressive enough to capture many
398: other notions of interest.
399: In Section~\ref{sec:related}, we compare our
400: framework to that of three other attempts to formalize anonymity, by
401: Schneider and Sidiropoulos \citeyear{schneider96}, Hughes and Shmatikov
402: \citeyear{hughes02}, and Stubblebine and Syverson \citeyear{syverson99}.
403: We conclude in Section~\ref{sec:conclusion}.
404: 
405: 
406: \section{Multiagent Systems: A Review}\label{sec:review}
407: 
408: In this section, we briefly review the multiagent systems framework; we
409: urge the reader  
410: to consult 
411: %
412: %
413: %
414: %
415: %
416: %
417: %
418: \cite{FHMV} for more details.
419: 
420: A {\em multiagent system\/} consists of $n$ agents, each of which is in some
421: {\em local state\/} at a given point in time. We assume that an agent's
422: local state encapsulates all the information to which the agent has
423: access. In the security setting, the local state of an agent might
424: include initial information regarding keys, the messages she has
425: sent and received, and perhaps the reading of a clock. The 
426: framework makes no assumptions about the precise nature of the local
427: state.
428: 
429: We can view the whole system as being in some {\em global state},
430: a tuple consisting of the local state of each agent and the
431: state of the environment. 
432: Thus, a global state has the form $(s_e, s_1,\ldots, s_n)$,
433: where $s_e$ is the state of the environment and $s_i$ is agent $i$'s
434: state,
435: for $i = 1, \ldots , n$. 
436: 
437: A {\em run\/} is a function from time to global states. Intuitively,
438: a run is a complete description of what happens over time in one
439: possible execution of the system. A {\em point\/} is a pair $(r, m)$
440: consisting of a run $r$ and a time $m$. 
441: We make the standard assumption that time ranges over the natural numbers.
442: At a point $(r, m)$, the system is in some global state $r(m)$. If $r(m)
443: = (s_e, s_1,
444: \ldots , s_n)$, then we take $r_i(m)$ to be $s_i$, agent $i$'s local
445: state at the point $(r, m)$.  
446: %
447: %
448: %
449: %
450: %
451: Note that an agent's local state at  point $(r,m)$ does {\em not}
452: necessarily encode all the agent's previous local states. 
453: In some systems, agents have perfect recall,
454: in the sense that their local state $r_i(m)$ encodes their states
455: at times $0, \ldots, m-1$, but this need not be generally true.
456: %
457: (See \cite[Chapter 4]{FHMV} for a formal definition and discussion of
458: perfect recall.)
459: Formally, a {\em system\/} consists of a set of runs (or executions).   
460: Let $\P(\R)$ denote the points in a system $\R$.  
461: 
462: 
463: The runs and systems framework is compatible with 
464: many other standard approaches for representing and reasoning about systems.
465: For example, the runs might be event traces generated by a CSP process 
466: (see Section~\ref{sec:CSP}), they
467: might be message-passing sequences generated by a security protocol,
468: or they might
469: %
470: %
471: %
472: %
473: %
474: be generated from the strands in a strand space \cite{HalPuc01,FHG99}. 
475: The approach is rich enough to accommodate a variety of system representations.
476: 
477: Another important advantage of the framework is that it it is
478: straightforward to define formally what
479: an agent knows at a point in a system.
480: Given a system $\R$, let $\K_i(r,m)$ be the set of points in $\P(\R)$ 
481: that $i$ thinks are possible at $(r,m)$, i.e.,
482: \[ \K_i(r,m) = \{ (r',m') \in \P(\R) : r'_i(m') = r_i(m) \}. \]
483: Agent $i$ knows a fact
484: $\phi$ at a point $(r,m)$ if $\phi$ is true at all points in $\K_i(r,m)$.
485: To make this intuition precise, we need to be able to assign truth
486: values to basic formulas in a system. We assume that
487: we have a set $\Phi$ of primitive propositions, which we can think of as
488: describing basic facts about the system.  
489: In the context of security protocols,
490: these might be such facts as ``the key is $n$'' or ``agent $A$ sent
491: the message $m$ to $B$''. An interpreted system $\I$ consists of a pair
492: $(\R,\pi)$, where $\R$ is a system and $\pi$ is an 
493: {\em interpretation}, which assigns to each primitive
494: %
495: %
496: proposition
497: in $\Phi$ a truth value at each point.
498: Thus, for every $p\in\Phi$ and 
499: point $(r,m)$ in $\R$, we have $(\pi(r,m))(p)\in \{\bf
500: true, \bf false\}$. 
501: 
502: We can now define what it means for a formula $\phi$ to be true at a point
503: $(r, m)$ in an interpreted system $\I$, written $(\I, r, m)\sat\phi$, by
504: induction on the structure of formulas:
505: \begin{itemize}
506: \item $(\I,r,m) \sat p$ iff $(\pi(r,m))(p) = {\bf true}$
507: \item $(\I,r,m) \sat \neg\phi$ iff $(\I,r,m) \not\sat \phi$
508: \item $(\I,r,m) \sat \phi \wedge \psi$ iff $(\I,r,m) \sat \phi$ and $(\I,r,m) \sat \psi$ 
509: \item $(\I, r, m) \sat K_i\phi ~\mbox{iff}~ (\I , r', m') \sat\phi$ for
510: all $(r',m') \in \K_i(r,m)$
511: \end{itemize}
512: As usual, we write $\I \sat \phi$ if $(\I,r,m) \sat \phi$ for all points
513: $(r,m)$ in $\I$.
514: 
515: 
516: \section{Defining Anonymity Using Knowledge}\label{sec:anonymity}
517: 
518: \subsection{Information-Hiding Definitions}\label{sec:infohide}
519: 
520: Anonymity is one example of an
521: information-hiding requirement. Other information-hiding requirements
522: include noninterference, privacy, confidentiality, secure message-sending,
523: and so on. These requirements are similar, and sometimes they overlap.
524: Noninterference, for example, requires a great deal to be hidden, and
525: typically implies privacy, anonymity, etc., for the classified user
526: whose state is protected by the noninterference requirement.
527: 
528: %
529: %
530: %
531: %
532: %
533: %
534: In an earlier paper \cite{HalOn02}, we looked at 
535: %
536: %
537: requirements of {\em total secrecy}
538: in multiagent systems.
539: %
540: %
541: Total secrecy
542: basically requires that in a 
543: system with ``classified'' and ``unclassified'' users, the unclassified 
544: users should never be able to infer the actions or the local states of
545: the unclassified users. For secrecy, the ``what needs to be hidden'' 
546: component of information-hiding is extremely restrictive: 
547: %
548: total
549: secrecy requires
550: that absolutely everything that a classified user does must be hidden.
551: The ``how well does it need to be hidden'' component depends on the
552: situation. Our definition of secrecy says
553: that for any {\em nontrivial\/}
554: fact $\phi$ (that is, one that is not already valid)
555: that depends only the state of the classified or high-level
556: %
557: %
558: agent, the formula $\neg K_j \phi$ must be valid.
559: (See 
560: %
561: %
562: %
563: our earlier paper
564: for more discussion of this definition.)
565: Semantically, this means that whatever the high-level user does, there
566: exists some run where the low-level user's view  of the system is the 
567: same, but the high-level user did something different.
568: %
569: %
570: Our nonprobabilistic definitions are fairly
571: strong (simply because secrecy requires that so much be hidden).
572: The probabilistic definitions 
573: %
574: we gave
575: require even more: not only 
576: can the agent not learn any new classified
577: fact, but he also cannot learn anything about the probability of any such
578: fact. (In other words, if an agent initially assigns a classified 
579: fact $\phi$ a probability $\alpha$
580: of being true, he always assigns $\phi$ that probability.)
581: It would be perfectly natural,
582: and possibly quite interesting, to
583: consider definitions of secrecy that do not require so much to be hidden  
584: (e.g., by allowing some classified information to be 
585: declassified~\cite{zdancewic01}),
586: or to discuss definitions that do not require such strong secrecy (e.g., by
587: giving definitions that were stronger than the nonprobabilistic definitions
588: we gave, but not quite so strong as the probabilistic definitions). 
589: 
590: 
591: \subsection{Defining Anonymity}\label{sec:dfns}
592: 
593: The basic intuition behind anonymity is that {\em actions} should be 
594: divorced from the {\em agents} who perform them, for some set of {\em observers}.
595: With respect to the basic information-hiding framework outlined above, the
596: information that needs to be hidden is the identity of the agent (or set of
597: agents) who perform a particular action. Who the information needs to be
598: hidden from, i.e., which observers, depends on the situation. The third
599: component of information-hiding requirements---how well information needs to
600: be hidden---will often be the most interesting component
601: of the definitions of anonymity that we present here.
602: 
603: %
604: %
605: %
606: %
607: Throughout the paper, we use the formula $\theta(i,a)$ to represent
608: ``agent $i$ has 
609: performed action $a$, or will perform $a$ in the future''.%
610: \footnote{If we want to consider systems that may crash
611: we may want to consider $\theta'(i,a)$ instead, where
612: $\theta'(i,a)$ represents ``agent $i$ has performed action $a$, or will
613: perform $a$ in the future if the system does not crash''.  Since issues
614: of failure are orthogonal to the anonymity issues that we focus on here,
615: we consider only the simpler definition in this paper.}
616: %
617: %
618: %
619: %
620: %
621: %
622: %
623: %
624: %
625: %
626: For future reference, let $\delta(i,a)$ 
627: represent ``agent $i$ has performed action $a$''. 
628: Note that $\theta(i,a)$ is a fact about the run: if it is true at some
629: point in a run, it is true at all points in a run (since it is true even
630: if $i$ performs $a$ at some point in the future).  On the other hand,
631: %
632: %
633: $\delta(i,a)$ may be false at the start of a run, and then become true
634: at the point where $i$ performs $a$.
635: %
636: %
637: %
638: %
639: %
640: %
641: %
642: %
643: %
644: %
645: %
646: %
647: %
648: %
649: %
650: %
651: %
652: %
653: %
654: %
655: %
656: %
657: %
658: %
659: %
660: %
661: %
662: %
663: %
664: 
665: %
666: 
667: It is not our goal in this 
668: %
669: %
670: paper
671: to provide a ``correct'' definition of 
672: anonymity. We also want to avoid giving an encyclopedia of definitions. 
673: Rather, we give some basic definitions of anonymity 
674: to show how our framework can be used. We base our choice of definitions
675: in part on definitions presented in earlier papers,
676: to make clear how our work relates to previous work, and in part on which
677: definitions of anonymity we expect to be useful in practice.
678: We first give
679: an extremely weak definition, but one that
680: nonetheless illustrates the basic intuition behind any definition of
681: anonymity. 
682: 
683: 
684: \dfn
685: \label{dfn:minimalanon1}
686: Action $a$, performed by agent $i$, is
687: {\em minimally anonymous} with respect to agent $j$ in the interpreted
688: system $\I$, if
689: $\I \models \neg K_j [\theta(i,a)]$.
690: \edfn
691: 
692: %
693: %
694: %
695: %
696: %
697: 
698: This definition makes it
699: clear what is being hidden ($\theta(i,a)$---the fact that
700: %
701: %
702: $i$ performs $a$) and from whom
703: ($j$).  It also describes how well the information is hidden:~it
704: requires that $j$ not be sure that $i$ actually 
705: %
706: %
707: performed, or will perform, the action.
708: Note that this is a weak information-hiding requirement.
709: It might be the case, for example, that agent $j$ is certain 
710: that the action was performed either by $i$, or by at most one or two
711: other agents, thereby making $i$ a ``prime suspect''. It might also
712: be the case that $j$ is able to place a very high probability on 
713: %
714: %
715: $i$ performing
716: the action, even though he isn't absolutely certain of
717: it. (Agent $j$ might know that there is some slight probability that
718: some other agent $i'$ performed the action, for example.)
719: Nonetheless,
720: it should be the case that for any other definition of anonymity we
721: give, if we want 
722: to ensure that $i$'s performing action $a$ is to be kept anonymous
723: as far as observer $j$ is concerned, then 
724: $i$'s action should be at least minimally anonymous with respect to $j$.
725: 
726: %
727: Our definition of $a$ being minimally anonymous with respect to $j$ is
728: equivalent to the apparently weaker requirement $\I \models
729: \theta(i,a) \rimp \neg K_j 
730: [\theta(i,a)]$, which says that if action $a$ is performed by $i$, then
731: $j$ does not not know it.  
732: Clearly if $j$ never knows that $a$ is performed by $i$, then $j$ will
733: never know that $a$ is performed by $i$ if $i$ actually does perform
734: $a$.  To see that the converse holds, it suffices to note that if $i$
735: does not perform $a$, then surely $\neg K_j [\theta(i,a)]$ holds.  
736: %
737: %
738: %
739: %
740: Thus, this definition, like several that will follow,
741: can be viewed as having the form ``if $i$
742: performed $a$, then $j$ does not know some appropriate 
743: %
744: %
745: %
746: %
747: %
748: %
749: %
750: fact''.
751: %
752: 
753: The definition 
754: %
755: of minimal anonymity
756: also makes it clear how anonymity
757: relates to secrecy, 
758: %
759: %
760: %
761: %
762: %
763: %
764: %
765: as defined in our earlier work \cite{HalOn02}.
766: To explain how, we first need to describe how we defined secrecy in
767: terms of knowledge.
768: Given a system $\I$, say that $\phi$ is {\em nontrivial in $\I$\/} if 
769: %
770: %
771: $\I \not\sat \phi$, and that 
772: {\em $\phi$ depends only on the local state of agent
773: $i$ in $\I$\/} if $\I \sat \phi \rimp K_i \phi$. 
774: Intuitively, $\phi$ 
775: is nontrivial in $\I$ if $\phi$ could be
776: false in $\I$, and $\phi$ depends only on $i$'s local state if $i$
777: always knows whether or not $\phi$ is true.  
778: (It is easy to see that 
779: $\phi$ depends only on the local state of $i$ if $(\I,r,m) \sat \phi$
780: and $r_i(m) = r'_i(m')$ implies that $(\I,r',m') \sat \phi$.)
781: According to the definition in \cite{HalOn02}, 
782: agent $i$ {\em maintains total secrecy with respect to another agent $j$
783: in system $\I$\/} if
784: for every 
785: nontrivial fact $\phi$ that depends 
786: %
787: only
788: on the local state of $i$, the formula
789: $\neg K_j \phi$ is valid for the system. 
790: That is, $i$ maintains total secrecy with respect to $j$ if 
791: $j$ does not learn anything new about agent $i$'s state.
792: %
793: %
794: %
795: %
796: %
797: %
798: %
799: %
800: %
801: In general, $\theta(i,a)$ does not depend only on $i$'s local state,
802: because whether $i$ performs $a$ may depend on whether or not $i$ gets a
803: certain message from 
804: %
805: %
806: some other agent $i'$.
807: On the other hand, if whether or not $i$
808: performs $a$ depends only on $i$'s protocol, and the protocol is encoded
809: in $i$'s local state, then $\theta(i,a)$ depends only on $i$'s local
810: state.  
811: %
812: %
813: %
814: If $\theta(i,a)$ does depend only on $i$'s local state and 
815: $j$ did not know all along that $i$ was going to
816: perform action $a$ (i.e., if we assume 
817: that $\theta(i,a)$ is nontrivial), then
818: Definition~\ref{dfn:minimalanon1} is clearly a special case of the
819: definition of secrecy.  
820: %
821: In any case, it is in much the same spirit as the definition of
822: secrecy. 
823: Essentially, anonymity says that
824: %
825: %
826: the fact that agent $i$ has or will perform action $a$ must be hidden
827: from $j$, 
828: while 
829: total secrecy says that all facts that depend on agent $i$ must
830: be 
831: hidden from $j$. 
832: 
833: %
834: Note that this definition of minimal anonymity is different from the one
835: we gave in the conference version of this paper \cite{HalOn03}.
836: There, the definition given used $\delta(i,a)$ rather than
837: $\theta(i,a)$.  We say that $a$ performed by agent $i$ is minimally
838: $\delta$-anonymous if Definition~\ref{dfn:minimalanon1} holds, with
839: $\theta(i,a)$ replaced by $\delta(i,a)$.  
840: It is easy to see that minimal anonymity implies minimal
841: $\delta$-anonymity (since $\delta(i,a)$ implies $\theta(i,a)$), but the
842: converse is not true in general.   For example, suppose that $j$
843: gets a signal if $i$ is going to perform action $a$ (before $i$ actually
844: performs the action), but then never finds out exactly when $i$ performs
845: $a$.  Then minimal anonymity does not hold.  In runs where $i$ performs
846: $a$, agent $j$ knows that $i$ will perform $a$ when he gets the signal.
847: On the other hand, minimal $\delta$-anonymity does hold, because $j$
848: never knows when $i$ performs $a$.  
849: %
850: %
851: %
852: In this situation, minimal anonymity seems to capture our intuitions of
853: what anonymity should mean better than minimal $\delta$-anonymity does.
854: 
855: 
856: The next definition of anonymity we give is much stronger. 
857: It requires that if some agent $i$ performs
858: an action anonymously with respect to another agent $j$, then $j$ must
859: think it possible that the action could have been performed by {\em any}
860: of the agents (except for $j$). 
861: Let $P_j \phi$ be an abbreviation for 
862: $\neg K_j \neg \phi$. The operator $P_j$ is the dual of $K_j$; 
863: intuitively, $P_j \phi$ means ``agent $j$ thinks that $\phi$ is possible''.
864: 
865: \dfn
866: \label{dfn:totalanon1}
867: Action $a$, performed by agent $i$, is {\em totally anonymous} 
868: with respect to $j$ in the interpreted system $\I$ if 
869: \[ 
870: \I \models \theta(i,a) \Rightarrow \bigwedge_{i' \ne j} P_j
871:  [\theta(i',a)].
872: \]
873: \edfn
874: 
875: Definition~\ref{dfn:totalanon1} captures the notion that an action is
876: anonymous if, 
877: as far as the observer in question is concerned, it could have been
878: performed by  
879: anybody in the system. 
880: 
881: %
882: Again, in the conference version of the paper, we defined total
883: anonymity using $\delta(i,a)$ rather than $\theta(i,a)$.  (The same
884: remark holds for all the other definitions of anonymity that we give,
885: although we do not always say 
886: %
887: %
888: so
889: explicitly.)  
890: Let 
891: %
892: %
893: %
894: total $\delta$-anonymity
895: be the 
896: %
897: %
898: anonymity requirement
899: obtained when $\theta(i,a)$ is replaced by
900: $\delta(i,a)$.  It is not hard to show that if agents have perfect
901: recall (which intuitively means that their local state keeps track of
902: all the actions they have performed---see \cite{FHMV} for the formal
903: definition), then total $\delta$-anonymity implies total anonymity. 
904: %
905: %
906: %
907: %
908: This is not true, in general, without perfect
909: recall, because it might be possible for some agent to know that $i$
910: %
911: %
912: %
913: %
914: will perform action $a$---and therefore that no other agent 
915: will---but forget this fact by the time that $i$ actually performs
916: $a$. 
917: %
918: %
919: %
920: Similarly, total anonymity does not imply total $\delta$-anonymity.
921: %
922: %
923: To see why,
924: suppose that the agents are numbered $1, \ldots, n$, and 
925: %
926: that
927: an outside observer knows 
928: %
929: %
930: that if
931: $j$ performs 
932: action $a$, 
933: %
934: then
935: $j$ will perform it at time $j$.  Then total anonymity may
936: %
937: %
938: hold even though total $\delta$-anonymity does not.
939: For example, at time 3,
940: although the 
941: %
942: %
943: observer 
944: %
945: %
946: may consider
947: it possible that agent 4 will perform the
948: action (at time 4), he 
949: %
950: %
951: cannot
952: consider it possible that 4 has
953: %
954: already
955: performed the action,
956: %
957: as required by total $\delta$-anonymity.
958: 
959: 
960: %
961: %
962: %
963: %
964: %
965: %
966: %
967: %
968: 
969: %
970: %
971: %
972: %
973: %
974: %
975: 
976: %
977: %
978: %
979: 
980: 
981: 
982: 
983: %
984: %
985: %
986: %
987: Chaum \citeyear{Chaum88} showed that total anonymity could be 
988: obtained using DC-nets.
989: %
990: %
991: %
992: %
993: %
994: %
995: %
996: %
997: %
998: Recall that in a DC-net, a group of $n$ users use Chaum's dining
999: cryptographer's 
1000: protocol (described in the same paper) to achieve anonymous communication.
1001: %
1002: %
1003: %
1004: If we model a  
1005: DC-net as an interpreted multiagent system $\I$ whose agents consist
1006: exclusively of agents 
1007: %
1008: %
1009: %
1010: participating in a single DC-net, then if an agent $i$ sends a message
1011: using the DC-net protocol, that action is totally anonymous. 
1012: %
1013: %
1014: %
1015: %
1016: %
1017: %
1018: (Chaum proves this,
1019: under the assumption that any message could be generated by any user
1020: in the system.)
1021: %
1022: %
1023: %
1024: %
1025: Note that in the dining cryptographer's example, total anonymity and
1026: $\delta$-total anonymity agree, because who paid is decided 
1027: before the protocol starts. 
1028: 
1029: %
1030: %
1031: %
1032: %
1033: It is easy to show that if an
1034: action is totally anonymous, then it must be minimally anonymous as well,
1035: as long as two simple requirements are satisfied. First, there must be
1036: at least 3 agents in the system. 
1037: (A college student with only one roommate can't leave out her dirty
1038: dishes anonymously,  
1039: but a student with at least two roommates might be able to.)
1040: Second, it must be the case that $a$ 
1041: can be performed only once in a given
1042: run of the system. Otherwise, 
1043: it might be possible for $j$ to think that
1044: any agent $i' \not= i$ could have performed $a$, 
1045: but for $j$ to {\em know}
1046: that agent $i$ did, indeed, perform $a$.
1047: For example, consider a system with three agents besides $j$.  Agent $j$
1048: might know 
1049: that all three of the other agents performed action $a$.  In that case, in
1050: particular, $j$ knows that $i$ performed $a$, so action $a$ performed
1051: by $i$ is not minimally anonymous with respect to $j$, but is totally anonymous.
1052: We anticipate that this assumption will typically be met in practice.  
1053: It is certainly consistent 
1054: with examples of anonymity given in the 
1055: literature. (See, for example,~\cite{Chaum88,schneider96}).
1056: In any case, if it is not met, it is possible to tag 
1057: occurrences of an action (so that we can talk about the $k$th time $a$
1058: is performed). Thus,
1059: we can talk about the $i$th occurrence of an action being
1060: anonymous. Because the $i$th occurrence of an action can only happen
1061: once in any given run, our requirement is satisfied.
1062: 
1063: 
1064: \pro\label{pro:minimally}
1065: Suppose that there are at least three agents in the interpreted system $\I$
1066: and that 
1067: $$\I \models \bigwedge_{i \not= j} \neg [ \theta(i,a) \wedge \theta(j,a) ].$$
1068: If action $a$, performed by agent $i$, is totally anonymous with respect to $j$,
1069: then it is minimally anonymous as well.
1070: \epro
1071: 
1072: %
1073: \prf
1074: Suppose that action $a$ is totally anonymous. Because there are three agents in
1075: the system, there is some agent $i'$ other than $i$ and $j$, and by total 
1076: anonymity, $\I \models \theta(i,a) \Rightarrow P_j[\theta(i',a)] $.
1077: If $(\I,r,m) \models \neg \theta(i,a)$, clearly $(\I,r,m) \models \neg K_j[\theta(i,a)]$.
1078: Otherwise, $(\I,r,m) \models P_j[\theta(i',a)]$ by total anonymity. Thus, there
1079: exists a point $(r',m')$ such that $r'_j(m') = r_j(m)$ and $(\I,r',m') \models \theta(i',a)$.
1080: By our assumption, $(\I,r',m') \models \neg \theta(i,a)$, because $i \not= i'$.
1081: Therefore, $(\I,r,m) \models \neg \K_j[\theta(i,a)]$. It follows that $a$
1082: is minimally anonymous with respect to $j$.
1083: \eprf
1084: 
1085: Definitions~\ref{dfn:minimalanon1} and~\ref{dfn:totalanon1} are
1086: conceptually similar, even 
1087: though the latter definition is much stronger. 
1088: Once again, there is a set of formulas that an
1089: observer is not allowed to know. With the earlier definition, there is only one
1090: formula in this set: $\theta(i,a)$. As long as $j$ doesn't know that $i$ performed
1091: action $a$, this requirement is satisfied. With total anonymity, there
1092: are more formulas that $j$
1093: is not allowed to know: they
1094: take the form $\neg \theta(i',a)$. Before,
1095: we could guarantee only
1096: that $j$ did not know that $i$ did the action; here, for many agents $i'$,
1097: we guarantee that $j$ does not know that $i'$ did {\em not} do the action. The
1098: definition is made slightly more complicated by the implication, which restricts
1099: the conditions under which $j$ is not allowed to know $\neg
1100: \theta(i',a)$. (If $i$ 
1101: didn't actually perform the action, we don't care what $j$ thinks,
1102: since we are concerned only with anonymity with respect to $i$.) 
1103: But the basic idea is the same.
1104: 
1105: Note that total anonymity does {\em not\/} necessarily follow from total
1106: %
1107: %
1108: %
1109: %
1110: secrecy, because  
1111: the formula $\neg \theta(i',a)$, for $i' \ne i$, does not, in general,
1112: %
1113: %
1114: depend only on the local 
1115: state of $i$. It is therefore perfectly consistent
1116: with the definition of total secrecy 
1117: for $j$ to learn this fact,
1118: %
1119: in violation of total anonymity.
1120: (Secrecy, of course, does not follow from anonymity, because secrecy
1121: requires that 
1122: many more facts be hidden than simply whether $i$ performed a given action.)
1123: 
1124: Total anonymity is a very strong requirement.
1125: Often, an action will not be totally anonymous, but
1126: only anonymous up to some set of agents who could have performed the action.
1127: This situation merits a weaker definition of anonymity.
1128: To be more precise, let $I$ be the set of all agents of the system and 
1129: suppose that we have some 
1130: %
1131: %
1132: %
1133: set $I_A \subseteq I$---an ``anonymity set'', using the terminology
1134: %
1135: %
1136: %
1137: %
1138: %
1139: %
1140: %
1141: of Chaum~\citeyear{Chaum88} and 
1142: Pfitzmann and K\"ohntopp~\citeyear{terminology}---%
1143: of agents who can perform some action. We can define anonymity in
1144: terms of this set.
1145: 
1146: \dfn
1147: \label{dfn:anonset1}
1148: Action $a$, performed by agent $i$, is 
1149: {\em anonymous up to $I_A \subseteq I$} with respect to $j$ if 
1150: \[ \I \models \theta(i,a) \Rightarrow \bigwedge_{i' \in I_A} P_j [\theta(i',a)]. \]
1151: \edfn
1152: %
1153: %
1154: %
1155: %
1156: %
1157: %
1158: %
1159: 
1160: %
1161: %
1162: In the anonymous message-passing system Herbivore~\cite{goel02}, users are
1163: organized into {\em cliques} $C_1, \ldots, C_n$, each of which uses the dining cryptographers
1164: protocol \cite{Chaum88} for anonymous message-transmission. If a user wants to send
1165: an anonymous message, she can do so through her clique. 
1166: %
1167: %
1168: Herbivore claims
1169: that any user $i$ is
1170: able to send a message anonymously up to $C_j$, where $i \in C_j$. As the size of
1171: a user's clique varies, so does the strength of the anonymity guarantees provided
1172: by the system.
1173: %
1174: %
1175: %
1176: %
1177: 
1178: In some situations, 
1179: it is not necessary that there be a fixed 
1180: %
1181: anonymity set,
1182: as in Definition~\ref{dfn:anonset1}.  It suffices that, at all times,
1183: there {\em exists} some 
1184: %
1185: %
1186: anonymity set
1187: with at least, say, $k$ agents. This leads to a definition of $k$-anonymity.
1188: 
1189: \dfn
1190: \label{dfn:anonset2}
1191: Action $a$, performed by agent $i$, is 
1192: {\em $k$-anonymous} with respect to $j$ if
1193: \[ \I \models \theta(i,a) \Rightarrow \bigvee_{\{I_A: |I_A| = k\}}
1194: \bigwedge_{i' \in I_A} P_j 
1195: [\theta(i',a)]. \] 
1196: \edfn
1197: 
1198: This definition says that at any point $j$ must think it possible that
1199: %
1200: %
1201: any of at least $k$ agents might perform, or have performed, the action.
1202: %
1203: %
1204: %
1205: Note that the set of $k$ agents might be different in different runs,
1206: making this condition strictly weaker than anonymity up to a particular
1207: set of size $k$.
1208: 
1209: %
1210: %
1211: A number of systems have been proposed that provide $k$-anonymity for
1212: some $k$.
1213: In the anonymous communications network protocol recently proposed by
1214: von Ahn, Bortz, and Hopper \cite{hopper03}, users can 
1215: %
1216: %
1217: %
1218: send 
1219: %
1220: messages with guarantees of $k$-anonymity. In the system $P^5$
1221: (for 
1222: ``Peer-to-Peer Personal Privacy Protocol'') \cite{sherwood02}, 
1223: users join a logical broadcast
1224: tree that provides anonymous communication, and users can choose what
1225: level 
1226: of $k$-anonymity they want, given that $k$-anonymity for a higher value of $k$
1227: makes communication more inefficient.
1228: %
1229: %
1230: %
1231: %
1232: Herbivore \cite{goel02} provides anonymity using 
1233: %
1234: %
1235: cliques of DC-nets.  If
1236: %
1237: the system guarantees that
1238: the cliques all have a size of at least $k$, 
1239: so that regardless of clique
1240: composition, there are at least $k$ users capable of sending any
1241: anonymous message, then Herbivore guarantees $k$-anonymity.
1242: 
1243: 
1244: %
1245: %
1246: %
1247: %
1248: %
1249: \subsection{A More Detailed Example: Dining Cryptographers}\label{sec:diningexample}
1250: 
1251: A well-known example of anonymity in the computer security literature
1252: is Chaum's ``dining cryptographers problem''~\cite{Chaum88}. In the 
1253: original description of this problem, three cryptographers sit down to
1254: dinner and are informed by the host that someone has already paid the bill 
1255: anonymously. The cryptographers decide that the bill was paid either by
1256: one of the three people in their group, or by an outside agency such as
1257: the NSA. They want to find out which of these two situations is the actual
1258: one while preserving the anonymity of the cryptographer who (might have) paid.
1259: Chaum provides a protocol that the cryptographers can use to solve this 
1260: problem. To guarantee that it works, however, it would be nice to check
1261: that anonymity conditions hold. Assuming we have a system that includes
1262: a set of three cryptographer agents $C = \{0, 1, 2\}$, 
1263: as well as an outside 
1264: observer agent $o$, the protocol should guarantee that for each agent $i
1265: \in C$, 
1266: and each agent $j \in C - \{i\}$,
1267: the act of paying is anonymous up to $C - \{j\}$ with respect to $j$.
1268: For an outside observer $o$, 
1269: %
1270: i.e., an agent other than one of three cryptographers,
1271: the protocol should guarantee that for
1272: each agent $i \in C$, the protocol is anonymous up to $C$ with respect to
1273: $o$. This can be made precise using our definition of anonymity up to a set.
1274: 
1275: Because the requirements are symmetric for each of the three cryptographers,
1276: %
1277: %
1278: %
1279: we can 
1280: %
1281: %
1282: describe the anonymity specification compactly by naming
1283: the agents using 
1284: modular arithmetic. We use $\oplus$ to denote addition mod 3. 
1285: %
1286: %
1287: %
1288: %
1289: %
1290: %
1291: Let the interpreted system $(\I = (\R,\pi)$ represent the possible runs
1292: of one instance of the 
1293: dining cryptographers protocol, where the interpretation
1294: $\pi$ interprets formulas of the form 
1295: %
1296: %
1297: $\theta(i,\mbox{``paid''})$
1298: in the obvious way. 
1299: The following knowledge-based requirements comprise the anonymity portion of the protocol's
1300: specification, for each agent $i \in C$:
1301: $$
1302: \begin{array}{ll}
1303: %
1304: %
1305: \I \models \theta(i,\mbox{``paid''}) &\Rightarrow 
1306:   P_{i \oplus 1} \theta(i \oplus 2,\mbox{``paid''}) 
1307:    \wedge~ P_{i \oplus 2} \theta(i \oplus 1,\mbox{``paid''})\\ 
1308:   &\wedge~ P_o \theta(i \oplus 1,\mbox{``paid''}) 
1309:    \wedge~ P_o \theta(i \oplus 2,\mbox{``paid''}). 
1310: \end{array}
1311: $$
1312: %
1313: %
1314: %
1315: %
1316: This means that if a cryptographer paid, then each of the other
1317: cryptographers must think it possible that the third cryptographer
1318: could have paid. In addition, an outside observer must think it 
1319: %
1320: %
1321: %
1322: possible that either of the other two cryptographers could have paid.
1323: 
1324: 
1325: 
1326: 
1327: \section{Probabilistic Variants of Anonymity}\label{sec:probanon}
1328: 
1329: \subsection{Probabilistic Anonymity}
1330: 
1331: All of the definitions presented in Section~\ref{sec:anonymity} were nonprobabilistic.
1332: As we mentioned in the introduction,
1333: this is a serious problem for the ``how well is information hidden'' component of
1334: the definitions. For all the definitions we gave, it was necessary only that 
1335: observers think it {\em possible} that multiple agents could have performed the
1336: anonymous action. However, an event that is possible may nonetheless be 
1337: extremely unlikely. Consider our definition of total anonymity 
1338: (Definition~\ref{dfn:totalanon1}). It states that an action performed by $i$
1339: is totally anonymous if the observer $j$ thinks it could have been performed
1340: by any agent other than $j$. This may seem like a strong requirement, but if there
1341: are, say, $102$ agents, 
1342: and $j$ can determine that $i$ performed action $a$
1343: with probability $0.99$ and that each of the other agents performed action
1344: $a$ with probability $0.0001$, agent $i$ might not be very happy with the
1345: guarantees provided by total anonymity.
1346: Of course, the appropriate notion of anonymity will depend on the
1347: application:
1348: $i$ might be content to know that no agent can {\em prove} that she
1349: performed the anonymous action. In that case, it
1350: might suffice for the action to be only
1351: minimally anonymous. 
1352: However, in many other cases, an agent might want a more quantitative,
1353: probabilistic guarantee that it will be considered reasonably
1354: likely that other agents could have performed the action. 
1355: 
1356: Adding probability to the runs and systems framework is straightforward.
1357: The approach we use goes back to \cite{HT}, and was also used in our
1358: work on secrecy \cite{HalOn02}, so we just briefly review the relevant
1359: details here.
1360: Given a system $\R$, suppose we have a probability measure $\mu$ on the runs
1361: of $\R$. The pair $(\R, \mu)$ is a {\em probabilistic system}. For simplicity,
1362: we assume that every subset of $\R$ is measurable. We are interested
1363: in the probability that an agent assigns to an event at the point $(r,m)$.
1364: For example, we may want to know that at the point $(r,m)$, 
1365: observer $i$ places a probability of $0.6$
1366: on $j$'s having performed some particular action. We want to condition
1367: the probability $\mu$ on $\K_i(r,m)$, the information that $i$ has
1368: at the point $(r,m)$. The problem is that $\K_i(r,m)$ is a set of {\em
1369: points}, while $\mu$ is a probability on {\em runs}.   
1370: This problem is dealt with as follows.
1371: 
1372: Given a set $U$ of points, let $\R(U)$ consist of the runs in $\R$ going through
1373: a point in $U$. That is, 
1374: $$\R(U) = \{r \in \R: (r,m) \in U \mbox{ for some $m$}\}.$$
1375: The idea will be to condition $\mu$ on $\R(\K_i(r,m))$ rather than on 
1376: $\K_i(r,m)$.  To make sure that conditioning is well defined, 
1377: we assume that $\mu(\R(\K_i(r,m))) > 0$ for each agent $i$, run $r$, 
1378: and time $m$. That is, $\mu$ assigns positive probability to the set of
1379: runs in $\R$ compatible with what happens in run $r$ up to time $m$,
1380: as far as agent $i$ is concerned.  
1381: 
1382: With this assumption, we can define a measure $\mu_{r,m,i}$ on the
1383: points in $\K_i(r,m)$ as follows.  If $\S \subseteq \R$, define
1384: $\K_i(r,m)(\S)$ to be the set of points in $\K_i(r,m)$ that lie on runs
1385: in $\S$; that is,
1386: $$\K_i(r,m)(\S) = \{(r',m') \in \K_i(r,m): r' \in \S\}.$$
1387: %
1388: %
1389: %
1390: %
1391: %
1392: %
1393: %
1394: %
1395: %
1396: %
1397: %
1398: %
1399: %
1400: %
1401: %
1402: %
1403: Let $\F_{r,m,i}$, the measurable subsets of $\K_i(r,m)$ (that is, the
1404: sets to which $\mu_{r,m,i}$ assigns a probability), consist of all sets of the
1405: form $\K_i(r,m)(\S)$, where $\S \subseteq \R$.  Then define 
1406: $\mu_{r,m,i}(\K_i(r,m)(\S)) = \mu(\S \mid \R(\K_i(r,m))$.
1407: It is easy to check that $\mu_{r,m,i}$ is a probability measure,
1408: essentially defined by conditioning.
1409: 
1410: %
1411: %
1412: %
1413: %
1414: Define a {\em probabilistic interpreted system\/} $\I$ to be a tuple
1415: $(\R,\mu,\pi)$, where $(\R,\mu)$ is a probabilistic system.
1416: %
1417: In a probabilistic interpreted system, we can give semantics to
1418: syntactic statements of probability. 
1419: Following \cite{FHM}, we will be most interested in formulas of the form
1420: $\Pr_i (\phi ) \leq \alpha$ 
1421: (or similar formulas with $>$, $<$, or $=$
1422: instead of $\leq$). Intuitively, a 
1423: formula such as $\Pr_i (\phi) \leq \alpha$ is true 
1424: at a point $(r,m)$ if, according to $\mu_{r,m,i}$, the probability that
1425: $\phi$ is true is at 
1426: %
1427: %
1428: most
1429: $\alpha$.  More formally,
1430: $  (\I,r,m) \models {\Pr}_i(\phi) \leq \alpha $ if
1431: %
1432: %
1433: %
1434: \[  \mu_{r,m,i}(\{(r',m') \in \K_i(r,m) : (\I,r',m') \models \phi \})
1435: \leq \alpha. \] 
1436: Similarly, we can give semantics to $\Pr_i(\phi) < \alpha$ and
1437: $\Pr(\phi) = \alpha$, as well as conditional formulas such as
1438: $\Pr(\phi \, | \, \psi) \le \alpha$.
1439: Note that although these formulas talk
1440: about probability, they are either true or false at a given state.  
1441: 
1442: %
1443: %
1444: %
1445: %
1446: %
1447: %
1448: %
1449: %
1450: %
1451: %
1452: %
1453: %
1454: %
1455: %
1456: %
1457: %
1458: %
1459: %
1460: 
1461: It is straightforward to define probabilistic notions of anonymity in
1462: probabilistic systems.  
1463: %
1464: %
1465: %
1466: %
1467: %
1468: %
1469: %
1470: %
1471: %
1472: %
1473: We can think of Definition~\ref{dfn:minimalanon1}, for example, as saying
1474: %
1475: %
1476: that $j$'s probability that
1477: $i$ 
1478: %
1479: %
1480: performs
1481: the anonymous action $a$ must be less than 1
1482: (assuming that every nonempty set has positive probability).
1483: This can be generalized
1484: by specifying some $\alpha \leq 1$
1485: and requiring that the probability of $\theta(i,a)$ be less than $\alpha$.
1486: 
1487: \dfn
1488: \label{dfn:alphaanon1}
1489: Action $a$, performed by agent $i$, is {\em $\alpha$-anonymous}
1490: with respect to agent $j$ if 
1491: $\I \models \theta(i,a) \Rightarrow {\Pr}_j [\theta(i,a)] < \alpha$. 
1492: \edfn
1493: 
1494: %
1495: Note that if we replace $\theta(i,a)$ by $\delta(i,a)$ in
1496: Definition~\ref{dfn:alphaanon1}, the resulting notion might not be well
1497: defined.  The problem is 
1498: %
1499: %
1500: %
1501: that the set
1502: %
1503: $$\{(r',m') \in \K_i(r,m): (\I,r',m') \sat \delta(i,a) \}$$ 
1504: %
1505: may not be
1506: measurable; it may not have the form $\K_i(r,m)(\S)$ for some $\S
1507: \subseteq \R$.  The problem does not arise if $\I$ is a {\em
1508: synchronous\/} sytem (in which case $i$ knows that time, and all the
1509: points in $\K_i(r,m)$ are of the form $(r',m)$), but it does arise if
1510: $\I$ is asynchronous.  We avoid this technical problem by working with
1511: $\theta(i,a)$ rather than $\delta(i,a)$.
1512: 
1513: %
1514: 
1515: %
1516: %
1517: Definition~\ref{dfn:alphaanon1}, unlike
1518: Definition~\ref{dfn:minimalanon1},
1519: includes an implication involving $\theta(i,a)$. It is easy
1520: to check that Definition~\ref{dfn:minimalanon1} does not change
1521: when such an implication is added; intuitively, if $\theta(i,a)$
1522: is false then $\neg K_j[\theta(i,a)]$ is trivially true. 
1523: Definition~\ref{dfn:alphaanon1}, however, would change if we removed
1524: the implication,
1525: because it might be possible for $j$ to have a high probability
1526: of $\theta(i,a)$ even though it isn't true. We include the implication
1527: because without it, we place constraints on what $j$ thinks about 
1528: %
1529: %
1530: %
1531: $\theta(i,a)$ even if $i$ has not performed the action $a$ and
1532: will not
1533: perform it in the future. Such a requirement,
1534: %
1535: while interesting,
1536: seems more akin to ``unsuspectibility'' than to anonymity.
1537: 
1538: 
1539: %
1540: %
1541: %
1542: %
1543: %
1544: Two of the notions of probabilistic anonymity considered by 
1545: Reiter and Rubin \citeyear{reiter98} in the context of their Crowds
1546: system can be understood in terms of $\alpha$-anonymity.
1547: Reiter and Rubin 
1548: say that a sender has {\em probable innocence}
1549: if, from an observer's point of view, the sender ``appears no more likely to be the
1550: originator than to not be the originator''. This is simply 0.5-anonymity. 
1551: %
1552: (Under reasonable assumptions, Crowds provides 0.5-anonymity for Web requests.)
1553: Similarly, a sender has {\em possible innocence} if, from an observer's point of
1554: view, ``there is a nontrivial probability that the real sender is someone else''.
1555: This corresponds to minimal anonymity (as defined in Section~\ref{sec:dfns}), or
1556: to $\epsilon$-anonymity for some nontrivial value of $\epsilon$.
1557: %
1558: %
1559: %
1560: 
1561: It might seem at first that Definition~\ref{dfn:alphaanon1} should be the only
1562: definition of anonymity we need: as 
1563: long as $j$'s probability of 
1564: %
1565: %
1566: $i$ performing the action is low
1567: enough, $i$ should have nothing to worry about. 
1568: However, with further thought, it is not hard to see that this is not
1569: the case.  
1570: 
1571: Consider a scenario where there are 1002 agents, and where $\alpha = 0.11$.
1572: Suppose that the probability, according to Alice, that Bob 
1573: %
1574: %
1575: performs
1576: the
1577: action is $.1$, but that her probability that any of the other $1000$ agents 
1578: %
1579: %
1580: performs 
1581: the action is $0.0009$ (for each agent). Alice's probability that Bob 
1582: %
1583: %
1584: performs
1585: the
1586: action is small, but her probability that anyone else 
1587: %
1588: %
1589: performs
1590: %
1591: %
1592: it is
1593: more than three orders of magnitude smaller. 
1594: %
1595: %
1596: %
1597: Bob is obviously the prime suspect.
1598: 
1599: %
1600: %
1601: This concern was addressed by Serjantov and Danezis~\citeyear{Serj02}
1602: in their paper
1603: on information-theoretic definitions of anonymity. They consider
1604: %
1605: %
1606: the probability that each agent in an anonymity set is the sender
1607: of some anonymous message, and use entropy to quantify the amount of
1608: %
1609: %
1610: information that the system is leaking;
1611: %
1612: %
1613: %
1614: %
1615: Diaz et al.~\citeyear{Diaz02} and 
1616: Danezis~\citeyear{danezis:pet2003} use similar techniques.
1617: In this paper we are not
1618: concerned with quantitative measurements of anonymity, but we
1619: %
1620: %
1621: do agree that it is worthwhile to consider stronger
1622: notions of anonymity than the nonprobabilistic definitions, or
1623: even $\alpha$-anonymity, can provide.
1624: %
1625: We hope to examine quantitative definitions in future work.
1626: 
1627: %
1628: %
1629: %
1630: %
1631: %
1632: The next definition strengthens
1633: Definition~\ref{dfn:alphaanon1} in the way that
1634: Definition~\ref{dfn:totalanon1} strengthens Definition~\ref{dfn:minimalanon1}.
1635: It requires that no agent in the 
1636: %
1637: %
1638: anonymity
1639: set be a more
1640: likely suspect than any other.
1641: 
1642: 
1643: \dfn
1644: \label{dfn:probanonset1}
1645: Action $a$, performed by agent $i$, is {\em strongly 
1646: probabilistically anonymous up to 
1647: $I_A$\/}  with respect to 
1648: agent $j$ if for each $i' \in I_A$,
1649: %
1650: %
1651: \[ \I \models \theta(i,a) \Rightarrow {\Pr}_j [\theta(i,a)] = {\Pr}_j [\theta(i',a)]. \]
1652: \edfn
1653: 
1654: Depending on the size of $I_A$, this definition can be extremely
1655: strong. It does not 
1656: state simply that for all agents in $I_A$, the observer must think it is
1657: reasonably likely 
1658: that the agent could have performed the action; it also says that the
1659: observer's probabilities must be the same for each such agent. 
1660: Of course, we could weaken the definition somewhat by not requiring that
1661: all the probabilities be equal, but by instead requiring
1662: that they be approximately equal
1663: (i.e., that their difference be small or that their ratio be close to 1).  
1664: %
1665: %
1666: %
1667: %
1668: %
1669: %
1670: %
1671: Reiter and Rubin \citeyear{reiter98}, for example, say that the sender of a
1672: message is {\em beyond suspicion} if she ``appears no more likely to
1673: be the originator of that message than any other potential sender in the system''.
1674: In our terminology, $i$ is beyond suspicion with respect to $j$ if for
1675: each $i' \in I_A$,
1676: %
1677: %
1678: %
1679: %
1680: \[ \I \models \theta(i,a) \Rightarrow {\Pr}_j [\theta(i,a)] \leq {\Pr}_j [\theta(i',a)]. \]
1681: This is clearly weaker than strong probabilistic anonymity, but still a very
1682: strong requirement,
1683: %
1684: and perhaps more reasonable, too.
1685: %
1686: %
1687: %
1688: %
1689: Our main point is that a wide variety of properties can be expressed
1690: clearly and succinctly in our framework.
1691: %
1692: %
1693: %
1694: 
1695: \subsection{Conditional Anonymity}
1696: 
1697: While we have shown that many useful notions of anonymity---including many definitions
1698: that have already been proposed---can be expressed
1699: in our framework, we claim
1700: that there are some important intuitions that have not yet been captured.
1701: Suppose, for example, that someone
1702: makes a \$5,000,000 donation to Cornell University. It is
1703: clearly not the case 
1704: %
1705: that
1706: everyone is equally likely, or even almost equally likely, to
1707: have made the donation.
1708: Of course, we could take the 
1709: %
1710: %
1711: anonymity
1712: set $I_A$ to
1713: consist of those people who might be in a position to make such a large donation,
1714: and insist that they all be considered equally likely. Unfortunately,
1715: even that is unreasonable: a priori, some of them may already have known
1716: connections to Cornell, and thus 
1717: be considered far more likely to
1718: have made the donation. All that an anonymous donor
1719: can reasonably expect is that nothing an
1720: observer learns from his interactions with the environment (e.g., 
1721: reading the newspapers, noting when the donation was made, etc.)~will
1722: give him more information about the identity of the donor than he
1723: already had.
1724: 
1725: For another example, consider a conference or research journal 
1726: that provides anonymous reviews to researchers who submit their papers for 
1727: publication. 
1728: It is unlikely that the review process provides anything like
1729: $\alpha$-anonymity for a small $\alpha$, or strongly probabilistic
1730: anonymity up to some reasonable set.
1731: %
1732: %
1733: %
1734: When 
1735: %
1736: the preliminary version of
1737: this paper, for example, was accepted by 
1738: %
1739: %
1740: %
1741: the Computer Security Foundations Workshop,
1742: the acceptance notice included three reviews that were, in our terminology,
1743: anonymous up
1744: to the program committee. That is, any one of the reviews we received
1745: could have been written by any of the members of the program committee. However,
1746: by reading some of the reviews, we were able to make fairly good guesses 
1747: as to
1748: which committee members had provided which reviews, based on our knowledge
1749: of the specializations of the various members, and based on the
1750: content of the reviews themselves. 
1751: Moreover, we had a fairly good idea of which committee members
1752: would provide reviews of our paper even before we received the
1753: reviews.
1754: Thus, it seems unreasonable to hope that the review process would
1755: provide strong probabilistic anonymity (up to the program committee),
1756: or even some weaker variant of probabilistic anonymity.
1757: Probabilistic anonymity would require the reviews to convert our prior
1758: beliefs, according to which
1759: some program committee members were more likely than others to be
1760: reviewers 
1761: of our paper, to posterior beliefs 
1762: according to which all program committee members were equally likely! 
1763: This does not seem at all reasonable.  
1764: However, the reviewers might hope that 
1765: that the process did not
1766: give us any more information than we already had.
1767: 
1768: %
1769: %
1770: %
1771: %
1772: %
1773: %
1774: %
1775: In our paper on secrecy \cite{HalOn02},
1776: we tried to capture 
1777: the intuition that, when an unclassified user interacts with a secure
1778: system, she does not learn anything about any classified user 
1779: that she didn't already know.  We did 
1780: this formally by requiring that, for any three 
1781: points $(r,m)$, $(r',m')$, and $(r'',m'')$, 
1782: \begin{equation}\label{secrecy}
1783: \mu_{(r,m,j)}(\K_i(r'',m'')) = \mu_{(r',m',j)}(\K_i(r'',m'')).
1784: \end{equation}
1785: That is, whatever the unclassified user $j$ sees, her probability of
1786: any particular classified state will remain unchanged.
1787: %
1788: %
1789: %
1790: %
1791: %
1792: 
1793: %
1794: %
1795: %
1796: %
1797: %
1798: 
1799: When defining anonymity, we are not concerned with protecting 
1800: all information about some agent $i$, but rather the fact that $i$
1801: %
1802: %
1803: performs
1804: some particular action $a$. 
1805: Given a probabilistic system $\I = (\R,\pi,\mu)$ and a formula $\phi$,
1806: let $e_r(\phi)$ 
1807: consist of the set of runs $r$ such 
1808: that $\phi$ is true at some point in $r$, and let $e_p(\phi)$ be the set
1809: of points where $\phi$ 
1810: is true.  That is
1811: $$\begin{array}{l}
1812: e_r(\phi) = \{ r : \exists m ( (\I,r,m) \models \phi) \}, \\
1813: e_p(\phi) = \{ (r,m) : (\I,r,m) \models \phi \}.
1814: \end{array}
1815: $$
1816: The most obvious analogue to (\ref{secrecy}) is the requirement that,
1817: for all points $(r,m)$ and $(r',m')$, 
1818: $$\mu_{(r,m,j)}(e_p(\theta(i,a))) = \mu_{(r',m',j)}(e_p(\theta(i,a))).$$
1819: This definition says that $j$ never learns anything about the
1820: probability that $i$ 
1821: %
1822: performed 
1823: performs
1824: $a$: 
1825: she always ascribes the same probability to this event.
1826: In the context of our anonymous donation example, 
1827: this would say that the probability (according to $j$) of $i$ donating
1828: \$5,000,000 to Cornell is the same at all times.
1829: 
1830: The problem with this definition is that it does not allow 
1831: $j$ to learn that {\em someone\/} donated \$5,000,000 to Cornell.  That is, before
1832: $j$ learned that someone donated \$5,000,000 to Cornell, $j$ may have
1833: thought it was unlikely that anyone would donate that much money to
1834: Cornell.  We cannot expect that $j$'s probability of $i$ donating
1835: \$5,000,000 would be the same both before and after learning that
1836: someone made a donation. 
1837: We want to give a definition of conditional anonymity that
1838: allows observers to learn that an action has been performed,
1839: but that protects---as much as possible, given the system---the
1840: fact that some particular agent 
1841: %
1842: performed 
1843: performs
1844: the action. If, on
1845: the other hand, the anonymous action has not been performed,
1846: then the observer's probabilities do not matter.
1847: 
1848: Suppose that $i$ wants to perform action $a$, and wants conditional anonymity
1849: with respect to $j$. Let
1850: $\theta(\overline{\jmath},a)$ represent the fact that $a$ has been performed
1851: by some agent other than 
1852: $j$, i.e., $\theta(\overline{\jmath},a) =
1853: \lor_{i' \ne j} \theta(i',a)$.
1854: %
1855: %
1856: The definition of conditional anonymity says that $j$'s prior probability 
1857: of $\theta(i,a)$ given $\theta(\overline{\jmath},a)$ must be the
1858: same as his posterior probability of $\theta(i,a)$ at points where $j$ knows 
1859: $\theta(\overline{\jmath},a)$, i.e., at points where $j$
1860: knows that someone other than $j$ has performed 
1861: %
1862: (or will perform)
1863: $a$.
1864: %
1865: %
1866: %
1867: %
1868: %
1869: %
1870: %
1871: Let $\alpha = \mu(e_r(\theta(i,a)) \mid e_r(\theta(\overline{\jmath},a)))$. This is the prior 
1872: probability that $i$ has performed $a$, given that somebody other than
1873: $j$ has. 
1874: Conditional anonymity says that at any point where 
1875: $j$ knows that someone other than $j$ 
1876: %
1877: %
1878: performs
1879: $a$, 
1880: $j$'s probability of $\theta(i,a)$ must be $\alpha$. 
1881: In other words, $j$ shouldn't be able to learn anything more
1882: about who 
1883: %
1884: %
1885: performs
1886: $a$ (except that 
1887: %
1888: %
1889: somebody does) 
1890: than he know before he began interacting with the system in the first place.
1891: 
1892: \dfn\label{dfn:conditionalanon}
1893: Action $a$, performed by agent $i$, is {\em conditionally anonymous} with
1894: respect to $j$ in the probabilistic system $\I$ if 
1895: $$
1896: %
1897: %
1898: %
1899: \I \models K_j\theta(\overline{\jmath},a) \Rightarrow
1900: {\Pr}_j(\theta(i,a)) = \mu(e_r(\theta(i,a)) \mid 
1901: e_r(\theta(\overline{\jmath},a))). 
1902: %
1903: $$
1904: \edfn
1905: Note that if only one agent ever performs $a$, then
1906: $a$ is trivially conditionally anonymous with respect to $j$, but may
1907: not be minimally anonymous with respect to $j$.  
1908: Thus, conditional anonymity does not necessarily imply minimal anonymity.
1909: 
1910: In Definition~\ref{dfn:conditionalanon}, we implicitly
1911: assumed that agent $j$ was allowed to learn that someone other than $j$
1912: performed action $a$; anonymity is
1913: intended to hide {\em which\/} agent performed $a$, given that somebody did.
1914: More generally, we believe that
1915: we need to consider anonymity
1916: with respect to what an observer
1917: is allowed to learn.  
1918: We might want to specify, for example, that an observer is allowed to know
1919: that a donation was made, and for how much, or to learn the contents of a
1920: conference paper review. The following definition lets us do this formally.
1921: 
1922: 
1923: %
1924: %
1925: %
1926: %
1927: %
1928: \dfn\label{dfn:conditionalanon1}
1929: %
1930: %
1931: Action $a$, performed by agent $i$, is {\em conditionally anonymous} with
1932: respect to $j$ and $\phi$ in the probabilistic system $\I$ if 
1933: $$ \I \models K_j \phi \Rightarrow
1934: {\Pr}_j(\theta(i,a)) = \mu(e_r(\theta(i,a)) \mid e_r(\phi)). $$
1935: \edfn
1936: Definition~\ref{dfn:conditionalanon} is clearly the special case of
1937: Definition~\ref{dfn:conditionalanon1} where $\phi =
1938: \theta(\overline{j},a)$.  
1939: Intuitively, both of these definitions say that once an observer learns some
1940: fact $\phi$ connected to the fact $\theta(i,a)$, we require that she
1941: doesn't learn anything else that might change her probabilities of
1942: $\theta(i,a)$.
1943: 
1944: %
1945: %
1946: %
1947: %
1948: \subsection{Example: Probabilistic Dining Cryptographers}
1949: 
1950: Returning the dining cryptographers problem, suppose that it is well-known
1951: that one of the three cryptographers at the table is much more generous than
1952: the other two, and therefore more likely to pay for dinner. 
1953: %
1954: %
1955: %
1956: %
1957: %
1958: %
1959: %
1960: %
1961: %
1962: %
1963: %
1964: %
1965: Suppose, for example, that the probability 
1966: measure on the set of runs where the generous
1967: cryptographer has paid is 0.8, given that one of the cryptographers paid for
1968: dinner, and that it is 0.1 for each of the other two cryptographers.
1969: Conditional anonymity for each of the three cryptographers with respect to
1970: an outside observer means that when such observer learns that one of the
1971: cryptographers has paid for dinner, his probability that any of the three
1972: cryptographers paid should remain 0.8, 0.1, and 0.1. 
1973: %
1974: %
1975: %
1976: %
1977: If the one of the thrifty cryptographers paid, the generous
1978: cryptographer 
1979: should think that there is a probability of $0.5 = 0.1 / (0.1 + 0.1)$ 
1980: that either of the others paid.
1981: Likewise, if the generous cryptographer paid, each of the others should
1982: think that 
1983: there is a probability of $0.8 / (0.8 + 0.1)$ that the generous
1984: cryptographer paid and a 
1985: probability of $0.1 / (0.8 + 0.1)$ that the other thrifty cryptographer
1986: paid.
1987: %
1988: We can similarly calculate all the other relevant probabilities.
1989: 
1990: More generally, suppose that
1991: we have an intepreted probabilistic system 
1992: %
1993: %
1994: %
1995: %
1996: %
1997: %
1998: %
1999: $(\R,\mu,\pi)$ that represents
2000: instances of the dining cryptographers protocol, where the interpretation $\pi$
2001: once again interprets formulas of the form 
2002: $\theta(i,\mbox{``paid''})$ and $\theta(\overline{\jmath},\mbox{``paid''})$
2003: in the obvious way, and where the formula $\gamma$ is true if one of the cryptographers
2004: paid.  (That is, $\gamma$ is equivalent to $\bigvee_{i \in \{0,1,2\}} \theta(i,\mbox{``paid''})$.)
2005: For any cryptographer $i \in \{ 0, 1, 2 \}$, let $\alpha(i)$ be the prior probability that $i$ paid,
2006: %
2007: %
2008: %
2009: %
2010: given that somebody else did. That is, let
2011: $$\alpha(i) = \mu(e_r(\theta(i,\mbox{``paid''})) \mid e_r(\gamma)).$$
2012: In the more concrete example given above, if $0$ is the generous cryptographer, we would
2013: have $\alpha(0) = 0.8$ and $\alpha(1) = \alpha(2) = 0.1$.
2014: 
2015: For the purposes of conditional probability with respect to an agent $j$, 
2016: we are interested in the probability that some agent $i$ paid, given that somebody
2017: other than $j$ paid. Formally, for $i \not= j$, let
2018: $$\alpha(i,j) = \mu(e_r(\theta(i,\mbox{``paid''})) \mid e_r(\theta(\overline{\jmath},\mbox{``paid''}))).$$
2019: If an observer $o$ is not one of the three cryptographers, than $o$ didn't pay, 
2020: and we have $\alpha(i,o) = \alpha(i)$. Otherwise, if $i,j \in \{0,1,2\}$,
2021:  we can use conditioning to compute $\alpha(i,j)$:
2022: \[ \alpha(i,j) = \frac{\alpha(i)}{\alpha(j \oplus 1) + \alpha(j \oplus 2)}. \]
2023: (Once again, we make our definitions and requirements more compact
2024: by using modular arithmetic, where $\oplus$ denotes addition mod 3.) 
2025: 
2026: %
2027: %
2028: The following formula captures the requirement of conditional
2029: anonymity in the dining cryptographer's protocol, for each cryptographer
2030: $i$, with respect to the other cryptographers and any outside observers.
2031: %
2032: %
2033: %
2034: %
2035: %
2036: 
2037: $$
2038: \begin{array}{ll}
2039: \I \models & \left[ K_{i \oplus 1} \theta(\overline{i \oplus 1},\mbox{``paid''}) \Rightarrow {\Pr}_{i \oplus 1}(\theta(i,\mbox{``paid''})) = \alpha(i,i \oplus 1) \right] \wedge \\
2040: 	& \left[ K_{i \oplus 2} \theta(\overline{i \oplus 2},\mbox{``paid''}) \Rightarrow {\Pr}_{i \oplus 2}(\theta(i,\mbox{``paid''})) = \alpha(i,i \oplus 2) \right] \wedge \\
2041: 	& \left[ K_{o} \theta(\overline{o},\mbox{``paid''}) \Rightarrow
2042: 	{\Pr}_{o}(\theta(i,\mbox{``paid''})) = \alpha(i,o) \right]. 
2043: \end{array}
2044: $$
2045: 
2046: Chaum's original proof that the dining cryptographers protocol provides
2047: anonymity actually proves conditional anonymity in this general setting.
2048: Note that if the probability that one of the cryptographers will pay is 1,
2049: that cryptographer will have conditional anonymity even though he doesn't even
2050: have minimal anonymity.
2051: 
2052: 
2053: \subsection{Other Uses for Probability}
2054: 
2055: In the previous two subsections, we have emphasized how probability can be used
2056: to obtain definitions of anonymity stronger than those presented in 
2057: Section~\ref{sec:anonymity}. However, probabilistic systems can also be used
2058: to define interesting ways of weakening those definitions. Real-world
2059: anonymity systems do not offer absolute guarantees of anonymity such as those
2060: those specified by our definitions. Rather, they guarantee that a user's anonymity
2061: will be protected {\em with high probability}. In a given run, a user's anonymity
2062: might be protected or corrupted. If the probability of the event that a user's
2063: anonymity is corrupted is very small, i.e., the set of runs where her anonymity is
2064: not protected is assigned a very small probability by the measure $\mu$, this
2065: might be enough of a guarantee for the user to interact with the system.
2066: 
2067: Recall that we said that $i$ maintains total anonymity with respect to $j$
2068: if the fact $\phi = \theta(i,a) \Rightarrow \bigwedge_{i' \ne j} P_j [\theta(i',a)]$
2069: is true at every point in the system. Total anonymity is compromised in a run $r$ 
2070: if at some point $(r,m)$, $\neg \phi$ holds. Therefore, the set of runs where
2071: total anonymity is compromised is simply $e_r(\neg \phi)$, using the notation of
2072: the previous section. If $\mu(e_r(\neg \phi))$ is very small, then $i$ maintains
2073: total anonymity with very high probability. This analysis can obviously be extended
2074: to all the other definitions of anonymity given in previous sections.
2075: 
2076: Bounds such as these are useful for analyzing real-world systems. The Crowds 
2077: system \cite{reiter98}, for example, uses randomization when routing communication traffic,
2078: so that anonymity is protected with high probability. The probabilistic guarantees provided
2079: by Crowds were analyzed formally by 
2080: %
2081: %
2082: Shmatikov \citeyear{shmat02}, using a probabilistic model
2083: checker, and he demonstrates how the anonymity guarantees provided by the Crowds system change
2084: as more users (who may be either honest or corrupt) are added to the system. Shmatikov
2085: uses a temporal probabilistic logic to express probabilistic anonymity properties, so these
2086: properties can be expressed in our system framework.  (It is
2087: straightforward to give semantics to temporal operators in systems; see \cite{FHMV}.)
2088: In any case, Shmatikov's analysis of
2089: a real-world anonymity system is a useful example of how the formal methods that we advocate
2090: can be used to specify and verify properties of real-world systems.
2091: 
2092: \section{Related Work}\label{sec:related}
2093: 
2094: \subsection{Knowledge-based Definitions of Anonymity}
2095: As mentioned in the introduction, we are not the first to use knowledge
2096: to handle definitions of security, information hiding, or even anonymity.
2097: Anonymity has been formalized using epistemic logic by Syverson
2098: and 
2099: %
2100: %
2101: Stubblebine~\citeyear{syverson99}.  Like us, they use epistemic logic
2102: to characterize a number of 
2103: information-hiding requirements that involve anonymity.
2104: However, the focus of their work is very different from ours.
2105: They describe a logic for reasoning about anonymity
2106: and a number of axioms for the logic.
2107: An agent's knowledge is based, roughly speaking, on
2108: %
2109: his recent actions and observations, as well as
2110: what follows from his log of system events.
2111: The first five axioms that Syverson and Stubblebine
2112: give are the standard {\bf S5} axioms for knowledge.  There 
2113: are well-known soundness and completeness results relating the
2114: {\bf S5} axiom system to Kripke structure semantics for knowledge
2115: \cite{FHMV}. However, they give many more axioms, and 
2116: they do not attempt to give a semantics for which their axioms are
2117: sound. Our focus, on the other hand, is completely semantic.  We have not tried
2118: to axiomatize anonymity. Rather, we try to give an appropriate semantic
2119: framework in which to consider anonymity.
2120: 
2121: In some ways, 
2122: Syverson and Stubblebine's model is more detailed than the model used here.
2123: Their logic includes
2124: many formulas that represent various actions and facts, including the
2125: sending and 
2126: receiving of messages, details of encryption and keys, and so on. They also 
2127: make more assumptions about the local state of a given agent,
2128: including details about the sequence of actions that the agent has performed
2129: locally, a log of system events that have been recorded, and a set of facts
2130: of which the agent is aware. While these extra details may accurately
2131: reflect the 
2132: nature of agents in real-world systems, 
2133: they are orthogonal to our concerns here.  In any case,
2134: it would be easy to add
2135: such expressiveness to our model as well, simply by including these details in
2136: the local states of the various agents.
2137: 
2138: It is straightforward to relate our definitions to those of Syverson and
2139: Stubblebine. They consider facts of the form 
2140: $\phi(i)$, where $i$ is a principal, i.e., an agent. 
2141: They assume that the fact $\phi(i)$ is a single
2142: formula in which a single agent name occurs. Clearly, 
2143: $\theta(i,a)$ is an example of such a formula. 
2144: In fact, Syverson and Stubblebine assume
2145: that if $\phi(i)$ and $\phi(j)$ are both true, then $i = j$. 
2146: For the $\theta(i,a)$ formulas, this means
2147: that $\theta(i,a)$ and $\theta(i',a)$ cannot be simultaneously 
2148: true: at most one agent can perform an action in a given run, 
2149: exactly as in the setup of Proposition~\ref{pro:minimally}.
2150: 
2151: There is one definition in~\cite{syverson99}
2152: that is especially relevant to our discussion; the other relevant
2153: definitions presented there are similar.
2154: A system is said to satisfy
2155: {\em $(\geq k)$-anonymity} if the following formula is valid for
2156: some observer $o$: 
2157: $$
2158: \begin{array}{c}
2159: \phi(i) \Rightarrow P_o(\phi(i)) \wedge P_o(\phi(i_1)) 
2160: \wedge \cdots \wedge P_o(\phi(i_{k-1})). 
2161: \end{array}
2162: $$
2163: This definition says that if $\phi(i)$ holds, there must be at least $k$ agents, including
2164: $i$, that the observer suspects. (The existential quantification of the agents $i_1, \ldots, i_{n-1}$ is
2165: implicit.)
2166: The definition
2167: is essentially equivalent to our definition of $(k-1)$-anonymity.  
2168: It certainly implies that there are $k-1$ agents other than
2169: $i$ for which $\phi(i')$ might be true. On the other
2170: hand,
2171: if $P_o(\phi(i'))$ is true for $k-1$ agents other than $i$,
2172: then the formula must hold, 
2173: because
2174: $\phi(i) \rimp P_o(\phi(i))$ is valid.  
2175: 
2176: 
2177: \subsection{CSP and Anonymity}\label{sec:CSP}
2178: A great deal of work on the foundations of computer security has used process
2179: algebras such as CCS and CSP \cite{Mil,hoare85} as the basic system 
2180: framework \cite{focardi01,schneider96b}. Process algebras
2181: offer several advantages: they are simple, they can be used for specifying
2182: systems as well as system properties, and model-checkers 
2183: are available that can be used
2184: to verify properties of systems described using their formalisms.
2185: 
2186: Schneider and Sidiropoulos~\citeyear{schneider96} use CSP 
2187: both to characterize one type of anonymity and to describe variants of
2188: the dining cryptographers
2189: problem~\cite{Chaum88}. They then use a model-checker to verify 
2190: that their notion of anonymity holds for those variants of the problem.
2191: To describe their approach, we need to outline
2192: some of the basic notation and semantics of CSP. To save space, we give a
2193: simplified treatment of CSP here. 
2194: %
2195: %
2196: %
2197: (See Hoare~\citeyear{hoare85} for a complete description of CSP.) 
2198: The basic unit of
2199: CSP is the {\em event}. Systems are modeled in terms of the events that
2200: they can perform. Events may be built up several components. For example,
2201: ``donate.\$5''
2202: might represent a ``donate'' event in the amount of \$5. {\em Processes} are the
2203: systems, or components of systems, that are described using CSP. As a
2204: process
2205: unfolds or executes, various events occur. 
2206: For our purposes, we make the simplifying assumption that a process
2207: is determined by the event sequences it is able to engage in.
2208: 
2209: We can associate with every process a set of {\em traces}.  
2210: Intuitively, each trace in the set associated with process $P$
2211: represents one sequence of events that might occur during an execution
2212: of $P$.
2213: Informally, CSP event traces correspond to
2214: finite prefixes of runs, except that they do not explicitly describe the
2215: local states of agents and do not explicitly describe time.
2216: 
2217: Schneider and Sidiropoulos define a notion of anonymity with respect to
2218: a set $A$ of events.  Typically, $A$ consists of 
2219: %
2220: events 
2221: of the form
2222: $i.a$ for a fixed action $a$, where $i$ is an agent in some set that we
2223: denote $I_A$. 
2224: Intuively, anonymity with respect to $A$ means that if any event in $A$
2225: occurs, it could equally well have been any other event in $A$.
2226: In particular,
2227: this means that if an agent in $I_A$ performs $a$, it could
2228: equally well have been any other agent in $I_A$.
2229: Formally, 
2230: given a set $\Sigma$ of possible events and $A \subseteq \Sigma$, 
2231: let $f_A$ be a function on traces
2232: that, given a trace $\tau$, returns a trace $f_A(\tau)$ that is
2233: identical to $\tau$ except that every event in $A$ is replaced by a
2234: fixed event $\alpha \notin \Sigma$.  
2235: A process $P$ is {\em strongly anonymous} on $A$ if  
2236: $f_A^{-1}(f_A(P)) = P$,
2237: where we identify
2238: $P$ with its associated set of traces. This means that all the
2239: events in $A$ are interchangeable; by 
2240: replacing any event in $A$ with any other
2241: we would still get a valid trace of $P$.
2242: 
2243: Schneider and Sidiropoulos give several very simple examples that are
2244: useful for clarifying this definition of anonymity. One is a system
2245: where there are two agents who can provide donations to a charity,
2246: but where only one
2247: of them will actually do so. Agent $0$, if she gives a donation,
2248:  gives \$5, and agent $1$ gives \$10. This is followed by a ``thanks''
2249: from the charity. The events of interest are ``0.gives'' and ``1.gives''
2250: (representing events where $0$ and $1$ make a donation),
2251: ``\$5'' and ``\$10'' (representing the charity's receipt of the donation),
2252: ``thanks'', and ``STOP'' (to signify that the process
2253: has ended). There are two possible traces:
2254: \begin{enumerate}
2255: \item 0.gives $\rightarrow$ \$5 $\rightarrow$ ``thanks'' $\rightarrow$
2256: STOP.
2257: \item 1.gives $\rightarrow$ \$10 $\rightarrow$ ``thanks'' $\rightarrow$
2258: STOP.
2259: \end{enumerate}
2260: The donors require anonymity, and so we require that the CSP process is
2261: strongly anonymous on the set \{0.gives, 1.gives\}. In fact, this
2262: condition is not satisfied by the process, because ``0.gives'' and
2263: ``1.gives'' are not interchangeable. This is because ``0.gives'' must
2264: be followed by ``\$5'', while ``1.gives'' must be followed by ``\$10''.
2265: Intuitively, an agent who observes the traces can determine the
2266: donor by looking at the amount of money donated.
2267: 
2268: We believe that Schneider and Sidiropoulos's definition is best
2269: understood as trying to
2270: capture the intuition that an observer who sees all the events generated
2271: by $P$, except for events in $A$, does not know which event
2272: in $A$ occurred. 
2273: We can make this precise by translating Schneider and Sidiropoulos's
2274: definition into our framework.
2275: The first step is to associate with each process $P$ 
2276: a corresponding set of runs $\R_P$.
2277: We present one reasonable way of doing so here, which suffices for our
2278: purposes. 
2279: In future work, we hope to explore the connection between CSP and the runs and
2280: systems framework in more detail.
2281: 
2282: Recall that a run is an infinite sequence
2283: of global states of the form $(s_e, s_1, \ldots, s_n)$, where each $s_i$
2284: is the local state of
2285: agent $i$, and $s_e$ is the state of the environment.
2286: Therefore, to
2287: specify a set of runs, we need to describe the set of agents, and then
2288: explain how to derive the local states of each agent for each run.
2289: There is an obvious problem here: 
2290: CSP has no analogue of agents and local states.
2291: To get around this, we could simply tag all events
2292: with an agent (as Schneider and Sidiropoulos in fact do for the events
2293: in $A$).  
2294: However, for our current purposes, a much simpler approach will do.
2295: The only agent we care about is a (possibly mythical) observer who
2296: is able to observe every event except the ones in $A$.  Moreover, for
2297: events in $A$, the observer knows that something happened (although not
2298: what).  There may be other agents in the system, but their local states
2299: are irrelevant.  We formalize this as follows.
2300: 
2301: Fix a process $P$ over some set $\Sigma$ of events, and let $A \subseteq \Sigma$.  
2302: Following Schneider and Sidiropoulos, for the purposes of this
2303: discussion, assume that $A$ consists of events of the form $i.a$, where
2304: $i \in I_A$ and $a$ is some specific action.
2305: We say that a system $\R$ is {\em compatible with $P$\/} if there
2306: exists some agent $o$ such that the following two conditions hold:
2307: \begin{itemize}
2308: \item for every run $r \in \R$ and every time $m$, there exists a trace
2309: $\tau \in P$
2310: such that $\tau = r_e(m)$ and $f_A(\tau) = r_o(m)$;
2311: \item for every trace $\tau \in P$, there exists a run $r \in \R$ such
2312: that $r_e( | \tau | ) = \tau$ and $r_o( | \tau | ) = f_A(\tau)$
2313: (where $|\tau|$ is the number of events in $\tau$).
2314: \end{itemize}
2315: Intuitively, $\R$ represents $P$ if (1) for every trace $\tau$ in $P$,
2316: there is a point $(r,m)$ in $\R$ such that, at this point, exactly the
2317: events in $\tau$ have occurred (and are recorded in the environment's
2318: state) and $o$ has observed $f_A(\tau)$, and (2) for every point $(r,m)$
2319: in $\R$, there is a trace $\tau$ in $P$ such that precisely the events
2320: in $r_e(m)$ have happened in $\tau$, and $o$ has observed $f_A(\tau)$ at
2321: $(r,m)$.  
2322: We say that the interpreted system $\I = (\R,\pi)$ is {\em compatible
2323: with $P$\/} if $\R$ is compatible with $P$ and 
2324: if $(\I,r,m) \models \theta(i,a)$ whenever the event 
2325: $i.a$ is in the 
2326: %
2327: %
2328: event sequence $r_e(m')$ for some $m'$.
2329: 
2330: 
2331: We are now able to make a formal connection between our definition of
2332: anonymity and that of Schneider and Sidiropoulos.
2333: As in the setup of Proposition~\ref{pro:minimally}, we assume that an
2334: anonymous action $a$ can be performed only once in a given run.
2335: 
2336: \thm\label{thm:stronganonthm}
2337: If $\I = (\R,\pi)$ is compatible with $P$, then 
2338: $P$ is strongly anonymous on the alphabet $A$ if and only if
2339: for every agent
2340: $i \in I_A$, the action $a$ performed by $i$ is anonymous up to $I_A$ with
2341: respect to $o$ in $\I$.
2342: \ethm
2343: 
2344: %
2345: \prf
2346: Suppose that $P$ is strongly anonymous on the alphabet $A$ and that 
2347: $i \in I_A$.
2348: %
2349: %
2350: %
2351: %
2352: %
2353: Given a point $(r,m)$, suppose that $(\I,r,m) \models \theta(i,a)$, 
2354: so that the event $i.a$ appears in $r_e(n)$ for some $n \geq m$.
2355: %
2356: %
2357: We must show that
2358: $(\I,r,m) \models P_o[\theta(i',a)]$ for every $i' \in I_A$,
2359: %
2360: that is, that $a$ is anonymous up to $I_A$ with respect to $o$.
2361: %
2362: %
2363: %
2364: %
2365: %
2366: For any $i' \in I_A$, this requires showing that there 
2367: exists a point $(r',m')$ such that
2368: $r_o(m) = r'_o(m')$, 
2369: and $r'_o(n')$ includes $i'.a$, for some $n' \geq m'$. Because $\R$ 
2370: is compatible with $P$, there exists $\tau \in P$ such that $\tau =
2371: r_e(n)$ and $i.a$ appears in $\tau$. Let $\tau'$ be the trace identical to $\tau$
2372: except that $i.a$ is replaced by $i'.a$. Because $P$ is strongly anonymous
2373: on $A$, $P = f_A^{-1}(f_A(P))$, and $\tau' \in P$. By compatibility, there
2374: exists a run $r'$ such that $r_e'(n) = \tau'$ and $r_o'(n) = f_A(\tau')$.
2375: By construction, $f_A(\tau) = f_A(\tau')$, so $r_o(n) = r'_o(n)$. Because the
2376: length-$m$ trace prefixes of $f_A(\tau)$ and $f_A(\tau')$ are the same,
2377: it follows that $r_o(m) = r'_o(m)$.
2378: %
2379: %
2380: Because $(\I,r',m) \models \theta(i',a)$, $(\I,r,m) \models P_o[\theta(i',a)]$
2381: as required.
2382: 
2383: Conversely, suppose that for every agent $i \in I_A$, the action $a$
2384: performed by $i$
2385: %
2386: %
2387: is anonymous up to $I_A$ with respect to $o$ in $\I$. We must show that
2388: $P$ is strongly anonymous.  
2389: It is clear that $P \subseteq
2390: f_A^{-1}(f_A(P))$, so we must show only that $P \supseteq 
2391: f_A^{-1}(f_A(P))$.  So suppose that $\tau \in f_A^{-1}(f_A(P))$.  
2392: If no event $i.a$ appears in $\tau$, for any $i \in I_A$, then $\tau \in P$
2393: trivially. Otherwise, some $i.a.$ does appear. Because $\tau \in f_A^{-1}(f_A(P))$,
2394: there exists a trace $\tau' \in P$ that is identical to 
2395: %
2396: %
2397: $\tau$
2398: except that $i'.a$
2399: replaces $i.a$, for some other $i' \in I_A$. Because $\R$ is compatible with $P$,
2400: there exists a run $r' \in R$ such that $r'_o(m) = f_A(\tau')$ and $r'_e(m) = \tau'$
2401: %
2402: %
2403: %
2404: (where $m = |\tau'|$). Clearly $(\I,r',m) \models \theta(i',a)$ so, by
2405: anonymity, 
2406: $(\I,r',m) \models P_o[\theta(i,a)]$, and there exists a run $r$ such that 
2407: $r_o(m) = r'_o(m)$ and $(\I,r,m) \models \theta(i,a)$. Because the action $a$ can
2408: be performed at most once, the trace $r_e(m)$ must be equal to $\tau$. By 
2409: compatibility, $\tau \in P$ as required.
2410: \eprf
2411: 
2412: 
2413: 
2414: 
2415: Up to now, we have assumed that the observer $o$ has access to all the
2416: information in the system except which event in $A$ was performed.  
2417: Schneider and Sidiropoulos extend their definition of strong anonymity to
2418: deal with agents that have somewhat less information.  They capture ``less
2419: information'' using {\em abstraction operators}.
2420: Given a process $P$, there are several
2421: abstraction
2422: operators that can give us a new process. For example the {\em hiding operator},
2423: represented by $\backslash$, hides all events in some set $C$. That is,
2424: the process $P \backslash C$ is the same as $P$ except that all events in
2425: $C$ become internal events of the new process, and are not included in 
2426: the traces associated with $P \backslash C$. 
2427: Another abstraction operator, the renaming
2428: operator,
2429: has already appeared in the definition
2430: of strong anonymity: 
2431: for any set $C$ of events, we can consider the function $f_C$ that maps
2432: events in $C$ to a fixed new event.  The difference between hiding and
2433: renaming is that, if events in $C$ are hidden, the observer is not even
2434: aware they took place.  If events in $C$ are renamed, then the observer
2435: is aware that some event in $C$ took place, but does not know which one.
2436: 
2437: Abstraction operators such as these provide a useful way to model a
2438: process or agent who has a distorted
2439: or limited view of the system. In the context of
2440: anonymity, they allow anonymity to hold with respect to an observer with a
2441: limited view of the system in cases where it would not hold with respect to an
2442: observer who can see everything. In the anonymous donations example,
2443: hiding
2444: the events \$5 and \$10, i.e., the amount of money donated, would make the
2445: new process $P \backslash \{\$5, \$10\}$ strongly anonymous on the set of
2446: donation events. Formally, given an abstraction operator $ABS_C$ on a set
2447: of events $C$, we have
2448: to check the requirement of strong anonymity on the process $ABS_C(P)$
2449: rather than on the process $P$.
2450: 
2451: Abstraction is easily captured in our framework. It amounts simply to
2452: changing the local state of the observer. For example, anonymity of 
2453: the process $P \backslash C$ in our framework corresponds to anonymity of the
2454: action $a$ for every agent in $I_A$ with respect to an observer whose
2455: local state at the point $(r,m)$ is $f_A(r_e(m)) \backslash C$.  We omit the obvious
2456: analogue of Theorem~\ref{thm:stronganonthm} here.
2457: 
2458: A major advantage of the runs and systems framework is that definitions of
2459: high-level properties such as anonymity do not depend on the local states
2460: of the agents in question. If we want to model the fact that an observer has
2461: a limited view of the system, 
2462: we need only modify her local state to reflect this fact. 
2463: While some limited views are naturally captured by CSP abstraction
2464: operators, others may not be.  The definition of anonymity should not
2465: depend on the existence of
2466: an appropriate abstraction operator 
2467: able to capture the limitations of a particular observer.
2468: 
2469: As we have demonstrated, our approach to anonymity is compatible with the
2470: approach
2471: taken in~\cite{schneider96}. Our definitions are stated in terms of
2472: actions, agents, and knowledge, and are thus
2473: very intuitive and flexible. The generality of runs and systems allows us
2474: to have simple definitions that apply to a wide variety of systems and
2475: agents. The low-level CSP
2476: definitions, on the other hand, are more operational than ours, and this
2477: allows easier model-checking and verification. Furthermore, there are
2478: many advantages to using process algebras in general: systems can often
2479: be represented much more succinctly, and so on. This suggests that
2480: both approaches have their advantages.
2481: Because CSP
2482: systems can be represented in the runs and systems framework, however,
2483: it makes perfect sense to define anonymity for CSP processes using
2484: the knowledge-based definitions we have presented here. If our
2485: definitions turn out to be equivalent to more low-level CSP definitions,
2486: this is ideal, because CSP model-checking programs can then be used for
2487: verification. A system designer simply needs to take care that the
2488: runs-based system derived from a CSP process (or set of processes)
2489: represents the local states of the different agents appropriately.
2490: 
2491: \subsection{Anonymity and Function View Semantics}\label{sec:functionviews}
2492: Hughes and Shmatikov \citeyear{hughes02} introduce {\em function views}
2493: %
2494: %
2495: and function-view {\em opaqueness} as a way of
2496: expressing a variety of information-hiding properties in a succinct and uniform way.
2497: Their main insight is that requirements such as anonymity involve restrictions
2498: on relationships between entities such as agents and actions. Because these relationships
2499: can be expressed by functions from one set of entities to another, hiding information
2500: from an observer amounts to limiting an observer's view of the function
2501: in question. For example, anonymity properties are concerned with whether or not
2502: an observer is able to connect actions with the agents who performed them. By
2503: considering the function from the set of actions to the set of agents who performed
2504: those actions, and specifying the degree to which that function must be opaque to 
2505: observers, we can express anonymity using the 
2506: %
2507: %
2508: %
2509: %
2510: function-view approach.
2511: 
2512: To model the uncertainty associated with a given function, Hughes and
2513: Shmatikov define a notion of
2514: {\em function knowledge} to explicitly represent an observer's 
2515: partial knowledge of a function.
2516: Function knowledge focuses on three particular aspects of a function:
2517: its graph, image, and kernel. 
2518: (Recall that the {\em kernel\/} of a function $f$ with domain $X$ is the
2519: equivalence relation {\em ker\/} on $X$ defined by $(x,x') \in
2520: \mbox{{\it ker}}$ iff $f(x) = f(x')$.)
2521: {\em Function knowledge\/} of type $X \rightarrow Y$ is a triple
2522: $N = (F,I,K)$, where $F \subseteq X \times Y$, $I \subseteq Y$, and $K$
2523: is an equivalence relation on $X$.  A triple $(F,I,K)$ is {\em
2524: consistent with $f$\/} if $f \subseteq F$, $I \subseteq \im f$, and
2525: $K \subseteq \kernel f$. Intuitively, a triple $(F,I,K)$ 
2526: that is consistent with $f$ represents
2527: what an agent might know about
2528: the function $f$. Complete knowledge of a function $f$, for example, would be 
2529: represented by the triple $( f, \im f, \kernel f )$.
2530: 
2531: For anonymity, and for information hiding in general,
2532: we are interested not in what an agent
2533: knows, but in what an agent does {\em not\/} know.
2534: This is formalized 
2535: %
2536: %
2537: by Hughes and Shmatikov
2538: in terms of
2539: opaqueness conditions for function knowledge. 
2540: If $N = \langle F, I, K \rangle$ 
2541: is consistent with $f: X \rightarrow Y$, then, for example,
2542: $N$ is {\em $k$-value opaque\/} if $|F(x)| \geq k$  for all $x \in X$.
2543: That is, $N$ is $k$-value opaque if there are $k$ possible candidates
2544: for the value of $f(x)$, for all $x \in X$. Similarly, $N$ is
2545: {\em $Z$-value opaque\/} if $Z \subseteq F(x)$
2546: for all $x \in X$. In other words, for each $x$ in the domain of $f$,
2547: no element of $Z$ can be ruled out as a candidate for $f(x)$. 
2548: Finally, $N$ is {\em absolutely value opaque\/} if that $N$ is $Y$-value opaque.
2549: 
2550: Opaqueness conditions are closely related to the 
2551: nonprobabilistic definitions of anonymity given in Section~\ref{sec:anonymity}.
2552: Consider functions from $X$ to $Y$, where $X$ is a set of actions
2553: and $Y$ is a set of agents, and suppose that some function $f$ is the function
2554: that, given some action, names the agent who performed the action. If we have 
2555: $k$-value opaqueness for some view of $f$ (corresponding to some
2556: observer $o$), this means, essentially, that
2557: each action $a$ in $X$ is $k$-anonymous with respect to $o$.
2558: Similarly, the view is $I_A$-value
2559: opaque if the action is 
2560: anonymous up to $I_A$ for each agent $i \in I_A$. Thus, function
2561: view opaqueness provides a concise way of describing anonymity properties,
2562: and information-hiding properties in general.
2563: 
2564: To make these connections precise, we need to explain how function views
2565: can be embedded within the runs and systems framework. 
2566: Hughes and Shmatikov already show how we can define function
2567: views using
2568: Kripke structures, the standard approach for giving semantics
2569: to knowledge.  A minor modification of their approach works in systems
2570: too.  Assume we are interested in who performs an action $a \in X$, where
2571: $X$, intuitively, is a set of ``anonymous actions''.  Let $Y$ be the set of agents, 
2572: %
2573: including a ``nobody agent'' denoted $N$, and let $f$ be a 
2574: %
2575: %
2576: function from $X$ to $Y$.  Intuitively,
2577: $f(a) = i$ if agent $i$ 
2578: %
2579: %
2580: performs action $a$,
2581: %
2582: %
2583: and $f(a) = N$ if no agent performs action $a$.
2584: The value of the
2585: function $f$ will depend on the point.  Let $f_{r,m}$ be the value of
2586: $f$ at the point $(r,m)$.  Thus, $f_{r,m}(a) = i$ if
2587: %
2588: %
2589: $i$ performs $a$ in run $r$.
2590: \footnote{Note that for $f_{(r,m)}$ to be well-defined, it must
2591: be the case that only one agent can ever perform a single action.
2592: %
2593: %
2594: %
2595: %
2596: }
2597: We can now easily talk about function opaqueness with respect to an
2598: observer $o$. For example, $f$ is $Z$-value opaque
2599: at the point $(r,m)$ with respect to $o$ if, for 
2600: all $z \in Z$, there exists a point $(r',m')$ such that $r'_o(m') = r_o(m)$
2601: and $f_{(r',m')}(x) = z$. In terms of knowledge, $Z$-value opaqueness
2602: says that for any value $x$ in the range of $f$, 
2603: $o$ thinks it possible that
2604: any value $z \in Z$ could be the result of $f(x)$. Indeed, Hughes and
2605: %
2606: %
2607: Shmatikov say that function-view opaqueness, defined in terms of Kripke
2608: structure semantics, is closely related to epistemic logic. The following
2609: proposition makes this precise; it would be easy to state similar propositions
2610: %
2611: %
2612: for other kinds of function-view opaqueness. 
2613: 
2614: \pro\label{pro:zvaluethm}
2615: Let $\I = (\R, \pi)$ be an interpreted system that satisfies
2616: $(\I,r,m) \models f(x) = y$ whenever $f_{(r,m)}(x) = y$.
2617: In system $\I$, $f$ is $Z$-value opaque for observer $o$ at the point $(r,m)$ if
2618: and only if
2619: \[ (\I,r,m) \models \bigwedge_{x \in X} \bigwedge_{z \in Z} P_o[f(x) = z]. \]
2620: \epro
2621: 
2622: \prf
2623: This result follows immediately from the definitions.
2624: \eprf
2625: 
2626: %
2627: %
2628: Stated in terms of knowledge, function-view opaqueness already looks 
2629: a lot like our definitions of anonymity. 
2630: Given $f$ (or, more precisely, the set $\{f_{(r,m)}\}$ of functions) mapping
2631: actions to agents, 
2632: %
2633: %
2634: we can state a theorem connecting anonymity to function-view opaqueness.
2635: There are two minor issues to deal with, though. First, our definitions
2636: of anonymity are stated with respect to a single action $a$, while the 
2637: function $f$ deals with a {\em set} of actions. We can deal with this
2638: by taking the domain of  $f$ to be
2639: the singleton $\{a\}$. Second, our definition
2640: of anonymity up to a set $I_A$ requires the observer to suspect
2641: agents in $I_A$ only
2642: if $i$ actually performs the action $a$.
2643: (Recall this is also true for Syverson and Stubblebine's definitions.)
2644: $I_A$-value opaqueness requires the observer to think many agents
2645: could have performed an action even if nobody has. 
2646: To deal with this, we require opaqueness only
2647: when the action has been performed
2648: %
2649: by one of the agents in $I_A$.
2650: %
2651: \commentout{
2652: Last, opaqueness provides anonymity for {\em all} agents who might
2653: want to perform $a$. To restrict our attention to agents in $I_A$,
2654: we require that $f_{(r,m)}(a) \in I_A$ for all points $(r,m) \in \P(\R)$.
2655: }
2656: \commentout{
2657: If $f$ maps from an action $a$ to the agent who performed
2658: $a$, $f_{(r,m)}(a)$ may be undefined if nobody has actually performed
2659: $a$ by time $m$ in the run $r$. For functions that might possibly
2660: be undefined, we can say that $f$ is $Z$-value opaque if 
2661: for all points $(r,m) \in \P(\R)$, for all $x \in X$ such that 
2662: $f_{(r,m)}(x)$ is well defined, and for 
2663: all $z \in Z$, there exists a point $(r',m')$ such that $r'_o(m') = r_o(m)$
2664: and $f_{(r',m')}(x) = z$.
2665: }
2666: 
2667: \thm\label{thm:functionviews}
2668: Suppose that $(\I,r,m) \models \theta(i,a)$ exactly if $f_{(r,m)}(a) = i$.
2669: Then action $a$ is anonymous up to $I_A$ with respect to $o$
2670: for each agent $i \in I_A$ if and only if at all points $(r,m)$ such
2671: that $f_{(r,m)}(a) \in I_A$,
2672: $f$ is $I_A$-value opaque with respect to $o$.
2673: \ethm
2674: 
2675: %
2676: %
2677: 
2678: \prf
2679: Suppose that $f$ is $I_A$-value opaque, and let $i \in I_A$ be given.
2680: If $(\I,r,m) \models \theta(i,a)$, then $f_{(r,m)}(a) = i$. 
2681: %
2682: %
2683: We must show that, for all $i' \in I_A$, 
2684: %
2685: %
2686: %
2687: %
2688: %
2689: $(\I,r,m) \models
2690: P_o[\theta(i',a)]$. 
2691: Because $f$ is $I_A$-value opaque at $(r,m)$,
2692: there exists a point 
2693: $(r',m')$ such that $r'_o(m') = r_o(m)$ and $f_{(r',m')}(a) = i'$.
2694: Because $(\I,r',m') \models \theta(i',a)$, $(\I,r,m) \models P_o[\theta(i',a)]$.
2695: 
2696: Conversely, suppose that for each agent $i \in I_A$, 
2697: $a$ is anonymous up to $I_A$ with respect to $o$.
2698: Let $(r,m)$ be given such that $f_{(r,m)}(a) \in I_A$,
2699: and let that $i = f_{(r,m)}(a)$. 
2700: It follows that $(\I,r,m) \models \theta(i,a)$.
2701: For any $i' \in I_A$, $(\I,r,m) \models P_o[\theta(i',a)]$, by
2702: anonymity. Thus there exists a point $(r',m')$ such that 
2703: $r'_o(m') = r_o(m)$ and $(\I,r',m') \models \theta(i',a)$.
2704: It follows that $f_{(r',m')}(a) = i'$, and that $f$ is
2705: $I_A$-value opaque.
2706: \eprf
2707: 
2708: 
2709: As with Proposition~\ref{pro:zvaluethm}, it would be easy to state
2710: analogous theorems connecting our other definitions of anonymity,
2711: including minimal anonymity, total anonymity, and $k$-anonymity,
2712: %
2713: %
2714: to other forms of function-view opaqueness. We omit the details here.
2715: 
2716: %
2717: The assumptions needed to prove Theorem~\ref{thm:functionviews} illustrate
2718: two ways in which our approach may seem to be less general than the
2719: %
2720: %
2721: function-view  
2722: approach. First, all our definitions are given with respect to a single
2723: action, rather than with respect to a set of actions. However, it is 
2724: %
2725: %
2726: perfectly reasonable to specify that all actions in some set $\A$ of actions 
2727: be anonymous. Then we could modify Theorem~\ref{thm:functionviews}
2728: %
2729: %
2730: so that the function $f$ is defined on all actions in $\A$. 
2731: (We omit the details.) Second, our definitions of anonymity only
2732: restrict the observer's  
2733: knowledge if somebody actually performs the action. This is simply a different
2734: way of defining anonymity. As mentioned previously,
2735: we are not trying to give a definitive definition of anonymity, and it certainly
2736: seems reasonable that someone might want to define or specify anonymity using the
2737: stronger condition. At any rate, it would be straightforward to modify our 
2738: definitions so that the implications, involving $\theta(i,a)$, are not included.
2739: %
2740: %
2741: 
2742: Hughes and Shmatikov argue that epistemic logic is a useful language 
2743: for expressing anonymity specifications, while CSP is a useful language
2744: %
2745: %
2746: for describing and specifying systems.  We agree with both of
2747: these claims.  They propose function views as a useful interface to
2748: mediate between the two.  We have tried to argue here that no mediation
2749: is necessary, since the multiagent systems framework can also be used for
2750: describing systems. (Indeed, the traces of CSP can essentially be viewed as runs.)  
2751: Nevertheless, we do believe that function views can be the
2752: basis of a useful language for reasoning about some aspects of
2753: information hiding. We can well imagine
2754: adding abbreviations to the language that let us talk directly about
2755: function views.  (We remark that we view these abbreviations as
2756: syntactic sugar, since these are notions that can already be expressed
2757: directly in terms of the knowledge operators we have introduced.)  
2758: 
2759: On the other hand, we believe that function views are not expressive
2760: enough to capture all aspects of information hiding.  One obvious
2761: problem is adding probability.  While it is easy to
2762: add probability to systems, as we have shown, and to 
2763: capture interesting probabilistic notions
2764: of anonymity, it is far from clear how to do this if we take function
2765: views triples as primitive.   
2766: 
2767: To sum up, we would argue that to reason about
2768: knowledge and probability, we need to have possible worlds as the
2769: underlying semantic framework.  Using the multiagent systems approach
2770: gives us possible worlds in a way that makes it particularly easy to
2771: relate them to systems.  Within this semantic framework, function views
2772: may provide a useful syntactic construct with which to reason about
2773: information hiding.
2774: 
2775: 
2776: 
2777: \section{Discussion}\label{sec:conclusion}
2778: 
2779: We have described a framework for reasoning about information hiding
2780: in multiagent systems, and have given general definitions of anonymity
2781: for agents acting in such systems. We have also compared and contrasted 
2782: our definitions to other similar definitions of anonymity. Our knowledge-based
2783: system framework provides a number of advantages:
2784: \ul
2785: \item We are able to state information-hiding properties succinctly
2786: and intuitively, and in terms of the knowledge of the observers or
2787: attackers who interact with the system.
2788: \item Our system has a well-defined semantics that lets us reason about
2789: knowledge in
2790: systems of interest, such as systems specified using process algebras 
2791: or strand spaces.
2792: \item We are able to give straightforward probabilistic definitions of 
2793: anonymity, and of other related information-hiding properties.
2794: \eul
2795: 
2796: %
2797: %
2798: %
2799: There are a number of issues that this paper has not addressed.
2800: We have focused almost exclusively on properties of anonymity, 
2801: %
2802: %
2803: %
2804: %
2805: and have not considered related notions, such as \emph{pseudonymity} and
2806: \emph{unlinkability}~\cite{hughes02,terminology}. 
2807: %
2808: %
2809: There seems to be no intrinsic difficulty capturing these notions in our
2810: framework.  For example, one form of message  unlinkability
2811: specifies that no two messages sent by an
2812: anonymous sender can be ``linked'', in the sense that an observer can
2813: determine that both 
2814: messages were sent by the same sender. More formally, two actions $a$
2815: and $a'$  
2816: %
2817: %
2818: are linked with respect to an observer $o$ if $o$ knows
2819: that there 
2820: exists an agent $i$ who performed both $a$ and $a'$. This definition 
2821: %
2822: %
2823: %
2824: %
2825: %
2826: %
2827: can be directly captured using knowledge.  Its negation says 
2828: that $o$ considers it possible 
2829: that there exist two distinct agents who performed $a$ and $a'$; 
2830: %
2831: this can be viewed as a definition of \emph{minimal unlinkability}.
2832: %
2833: %
2834: %
2835: This minimal requirement can be strengthened, exactly as our definitions of
2836: anonymity were, 
2837: to include larger numbers of distinct agents, probability, and so on. 
2838: %
2839: %
2840: %
2841: Although we have not worked out the details, we believe  that our
2842: approach will be similarly applicable to
2843: other definitions of information hiding.
2844: 
2845: %
2846: %
2847: Another obviously important issue is
2848: %
2849: %
2850: checking whether 
2851: a given system specifies
2852: the knowledge-based properties we have introduced. 
2853: %
2854: %
2855: %
2856: The standard technique for doing this is model checking.
2857: Recent work on the problem of model checking in
2858: the multiagent systems framework suggests that this may be viable.
2859: Van der Meyden~\citeyear{vandermeyden98} discusses 
2860: algorithms and complexity results for model checking a wide range of
2861: epistemic formulas in the runs and systems framework, and van der Meyden and 
2862: Su~\citeyear{vandermeyden02} use these results to verify the 
2863: dining cryptographers protocol~\cite{Chaum88}, using formulas much
2864: like those described in 
2865: %
2866: %
2867: Section~\ref{sec:diningexample}.
2868: Even though model checking of formulas involving knowledge seems to
2869: be intractable for large problems, these results are a promising first
2870: step towards being able to use knowledge for both the specification and
2871: verification of anonymity properties. 
2872: %
2873: %
2874: %
2875: Shmatikov~\citeyear{shmat02}, for example, analyzes the Crowds system
2876: using the 
2877: %
2878: %
2879: %
2880: probabilistic model checker PRISM~\cite{KNP01}.  This is a
2881: particularly good example of how definitions of 
2882: anonymity can be made precise using logic and probability, and how
2883: model-checking 
2884: can generate new insights into the functioning of a deployed protocol.
2885: 
2886: %
2887: Finally, it is important to note that the examples considered in this
2888: paper do 
2889: not reflect the state of the art for computational anonymity. Anonymity
2890: protocols 
2891: based on DC-nets, while theoretically interesting, have not been widely
2892: %
2893: %
2894: %
2895: %
2896: %
2897: %
2898: deployed; in practice, protocols based on mixes and message-rerouting 
2899: are much more common.
2900: We used the dining cryptographer's problem as a running
2901: %
2902: %
2903: example here mainly 
2904: because of its simplicity, but it remains to be seen whether our general
2905: approach will be as 
2906: illuminating for more complicated protocols. There are reasons to
2907: %
2908: %
2909: %
2910: believe that it will be.  Shmatikov's analysis of Crowds shows
2911: that a logic-based approach can be useful  
2912: for analyzing protocols based on message-rerouting. Furthermore, we
2913: believe that formalizing 
2914: anonymity protocols using techniques like ours is worthwhile even if
2915: formal 
2916: verification is impractical or impossible. 
2917: %
2918: %
2919: It forces system designers to 
2920: think carefully about information-hiding requirements, which can often
2921: be tricky,  
2922: %
2923: %
2924: and provides a system-independent framework for
2925: comparing the anonymity  
2926: guarantees provided by different systems. 
2927: %
2928: %
2929: 
2930: We described one way to generate a set of runs from a CSP process $P$, 
2931: basically by recording all the events in the state of the environment
2932: and describing some observer $o$ who is able to observe a subset of
2933: the events. This translation was useful for comparing our abstract
2934: definitions of anonymity to more operational CSP-based definitions.
2935: In future work we hope to further explore the connections between the
2936: runs and systems framework and tools such as CCS, CSP, and the spi 
2937: calculus~\cite{abadi97}. 
2938: %
2939: In particular, we are interested in the 
2940: recent work of Fournet and Abadi~\citeyear{fournet02}, who use the 
2941: %
2942: %
2943: %
2944: %
2945: %
2946: %
2947: applied pi calculus to model {\em private authentication}, according to which
2948: a principal in a network is able to authenticate herself to
2949: %
2950: %
2951: another principal while remaining anonymous to other ``nearby''
2952: principals.
2953: A great deal of work in computer security has formalized
2954: %
2955: %
2956: %
2957: information-hiding properties using these tools. Such work often reasons
2958: about the knowledge of various agents in an informal way, and then
2959: tries to capture knowledge-based security properties
2960: using one of these formalisms.
2961: By describing canonical translations from these formalisms to the runs
2962: and systems framework, we hope to be able to demonstrate formally
2963: how such definitions of security do (or do not) capture notions of knowledge.
2964: 
2965: \paragraph{Acknowledgments:}  We thank Dan Grossman, Dominic Hughes, 
2966: %
2967: %
2968: %
2969: Vitaly Shmatikov, Paul Syverson, Vicky Weissman, 
2970: %
2971: the CSFW reviewers
2972: (who were anonymous 
2973: up to $P_{CSFW}$, the program committee), 
2974: and the JCS referees,
2975: for useful discussions,
2976: %
2977: helpful criticism, and 
2978: %
2979: %
2980: careful copy-editing.
2981: 
2982: %
2983: %
2984: \bibliographystyle{chicago}
2985: %
2986: %
2987: %
2988: %
2989: %
2990: %
2991: %
2992: %
2993: %
2994: \bibliography{joe,z}
2995: 
2996: 
2997: \end{document}
2998: 
2999: 
3000: 
3001: 
3002: 
3003: