0402:cs0402042/jcs.tex

1: \documentclass[11pt]{article}         %

2: \usepackage{times}                               %

3: %

4: %

5: %

6: \usepackage{chicagor}

7:

8: \input{defn.tex}

9:

10: %

11: %

12:

13: \newcommand{\im}{im}

14: \newcommand{\kernel}{ker}

15: %

16: %

17: \renewcommand{\S}{{\cal S}}

18:

19: \newcommand{\send}{\mbox{\it send}}

20: \newcommand{\recv}{\mbox{\it receive}}

21:

22: \begin{document}

23:

24: \title{Anonymity and Information Hiding \\ in Multiagent Systems%

25: \thanks{Authors were supported in part by NSF under grant

26: CTC-0208535, by ONR under grants  N00014-00-1-03-41 and

27: N00014-01-10-511, by the DoD Multidisciplinary University Research

28: Initiative (MURI) program administered by the ONR under

29: grant N00014-01-1-0795, and by AFOSR under grant F49620-02-1-0101.

30: %

31: Kevin O'Neill was also supported in part by a graduate fellowship from the

32: National Science and Engineering Research Council of Canada.

33: %

34: %

35: %

36: %

37: %

38: A preliminary version of this paper appeared at the 16th IEEE Computer Security

39: Foundations Workshop in Pacific Grove, California.

40: }

41: }

42:

43: \author{Joseph Y. Halpern \ \ \ \ \ \ \ \ \ \ \ Kevin R. O'Neill \\

44: Department of Computer Science\\

45: Cornell University\\

46: $\{$halpern, oneill$\}$@cs.cornell.edu\\

47: }

48:

49:

50: \maketitle

51: \thispagestyle{empty}

52:

53: \begin{abstract}

54: We provide a framework for reasoning about information-hiding requirements

55: in multiagent systems and for reasoning about anonymity in particular.

56: Our framework employs the modal logic of knowledge within the context of the

57: runs and systems framework, much in the spirit of our earlier work on

58: secrecy

59: %

60: %

61: %

62: \cite{HalOn02}.

63: %

64: We give several definitions of anonymity with respect

65: to agents, actions, and observers in multiagent systems, and we relate our definitions

66: of anonymity to other definitions of information hiding, such as secrecy. We also

67: give probabilistic definitions of anonymity that are able to quantify an observer's

68: uncertainty about the state of the system. Finally, we relate our definitions of

69: anonymity to other formalizations of anonymity and information hiding, including

70: definitions of anonymity in the process algebra CSP and definitions of information

71: hiding using function views.

72: \end{abstract}

73:

74:

75: \section{Introduction}

76:

77: %

78: %

79: %

80: %

81: %

82: %

83: %

84: %

85: %

86: %

87: %

88: %

89: %

90:

91: The primary goal of this paper is to provide

92: a formal framework for reasoning

93: about anonymity in multiagent systems.

94: The importance of anonymity has increased over the past few years

95: as more communication passes over the Internet.

96: Web-browsing, message-sending, and file-sharing are all important

97: examples of activities that computer users would like to

98: engage in, but may be reluctant to do unless they can

99: receive guarantees that their anonymity will be protected to some

100: reasonable degree.

101: %

102: %

103: %

104: %

105: Systems are being built that attempt to implement

106: anonymity

107: %

108: for various kinds of network communication

109: %

110: %

111: %

112: %

113: %

114: %

115: (see, for example,

116: \cite{goel02,hopper03,levine02,reiter98,sherwood02,syverson97}).

117: It would be helpful to have a formal framework in which to reason

118: about the level of anonymity that such systems provide.

119:

120: We view anonymity as an instance of a more general problem:

121: information hiding.

122: In the theory of computer security, many of the fundamental problems and much of the

123: research has been concerned with the hiding of information. Cryptography, for instance,

124: is used

125: to hide the contents of a message from untrusted observers as it passes from

126: one party to another. Anonymity requirements are intended to ensure that the identity

127: of the agent who performs some action remains hidden from other

128: observers. Noninterference requirements

129: essentially say that {\em everything} about classified or high-level

130: users of a system should be hidden from low-level users.

131: Privacy is a catch-all term that means different things to different

132: people, but

133: it typically involves hiding personal or private information from others.

134:

135: Information-hiding properties such as these can be

136: thought of as providing answers to the following set of questions:

137: \ul

138: \item What information needs to be hidden?

139: \item Who does it need to be hidden from?

140: \item How well does it need to be hidden?

141: \eul

142: By analyzing security properties with these questions in

143: mind, it often

144: becomes clear how different properties relate to each other. These

145: questions can

146: also serve as a test of a definition's usefulness: an information-hiding

147: property

148: should be able to provide clear answers to these three questions.

149:

150: %

151: %

152: %

153: %

154: In an earlier paper \cite{HalOn02}, we formalized

155: %

156: %

157: %

158: %

159: %

160: secrecy

161: %

162: %

163: in terms of knowledge.

164: %

165: %

166: %

167: %

168: %

169: Our focus was on capturing what it

170: means for one agent to have {\em total\/} secrecy with respect to

171: another,

172: %

173: in the sense that no information flows from the first agent to the

174: second.

175: Roughly speaking,

176: %

177: %

178: %

179: a high-level user has total secrecy

180: if the low-level user never knows anything about the

181: high-level user that he didn't initially know.  Knowledge

182: provides a

183: natural way to express information-hiding properties---information is

184: hidden from $a$ if $a$ does not know about it.  Not surprisingly, our

185: formalization of anonymity is similar in spirit to our

186: formalization of secrecy.

187: Our definition of secrecy says that a classified agent maintains

188: secrecy with respect to an unclassified agent if the unclassified

189: agent doesn't learn any new fact that depends only

190: on the state of the classified agent.

191: That is, if the agent didn't know a classified

192: fact $\phi$ to start with, then the

193: agent doesn't know it at any point in the system.

194: Our definitions of anonymity

195: say that an agent performing an action maintains anonymity with

196: respect to an observer if the observer never learns certain facts

197: having to do with whether or not the agent performed the action.

198:

199: %

200: %

201: %

202: %

203: %

204: Obviously, total secrecy and anonymity are different.

205: %

206: %

207: It is possible

208: for $i$ to

209: %

210: %

211: have

212: complete secrecy while still not having

213: %

214: %

215: very strong guarantees of

216: anonymity, for example,

217: and it is possible to have anonymity without preserving secrecy.

218: %

219: %

220: However, thinking carefully about

221: the relationship between

222: secrecy and anonymity suggests new and interesting

223: ways of thinking about anonymity.

224: More generally, formalizing anonymity and information hiding in terms

225: of knowledge is useful for capturing the intuitions that

226: practitioners have.

227:

228: We are not the first to use knowledge and belief to

229: formalize notions of information hiding.

230: Glasgow, MacEwen, and Panangaden \citeyear{GMP92} describe

231: a logic for reasoning about security that includes both

232: {\em epistemic\/}

233: operators (for reasoning about knowledge) and {\em deontic\/} operators

234: (for reasoning about permission and obligation).

235: They characterize some security policies in terms of

236: the facts that an agent is

237: permitted to know.  Intuitively, everything that an agent is not

238: permitted to know must remain hidden.

239: Our approach is similar,

240: except that we specify the formulas that an

241: agent is {\em not} allowed to know, rather than the formulas she is permitted

242: to know.

243: One advantage of accentuating the negative is that we do not need to use

244: deontic operators in our logic.

245:

246: Epistemic logics have also been used to define information-hiding

247: properties, including noninterference and anonymity.

248: Gray and Syverson \citeyear{gray98}

249: use an epistemic logic to define probabilistic noninterference, and

250: Syverson and Stubblebine \citeyear{syverson99} use one

251: to formalize definitions of anonymity.

252: The thrust of our paper is quite different from these.

253: Gray and Syverson focus on one particular definition of information

254: hiding in a probabilistic setting, while Syverson

255: and Stubblebine

256: %

257: %

258: focus on describing an axiom system that is useful for reasoning

259: about real-world systems, and

260: on how to reason about and compose parts of the system into

261: %

262: %

263: adversaries and honest agents.

264: Our focus, on the other hand, is on giving a

265: semantic characterization of anonymity in a framework that lends itself

266: well to modeling systems.

267:

268: Shmatikov and Hughes \citeyear{hughes02} position their approach to

269: anonymity (which is discussed in more detail in Section~\ref{sec:functionviews}) as an

270: attempt to provide an interface between logic-based approaches,

271: which they claim are good for specifying the desired properties (like

272: anonymity), and formalisms like CSP, which they claim are good for

273: specifying systems.  We agree with their claim that logic-based approaches

274: are good for specifying properties of systems, but also claim

275: that, with an appropriate semantics for the logic, there is no need to

276: provide such

277: an interface.  While there are many ways of specifying systems,

278: many end up identifying a system with a set of runs or traces, and

279: can thus be embedded in the runs and systems framework

280: that we use.

281:

282: Definitions of anonymity using epistemic logic are {\em possibilistic}.

283: Certainly, if $j$ believes that any of 1000 users (including $i$)

284: could have performed the action that $i$ in fact performed,

285: then $i$ has some degree of anonymity with respect to $j$.  However, if $j$

286: believes that the probability that $i$ performed the action is

287: %

288: %

289: .99, the possibilistic assurance of anonymity provides little comfort.

290: %

291: %

292: %

293: %

294: %

295: %

296: %

297: %

298: %

299: %

300: %

301: %

302: %

303: %

304: Most previous formalizations of anonymity have not

305: dealt with probability; they

306: typically conclude with an acknowledgment that it is

307: important to do so, and suggest that their formalism can

308: indeed handle probability.

309: %

310: %

311: %

312: %

313: %

314: %

315: %

316: %

317: %

318: %

319: %

320: %

321: One significant

322: advantage of our formalism is that it

323: is completely straightforward to add probability in a natural way,

324: using known techniques \cite{HT}.

325: %

326: %

327: %

328: %

329: %

330: %

331: %

332: %

333: %

334: %

335: %

336: As we show in Section \ref{sec:probanon},

337: %

338: %

339: this lets us formalize the (somewhat less formal)

340: definitions of probabilistic anonymity given

341: %

342: %

343: %

344: by Reiter and Rubin \citeyear{reiter98}.

345: %

346: %

347: %

348: %

349: %

350: %

351: %

352: %

353:

354: %

355: In this paper, we are more concerned with defining and specifying

356: anonymity

357: %

358: %

359: %

360: %

361: %

362: %

363: %

364: properties than with describing systems

365: for achieving anonymity or with verifying anonymity

366: properties. We want to define what anonymity means by using syntactic

367: statements that have a well-defined semantics. Our

368: %

369: work

370: is similar in spirit to

371: previous papers that have given definitions of anonymity and other

372: similar properties, such as the proposal for terminology given by

373: %

374: %

375: Pfitzmann and K\"ohntopp~\citeyear{terminology} and the

376: information-theoretic definitions of anonymity given by

377: Diaz, Seys, Claessens, and Preneel~\citeyear{Diaz02}.

378:

379: The rest of this paper is organized as follows.  In

380: Section~\ref{sec:review} we briefly review the runs and systems

381: formalism of \cite{FHMV} and describe

382: how it can be used to represent knowledge.

383: In Section~\ref{sec:anonymity}, we show how anonymity can be defined

384: using knowledge, and relate this definition to other notions of

385: information hiding, particularly

386: secrecy (as defined in our earlier work).

387: In Section~\ref{sec:probanon}, we extend the

388: possibilistic definition of Section~\ref{sec:anonymity} so that it can

389: capture probabilistic concerns.

390: As others have observed \cite{hughes02,reiter98,syverson99}, there

391: are a number of

392: ways to define anonymity. Some definitions provide very strong

393: guarantees of anonymity, while others are easier to verify in practice.

394: Rather than giving an

395: exhaustive list of definitions, we focus on a few representative

396: notions, and

397: show by example that our logic is expressive enough to capture many

398: other notions of interest.

399: In Section~\ref{sec:related}, we compare our

400: framework to that of three other attempts to formalize anonymity, by

401: Schneider and Sidiropoulos \citeyear{schneider96}, Hughes and Shmatikov

402: \citeyear{hughes02}, and Stubblebine and Syverson \citeyear{syverson99}.

403: We conclude in Section~\ref{sec:conclusion}.

404:

405:

406: \section{Multiagent Systems: A Review}\label{sec:review}

407:

408: In this section, we briefly review the multiagent systems framework; we

409: urge the reader

410: to consult

411: %

412: %

413: %

414: %

415: %

416: %

417: %

418: \cite{FHMV} for more details.

419:

420: A {\em multiagent system\/} consists of $n$ agents, each of which is in some

421: {\em local state\/} at a given point in time. We assume that an agent's

422: local state encapsulates all the information to which the agent has

423: access. In the security setting, the local state of an agent might

424: include initial information regarding keys, the messages she has

425: sent and received, and perhaps the reading of a clock. The

426: framework makes no assumptions about the precise nature of the local

427: state.

428:

429: We can view the whole system as being in some {\em global state},

430: a tuple consisting of the local state of each agent and the

431: state of the environment.

432: Thus, a global state has the form $(s_e, s_1,\ldots, s_n)$,

433: where $s_e$ is the state of the environment and $s_i$ is agent $i$'s

434: state,

435: for $i = 1, \ldots , n$.

436:

437: A {\em run\/} is a function from time to global states. Intuitively,

438: a run is a complete description of what happens over time in one

439: possible execution of the system. A {\em point\/} is a pair $(r, m)$

440: consisting of a run $r$ and a time $m$.

441: We make the standard assumption that time ranges over the natural numbers.

442: At a point $(r, m)$, the system is in some global state $r(m)$. If $r(m)

443: = (s_e, s_1,

444: \ldots , s_n)$, then we take $r_i(m)$ to be $s_i$, agent $i$'s local

445: state at the point $(r, m)$.

446: %

447: %

448: %

449: %

450: %

451: Note that an agent's local state at  point $(r,m)$ does {\em not}

452: necessarily encode all the agent's previous local states.

453: In some systems, agents have perfect recall,

454: in the sense that their local state $r_i(m)$ encodes their states

455: at times $0, \ldots, m-1$, but this need not be generally true.

456: %

457: (See \cite[Chapter 4]{FHMV} for a formal definition and discussion of

458: perfect recall.)

459: Formally, a {\em system\/} consists of a set of runs (or executions).

460: Let $\P(\R)$ denote the points in a system $\R$.

461:

462:

463: The runs and systems framework is compatible with

464: many other standard approaches for representing and reasoning about systems.

465: For example, the runs might be event traces generated by a CSP process

466: (see Section~\ref{sec:CSP}), they

467: might be message-passing sequences generated by a security protocol,

468: or they might

469: %

470: %

471: %

472: %

473: %

474: be generated from the strands in a strand space \cite{HalPuc01,FHG99}.

475: The approach is rich enough to accommodate a variety of system representations.

476:

477: Another important advantage of the framework is that it it is

478: straightforward to define formally what

479: an agent knows at a point in a system.

480: Given a system $\R$, let $\K_i(r,m)$ be the set of points in $\P(\R)$

481: that $i$ thinks are possible at $(r,m)$, i.e.,

482: \[ \K_i(r,m) = \{ (r',m') \in \P(\R) : r'_i(m') = r_i(m) \}. \]

483: Agent $i$ knows a fact

484: $\phi$ at a point $(r,m)$ if $\phi$ is true at all points in $\K_i(r,m)$.

485: To make this intuition precise, we need to be able to assign truth

486: values to basic formulas in a system. We assume that

487: we have a set $\Phi$ of primitive propositions, which we can think of as

488: describing basic facts about the system.

489: In the context of security protocols,

490: these might be such facts as ``the key is $n$'' or ``agent $A$ sent

491: the message $m$ to $B$''. An interpreted system $\I$ consists of a pair

492: $(\R,\pi)$, where $\R$ is a system and $\pi$ is an

493: {\em interpretation}, which assigns to each primitive

494: %

495: %

496: proposition

497: in $\Phi$ a truth value at each point.

498: Thus, for every $p\in\Phi$ and

499: point $(r,m)$ in $\R$, we have $(\pi(r,m))(p)\in \{\bf

500: true, \bf false\}$.

501:

502: We can now define what it means for a formula $\phi$ to be true at a point

503: $(r, m)$ in an interpreted system $\I$, written $(\I, r, m)\sat\phi$, by

504: induction on the structure of formulas:

505: \begin{itemize}

506: \item $(\I,r,m) \sat p$ iff $(\pi(r,m))(p) = {\bf true}$

507: \item $(\I,r,m) \sat \neg\phi$ iff $(\I,r,m) \not\sat \phi$

508: \item $(\I,r,m) \sat \phi \wedge \psi$ iff $(\I,r,m) \sat \phi$ and $(\I,r,m) \sat \psi$

509: \item $(\I, r, m) \sat K_i\phi ~\mbox{iff}~ (\I , r', m') \sat\phi$ for

510: all $(r',m') \in \K_i(r,m)$

511: \end{itemize}

512: As usual, we write $\I \sat \phi$ if $(\I,r,m) \sat \phi$ for all points

513: $(r,m)$ in $\I$.

514:

515:

516: \section{Defining Anonymity Using Knowledge}\label{sec:anonymity}

517:

518: \subsection{Information-Hiding Definitions}\label{sec:infohide}

519:

520: Anonymity is one example of an

521: information-hiding requirement. Other information-hiding requirements

522: include noninterference, privacy, confidentiality, secure message-sending,

523: and so on. These requirements are similar, and sometimes they overlap.

524: Noninterference, for example, requires a great deal to be hidden, and

525: typically implies privacy, anonymity, etc., for the classified user

526: whose state is protected by the noninterference requirement.

527:

528: %

529: %

530: %

531: %

532: %

533: %

534: In an earlier paper \cite{HalOn02}, we looked at

535: %

536: %

537: requirements of {\em total secrecy}

538: in multiagent systems.

539: %

540: %

541: Total secrecy

542: basically requires that in a

543: system with ``classified'' and ``unclassified'' users, the unclassified

544: users should never be able to infer the actions or the local states of

545: the unclassified users. For secrecy, the ``what needs to be hidden''

546: component of information-hiding is extremely restrictive:

547: %

548: total

549: secrecy requires

550: that absolutely everything that a classified user does must be hidden.

551: The ``how well does it need to be hidden'' component depends on the

552: situation. Our definition of secrecy says

553: that for any {\em nontrivial\/}

554: fact $\phi$ (that is, one that is not already valid)

555: that depends only the state of the classified or high-level

556: %

557: %

558: agent, the formula $\neg K_j \phi$ must be valid.

559: (See

560: %

561: %

562: %

563: our earlier paper

564: for more discussion of this definition.)

565: Semantically, this means that whatever the high-level user does, there

566: exists some run where the low-level user's view  of the system is the

567: same, but the high-level user did something different.

568: %

569: %

570: Our nonprobabilistic definitions are fairly

571: strong (simply because secrecy requires that so much be hidden).

572: The probabilistic definitions

573: %

574: we gave

575: require even more: not only

576: can the agent not learn any new classified

577: fact, but he also cannot learn anything about the probability of any such

578: fact. (In other words, if an agent initially assigns a classified

579: fact $\phi$ a probability $\alpha$

580: of being true, he always assigns $\phi$ that probability.)

581: It would be perfectly natural,

582: and possibly quite interesting, to

583: consider definitions of secrecy that do not require so much to be hidden

584: (e.g., by allowing some classified information to be

585: declassified~\cite{zdancewic01}),

586: or to discuss definitions that do not require such strong secrecy (e.g., by

587: giving definitions that were stronger than the nonprobabilistic definitions

588: we gave, but not quite so strong as the probabilistic definitions).

589:

590:

591: \subsection{Defining Anonymity}\label{sec:dfns}

592:

593: The basic intuition behind anonymity is that {\em actions} should be

594: divorced from the {\em agents} who perform them, for some set of {\em observers}.

595: With respect to the basic information-hiding framework outlined above, the

596: information that needs to be hidden is the identity of the agent (or set of

597: agents) who perform a particular action. Who the information needs to be

598: hidden from, i.e., which observers, depends on the situation. The third

599: component of information-hiding requirements---how well information needs to

600: be hidden---will often be the most interesting component

601: of the definitions of anonymity that we present here.

602:

603: %

604: %

605: %

606: %

607: Throughout the paper, we use the formula $\theta(i,a)$ to represent

608: ``agent $i$ has

609: performed action $a$, or will perform $a$ in the future''.%

610: \footnote{If we want to consider systems that may crash

611: we may want to consider $\theta'(i,a)$ instead, where

612: $\theta'(i,a)$ represents ``agent $i$ has performed action $a$, or will

613: perform $a$ in the future if the system does not crash''.  Since issues

614: of failure are orthogonal to the anonymity issues that we focus on here,

615: we consider only the simpler definition in this paper.}

616: %

617: %

618: %

619: %

620: %

621: %

622: %

623: %

624: %

625: %

626: For future reference, let $\delta(i,a)$

627: represent ``agent $i$ has performed action $a$''.

628: Note that $\theta(i,a)$ is a fact about the run: if it is true at some

629: point in a run, it is true at all points in a run (since it is true even

630: if $i$ performs $a$ at some point in the future).  On the other hand,

631: %

632: %

633: $\delta(i,a)$ may be false at the start of a run, and then become true

634: at the point where $i$ performs $a$.

635: %

636: %

637: %

638: %

639: %

640: %

641: %

642: %

643: %

644: %

645: %

646: %

647: %

648: %

649: %

650: %

651: %

652: %

653: %

654: %

655: %

656: %

657: %

658: %

659: %

660: %

661: %

662: %

663: %

664:

665: %

666:

667: It is not our goal in this

668: %

669: %

670: paper

671: to provide a ``correct'' definition of

672: anonymity. We also want to avoid giving an encyclopedia of definitions.

673: Rather, we give some basic definitions of anonymity

674: to show how our framework can be used. We base our choice of definitions

675: in part on definitions presented in earlier papers,

676: to make clear how our work relates to previous work, and in part on which

677: definitions of anonymity we expect to be useful in practice.

678: We first give

679: an extremely weak definition, but one that

680: nonetheless illustrates the basic intuition behind any definition of

681: anonymity.

682:

683:

684: \dfn

685: \label{dfn:minimalanon1}

686: Action $a$, performed by agent $i$, is

687: {\em minimally anonymous} with respect to agent $j$ in the interpreted

688: system $\I$, if

689: $\I \models \neg K_j [\theta(i,a)]$.

690: \edfn

691:

692: %

693: %

694: %

695: %

696: %

697:

698: This definition makes it

699: clear what is being hidden ($\theta(i,a)$---the fact that

700: %

701: %

702: $i$ performs $a$) and from whom

703: ($j$).  It also describes how well the information is hidden:~it

704: requires that $j$ not be sure that $i$ actually

705: %

706: %

707: performed, or will perform, the action.

708: Note that this is a weak information-hiding requirement.

709: It might be the case, for example, that agent $j$ is certain

710: that the action was performed either by $i$, or by at most one or two

711: other agents, thereby making $i$ a ``prime suspect''. It might also

712: be the case that $j$ is able to place a very high probability on

713: %

714: %

715: $i$ performing

716: the action, even though he isn't absolutely certain of

717: it. (Agent $j$ might know that there is some slight probability that

718: some other agent $i'$ performed the action, for example.)

719: Nonetheless,

720: it should be the case that for any other definition of anonymity we

721: give, if we want

722: to ensure that $i$'s performing action $a$ is to be kept anonymous

723: as far as observer $j$ is concerned, then

724: $i$'s action should be at least minimally anonymous with respect to $j$.

725:

726: %

727: Our definition of $a$ being minimally anonymous with respect to $j$ is

728: equivalent to the apparently weaker requirement $\I \models

729: \theta(i,a) \rimp \neg K_j

730: [\theta(i,a)]$, which says that if action $a$ is performed by $i$, then

731: $j$ does not not know it.

732: Clearly if $j$ never knows that $a$ is performed by $i$, then $j$ will

733: never know that $a$ is performed by $i$ if $i$ actually does perform

734: $a$.  To see that the converse holds, it suffices to note that if $i$

735: does not perform $a$, then surely $\neg K_j [\theta(i,a)]$ holds.

736: %

737: %

738: %

739: %

740: Thus, this definition, like several that will follow,

741: can be viewed as having the form ``if $i$

742: performed $a$, then $j$ does not know some appropriate

743: %

744: %

745: %

746: %

747: %

748: %

749: %

750: fact''.

751: %

752:

753: The definition

754: %

755: of minimal anonymity

756: also makes it clear how anonymity

757: relates to secrecy,

758: %

759: %

760: %

761: %

762: %

763: %

764: %

765: as defined in our earlier work \cite{HalOn02}.

766: To explain how, we first need to describe how we defined secrecy in

767: terms of knowledge.

768: Given a system $\I$, say that $\phi$ is {\em nontrivial in $\I$\/} if

769: %

770: %

771: $\I \not\sat \phi$, and that

772: {\em $\phi$ depends only on the local state of agent

773: $i$ in $\I$\/} if $\I \sat \phi \rimp K_i \phi$.

774: Intuitively, $\phi$

775: is nontrivial in $\I$ if $\phi$ could be

776: false in $\I$, and $\phi$ depends only on $i$'s local state if $i$

777: always knows whether or not $\phi$ is true.

778: (It is easy to see that

779: $\phi$ depends only on the local state of $i$ if $(\I,r,m) \sat \phi$

780: and $r_i(m) = r'_i(m')$ implies that $(\I,r',m') \sat \phi$.)

781: According to the definition in \cite{HalOn02},

782: agent $i$ {\em maintains total secrecy with respect to another agent $j$

783: in system $\I$\/} if

784: for every

785: nontrivial fact $\phi$ that depends

786: %

787: only

788: on the local state of $i$, the formula

789: $\neg K_j \phi$ is valid for the system.

790: That is, $i$ maintains total secrecy with respect to $j$ if

791: $j$ does not learn anything new about agent $i$'s state.

792: %

793: %

794: %

795: %

796: %

797: %

798: %

799: %

800: %

801: In general, $\theta(i,a)$ does not depend only on $i$'s local state,

802: because whether $i$ performs $a$ may depend on whether or not $i$ gets a

803: certain message from

804: %

805: %

806: some other agent $i'$.

807: On the other hand, if whether or not $i$

808: performs $a$ depends only on $i$'s protocol, and the protocol is encoded

809: in $i$'s local state, then $\theta(i,a)$ depends only on $i$'s local

810: state.

811: %

812: %

813: %

814: If $\theta(i,a)$ does depend only on $i$'s local state and

815: $j$ did not know all along that $i$ was going to

816: perform action $a$ (i.e., if we assume

817: that $\theta(i,a)$ is nontrivial), then

818: Definition~\ref{dfn:minimalanon1} is clearly a special case of the

819: definition of secrecy.

820: %

821: In any case, it is in much the same spirit as the definition of

822: secrecy.

823: Essentially, anonymity says that

824: %

825: %

826: the fact that agent $i$ has or will perform action $a$ must be hidden

827: from $j$,

828: while

829: total secrecy says that all facts that depend on agent $i$ must

830: be

831: hidden from $j$.

832:

833: %

834: Note that this definition of minimal anonymity is different from the one

835: we gave in the conference version of this paper \cite{HalOn03}.

836: There, the definition given used $\delta(i,a)$ rather than

837: $\theta(i,a)$.  We say that $a$ performed by agent $i$ is minimally

838: $\delta$-anonymous if Definition~\ref{dfn:minimalanon1} holds, with

839: $\theta(i,a)$ replaced by $\delta(i,a)$.

840: It is easy to see that minimal anonymity implies minimal

841: $\delta$-anonymity (since $\delta(i,a)$ implies $\theta(i,a)$), but the

842: converse is not true in general.   For example, suppose that $j$

843: gets a signal if $i$ is going to perform action $a$ (before $i$ actually

844: performs the action), but then never finds out exactly when $i$ performs

845: $a$.  Then minimal anonymity does not hold.  In runs where $i$ performs

846: $a$, agent $j$ knows that $i$ will perform $a$ when he gets the signal.

847: On the other hand, minimal $\delta$-anonymity does hold, because $j$

848: never knows when $i$ performs $a$.

849: %

850: %

851: %

852: In this situation, minimal anonymity seems to capture our intuitions of

853: what anonymity should mean better than minimal $\delta$-anonymity does.

854:

855:

856: The next definition of anonymity we give is much stronger.

857: It requires that if some agent $i$ performs

858: an action anonymously with respect to another agent $j$, then $j$ must

859: think it possible that the action could have been performed by {\em any}

860: of the agents (except for $j$).

861: Let $P_j \phi$ be an abbreviation for

862: $\neg K_j \neg \phi$. The operator $P_j$ is the dual of $K_j$;

863: intuitively, $P_j \phi$ means ``agent $j$ thinks that $\phi$ is possible''.

864:

865: \dfn

866: \label{dfn:totalanon1}

867: Action $a$, performed by agent $i$, is {\em totally anonymous}

868: with respect to $j$ in the interpreted system $\I$ if

869: \[

870: \I \models \theta(i,a) \Rightarrow \bigwedge_{i' \ne j} P_j

871:  [\theta(i',a)].

872: \]

873: \edfn

874:

875: Definition~\ref{dfn:totalanon1} captures the notion that an action is

876: anonymous if,

877: as far as the observer in question is concerned, it could have been

878: performed by

879: anybody in the system.

880:

881: %

882: Again, in the conference version of the paper, we defined total

883: anonymity using $\delta(i,a)$ rather than $\theta(i,a)$.  (The same

884: remark holds for all the other definitions of anonymity that we give,

885: although we do not always say

886: %

887: %

888: so

889: explicitly.)

890: Let

891: %

892: %

893: %

894: total $\delta$-anonymity

895: be the

896: %

897: %

898: anonymity requirement

899: obtained when $\theta(i,a)$ is replaced by

900: $\delta(i,a)$.  It is not hard to show that if agents have perfect

901: recall (which intuitively means that their local state keeps track of

902: all the actions they have performed---see \cite{FHMV} for the formal

903: definition), then total $\delta$-anonymity implies total anonymity.

904: %

905: %

906: %

907: %

908: This is not true, in general, without perfect

909: recall, because it might be possible for some agent to know that $i$

910: %

911: %

912: %

913: %

914: will perform action $a$---and therefore that no other agent

915: will---but forget this fact by the time that $i$ actually performs

916: $a$.

917: %

918: %

919: %

920: Similarly, total anonymity does not imply total $\delta$-anonymity.

921: %

922: %

923: To see why,

924: suppose that the agents are numbered $1, \ldots, n$, and

925: %

926: that

927: an outside observer knows

928: %

929: %

930: that if

931: $j$ performs

932: action $a$,

933: %

934: then

935: $j$ will perform it at time $j$.  Then total anonymity may

936: %

937: %

938: hold even though total $\delta$-anonymity does not.

939: For example, at time 3,

940: although the

941: %

942: %

943: observer

944: %

945: %

946: may consider

947: it possible that agent 4 will perform the

948: action (at time 4), he

949: %

950: %

951: cannot

952: consider it possible that 4 has

953: %

954: already

955: performed the action,

956: %

957: as required by total $\delta$-anonymity.

958:

959:

960: %

961: %

962: %

963: %

964: %

965: %

966: %

967: %

968:

969: %

970: %

971: %

972: %

973: %

974: %

975:

976: %

977: %

978: %

979:

980:

981:

982:

983: %

984: %

985: %

986: %

987: Chaum \citeyear{Chaum88} showed that total anonymity could be

988: obtained using DC-nets.

989: %

990: %

991: %

992: %

993: %

994: %

995: %

996: %

997: %

998: Recall that in a DC-net, a group of $n$ users use Chaum's dining

999: cryptographer's

1000: protocol (described in the same paper) to achieve anonymous communication.

1001: %

1002: %

1003: %

1004: If we model a

1005: DC-net as an interpreted multiagent system $\I$ whose agents consist

1006: exclusively of agents

1007: %

1008: %

1009: %

1010: participating in a single DC-net, then if an agent $i$ sends a message

1011: using the DC-net protocol, that action is totally anonymous.

1012: %

1013: %

1014: %

1015: %

1016: %

1017: %

1018: (Chaum proves this,

1019: under the assumption that any message could be generated by any user

1020: in the system.)

1021: %

1022: %

1023: %

1024: %

1025: Note that in the dining cryptographer's example, total anonymity and

1026: $\delta$-total anonymity agree, because who paid is decided

1027: before the protocol starts.

1028:

1029: %

1030: %

1031: %

1032: %

1033: It is easy to show that if an

1034: action is totally anonymous, then it must be minimally anonymous as well,

1035: as long as two simple requirements are satisfied. First, there must be

1036: at least 3 agents in the system.

1037: (A college student with only one roommate can't leave out her dirty

1038: dishes anonymously,

1039: but a student with at least two roommates might be able to.)

1040: Second, it must be the case that $a$

1041: can be performed only once in a given

1042: run of the system. Otherwise,

1043: it might be possible for $j$ to think that

1044: any agent $i' \not= i$ could have performed $a$,

1045: but for $j$ to {\em know}

1046: that agent $i$ did, indeed, perform $a$.

1047: For example, consider a system with three agents besides $j$.  Agent $j$

1048: might know

1049: that all three of the other agents performed action $a$.  In that case, in

1050: particular, $j$ knows that $i$ performed $a$, so action $a$ performed

1051: by $i$ is not minimally anonymous with respect to $j$, but is totally anonymous.

1052: We anticipate that this assumption will typically be met in practice.

1053: It is certainly consistent

1054: with examples of anonymity given in the

1055: literature. (See, for example,~\cite{Chaum88,schneider96}).

1056: In any case, if it is not met, it is possible to tag

1057: occurrences of an action (so that we can talk about the $k$th time $a$

1058: is performed). Thus,

1059: we can talk about the $i$th occurrence of an action being

1060: anonymous. Because the $i$th occurrence of an action can only happen

1061: once in any given run, our requirement is satisfied.

1062:

1063:

1064: \pro\label{pro:minimally}

1065: Suppose that there are at least three agents in the interpreted system $\I$

1066: and that

1067: $$\I \models \bigwedge_{i \not= j} \neg [ \theta(i,a) \wedge \theta(j,a) ].$$

1068: If action $a$, performed by agent $i$, is totally anonymous with respect to $j$,

1069: then it is minimally anonymous as well.

1070: \epro

1071:

1072: %

1073: \prf

1074: Suppose that action $a$ is totally anonymous. Because there are three agents in

1075: the system, there is some agent $i'$ other than $i$ and $j$, and by total

1076: anonymity, $\I \models \theta(i,a) \Rightarrow P_j[\theta(i',a)] $.

1077: If $(\I,r,m) \models \neg \theta(i,a)$, clearly $(\I,r,m) \models \neg K_j[\theta(i,a)]$.

1078: Otherwise, $(\I,r,m) \models P_j[\theta(i',a)]$ by total anonymity. Thus, there

1079: exists a point $(r',m')$ such that $r'_j(m') = r_j(m)$ and $(\I,r',m') \models \theta(i',a)$.

1080: By our assumption, $(\I,r',m') \models \neg \theta(i,a)$, because $i \not= i'$.

1081: Therefore, $(\I,r,m) \models \neg \K_j[\theta(i,a)]$. It follows that $a$

1082: is minimally anonymous with respect to $j$.

1083: \eprf

1084:

1085: Definitions~\ref{dfn:minimalanon1} and~\ref{dfn:totalanon1} are

1086: conceptually similar, even

1087: though the latter definition is much stronger.

1088: Once again, there is a set of formulas that an

1089: observer is not allowed to know. With the earlier definition, there is only one

1090: formula in this set: $\theta(i,a)$. As long as $j$ doesn't know that $i$ performed

1091: action $a$, this requirement is satisfied. With total anonymity, there

1092: are more formulas that $j$

1093: is not allowed to know: they

1094: take the form $\neg \theta(i',a)$. Before,

1095: we could guarantee only

1096: that $j$ did not know that $i$ did the action; here, for many agents $i'$,

1097: we guarantee that $j$ does not know that $i'$ did {\em not} do the action. The

1098: definition is made slightly more complicated by the implication, which restricts

1099: the conditions under which $j$ is not allowed to know $\neg

1100: \theta(i',a)$. (If $i$

1101: didn't actually perform the action, we don't care what $j$ thinks,

1102: since we are concerned only with anonymity with respect to $i$.)

1103: But the basic idea is the same.

1104:

1105: Note that total anonymity does {\em not\/} necessarily follow from total

1106: %

1107: %

1108: %

1109: %

1110: secrecy, because

1111: the formula $\neg \theta(i',a)$, for $i' \ne i$, does not, in general,

1112: %

1113: %

1114: depend only on the local

1115: state of $i$. It is therefore perfectly consistent

1116: with the definition of total secrecy

1117: for $j$ to learn this fact,

1118: %

1119: in violation of total anonymity.

1120: (Secrecy, of course, does not follow from anonymity, because secrecy

1121: requires that

1122: many more facts be hidden than simply whether $i$ performed a given action.)

1123:

1124: Total anonymity is a very strong requirement.

1125: Often, an action will not be totally anonymous, but

1126: only anonymous up to some set of agents who could have performed the action.

1127: This situation merits a weaker definition of anonymity.

1128: To be more precise, let $I$ be the set of all agents of the system and

1129: suppose that we have some

1130: %

1131: %

1132: %

1133: set $I_A \subseteq I$---an ``anonymity set'', using the terminology

1134: %

1135: %

1136: %

1137: %

1138: %

1139: %

1140: %

1141: of Chaum~\citeyear{Chaum88} and

1142: Pfitzmann and K\"ohntopp~\citeyear{terminology}---%

1143: of agents who can perform some action. We can define anonymity in

1144: terms of this set.

1145:

1146: \dfn

1147: \label{dfn:anonset1}

1148: Action $a$, performed by agent $i$, is

1149: {\em anonymous up to $I_A \subseteq I$} with respect to $j$ if

1150: \[ \I \models \theta(i,a) \Rightarrow \bigwedge_{i' \in I_A} P_j [\theta(i',a)]. \]

1151: \edfn

1152: %

1153: %

1154: %

1155: %

1156: %

1157: %

1158: %

1159:

1160: %

1161: %

1162: In the anonymous message-passing system Herbivore~\cite{goel02}, users are

1163: organized into {\em cliques} $C_1, \ldots, C_n$, each of which uses the dining cryptographers

1164: protocol \cite{Chaum88} for anonymous message-transmission. If a user wants to send

1165: an anonymous message, she can do so through her clique.

1166: %

1167: %

1168: Herbivore claims

1169: that any user $i$ is

1170: able to send a message anonymously up to $C_j$, where $i \in C_j$. As the size of

1171: a user's clique varies, so does the strength of the anonymity guarantees provided

1172: by the system.

1173: %

1174: %

1175: %

1176: %

1177:

1178: In some situations,

1179: it is not necessary that there be a fixed

1180: %

1181: anonymity set,

1182: as in Definition~\ref{dfn:anonset1}.  It suffices that, at all times,

1183: there {\em exists} some

1184: %

1185: %

1186: anonymity set

1187: with at least, say, $k$ agents. This leads to a definition of $k$-anonymity.

1188:

1189: \dfn

1190: \label{dfn:anonset2}

1191: Action $a$, performed by agent $i$, is

1192: {\em $k$-anonymous} with respect to $j$ if

1193: \[ \I \models \theta(i,a) \Rightarrow \bigvee_{\{I_A: |I_A| = k\}}

1194: \bigwedge_{i' \in I_A} P_j

1195: [\theta(i',a)]. \]

1196: \edfn

1197:

1198: This definition says that at any point $j$ must think it possible that

1199: %

1200: %

1201: any of at least $k$ agents might perform, or have performed, the action.

1202: %

1203: %

1204: %

1205: Note that the set of $k$ agents might be different in different runs,

1206: making this condition strictly weaker than anonymity up to a particular

1207: set of size $k$.

1208:

1209: %

1210: %

1211: A number of systems have been proposed that provide $k$-anonymity for

1212: some $k$.

1213: In the anonymous communications network protocol recently proposed by

1214: von Ahn, Bortz, and Hopper \cite{hopper03}, users can

1215: %

1216: %

1217: %

1218: send

1219: %

1220: messages with guarantees of $k$-anonymity. In the system $P^5$

1221: (for

1222: ``Peer-to-Peer Personal Privacy Protocol'') \cite{sherwood02},

1223: users join a logical broadcast

1224: tree that provides anonymous communication, and users can choose what

1225: level

1226: of $k$-anonymity they want, given that $k$-anonymity for a higher value of $k$

1227: makes communication more inefficient.

1228: %

1229: %

1230: %

1231: %

1232: Herbivore \cite{goel02} provides anonymity using

1233: %

1234: %

1235: cliques of DC-nets.  If

1236: %

1237: the system guarantees that

1238: the cliques all have a size of at least $k$,

1239: so that regardless of clique

1240: composition, there are at least $k$ users capable of sending any

1241: anonymous message, then Herbivore guarantees $k$-anonymity.

1242:

1243:

1244: %

1245: %

1246: %

1247: %

1248: %

1249: \subsection{A More Detailed Example: Dining Cryptographers}\label{sec:diningexample}

1250:

1251: A well-known example of anonymity in the computer security literature

1252: is Chaum's ``dining cryptographers problem''~\cite{Chaum88}. In the

1253: original description of this problem, three cryptographers sit down to

1254: dinner and are informed by the host that someone has already paid the bill

1255: anonymously. The cryptographers decide that the bill was paid either by

1256: one of the three people in their group, or by an outside agency such as

1257: the NSA. They want to find out which of these two situations is the actual

1258: one while preserving the anonymity of the cryptographer who (might have) paid.

1259: Chaum provides a protocol that the cryptographers can use to solve this

1260: problem. To guarantee that it works, however, it would be nice to check

1261: that anonymity conditions hold. Assuming we have a system that includes

1262: a set of three cryptographer agents $C = \{0, 1, 2\}$,

1263: as well as an outside

1264: observer agent $o$, the protocol should guarantee that for each agent $i

1265: \in C$,

1266: and each agent $j \in C - \{i\}$,

1267: the act of paying is anonymous up to $C - \{j\}$ with respect to $j$.

1268: For an outside observer $o$,

1269: %

1270: i.e., an agent other than one of three cryptographers,

1271: the protocol should guarantee that for

1272: each agent $i \in C$, the protocol is anonymous up to $C$ with respect to

1273: $o$. This can be made precise using our definition of anonymity up to a set.

1274:

1275: Because the requirements are symmetric for each of the three cryptographers,

1276: %

1277: %

1278: %

1279: we can

1280: %

1281: %

1282: describe the anonymity specification compactly by naming

1283: the agents using

1284: modular arithmetic. We use $\oplus$ to denote addition mod 3.

1285: %

1286: %

1287: %

1288: %

1289: %

1290: %

1291: Let the interpreted system $(\I = (\R,\pi)$ represent the possible runs

1292: of one instance of the

1293: dining cryptographers protocol, where the interpretation

1294: $\pi$ interprets formulas of the form

1295: %

1296: %

1297: $\theta(i,\mbox{``paid''})$

1298: in the obvious way.

1299: The following knowledge-based requirements comprise the anonymity portion of the protocol's

1300: specification, for each agent $i \in C$:

1301: $$

1302: \begin{array}{ll}

1303: %

1304: %

1305: \I \models \theta(i,\mbox{``paid''}) &\Rightarrow

1306:   P_{i \oplus 1} \theta(i \oplus 2,\mbox{``paid''})

1307:    \wedge~ P_{i \oplus 2} \theta(i \oplus 1,\mbox{``paid''})\\

1308:   &\wedge~ P_o \theta(i \oplus 1,\mbox{``paid''})

1309:    \wedge~ P_o \theta(i \oplus 2,\mbox{``paid''}).

1310: \end{array}

1311: $$

1312: %

1313: %

1314: %

1315: %

1316: This means that if a cryptographer paid, then each of the other

1317: cryptographers must think it possible that the third cryptographer

1318: could have paid. In addition, an outside observer must think it

1319: %

1320: %

1321: %

1322: possible that either of the other two cryptographers could have paid.

1323:

1324:

1325:

1326:

1327: \section{Probabilistic Variants of Anonymity}\label{sec:probanon}

1328:

1329: \subsection{Probabilistic Anonymity}

1330:

1331: All of the definitions presented in Section~\ref{sec:anonymity} were nonprobabilistic.

1332: As we mentioned in the introduction,

1333: this is a serious problem for the ``how well is information hidden'' component of

1334: the definitions. For all the definitions we gave, it was necessary only that

1335: observers think it {\em possible} that multiple agents could have performed the

1336: anonymous action. However, an event that is possible may nonetheless be

1337: extremely unlikely. Consider our definition of total anonymity

1338: (Definition~\ref{dfn:totalanon1}). It states that an action performed by $i$

1339: is totally anonymous if the observer $j$ thinks it could have been performed

1340: by any agent other than $j$. This may seem like a strong requirement, but if there

1341: are, say, $102$ agents,

1342: and $j$ can determine that $i$ performed action $a$

1343: with probability $0.99$ and that each of the other agents performed action

1344: $a$ with probability $0.0001$, agent $i$ might not be very happy with the

1345: guarantees provided by total anonymity.

1346: Of course, the appropriate notion of anonymity will depend on the

1347: application:

1348: $i$ might be content to know that no agent can {\em prove} that she

1349: performed the anonymous action. In that case, it

1350: might suffice for the action to be only

1351: minimally anonymous.

1352: However, in many other cases, an agent might want a more quantitative,

1353: probabilistic guarantee that it will be considered reasonably

1354: likely that other agents could have performed the action.

1355:

1356: Adding probability to the runs and systems framework is straightforward.

1357: The approach we use goes back to \cite{HT}, and was also used in our

1358: work on secrecy \cite{HalOn02}, so we just briefly review the relevant

1359: details here.

1360: Given a system $\R$, suppose we have a probability measure $\mu$ on the runs

1361: of $\R$. The pair $(\R, \mu)$ is a {\em probabilistic system}. For simplicity,

1362: we assume that every subset of $\R$ is measurable. We are interested

1363: in the probability that an agent assigns to an event at the point $(r,m)$.

1364: For example, we may want to know that at the point $(r,m)$,

1365: observer $i$ places a probability of $0.6$

1366: on $j$'s having performed some particular action. We want to condition

1367: the probability $\mu$ on $\K_i(r,m)$, the information that $i$ has

1368: at the point $(r,m)$. The problem is that $\K_i(r,m)$ is a set of {\em

1369: points}, while $\mu$ is a probability on {\em runs}.

1370: This problem is dealt with as follows.

1371:

1372: Given a set $U$ of points, let $\R(U)$ consist of the runs in $\R$ going through

1373: a point in $U$. That is,

1374: $$\R(U) = \{r \in \R: (r,m) \in U \mbox{ for some $m$}\}.$$

1375: The idea will be to condition $\mu$ on $\R(\K_i(r,m))$ rather than on

1376: $\K_i(r,m)$.  To make sure that conditioning is well defined,

1377: we assume that $\mu(\R(\K_i(r,m))) > 0$ for each agent $i$, run $r$,

1378: and time $m$. That is, $\mu$ assigns positive probability to the set of

1379: runs in $\R$ compatible with what happens in run $r$ up to time $m$,

1380: as far as agent $i$ is concerned.

1381:

1382: With this assumption, we can define a measure $\mu_{r,m,i}$ on the

1383: points in $\K_i(r,m)$ as follows.  If $\S \subseteq \R$, define

1384: $\K_i(r,m)(\S)$ to be the set of points in $\K_i(r,m)$ that lie on runs

1385: in $\S$; that is,

1386: $$\K_i(r,m)(\S) = \{(r',m') \in \K_i(r,m): r' \in \S\}.$$

1387: %

1388: %

1389: %

1390: %

1391: %

1392: %

1393: %

1394: %

1395: %

1396: %

1397: %

1398: %

1399: %

1400: %

1401: %

1402: %

1403: Let $\F_{r,m,i}$, the measurable subsets of $\K_i(r,m)$ (that is, the

1404: sets to which $\mu_{r,m,i}$ assigns a probability), consist of all sets of the

1405: form $\K_i(r,m)(\S)$, where $\S \subseteq \R$.  Then define

1406: $\mu_{r,m,i}(\K_i(r,m)(\S)) = \mu(\S \mid \R(\K_i(r,m))$.

1407: It is easy to check that $\mu_{r,m,i}$ is a probability measure,

1408: essentially defined by conditioning.

1409:

1410: %

1411: %

1412: %

1413: %

1414: Define a {\em probabilistic interpreted system\/} $\I$ to be a tuple

1415: $(\R,\mu,\pi)$, where $(\R,\mu)$ is a probabilistic system.

1416: %

1417: In a probabilistic interpreted system, we can give semantics to

1418: syntactic statements of probability.

1419: Following \cite{FHM}, we will be most interested in formulas of the form

1420: $\Pr_i (\phi ) \leq \alpha$

1421: (or similar formulas with $>$, $<$, or $=$

1422: instead of $\leq$). Intuitively, a

1423: formula such as $\Pr_i (\phi) \leq \alpha$ is true

1424: at a point $(r,m)$ if, according to $\mu_{r,m,i}$, the probability that

1425: $\phi$ is true is at

1426: %

1427: %

1428: most

1429: $\alpha$.  More formally,

1430: $  (\I,r,m) \models {\Pr}_i(\phi) \leq \alpha $ if

1431: %

1432: %

1433: %

1434: \[  \mu_{r,m,i}(\{(r',m') \in \K_i(r,m) : (\I,r',m') \models \phi \})

1435: \leq \alpha. \]

1436: Similarly, we can give semantics to $\Pr_i(\phi) < \alpha$ and

1437: $\Pr(\phi) = \alpha$, as well as conditional formulas such as

1438: $\Pr(\phi \, | \, \psi) \le \alpha$.

1439: Note that although these formulas talk

1440: about probability, they are either true or false at a given state.

1441:

1442: %

1443: %

1444: %

1445: %

1446: %

1447: %

1448: %

1449: %

1450: %

1451: %

1452: %

1453: %

1454: %

1455: %

1456: %

1457: %

1458: %

1459: %

1460:

1461: It is straightforward to define probabilistic notions of anonymity in

1462: probabilistic systems.

1463: %

1464: %

1465: %

1466: %

1467: %

1468: %

1469: %

1470: %

1471: %

1472: %

1473: We can think of Definition~\ref{dfn:minimalanon1}, for example, as saying

1474: %

1475: %

1476: that $j$'s probability that

1477: $i$

1478: %

1479: %

1480: performs

1481: the anonymous action $a$ must be less than 1

1482: (assuming that every nonempty set has positive probability).

1483: This can be generalized

1484: by specifying some $\alpha \leq 1$

1485: and requiring that the probability of $\theta(i,a)$ be less than $\alpha$.

1486:

1487: \dfn

1488: \label{dfn:alphaanon1}

1489: Action $a$, performed by agent $i$, is {\em $\alpha$-anonymous}

1490: with respect to agent $j$ if

1491: $\I \models \theta(i,a) \Rightarrow {\Pr}_j [\theta(i,a)] < \alpha$.

1492: \edfn

1493:

1494: %

1495: Note that if we replace $\theta(i,a)$ by $\delta(i,a)$ in

1496: Definition~\ref{dfn:alphaanon1}, the resulting notion might not be well

1497: defined.  The problem is

1498: %

1499: %

1500: %

1501: that the set

1502: %

1503: $$\{(r',m') \in \K_i(r,m): (\I,r',m') \sat \delta(i,a) \}$$

1504: %

1505: may not be

1506: measurable; it may not have the form $\K_i(r,m)(\S)$ for some $\S

1507: \subseteq \R$.  The problem does not arise if $\I$ is a {\em

1508: synchronous\/} sytem (in which case $i$ knows that time, and all the

1509: points in $\K_i(r,m)$ are of the form $(r',m)$), but it does arise if

1510: $\I$ is asynchronous.  We avoid this technical problem by working with

1511: $\theta(i,a)$ rather than $\delta(i,a)$.

1512:

1513: %

1514:

1515: %

1516: %

1517: Definition~\ref{dfn:alphaanon1}, unlike

1518: Definition~\ref{dfn:minimalanon1},

1519: includes an implication involving $\theta(i,a)$. It is easy

1520: to check that Definition~\ref{dfn:minimalanon1} does not change

1521: when such an implication is added; intuitively, if $\theta(i,a)$

1522: is false then $\neg K_j[\theta(i,a)]$ is trivially true.

1523: Definition~\ref{dfn:alphaanon1}, however, would change if we removed

1524: the implication,

1525: because it might be possible for $j$ to have a high probability

1526: of $\theta(i,a)$ even though it isn't true. We include the implication

1527: because without it, we place constraints on what $j$ thinks about

1528: %

1529: %

1530: %

1531: $\theta(i,a)$ even if $i$ has not performed the action $a$ and

1532: will not

1533: perform it in the future. Such a requirement,

1534: %

1535: while interesting,

1536: seems more akin to ``unsuspectibility'' than to anonymity.

1537:

1538:

1539: %

1540: %

1541: %

1542: %

1543: %

1544: Two of the notions of probabilistic anonymity considered by

1545: Reiter and Rubin \citeyear{reiter98} in the context of their Crowds

1546: system can be understood in terms of $\alpha$-anonymity.

1547: Reiter and Rubin

1548: say that a sender has {\em probable innocence}

1549: if, from an observer's point of view, the sender ``appears no more likely to be the

1550: originator than to not be the originator''. This is simply 0.5-anonymity.

1551: %

1552: (Under reasonable assumptions, Crowds provides 0.5-anonymity for Web requests.)

1553: Similarly, a sender has {\em possible innocence} if, from an observer's point of

1554: view, ``there is a nontrivial probability that the real sender is someone else''.

1555: This corresponds to minimal anonymity (as defined in Section~\ref{sec:dfns}), or

1556: to $\epsilon$-anonymity for some nontrivial value of $\epsilon$.

1557: %

1558: %

1559: %

1560:

1561: It might seem at first that Definition~\ref{dfn:alphaanon1} should be the only

1562: definition of anonymity we need: as

1563: long as $j$'s probability of

1564: %

1565: %

1566: $i$ performing the action is low

1567: enough, $i$ should have nothing to worry about.

1568: However, with further thought, it is not hard to see that this is not

1569: the case.

1570:

1571: Consider a scenario where there are 1002 agents, and where $\alpha = 0.11$.

1572: Suppose that the probability, according to Alice, that Bob

1573: %

1574: %

1575: performs

1576: the

1577: action is $.1$, but that her probability that any of the other $1000$ agents

1578: %

1579: %

1580: performs

1581: the action is $0.0009$ (for each agent). Alice's probability that Bob

1582: %

1583: %

1584: performs

1585: the

1586: action is small, but her probability that anyone else

1587: %

1588: %

1589: performs

1590: %

1591: %

1592: it is

1593: more than three orders of magnitude smaller.

1594: %

1595: %

1596: %

1597: Bob is obviously the prime suspect.

1598:

1599: %

1600: %

1601: This concern was addressed by Serjantov and Danezis~\citeyear{Serj02}

1602: in their paper

1603: on information-theoretic definitions of anonymity. They consider

1604: %

1605: %

1606: the probability that each agent in an anonymity set is the sender

1607: of some anonymous message, and use entropy to quantify the amount of

1608: %

1609: %

1610: information that the system is leaking;

1611: %

1612: %

1613: %

1614: %

1615: Diaz et al.~\citeyear{Diaz02} and

1616: Danezis~\citeyear{danezis:pet2003} use similar techniques.

1617: In this paper we are not

1618: concerned with quantitative measurements of anonymity, but we

1619: %

1620: %

1621: do agree that it is worthwhile to consider stronger

1622: notions of anonymity than the nonprobabilistic definitions, or

1623: even $\alpha$-anonymity, can provide.

1624: %

1625: We hope to examine quantitative definitions in future work.

1626:

1627: %

1628: %

1629: %

1630: %

1631: %

1632: The next definition strengthens

1633: Definition~\ref{dfn:alphaanon1} in the way that

1634: Definition~\ref{dfn:totalanon1} strengthens Definition~\ref{dfn:minimalanon1}.

1635: It requires that no agent in the

1636: %

1637: %

1638: anonymity

1639: set be a more

1640: likely suspect than any other.

1641:

1642:

1643: \dfn

1644: \label{dfn:probanonset1}

1645: Action $a$, performed by agent $i$, is {\em strongly

1646: probabilistically anonymous up to

1647: $I_A$\/}  with respect to

1648: agent $j$ if for each $i' \in I_A$,

1649: %

1650: %

1651: \[ \I \models \theta(i,a) \Rightarrow {\Pr}_j [\theta(i,a)] = {\Pr}_j [\theta(i',a)]. \]

1652: \edfn

1653:

1654: Depending on the size of $I_A$, this definition can be extremely

1655: strong. It does not

1656: state simply that for all agents in $I_A$, the observer must think it is

1657: reasonably likely

1658: that the agent could have performed the action; it also says that the

1659: observer's probabilities must be the same for each such agent.

1660: Of course, we could weaken the definition somewhat by not requiring that

1661: all the probabilities be equal, but by instead requiring

1662: that they be approximately equal

1663: (i.e., that their difference be small or that their ratio be close to 1).

1664: %

1665: %

1666: %

1667: %

1668: %

1669: %

1670: %

1671: Reiter and Rubin \citeyear{reiter98}, for example, say that the sender of a

1672: message is {\em beyond suspicion} if she ``appears no more likely to

1673: be the originator of that message than any other potential sender in the system''.

1674: In our terminology, $i$ is beyond suspicion with respect to $j$ if for

1675: each $i' \in I_A$,

1676: %

1677: %

1678: %

1679: %

1680: \[ \I \models \theta(i,a) \Rightarrow {\Pr}_j [\theta(i,a)] \leq {\Pr}_j [\theta(i',a)]. \]

1681: This is clearly weaker than strong probabilistic anonymity, but still a very

1682: strong requirement,

1683: %

1684: and perhaps more reasonable, too.

1685: %

1686: %

1687: %

1688: %

1689: Our main point is that a wide variety of properties can be expressed

1690: clearly and succinctly in our framework.

1691: %

1692: %

1693: %

1694:

1695: \subsection{Conditional Anonymity}

1696:

1697: While we have shown that many useful notions of anonymity---including many definitions

1698: that have already been proposed---can be expressed

1699: in our framework, we claim

1700: that there are some important intuitions that have not yet been captured.

1701: Suppose, for example, that someone

1702: makes a \$5,000,000 donation to Cornell University. It is

1703: clearly not the case

1704: %

1705: that

1706: everyone is equally likely, or even almost equally likely, to

1707: have made the donation.

1708: Of course, we could take the

1709: %

1710: %

1711: anonymity

1712: set $I_A$ to

1713: consist of those people who might be in a position to make such a large donation,

1714: and insist that they all be considered equally likely. Unfortunately,

1715: even that is unreasonable: a priori, some of them may already have known

1716: connections to Cornell, and thus

1717: be considered far more likely to

1718: have made the donation. All that an anonymous donor

1719: can reasonably expect is that nothing an

1720: observer learns from his interactions with the environment (e.g.,

1721: reading the newspapers, noting when the donation was made, etc.)~will

1722: give him more information about the identity of the donor than he

1723: already had.

1724:

1725: For another example, consider a conference or research journal

1726: that provides anonymous reviews to researchers who submit their papers for

1727: publication.

1728: It is unlikely that the review process provides anything like

1729: $\alpha$-anonymity for a small $\alpha$, or strongly probabilistic

1730: anonymity up to some reasonable set.

1731: %

1732: %

1733: %

1734: When

1735: %

1736: the preliminary version of

1737: this paper, for example, was accepted by

1738: %

1739: %

1740: %

1741: the Computer Security Foundations Workshop,

1742: the acceptance notice included three reviews that were, in our terminology,

1743: anonymous up

1744: to the program committee. That is, any one of the reviews we received

1745: could have been written by any of the members of the program committee. However,

1746: by reading some of the reviews, we were able to make fairly good guesses

1747: as to

1748: which committee members had provided which reviews, based on our knowledge

1749: of the specializations of the various members, and based on the

1750: content of the reviews themselves.

1751: Moreover, we had a fairly good idea of which committee members

1752: would provide reviews of our paper even before we received the

1753: reviews.

1754: Thus, it seems unreasonable to hope that the review process would

1755: provide strong probabilistic anonymity (up to the program committee),

1756: or even some weaker variant of probabilistic anonymity.

1757: Probabilistic anonymity would require the reviews to convert our prior

1758: beliefs, according to which

1759: some program committee members were more likely than others to be

1760: reviewers

1761: of our paper, to posterior beliefs

1762: according to which all program committee members were equally likely!

1763: This does not seem at all reasonable.

1764: However, the reviewers might hope that

1765: that the process did not

1766: give us any more information than we already had.

1767:

1768: %

1769: %

1770: %

1771: %

1772: %

1773: %

1774: %

1775: In our paper on secrecy \cite{HalOn02},

1776: we tried to capture

1777: the intuition that, when an unclassified user interacts with a secure

1778: system, she does not learn anything about any classified user

1779: that she didn't already know.  We did

1780: this formally by requiring that, for any three

1781: points $(r,m)$, $(r',m')$, and $(r'',m'')$,

1782: \begin{equation}\label{secrecy}

1783: \mu_{(r,m,j)}(\K_i(r'',m'')) = \mu_{(r',m',j)}(\K_i(r'',m'')).

1784: \end{equation}

1785: That is, whatever the unclassified user $j$ sees, her probability of

1786: any particular classified state will remain unchanged.

1787: %

1788: %

1789: %

1790: %

1791: %

1792:

1793: %

1794: %

1795: %

1796: %

1797: %

1798:

1799: When defining anonymity, we are not concerned with protecting

1800: all information about some agent $i$, but rather the fact that $i$

1801: %

1802: %

1803: performs

1804: some particular action $a$.

1805: Given a probabilistic system $\I = (\R,\pi,\mu)$ and a formula $\phi$,

1806: let $e_r(\phi)$

1807: consist of the set of runs $r$ such

1808: that $\phi$ is true at some point in $r$, and let $e_p(\phi)$ be the set

1809: of points where $\phi$

1810: is true.  That is

1811: $$\begin{array}{l}

1812: e_r(\phi) = \{ r : \exists m ( (\I,r,m) \models \phi) \}, \\

1813: e_p(\phi) = \{ (r,m) : (\I,r,m) \models \phi \}.

1814: \end{array}

1815: $$

1816: The most obvious analogue to (\ref{secrecy}) is the requirement that,

1817: for all points $(r,m)$ and $(r',m')$,

1818: $$\mu_{(r,m,j)}(e_p(\theta(i,a))) = \mu_{(r',m',j)}(e_p(\theta(i,a))).$$

1819: This definition says that $j$ never learns anything about the

1820: probability that $i$

1821: %

1822: performed

1823: performs

1824: $a$:

1825: she always ascribes the same probability to this event.

1826: In the context of our anonymous donation example,

1827: this would say that the probability (according to $j$) of $i$ donating

1828: \$5,000,000 to Cornell is the same at all times.

1829:

1830: The problem with this definition is that it does not allow

1831: $j$ to learn that {\em someone\/} donated \$5,000,000 to Cornell.  That is, before

1832: $j$ learned that someone donated \$5,000,000 to Cornell, $j$ may have

1833: thought it was unlikely that anyone would donate that much money to

1834: Cornell.  We cannot expect that $j$'s probability of $i$ donating

1835: \$5,000,000 would be the same both before and after learning that

1836: someone made a donation.

1837: We want to give a definition of conditional anonymity that

1838: allows observers to learn that an action has been performed,

1839: but that protects---as much as possible, given the system---the

1840: fact that some particular agent

1841: %

1842: performed

1843: performs

1844: the action. If, on

1845: the other hand, the anonymous action has not been performed,

1846: then the observer's probabilities do not matter.

1847:

1848: Suppose that $i$ wants to perform action $a$, and wants conditional anonymity

1849: with respect to $j$. Let

1850: $\theta(\overline{\jmath},a)$ represent the fact that $a$ has been performed

1851: by some agent other than

1852: $j$, i.e., $\theta(\overline{\jmath},a) =

1853: \lor_{i' \ne j} \theta(i',a)$.

1854: %

1855: %

1856: The definition of conditional anonymity says that $j$'s prior probability

1857: of $\theta(i,a)$ given $\theta(\overline{\jmath},a)$ must be the

1858: same as his posterior probability of $\theta(i,a)$ at points where $j$ knows

1859: $\theta(\overline{\jmath},a)$, i.e., at points where $j$

1860: knows that someone other than $j$ has performed

1861: %

1862: (or will perform)

1863: $a$.

1864: %

1865: %

1866: %

1867: %

1868: %

1869: %

1870: %

1871: Let $\alpha = \mu(e_r(\theta(i,a)) \mid e_r(\theta(\overline{\jmath},a)))$. This is the prior

1872: probability that $i$ has performed $a$, given that somebody other than

1873: $j$ has.

1874: Conditional anonymity says that at any point where

1875: $j$ knows that someone other than $j$

1876: %

1877: %

1878: performs

1879: $a$,

1880: $j$'s probability of $\theta(i,a)$ must be $\alpha$.

1881: In other words, $j$ shouldn't be able to learn anything more

1882: about who

1883: %

1884: %

1885: performs

1886: $a$ (except that

1887: %

1888: %

1889: somebody does)

1890: than he know before he began interacting with the system in the first place.

1891:

1892: \dfn\label{dfn:conditionalanon}

1893: Action $a$, performed by agent $i$, is {\em conditionally anonymous} with

1894: respect to $j$ in the probabilistic system $\I$ if

1895: $$

1896: %

1897: %

1898: %

1899: \I \models K_j\theta(\overline{\jmath},a) \Rightarrow

1900: {\Pr}_j(\theta(i,a)) = \mu(e_r(\theta(i,a)) \mid

1901: e_r(\theta(\overline{\jmath},a))).

1902: %

1903: $$

1904: \edfn

1905: Note that if only one agent ever performs $a$, then

1906: $a$ is trivially conditionally anonymous with respect to $j$, but may

1907: not be minimally anonymous with respect to $j$.

1908: Thus, conditional anonymity does not necessarily imply minimal anonymity.

1909:

1910: In Definition~\ref{dfn:conditionalanon}, we implicitly

1911: assumed that agent $j$ was allowed to learn that someone other than $j$

1912: performed action $a$; anonymity is

1913: intended to hide {\em which\/} agent performed $a$, given that somebody did.

1914: More generally, we believe that

1915: we need to consider anonymity

1916: with respect to what an observer

1917: is allowed to learn.

1918: We might want to specify, for example, that an observer is allowed to know

1919: that a donation was made, and for how much, or to learn the contents of a

1920: conference paper review. The following definition lets us do this formally.

1921:

1922:

1923: %

1924: %

1925: %

1926: %

1927: %

1928: \dfn\label{dfn:conditionalanon1}

1929: %

1930: %

1931: Action $a$, performed by agent $i$, is {\em conditionally anonymous} with

1932: respect to $j$ and $\phi$ in the probabilistic system $\I$ if

1933: $$ \I \models K_j \phi \Rightarrow

1934: {\Pr}_j(\theta(i,a)) = \mu(e_r(\theta(i,a)) \mid e_r(\phi)). $$

1935: \edfn

1936: Definition~\ref{dfn:conditionalanon} is clearly the special case of

1937: Definition~\ref{dfn:conditionalanon1} where $\phi =

1938: \theta(\overline{j},a)$.

1939: Intuitively, both of these definitions say that once an observer learns some

1940: fact $\phi$ connected to the fact $\theta(i,a)$, we require that she

1941: doesn't learn anything else that might change her probabilities of

1942: $\theta(i,a)$.

1943:

1944: %

1945: %

1946: %

1947: %

1948: \subsection{Example: Probabilistic Dining Cryptographers}

1949:

1950: Returning the dining cryptographers problem, suppose that it is well-known

1951: that one of the three cryptographers at the table is much more generous than

1952: the other two, and therefore more likely to pay for dinner.

1953: %

1954: %

1955: %

1956: %

1957: %

1958: %

1959: %

1960: %

1961: %

1962: %

1963: %

1964: %

1965: Suppose, for example, that the probability

1966: measure on the set of runs where the generous

1967: cryptographer has paid is 0.8, given that one of the cryptographers paid for

1968: dinner, and that it is 0.1 for each of the other two cryptographers.

1969: Conditional anonymity for each of the three cryptographers with respect to

1970: an outside observer means that when such observer learns that one of the

1971: cryptographers has paid for dinner, his probability that any of the three

1972: cryptographers paid should remain 0.8, 0.1, and 0.1.

1973: %

1974: %

1975: %

1976: %

1977: If the one of the thrifty cryptographers paid, the generous

1978: cryptographer

1979: should think that there is a probability of $0.5 = 0.1 / (0.1 + 0.1)$

1980: that either of the others paid.

1981: Likewise, if the generous cryptographer paid, each of the others should

1982: think that

1983: there is a probability of $0.8 / (0.8 + 0.1)$ that the generous

1984: cryptographer paid and a

1985: probability of $0.1 / (0.8 + 0.1)$ that the other thrifty cryptographer

1986: paid.

1987: %

1988: We can similarly calculate all the other relevant probabilities.

1989:

1990: More generally, suppose that

1991: we have an intepreted probabilistic system

1992: %

1993: %

1994: %

1995: %

1996: %

1997: %

1998: %

1999: $(\R,\mu,\pi)$ that represents

2000: instances of the dining cryptographers protocol, where the interpretation $\pi$

2001: once again interprets formulas of the form

2002: $\theta(i,\mbox{``paid''})$ and $\theta(\overline{\jmath},\mbox{``paid''})$

2003: in the obvious way, and where the formula $\gamma$ is true if one of the cryptographers

2004: paid.  (That is, $\gamma$ is equivalent to $\bigvee_{i \in \{0,1,2\}} \theta(i,\mbox{``paid''})$.)

2005: For any cryptographer $i \in \{ 0, 1, 2 \}$, let $\alpha(i)$ be the prior probability that $i$ paid,

2006: %

2007: %

2008: %

2009: %

2010: given that somebody else did. That is, let

2011: $$\alpha(i) = \mu(e_r(\theta(i,\mbox{``paid''})) \mid e_r(\gamma)).$$

2012: In the more concrete example given above, if $0$ is the generous cryptographer, we would

2013: have $\alpha(0) = 0.8$ and $\alpha(1) = \alpha(2) = 0.1$.

2014:

2015: For the purposes of conditional probability with respect to an agent $j$,

2016: we are interested in the probability that some agent $i$ paid, given that somebody

2017: other than $j$ paid. Formally, for $i \not= j$, let

2018: $$\alpha(i,j) = \mu(e_r(\theta(i,\mbox{``paid''})) \mid e_r(\theta(\overline{\jmath},\mbox{``paid''}))).$$

2019: If an observer $o$ is not one of the three cryptographers, than $o$ didn't pay,

2020: and we have $\alpha(i,o) = \alpha(i)$. Otherwise, if $i,j \in \{0,1,2\}$,

2021:  we can use conditioning to compute $\alpha(i,j)$:

2022: \[ \alpha(i,j) = \frac{\alpha(i)}{\alpha(j \oplus 1) + \alpha(j \oplus 2)}. \]

2023: (Once again, we make our definitions and requirements more compact

2024: by using modular arithmetic, where $\oplus$ denotes addition mod 3.)

2025:

2026: %

2027: %

2028: The following formula captures the requirement of conditional

2029: anonymity in the dining cryptographer's protocol, for each cryptographer

2030: $i$, with respect to the other cryptographers and any outside observers.

2031: %

2032: %

2033: %

2034: %

2035: %

2036:

2037: $$

2038: \begin{array}{ll}

2039: \I \models & \left[ K_{i \oplus 1} \theta(\overline{i \oplus 1},\mbox{``paid''}) \Rightarrow {\Pr}_{i \oplus 1}(\theta(i,\mbox{``paid''})) = \alpha(i,i \oplus 1) \right] \wedge \\

2040: 	& \left[ K_{i \oplus 2} \theta(\overline{i \oplus 2},\mbox{``paid''}) \Rightarrow {\Pr}_{i \oplus 2}(\theta(i,\mbox{``paid''})) = \alpha(i,i \oplus 2) \right] \wedge \\

2041: 	& \left[ K_{o} \theta(\overline{o},\mbox{``paid''}) \Rightarrow

2042: 	{\Pr}_{o}(\theta(i,\mbox{``paid''})) = \alpha(i,o) \right].

2043: \end{array}

2044: $$

2045:

2046: Chaum's original proof that the dining cryptographers protocol provides

2047: anonymity actually proves conditional anonymity in this general setting.

2048: Note that if the probability that one of the cryptographers will pay is 1,

2049: that cryptographer will have conditional anonymity even though he doesn't even

2050: have minimal anonymity.

2051:

2052:

2053: \subsection{Other Uses for Probability}

2054:

2055: In the previous two subsections, we have emphasized how probability can be used

2056: to obtain definitions of anonymity stronger than those presented in

2057: Section~\ref{sec:anonymity}. However, probabilistic systems can also be used

2058: to define interesting ways of weakening those definitions. Real-world

2059: anonymity systems do not offer absolute guarantees of anonymity such as those

2060: those specified by our definitions. Rather, they guarantee that a user's anonymity

2061: will be protected {\em with high probability}. In a given run, a user's anonymity

2062: might be protected or corrupted. If the probability of the event that a user's

2063: anonymity is corrupted is very small, i.e., the set of runs where her anonymity is

2064: not protected is assigned a very small probability by the measure $\mu$, this

2065: might be enough of a guarantee for the user to interact with the system.

2066:

2067: Recall that we said that $i$ maintains total anonymity with respect to $j$

2068: if the fact $\phi = \theta(i,a) \Rightarrow \bigwedge_{i' \ne j} P_j [\theta(i',a)]$

2069: is true at every point in the system. Total anonymity is compromised in a run $r$

2070: if at some point $(r,m)$, $\neg \phi$ holds. Therefore, the set of runs where

2071: total anonymity is compromised is simply $e_r(\neg \phi)$, using the notation of

2072: the previous section. If $\mu(e_r(\neg \phi))$ is very small, then $i$ maintains

2073: total anonymity with very high probability. This analysis can obviously be extended

2074: to all the other definitions of anonymity given in previous sections.

2075:

2076: Bounds such as these are useful for analyzing real-world systems. The Crowds

2077: system \cite{reiter98}, for example, uses randomization when routing communication traffic,

2078: so that anonymity is protected with high probability. The probabilistic guarantees provided

2079: by Crowds were analyzed formally by

2080: %

2081: %

2082: Shmatikov \citeyear{shmat02}, using a probabilistic model

2083: checker, and he demonstrates how the anonymity guarantees provided by the Crowds system change

2084: as more users (who may be either honest or corrupt) are added to the system. Shmatikov

2085: uses a temporal probabilistic logic to express probabilistic anonymity properties, so these

2086: properties can be expressed in our system framework.  (It is

2087: straightforward to give semantics to temporal operators in systems; see \cite{FHMV}.)

2088: In any case, Shmatikov's analysis of

2089: a real-world anonymity system is a useful example of how the formal methods that we advocate

2090: can be used to specify and verify properties of real-world systems.

2091:

2092: \section{Related Work}\label{sec:related}

2093:

2094: \subsection{Knowledge-based Definitions of Anonymity}

2095: As mentioned in the introduction, we are not the first to use knowledge

2096: to handle definitions of security, information hiding, or even anonymity.

2097: Anonymity has been formalized using epistemic logic by Syverson

2098: and

2099: %

2100: %

2101: Stubblebine~\citeyear{syverson99}.  Like us, they use epistemic logic

2102: to characterize a number of

2103: information-hiding requirements that involve anonymity.

2104: However, the focus of their work is very different from ours.

2105: They describe a logic for reasoning about anonymity

2106: and a number of axioms for the logic.

2107: An agent's knowledge is based, roughly speaking, on

2108: %

2109: his recent actions and observations, as well as

2110: what follows from his log of system events.

2111: The first five axioms that Syverson and Stubblebine

2112: give are the standard {\bf S5} axioms for knowledge.  There

2113: are well-known soundness and completeness results relating the

2114: {\bf S5} axiom system to Kripke structure semantics for knowledge

2115: \cite{FHMV}. However, they give many more axioms, and

2116: they do not attempt to give a semantics for which their axioms are

2117: sound. Our focus, on the other hand, is completely semantic.  We have not tried

2118: to axiomatize anonymity. Rather, we try to give an appropriate semantic

2119: framework in which to consider anonymity.

2120:

2121: In some ways,

2122: Syverson and Stubblebine's model is more detailed than the model used here.

2123: Their logic includes

2124: many formulas that represent various actions and facts, including the

2125: sending and

2126: receiving of messages, details of encryption and keys, and so on. They also

2127: make more assumptions about the local state of a given agent,

2128: including details about the sequence of actions that the agent has performed

2129: locally, a log of system events that have been recorded, and a set of facts

2130: of which the agent is aware. While these extra details may accurately

2131: reflect the

2132: nature of agents in real-world systems,

2133: they are orthogonal to our concerns here.  In any case,

2134: it would be easy to add

2135: such expressiveness to our model as well, simply by including these details in

2136: the local states of the various agents.

2137:

2138: It is straightforward to relate our definitions to those of Syverson and

2139: Stubblebine. They consider facts of the form

2140: $\phi(i)$, where $i$ is a principal, i.e., an agent.

2141: They assume that the fact $\phi(i)$ is a single

2142: formula in which a single agent name occurs. Clearly,

2143: $\theta(i,a)$ is an example of such a formula.

2144: In fact, Syverson and Stubblebine assume

2145: that if $\phi(i)$ and $\phi(j)$ are both true, then $i = j$.

2146: For the $\theta(i,a)$ formulas, this means

2147: that $\theta(i,a)$ and $\theta(i',a)$ cannot be simultaneously

2148: true: at most one agent can perform an action in a given run,

2149: exactly as in the setup of Proposition~\ref{pro:minimally}.

2150:

2151: There is one definition in~\cite{syverson99}

2152: that is especially relevant to our discussion; the other relevant

2153: definitions presented there are similar.

2154: A system is said to satisfy

2155: {\em $(\geq k)$-anonymity} if the following formula is valid for

2156: some observer $o$:

2157: $$

2158: \begin{array}{c}

2159: \phi(i) \Rightarrow P_o(\phi(i)) \wedge P_o(\phi(i_1))

2160: \wedge \cdots \wedge P_o(\phi(i_{k-1})).

2161: \end{array}

2162: $$

2163: This definition says that if $\phi(i)$ holds, there must be at least $k$ agents, including

2164: $i$, that the observer suspects. (The existential quantification of the agents $i_1, \ldots, i_{n-1}$ is

2165: implicit.)

2166: The definition

2167: is essentially equivalent to our definition of $(k-1)$-anonymity.

2168: It certainly implies that there are $k-1$ agents other than

2169: $i$ for which $\phi(i')$ might be true. On the other

2170: hand,

2171: if $P_o(\phi(i'))$ is true for $k-1$ agents other than $i$,

2172: then the formula must hold,

2173: because

2174: $\phi(i) \rimp P_o(\phi(i))$ is valid.

2175:

2176:

2177: \subsection{CSP and Anonymity}\label{sec:CSP}

2178: A great deal of work on the foundations of computer security has used process

2179: algebras such as CCS and CSP \cite{Mil,hoare85} as the basic system

2180: framework \cite{focardi01,schneider96b}. Process algebras

2181: offer several advantages: they are simple, they can be used for specifying

2182: systems as well as system properties, and model-checkers

2183: are available that can be used

2184: to verify properties of systems described using their formalisms.

2185:

2186: Schneider and Sidiropoulos~\citeyear{schneider96} use CSP

2187: both to characterize one type of anonymity and to describe variants of

2188: the dining cryptographers

2189: problem~\cite{Chaum88}. They then use a model-checker to verify

2190: that their notion of anonymity holds for those variants of the problem.

2191: To describe their approach, we need to outline

2192: some of the basic notation and semantics of CSP. To save space, we give a

2193: simplified treatment of CSP here.

2194: %

2195: %

2196: %

2197: (See Hoare~\citeyear{hoare85} for a complete description of CSP.)

2198: The basic unit of

2199: CSP is the {\em event}. Systems are modeled in terms of the events that

2200: they can perform. Events may be built up several components. For example,

2201: ``donate.\$5''

2202: might represent a ``donate'' event in the amount of \$5. {\em Processes} are the

2203: systems, or components of systems, that are described using CSP. As a

2204: process

2205: unfolds or executes, various events occur.

2206: For our purposes, we make the simplifying assumption that a process

2207: is determined by the event sequences it is able to engage in.

2208:

2209: We can associate with every process a set of {\em traces}.

2210: Intuitively, each trace in the set associated with process $P$

2211: represents one sequence of events that might occur during an execution

2212: of $P$.

2213: Informally, CSP event traces correspond to

2214: finite prefixes of runs, except that they do not explicitly describe the

2215: local states of agents and do not explicitly describe time.

2216:

2217: Schneider and Sidiropoulos define a notion of anonymity with respect to

2218: a set $A$ of events.  Typically, $A$ consists of

2219: %

2220: events

2221: of the form

2222: $i.a$ for a fixed action $a$, where $i$ is an agent in some set that we

2223: denote $I_A$.

2224: Intuively, anonymity with respect to $A$ means that if any event in $A$

2225: occurs, it could equally well have been any other event in $A$.

2226: In particular,

2227: this means that if an agent in $I_A$ performs $a$, it could

2228: equally well have been any other agent in $I_A$.

2229: Formally,

2230: given a set $\Sigma$ of possible events and $A \subseteq \Sigma$,

2231: let $f_A$ be a function on traces

2232: that, given a trace $\tau$, returns a trace $f_A(\tau)$ that is

2233: identical to $\tau$ except that every event in $A$ is replaced by a

2234: fixed event $\alpha \notin \Sigma$.

2235: A process $P$ is {\em strongly anonymous} on $A$ if

2236: $f_A^{-1}(f_A(P)) = P$,

2237: where we identify

2238: $P$ with its associated set of traces. This means that all the

2239: events in $A$ are interchangeable; by

2240: replacing any event in $A$ with any other

2241: we would still get a valid trace of $P$.

2242:

2243: Schneider and Sidiropoulos give several very simple examples that are

2244: useful for clarifying this definition of anonymity. One is a system

2245: where there are two agents who can provide donations to a charity,

2246: but where only one

2247: of them will actually do so. Agent $0$, if she gives a donation,

2248:  gives \$5, and agent $1$ gives \$10. This is followed by a ``thanks''

2249: from the charity. The events of interest are ``0.gives'' and ``1.gives''

2250: (representing events where $0$ and $1$ make a donation),

2251: ``\$5'' and ``\$10'' (representing the charity's receipt of the donation),

2252: ``thanks'', and ``STOP'' (to signify that the process

2253: has ended). There are two possible traces:

2254: \begin{enumerate}

2255: \item 0.gives $\rightarrow$ \$5 $\rightarrow$ ``thanks'' $\rightarrow$

2256: STOP.

2257: \item 1.gives $\rightarrow$ \$10 $\rightarrow$ ``thanks'' $\rightarrow$

2258: STOP.

2259: \end{enumerate}

2260: The donors require anonymity, and so we require that the CSP process is

2261: strongly anonymous on the set \{0.gives, 1.gives\}. In fact, this

2262: condition is not satisfied by the process, because ``0.gives'' and

2263: ``1.gives'' are not interchangeable. This is because ``0.gives'' must

2264: be followed by ``\$5'', while ``1.gives'' must be followed by ``\$10''.

2265: Intuitively, an agent who observes the traces can determine the

2266: donor by looking at the amount of money donated.

2267:

2268: We believe that Schneider and Sidiropoulos's definition is best

2269: understood as trying to

2270: capture the intuition that an observer who sees all the events generated

2271: by $P$, except for events in $A$, does not know which event

2272: in $A$ occurred.

2273: We can make this precise by translating Schneider and Sidiropoulos's

2274: definition into our framework.

2275: The first step is to associate with each process $P$

2276: a corresponding set of runs $\R_P$.

2277: We present one reasonable way of doing so here, which suffices for our

2278: purposes.

2279: In future work, we hope to explore the connection between CSP and the runs and

2280: systems framework in more detail.

2281:

2282: Recall that a run is an infinite sequence

2283: of global states of the form $(s_e, s_1, \ldots, s_n)$, where each $s_i$

2284: is the local state of

2285: agent $i$, and $s_e$ is the state of the environment.

2286: Therefore, to

2287: specify a set of runs, we need to describe the set of agents, and then

2288: explain how to derive the local states of each agent for each run.

2289: There is an obvious problem here:

2290: CSP has no analogue of agents and local states.

2291: To get around this, we could simply tag all events

2292: with an agent (as Schneider and Sidiropoulos in fact do for the events

2293: in $A$).

2294: However, for our current purposes, a much simpler approach will do.

2295: The only agent we care about is a (possibly mythical) observer who

2296: is able to observe every event except the ones in $A$.  Moreover, for

2297: events in $A$, the observer knows that something happened (although not

2298: what).  There may be other agents in the system, but their local states

2299: are irrelevant.  We formalize this as follows.

2300:

2301: Fix a process $P$ over some set $\Sigma$ of events, and let $A \subseteq \Sigma$.

2302: Following Schneider and Sidiropoulos, for the purposes of this

2303: discussion, assume that $A$ consists of events of the form $i.a$, where

2304: $i \in I_A$ and $a$ is some specific action.

2305: We say that a system $\R$ is {\em compatible with $P$\/} if there

2306: exists some agent $o$ such that the following two conditions hold:

2307: \begin{itemize}

2308: \item for every run $r \in \R$ and every time $m$, there exists a trace

2309: $\tau \in P$

2310: such that $\tau = r_e(m)$ and $f_A(\tau) = r_o(m)$;

2311: \item for every trace $\tau \in P$, there exists a run $r \in \R$ such

2312: that $r_e( | \tau | ) = \tau$ and $r_o( | \tau | ) = f_A(\tau)$

2313: (where $|\tau|$ is the number of events in $\tau$).

2314: \end{itemize}

2315: Intuitively, $\R$ represents $P$ if (1) for every trace $\tau$ in $P$,

2316: there is a point $(r,m)$ in $\R$ such that, at this point, exactly the

2317: events in $\tau$ have occurred (and are recorded in the environment's

2318: state) and $o$ has observed $f_A(\tau)$, and (2) for every point $(r,m)$

2319: in $\R$, there is a trace $\tau$ in $P$ such that precisely the events

2320: in $r_e(m)$ have happened in $\tau$, and $o$ has observed $f_A(\tau)$ at

2321: $(r,m)$.

2322: We say that the interpreted system $\I = (\R,\pi)$ is {\em compatible

2323: with $P$\/} if $\R$ is compatible with $P$ and

2324: if $(\I,r,m) \models \theta(i,a)$ whenever the event

2325: $i.a$ is in the

2326: %

2327: %

2328: event sequence $r_e(m')$ for some $m'$.

2329:

2330:

2331: We are now able to make a formal connection between our definition of

2332: anonymity and that of Schneider and Sidiropoulos.

2333: As in the setup of Proposition~\ref{pro:minimally}, we assume that an

2334: anonymous action $a$ can be performed only once in a given run.

2335:

2336: \thm\label{thm:stronganonthm}

2337: If $\I = (\R,\pi)$ is compatible with $P$, then

2338: $P$ is strongly anonymous on the alphabet $A$ if and only if

2339: for every agent

2340: $i \in I_A$, the action $a$ performed by $i$ is anonymous up to $I_A$ with

2341: respect to $o$ in $\I$.

2342: \ethm

2343:

2344: %

2345: \prf

2346: Suppose that $P$ is strongly anonymous on the alphabet $A$ and that

2347: $i \in I_A$.

2348: %

2349: %

2350: %

2351: %

2352: %

2353: Given a point $(r,m)$, suppose that $(\I,r,m) \models \theta(i,a)$,

2354: so that the event $i.a$ appears in $r_e(n)$ for some $n \geq m$.

2355: %

2356: %

2357: We must show that

2358: $(\I,r,m) \models P_o[\theta(i',a)]$ for every $i' \in I_A$,

2359: %

2360: that is, that $a$ is anonymous up to $I_A$ with respect to $o$.

2361: %

2362: %

2363: %

2364: %

2365: %

2366: For any $i' \in I_A$, this requires showing that there

2367: exists a point $(r',m')$ such that

2368: $r_o(m) = r'_o(m')$,

2369: and $r'_o(n')$ includes $i'.a$, for some $n' \geq m'$. Because $\R$

2370: is compatible with $P$, there exists $\tau \in P$ such that $\tau =

2371: r_e(n)$ and $i.a$ appears in $\tau$. Let $\tau'$ be the trace identical to $\tau$

2372: except that $i.a$ is replaced by $i'.a$. Because $P$ is strongly anonymous

2373: on $A$, $P = f_A^{-1}(f_A(P))$, and $\tau' \in P$. By compatibility, there

2374: exists a run $r'$ such that $r_e'(n) = \tau'$ and $r_o'(n) = f_A(\tau')$.

2375: By construction, $f_A(\tau) = f_A(\tau')$, so $r_o(n) = r'_o(n)$. Because the

2376: length-$m$ trace prefixes of $f_A(\tau)$ and $f_A(\tau')$ are the same,

2377: it follows that $r_o(m) = r'_o(m)$.

2378: %

2379: %

2380: Because $(\I,r',m) \models \theta(i',a)$, $(\I,r,m) \models P_o[\theta(i',a)]$

2381: as required.

2382:

2383: Conversely, suppose that for every agent $i \in I_A$, the action $a$

2384: performed by $i$

2385: %

2386: %

2387: is anonymous up to $I_A$ with respect to $o$ in $\I$. We must show that

2388: $P$ is strongly anonymous.

2389: It is clear that $P \subseteq

2390: f_A^{-1}(f_A(P))$, so we must show only that $P \supseteq

2391: f_A^{-1}(f_A(P))$.  So suppose that $\tau \in f_A^{-1}(f_A(P))$.

2392: If no event $i.a$ appears in $\tau$, for any $i \in I_A$, then $\tau \in P$

2393: trivially. Otherwise, some $i.a.$ does appear. Because $\tau \in f_A^{-1}(f_A(P))$,

2394: there exists a trace $\tau' \in P$ that is identical to

2395: %

2396: %

2397: $\tau$

2398: except that $i'.a$

2399: replaces $i.a$, for some other $i' \in I_A$. Because $\R$ is compatible with $P$,

2400: there exists a run $r' \in R$ such that $r'_o(m) = f_A(\tau')$ and $r'_e(m) = \tau'$

2401: %

2402: %

2403: %

2404: (where $m = |\tau'|$). Clearly $(\I,r',m) \models \theta(i',a)$ so, by

2405: anonymity,

2406: $(\I,r',m) \models P_o[\theta(i,a)]$, and there exists a run $r$ such that

2407: $r_o(m) = r'_o(m)$ and $(\I,r,m) \models \theta(i,a)$. Because the action $a$ can

2408: be performed at most once, the trace $r_e(m)$ must be equal to $\tau$. By

2409: compatibility, $\tau \in P$ as required.

2410: \eprf

2411:

2412:

2413:

2414:

2415: Up to now, we have assumed that the observer $o$ has access to all the

2416: information in the system except which event in $A$ was performed.

2417: Schneider and Sidiropoulos extend their definition of strong anonymity to

2418: deal with agents that have somewhat less information.  They capture ``less

2419: information'' using {\em abstraction operators}.

2420: Given a process $P$, there are several

2421: abstraction

2422: operators that can give us a new process. For example the {\em hiding operator},

2423: represented by $\backslash$, hides all events in some set $C$. That is,

2424: the process $P \backslash C$ is the same as $P$ except that all events in

2425: $C$ become internal events of the new process, and are not included in

2426: the traces associated with $P \backslash C$.

2427: Another abstraction operator, the renaming

2428: operator,

2429: has already appeared in the definition

2430: of strong anonymity:

2431: for any set $C$ of events, we can consider the function $f_C$ that maps

2432: events in $C$ to a fixed new event.  The difference between hiding and

2433: renaming is that, if events in $C$ are hidden, the observer is not even

2434: aware they took place.  If events in $C$ are renamed, then the observer

2435: is aware that some event in $C$ took place, but does not know which one.

2436:

2437: Abstraction operators such as these provide a useful way to model a

2438: process or agent who has a distorted

2439: or limited view of the system. In the context of

2440: anonymity, they allow anonymity to hold with respect to an observer with a

2441: limited view of the system in cases where it would not hold with respect to an

2442: observer who can see everything. In the anonymous donations example,

2443: hiding

2444: the events \$5 and \$10, i.e., the amount of money donated, would make the

2445: new process $P \backslash \{\$5, \$10\}$ strongly anonymous on the set of

2446: donation events. Formally, given an abstraction operator $ABS_C$ on a set

2447: of events $C$, we have

2448: to check the requirement of strong anonymity on the process $ABS_C(P)$

2449: rather than on the process $P$.

2450:

2451: Abstraction is easily captured in our framework. It amounts simply to

2452: changing the local state of the observer. For example, anonymity of

2453: the process $P \backslash C$ in our framework corresponds to anonymity of the

2454: action $a$ for every agent in $I_A$ with respect to an observer whose

2455: local state at the point $(r,m)$ is $f_A(r_e(m)) \backslash C$.  We omit the obvious

2456: analogue of Theorem~\ref{thm:stronganonthm} here.

2457:

2458: A major advantage of the runs and systems framework is that definitions of

2459: high-level properties such as anonymity do not depend on the local states

2460: of the agents in question. If we want to model the fact that an observer has

2461: a limited view of the system,

2462: we need only modify her local state to reflect this fact.

2463: While some limited views are naturally captured by CSP abstraction

2464: operators, others may not be.  The definition of anonymity should not

2465: depend on the existence of

2466: an appropriate abstraction operator

2467: able to capture the limitations of a particular observer.

2468:

2469: As we have demonstrated, our approach to anonymity is compatible with the

2470: approach

2471: taken in~\cite{schneider96}. Our definitions are stated in terms of

2472: actions, agents, and knowledge, and are thus

2473: very intuitive and flexible. The generality of runs and systems allows us

2474: to have simple definitions that apply to a wide variety of systems and

2475: agents. The low-level CSP

2476: definitions, on the other hand, are more operational than ours, and this

2477: allows easier model-checking and verification. Furthermore, there are

2478: many advantages to using process algebras in general: systems can often

2479: be represented much more succinctly, and so on. This suggests that

2480: both approaches have their advantages.

2481: Because CSP

2482: systems can be represented in the runs and systems framework, however,

2483: it makes perfect sense to define anonymity for CSP processes using

2484: the knowledge-based definitions we have presented here. If our

2485: definitions turn out to be equivalent to more low-level CSP definitions,

2486: this is ideal, because CSP model-checking programs can then be used for

2487: verification. A system designer simply needs to take care that the

2488: runs-based system derived from a CSP process (or set of processes)

2489: represents the local states of the different agents appropriately.

2490:

2491: \subsection{Anonymity and Function View Semantics}\label{sec:functionviews}

2492: Hughes and Shmatikov \citeyear{hughes02} introduce {\em function views}

2493: %

2494: %

2495: and function-view {\em opaqueness} as a way of

2496: expressing a variety of information-hiding properties in a succinct and uniform way.

2497: Their main insight is that requirements such as anonymity involve restrictions

2498: on relationships between entities such as agents and actions. Because these relationships

2499: can be expressed by functions from one set of entities to another, hiding information

2500: from an observer amounts to limiting an observer's view of the function

2501: in question. For example, anonymity properties are concerned with whether or not

2502: an observer is able to connect actions with the agents who performed them. By

2503: considering the function from the set of actions to the set of agents who performed

2504: those actions, and specifying the degree to which that function must be opaque to

2505: observers, we can express anonymity using the

2506: %

2507: %

2508: %

2509: %

2510: function-view approach.

2511:

2512: To model the uncertainty associated with a given function, Hughes and

2513: Shmatikov define a notion of

2514: {\em function knowledge} to explicitly represent an observer's

2515: partial knowledge of a function.

2516: Function knowledge focuses on three particular aspects of a function:

2517: its graph, image, and kernel.

2518: (Recall that the {\em kernel\/} of a function $f$ with domain $X$ is the

2519: equivalence relation {\em ker\/} on $X$ defined by $(x,x') \in

2520: \mbox{{\it ker}}$ iff $f(x) = f(x')$.)

2521: {\em Function knowledge\/} of type $X \rightarrow Y$ is a triple

2522: $N = (F,I,K)$, where $F \subseteq X \times Y$, $I \subseteq Y$, and $K$

2523: is an equivalence relation on $X$.  A triple $(F,I,K)$ is {\em

2524: consistent with $f$\/} if $f \subseteq F$, $I \subseteq \im f$, and

2525: $K \subseteq \kernel f$. Intuitively, a triple $(F,I,K)$

2526: that is consistent with $f$ represents

2527: what an agent might know about

2528: the function $f$. Complete knowledge of a function $f$, for example, would be

2529: represented by the triple $( f, \im f, \kernel f )$.

2530:

2531: For anonymity, and for information hiding in general,

2532: we are interested not in what an agent

2533: knows, but in what an agent does {\em not\/} know.

2534: This is formalized

2535: %

2536: %

2537: by Hughes and Shmatikov

2538: in terms of

2539: opaqueness conditions for function knowledge.

2540: If $N = \langle F, I, K \rangle$

2541: is consistent with $f: X \rightarrow Y$, then, for example,

2542: $N$ is {\em $k$-value opaque\/} if $|F(x)| \geq k$  for all $x \in X$.

2543: That is, $N$ is $k$-value opaque if there are $k$ possible candidates

2544: for the value of $f(x)$, for all $x \in X$. Similarly, $N$ is

2545: {\em $Z$-value opaque\/} if $Z \subseteq F(x)$

2546: for all $x \in X$. In other words, for each $x$ in the domain of $f$,

2547: no element of $Z$ can be ruled out as a candidate for $f(x)$.

2548: Finally, $N$ is {\em absolutely value opaque\/} if that $N$ is $Y$-value opaque.

2549:

2550: Opaqueness conditions are closely related to the

2551: nonprobabilistic definitions of anonymity given in Section~\ref{sec:anonymity}.

2552: Consider functions from $X$ to $Y$, where $X$ is a set of actions

2553: and $Y$ is a set of agents, and suppose that some function $f$ is the function

2554: that, given some action, names the agent who performed the action. If we have

2555: $k$-value opaqueness for some view of $f$ (corresponding to some

2556: observer $o$), this means, essentially, that

2557: each action $a$ in $X$ is $k$-anonymous with respect to $o$.

2558: Similarly, the view is $I_A$-value

2559: opaque if the action is

2560: anonymous up to $I_A$ for each agent $i \in I_A$. Thus, function

2561: view opaqueness provides a concise way of describing anonymity properties,

2562: and information-hiding properties in general.

2563:

2564: To make these connections precise, we need to explain how function views

2565: can be embedded within the runs and systems framework.

2566: Hughes and Shmatikov already show how we can define function

2567: views using

2568: Kripke structures, the standard approach for giving semantics

2569: to knowledge.  A minor modification of their approach works in systems

2570: too.  Assume we are interested in who performs an action $a \in X$, where

2571: $X$, intuitively, is a set of ``anonymous actions''.  Let $Y$ be the set of agents,

2572: %

2573: including a ``nobody agent'' denoted $N$, and let $f$ be a

2574: %

2575: %

2576: function from $X$ to $Y$.  Intuitively,

2577: $f(a) = i$ if agent $i$

2578: %

2579: %

2580: performs action $a$,

2581: %

2582: %

2583: and $f(a) = N$ if no agent performs action $a$.

2584: The value of the

2585: function $f$ will depend on the point.  Let $f_{r,m}$ be the value of

2586: $f$ at the point $(r,m)$.  Thus, $f_{r,m}(a) = i$ if

2587: %

2588: %

2589: $i$ performs $a$ in run $r$.

2590: \footnote{Note that for $f_{(r,m)}$ to be well-defined, it must

2591: be the case that only one agent can ever perform a single action.

2592: %

2593: %

2594: %

2595: %

2596: }

2597: We can now easily talk about function opaqueness with respect to an

2598: observer $o$. For example, $f$ is $Z$-value opaque

2599: at the point $(r,m)$ with respect to $o$ if, for

2600: all $z \in Z$, there exists a point $(r',m')$ such that $r'_o(m') = r_o(m)$

2601: and $f_{(r',m')}(x) = z$. In terms of knowledge, $Z$-value opaqueness

2602: says that for any value $x$ in the range of $f$,

2603: $o$ thinks it possible that

2604: any value $z \in Z$ could be the result of $f(x)$. Indeed, Hughes and

2605: %

2606: %

2607: Shmatikov say that function-view opaqueness, defined in terms of Kripke

2608: structure semantics, is closely related to epistemic logic. The following

2609: proposition makes this precise; it would be easy to state similar propositions

2610: %

2611: %

2612: for other kinds of function-view opaqueness.

2613:

2614: \pro\label{pro:zvaluethm}

2615: Let $\I = (\R, \pi)$ be an interpreted system that satisfies

2616: $(\I,r,m) \models f(x) = y$ whenever $f_{(r,m)}(x) = y$.

2617: In system $\I$, $f$ is $Z$-value opaque for observer $o$ at the point $(r,m)$ if

2618: and only if

2619: \[ (\I,r,m) \models \bigwedge_{x \in X} \bigwedge_{z \in Z} P_o[f(x) = z]. \]

2620: \epro

2621:

2622: \prf

2623: This result follows immediately from the definitions.

2624: \eprf

2625:

2626: %

2627: %

2628: Stated in terms of knowledge, function-view opaqueness already looks

2629: a lot like our definitions of anonymity.

2630: Given $f$ (or, more precisely, the set $\{f_{(r,m)}\}$ of functions) mapping

2631: actions to agents,

2632: %

2633: %

2634: we can state a theorem connecting anonymity to function-view opaqueness.

2635: There are two minor issues to deal with, though. First, our definitions

2636: of anonymity are stated with respect to a single action $a$, while the

2637: function $f$ deals with a {\em set} of actions. We can deal with this

2638: by taking the domain of  $f$ to be

2639: the singleton $\{a\}$. Second, our definition

2640: of anonymity up to a set $I_A$ requires the observer to suspect

2641: agents in $I_A$ only

2642: if $i$ actually performs the action $a$.

2643: (Recall this is also true for Syverson and Stubblebine's definitions.)

2644: $I_A$-value opaqueness requires the observer to think many agents

2645: could have performed an action even if nobody has.

2646: To deal with this, we require opaqueness only

2647: when the action has been performed

2648: %

2649: by one of the agents in $I_A$.

2650: %

2651: \commentout{

2652: Last, opaqueness provides anonymity for {\em all} agents who might

2653: want to perform $a$. To restrict our attention to agents in $I_A$,

2654: we require that $f_{(r,m)}(a) \in I_A$ for all points $(r,m) \in \P(\R)$.

2655: }

2656: \commentout{

2657: If $f$ maps from an action $a$ to the agent who performed

2658: $a$, $f_{(r,m)}(a)$ may be undefined if nobody has actually performed

2659: $a$ by time $m$ in the run $r$. For functions that might possibly

2660: be undefined, we can say that $f$ is $Z$-value opaque if

2661: for all points $(r,m) \in \P(\R)$, for all $x \in X$ such that

2662: $f_{(r,m)}(x)$ is well defined, and for

2663: all $z \in Z$, there exists a point $(r',m')$ such that $r'_o(m') = r_o(m)$

2664: and $f_{(r',m')}(x) = z$.

2665: }

2666:

2667: \thm\label{thm:functionviews}

2668: Suppose that $(\I,r,m) \models \theta(i,a)$ exactly if $f_{(r,m)}(a) = i$.

2669: Then action $a$ is anonymous up to $I_A$ with respect to $o$

2670: for each agent $i \in I_A$ if and only if at all points $(r,m)$ such

2671: that $f_{(r,m)}(a) \in I_A$,

2672: $f$ is $I_A$-value opaque with respect to $o$.

2673: \ethm

2674:

2675: %

2676: %

2677:

2678: \prf

2679: Suppose that $f$ is $I_A$-value opaque, and let $i \in I_A$ be given.

2680: If $(\I,r,m) \models \theta(i,a)$, then $f_{(r,m)}(a) = i$.

2681: %

2682: %

2683: We must show that, for all $i' \in I_A$,

2684: %

2685: %

2686: %

2687: %

2688: %

2689: $(\I,r,m) \models

2690: P_o[\theta(i',a)]$.

2691: Because $f$ is $I_A$-value opaque at $(r,m)$,

2692: there exists a point

2693: $(r',m')$ such that $r'_o(m') = r_o(m)$ and $f_{(r',m')}(a) = i'$.

2694: Because $(\I,r',m') \models \theta(i',a)$, $(\I,r,m) \models P_o[\theta(i',a)]$.

2695:

2696: Conversely, suppose that for each agent $i \in I_A$,

2697: $a$ is anonymous up to $I_A$ with respect to $o$.

2698: Let $(r,m)$ be given such that $f_{(r,m)}(a) \in I_A$,

2699: and let that $i = f_{(r,m)}(a)$.

2700: It follows that $(\I,r,m) \models \theta(i,a)$.

2701: For any $i' \in I_A$, $(\I,r,m) \models P_o[\theta(i',a)]$, by

2702: anonymity. Thus there exists a point $(r',m')$ such that

2703: $r'_o(m') = r_o(m)$ and $(\I,r',m') \models \theta(i',a)$.

2704: It follows that $f_{(r',m')}(a) = i'$, and that $f$ is

2705: $I_A$-value opaque.

2706: \eprf

2707:

2708:

2709: As with Proposition~\ref{pro:zvaluethm}, it would be easy to state

2710: analogous theorems connecting our other definitions of anonymity,

2711: including minimal anonymity, total anonymity, and $k$-anonymity,

2712: %

2713: %

2714: to other forms of function-view opaqueness. We omit the details here.

2715:

2716: %

2717: The assumptions needed to prove Theorem~\ref{thm:functionviews} illustrate

2718: two ways in which our approach may seem to be less general than the

2719: %

2720: %

2721: function-view

2722: approach. First, all our definitions are given with respect to a single

2723: action, rather than with respect to a set of actions. However, it is

2724: %

2725: %

2726: perfectly reasonable to specify that all actions in some set $\A$ of actions

2727: be anonymous. Then we could modify Theorem~\ref{thm:functionviews}

2728: %

2729: %

2730: so that the function $f$ is defined on all actions in $\A$.

2731: (We omit the details.) Second, our definitions of anonymity only

2732: restrict the observer's

2733: knowledge if somebody actually performs the action. This is simply a different

2734: way of defining anonymity. As mentioned previously,

2735: we are not trying to give a definitive definition of anonymity, and it certainly

2736: seems reasonable that someone might want to define or specify anonymity using the

2737: stronger condition. At any rate, it would be straightforward to modify our

2738: definitions so that the implications, involving $\theta(i,a)$, are not included.

2739: %

2740: %

2741:

2742: Hughes and Shmatikov argue that epistemic logic is a useful language

2743: for expressing anonymity specifications, while CSP is a useful language

2744: %

2745: %

2746: for describing and specifying systems.  We agree with both of

2747: these claims.  They propose function views as a useful interface to

2748: mediate between the two.  We have tried to argue here that no mediation

2749: is necessary, since the multiagent systems framework can also be used for

2750: describing systems. (Indeed, the traces of CSP can essentially be viewed as runs.)

2751: Nevertheless, we do believe that function views can be the

2752: basis of a useful language for reasoning about some aspects of

2753: information hiding. We can well imagine

2754: adding abbreviations to the language that let us talk directly about

2755: function views.  (We remark that we view these abbreviations as

2756: syntactic sugar, since these are notions that can already be expressed

2757: directly in terms of the knowledge operators we have introduced.)

2758:

2759: On the other hand, we believe that function views are not expressive

2760: enough to capture all aspects of information hiding.  One obvious

2761: problem is adding probability.  While it is easy to

2762: add probability to systems, as we have shown, and to

2763: capture interesting probabilistic notions

2764: of anonymity, it is far from clear how to do this if we take function

2765: views triples as primitive.

2766:

2767: To sum up, we would argue that to reason about

2768: knowledge and probability, we need to have possible worlds as the

2769: underlying semantic framework.  Using the multiagent systems approach

2770: gives us possible worlds in a way that makes it particularly easy to

2771: relate them to systems.  Within this semantic framework, function views

2772: may provide a useful syntactic construct with which to reason about

2773: information hiding.

2774:

2775:

2776:

2777: \section{Discussion}\label{sec:conclusion}

2778:

2779: We have described a framework for reasoning about information hiding

2780: in multiagent systems, and have given general definitions of anonymity

2781: for agents acting in such systems. We have also compared and contrasted

2782: our definitions to other similar definitions of anonymity. Our knowledge-based

2783: system framework provides a number of advantages:

2784: \ul

2785: \item We are able to state information-hiding properties succinctly

2786: and intuitively, and in terms of the knowledge of the observers or

2787: attackers who interact with the system.

2788: \item Our system has a well-defined semantics that lets us reason about

2789: knowledge in

2790: systems of interest, such as systems specified using process algebras

2791: or strand spaces.

2792: \item We are able to give straightforward probabilistic definitions of

2793: anonymity, and of other related information-hiding properties.

2794: \eul

2795:

2796: %

2797: %

2798: %

2799: There are a number of issues that this paper has not addressed.

2800: We have focused almost exclusively on properties of anonymity,

2801: %

2802: %

2803: %

2804: %

2805: and have not considered related notions, such as \emph{pseudonymity} and

2806: \emph{unlinkability}~\cite{hughes02,terminology}.

2807: %

2808: %

2809: There seems to be no intrinsic difficulty capturing these notions in our

2810: framework.  For example, one form of message  unlinkability

2811: specifies that no two messages sent by an

2812: anonymous sender can be ``linked'', in the sense that an observer can

2813: determine that both

2814: messages were sent by the same sender. More formally, two actions $a$

2815: and $a'$

2816: %

2817: %

2818: are linked with respect to an observer $o$ if $o$ knows

2819: that there

2820: exists an agent $i$ who performed both $a$ and $a'$. This definition

2821: %

2822: %

2823: %

2824: %

2825: %

2826: %

2827: can be directly captured using knowledge.  Its negation says

2828: that $o$ considers it possible

2829: that there exist two distinct agents who performed $a$ and $a'$;

2830: %

2831: this can be viewed as a definition of \emph{minimal unlinkability}.

2832: %

2833: %

2834: %

2835: This minimal requirement can be strengthened, exactly as our definitions of

2836: anonymity were,

2837: to include larger numbers of distinct agents, probability, and so on.

2838: %

2839: %

2840: %

2841: Although we have not worked out the details, we believe  that our

2842: approach will be similarly applicable to

2843: other definitions of information hiding.

2844:

2845: %

2846: %

2847: Another obviously important issue is

2848: %

2849: %

2850: checking whether

2851: a given system specifies

2852: the knowledge-based properties we have introduced.

2853: %

2854: %

2855: %

2856: The standard technique for doing this is model checking.

2857: Recent work on the problem of model checking in

2858: the multiagent systems framework suggests that this may be viable.

2859: Van der Meyden~\citeyear{vandermeyden98} discusses

2860: algorithms and complexity results for model checking a wide range of

2861: epistemic formulas in the runs and systems framework, and van der Meyden and

2862: Su~\citeyear{vandermeyden02} use these results to verify the

2863: dining cryptographers protocol~\cite{Chaum88}, using formulas much

2864: like those described in

2865: %

2866: %

2867: Section~\ref{sec:diningexample}.

2868: Even though model checking of formulas involving knowledge seems to

2869: be intractable for large problems, these results are a promising first

2870: step towards being able to use knowledge for both the specification and

2871: verification of anonymity properties.

2872: %

2873: %

2874: %

2875: Shmatikov~\citeyear{shmat02}, for example, analyzes the Crowds system

2876: using the

2877: %

2878: %

2879: %

2880: probabilistic model checker PRISM~\cite{KNP01}.  This is a

2881: particularly good example of how definitions of

2882: anonymity can be made precise using logic and probability, and how

2883: model-checking

2884: can generate new insights into the functioning of a deployed protocol.

2885:

2886: %

2887: Finally, it is important to note that the examples considered in this

2888: paper do

2889: not reflect the state of the art for computational anonymity. Anonymity

2890: protocols

2891: based on DC-nets, while theoretically interesting, have not been widely

2892: %

2893: %

2894: %

2895: %

2896: %

2897: %

2898: deployed; in practice, protocols based on mixes and message-rerouting

2899: are much more common.

2900: We used the dining cryptographer's problem as a running

2901: %

2902: %

2903: example here mainly

2904: because of its simplicity, but it remains to be seen whether our general

2905: approach will be as

2906: illuminating for more complicated protocols. There are reasons to

2907: %

2908: %

2909: %

2910: believe that it will be.  Shmatikov's analysis of Crowds shows

2911: that a logic-based approach can be useful

2912: for analyzing protocols based on message-rerouting. Furthermore, we

2913: believe that formalizing

2914: anonymity protocols using techniques like ours is worthwhile even if

2915: formal

2916: verification is impractical or impossible.

2917: %

2918: %

2919: It forces system designers to

2920: think carefully about information-hiding requirements, which can often

2921: be tricky,

2922: %

2923: %

2924: and provides a system-independent framework for

2925: comparing the anonymity

2926: guarantees provided by different systems.

2927: %

2928: %

2929:

2930: We described one way to generate a set of runs from a CSP process $P$,

2931: basically by recording all the events in the state of the environment

2932: and describing some observer $o$ who is able to observe a subset of

2933: the events. This translation was useful for comparing our abstract

2934: definitions of anonymity to more operational CSP-based definitions.

2935: In future work we hope to further explore the connections between the

2936: runs and systems framework and tools such as CCS, CSP, and the spi

2937: calculus~\cite{abadi97}.

2938: %

2939: In particular, we are interested in the

2940: recent work of Fournet and Abadi~\citeyear{fournet02}, who use the

2941: %

2942: %

2943: %

2944: %

2945: %

2946: %

2947: applied pi calculus to model {\em private authentication}, according to which

2948: a principal in a network is able to authenticate herself to

2949: %

2950: %

2951: another principal while remaining anonymous to other ``nearby''

2952: principals.

2953: A great deal of work in computer security has formalized

2954: %

2955: %

2956: %

2957: information-hiding properties using these tools. Such work often reasons

2958: about the knowledge of various agents in an informal way, and then

2959: tries to capture knowledge-based security properties

2960: using one of these formalisms.

2961: By describing canonical translations from these formalisms to the runs

2962: and systems framework, we hope to be able to demonstrate formally

2963: how such definitions of security do (or do not) capture notions of knowledge.

2964:

2965: \paragraph{Acknowledgments:}  We thank Dan Grossman, Dominic Hughes,

2966: %

2967: %

2968: %

2969: Vitaly Shmatikov, Paul Syverson, Vicky Weissman,

2970: %

2971: the CSFW reviewers

2972: (who were anonymous

2973: up to $P_{CSFW}$, the program committee),

2974: and the JCS referees,

2975: for useful discussions,

2976: %

2977: helpful criticism, and

2978: %

2979: %

2980: careful copy-editing.

2981:

2982: %

2983: %

2984: \bibliographystyle{chicago}

2985: %

2986: %

2987: %

2988: %

2989: %

2990: %

2991: %

2992: %

2993: %

2994: \bibliography{joe,z}

2995:

2996:

2997: \end{document}

2998:

2999:

3000:

3001:

3002:

3003: