1: \documentclass[12pt,times, english, a4paper]{article}
2: \usepackage{times}
3: \usepackage{graphicx}
4: \usepackage[latin1]{inputenc}
5: \usepackage{geometry}
6: \usepackage{amsthm}
7: \geometry{verbose,a4paper,tmargin=2.5cm,bmargin=3.5cm,lmargin=2cm,rmargin=2cm,footskip=1.5cm}
8: \usepackage{setspace}
9: \renewcommand{\baselinestretch}{1.4}
10: \renewcommand{\floatpagefraction}{.7}
11: \renewcommand{\qedsymbol}{\textbf{Q.E.D.}}
12: \hyphenpenalty=5000
13: \tolerance=1000
14: \newtheorem{lemma}{Lemma}[section]
15: \newtheorem{proposition}{Proposition}[section]
16: \newtheorem{theorem}{Theorem}[section]
17:
18:
19: \newenvironment{biography}[3][]{%
20: \footnotesize\unitlength 1mm\bigskip\bigskip\bigskip\parskip=0pt\par%
21: \rule{0pt}{39mm}\vspace{-39mm}\par% garantees correct page breaking
22: \noindent\setbox0\hbox{\framebox(25,32){
23: #2
24: }}% box containing the frame
25: \ht0=37mm\count10=\ht0\divide\count10 by\baselineskip% calculates lines
26: \global\hangindent29mm\global\hangafter-\count10%
27: \hskip-28.5mm\setbox0\hbox to 28.5mm {\raise-30.5mm\box0\hss}%
28: \dp0=0mm\ht0=0mm\box0\noindent\bf [#3]\rm}{
29: \par\rm\normalsize}
30:
31:
32:
33: \begin{document}
34:
35: \title{\textbf{Statistical Modelling of Information Sharing: Community, Membership and Content}}
36:
37: \author{W.-Y. Ng,\hspace{.3cm}W.K. Lin,\hspace{.3cm}D.M. Chiu\\
38: Department of Information Engineering\\
39: The Chinese University of Hong Kong\\
40: \{wyng,wklin3,dmchiu\}@ie.cuhk.edu.hk}
41: \date{June 29, 2005}
42:
43: %\begin{singlespace}
44: \maketitle
45: %\end{singlespace}
46: %%
47: %%%%%%
48: \begin{abstract}
49: File-sharing systems, like many online and traditional information sharing
50: communities (e.g. newsgroups, BBS, forums, interest clubs), are dynamical
51: systems in nature. As peers get in and out of the system, the information
52: content made available by the prevailing membership varies continually in
53: amount as well as composition, which in turn affects all peers' join/leave
54: decisions. As a result, the dynamics of membership and information content
55: are strongly coupled, suggesting interesting issues about growth, sustenance
56: and stability.
57:
58: In this paper, we propose to study such communities with a simple statistical
59: model of an \textit{information sharing club}. Carrying their private payloads of
60: information goods as potential supply to the club, peers join or leave on the
61: basis of whether the information they demand is currently available.
62: Information goods are chunked and \textit{typed}, as in a file sharing system where
63: peers contribute different files, or a forum where messages are grouped by
64: topics or threads. Peers' demand and supply are then characterized by
65: statistical distributions over the type domain.
66:
67: This model reveals interesting critical behaviour with multiple equilibria. A
68: sharp growth threshold is derived: the club may grow towards a sustainable
69: equilibrium only if the value of an \textit{control parameter} is above the
70: threshold, or shrink to emptiness otherwise. The control parameter is
71: composite and comprises the peer population size, the level of their
72: contributed supply, the club's efficiency in information search, the spread
73: of supply and demand over the type domain, as well as the goodness of match
74: between them.
75:
76: \end{abstract}
77: %%%%%
78: \section{Introduction}
79: The notion of a peer-to-peer system means different things
80: to different people. To some, it is a way to use the commodity
81: personal computers to do the job of large and expensive servers
82: \cite{SETI}, \cite{ll2002}, \cite{hhbb2003}. Others build application layer multicast
83: systems out of it \cite{kazza}, \cite{bittorrent}. But there is one
84: thing in common for almost all peer-to-peer systems, that is the
85: coming-together of such a system depends on the number of peers (large
86: or small) wanting to participate.\\
87:
88: The formation of such cooperation without central management
89: has its inherent advantages: it saves the cost of central
90: management; and more importantly, it automatically adapts to
91: the need (for example, in terms of time and scope) of the peers
92: who collectively form the club.
93: But what are the forces that attract peers
94: together? What would cause a peer-to-peer system to grow,
95: sustain itself, or fall apart? Are there some fundamental
96: reasons that apply to different peer-to-peer systems?\\
97:
98: The purpose of the peer-to-peer systems is invariably to
99: share some resources or information. Economists differentiate between two
100: kinds of goods that are shared: rivalrous and non-rivalrous
101: goods. The former diminishes when shared. Compute power,
102: storage and communication bandwidth are examples of
103: rivalrous goods. Many information goods, however,
104: are inherently non-rivalrous. In other words, they can be
105: readily replicated many times with little or no cost.\\
106:
107: Motivated by the above questions, we formulate a model for
108: a club where members share non-rivalrous information
109: goods. Conventional resources such as computers and bandwidth
110: are assumed to be abundant. In such a setting, the strength
111: of a club is determined by the amount as well as the
112: \emph{composition} of the information content made available by
113: the club's prevailing membership, and how that content fits
114: the potential members demands. Based on this simple
115: model, it is then possible to derive some very basic
116: conditions for a club to form and sustain. The model predicts
117: a critical population size, from which enough peers will
118: find matching interest and form a club. Furthermore,
119: the model can be used to understand the dynamics of
120: the content and membership, and whether and
121: how it leads to an equilibrium.\\
122:
123: Since the model is simple, it is also general enough to be
124: applied to many other information sharing paradigms.
125: Examples include web-based collaborative environments,
126: newsgroups where peers contribute their opinions about different
127: topics, or other forums or communities for information
128: sharing.\\
129:
130: The rest of the paper is organized as follows. Section 2
131: is devoted to modeling the peers in terms of their
132: demand and contribution, which then leads to a model of
133: a club in terms of its content.
134: Section 3 models peers' decisions of joining or leaving
135: a club, and consequently the conditions for club
136: formation, and other equilibrium properties of it.
137: Section 4 illustrates the properties of the model through
138: numerical examples. Section 5 discusses the contribution
139: of this model, the interpretation of various results,
140: as well as the limitations of our model. Finally, we
141: conclude and discuss future directions.
142: %%%%%%
143: \subsection{Related Works}
144: Many other papers tried to model incentives in
145: peer-to-peer systems and the resulting club dynamics.
146: \cite{gj2003} discusses private versus public goods,
147: and argues that messages shared in web forums are
148: private goods, thus suggesting sharing is
149: not simply an altruistic behavior.
150: Several papers focus on how to relieve the
151: cost/congestion of some rivalrous resources, such as
152: bandwidth and other resources that a peer has to consume.
153: For example \cite{kstt2002} suggests a possible rationale
154: for peers' contributions is to relieve the bandwidth stress
155: when they share, their actions thereby benefit the peers
156: themselves.
157: \cite{bas2003}, \cite{gbml2001}, \cite{rrsf2003} use
158: game theoretic approaches to model and understand
159: the sharing incentives in peer-to-peer networks.
160: These works also discuss incentive-compatible solutions to
161: peer-to-peer systems.
162: In comparison, our model brings out a new angle that is
163: complementary and somewhat orthogonal to the above works.\\
164:
165: Our work is in part motivated by \cite{fpcs2004} in which
166: a general model is used to explain the vitality of a peer-to-peer network
167: when different types of
168: peers are involved. The type of a peer is characterized by
169: the peer's generosity, which is used as a threshold to
170: determine when a peer would contribute to the club
171: rather than free-ride. Their model does not explicitly
172: capture different types of information goods themselves, therefore
173: the motivation for sharing remains rather abstract.\\
174:
175: Our work attempts to explain the motivation of the peers by characterizing
176: the different types of peers based on their contribution and demand of
177: different types of information goods. A peer's decision to join a club can
178: then be related to the extent the club can satisfy the peer's interest
179: (demand). This sheds more (at least different) insights to what brings peers
180: together in the first place.
181: %%%%%%
182: \section{The Information Sharing Club (ISC) Model} \label{club_model}
183:
184:
185:
186: The Information Sharing Club (ISC) model has three basic components.
187: %
188: First, a population of $N$ \emph{peers}, denoted by $\mathcal{N}$,
189: may freely join or leave the club any time at their own will.
190: Each peer carries a payload of information goods which
191: are shared with other current members only when he joins the club.\\
192:
193: Second, information goods are chunked and \emph{typed}, the same way that versions of different
194: files are served in a file sharing system, or messages of various
195: topics are hosted in a forum. Information chunks of the same type are not
196: differentiated: an instance of information demand specifies
197: the chunk type only and is satisfied by \emph{any} chunk of that type, as when
198: request for a file is satisfied with any copy of it, or when information query
199: returns any piece of information of the specified class (e.g. as
200: implied by the query criteria, for instance).\\
201:
202: Third, the club maintains a platform on which information chunks shared by
203: members are maintained and searched. A perfect membership system
204: makes sure that only requests by current members are processed. A request
205: may comprise one or more instances of demand, and is successfully
206: served when all instances are satisfied. However, the search may not
207: be perfect and is conducted with efficiency $\rho \in (0, 1]$, defined as the
208: probability that any shared chunk is actually found in time by the platform in response to a request.\\
209:
210: We make probabilistic assumptions about both demand and supply:
211: peer $i$'s demand instances as well as the content of his private payload,
212: in terms of chunk types, are drawn from statistical distributions.
213: %
214: Specifically, we assume
215: peer $i$'s private payload comprises $K_i \ge 0$
216: chunks drawn from distribution $g_i(s),$
217: $s\in\mathcal{S}\stackrel{\triangle}{=}\{1,2,\ldots\}$ where
218: $\mathcal{S}$ is the set of all types.
219: The total payload of any group of members (\emph{membership})
220: $\mathcal{G}\subset\mathcal{N}$ is then given by
221: $$
222: g_{\mathcal{G}}(s) \;\stackrel{\triangle}{=}\;
223: \frac{\sum_{i\in \mathcal{G}} K_i\;g_i(s)}{\sum_{i\in \mathcal{G}} K_i},\;\;
224: \mathcal{G}\subset\mathcal{N}
225: $$
226: Without loss of generality, we assume the \emph{aggregate supply function}
227: $g(s)\stackrel{\triangle}{=} g_{\mathcal{N}}(s)$
228: to be monotonically non-increasing. The type variable $s$ may then
229: be interpreted as a \emph{supply rank (s-rank)}. In other words,
230: $s=1$ and $s=|\mathcal{S}|$ denote the most and least supplied chunk types
231: respectively.
232: %
233: Likewise, we define the \emph{aggregate demand function}
234: $h(s)\stackrel{\triangle}{=}h_{\mathcal{N}}(s)$ where
235: $$
236: h_{\mathcal{G}}(s)\stackrel{\triangle}{=}\;
237: \frac{\sum_{i\in \mathcal{G}} M_i\;h_i(s)}{\sum_{i\in \mathcal{G}} M_i},\;\;
238: \mathcal{G}\subset\mathcal{N}
239: $$
240: as peer $i$ generates demand instances at a rate of $M_i$ chunks per unit
241: time, drawn from distribution $h_i(s)$, $s\in\mathcal{S}$\footnote{Another
242: possible ranking of the types is \emph{popularity rank (p-rank)}, which ranks
243: the types according to the aggregate demand instead. In cases when the p-rank
244: is more natural to work with, such as when supply is being driven by demand
245: and p-ranks are more readily known, we may derive the requisite demand
246: functions in s-rank as
247: $$
248: h_i(s) \stackrel{\triangle}{=} \sum_r\frac{\phi(r, s)}{f(r)}\;f_i(r)
249: $$
250: where $f_i(r)$ is peer $i$'s demand distribution over the p-rank domain and
251: $\phi(r,s)$ is the joint distribution of the two rank measures that captures
252: how well supply follows demand. (Perfect following would imply
253: $\phi(r,s)=0\;\forall r\neq s$.)}.\\
254:
255: For current club membership $\mathcal{C}$, the expected number of chunks of
256: type $s$ being shared would be given by $\mu_\mathcal{C}(s) \stackrel{\triangle}{=}
257: n\; k_\mathcal{C} \; g_{\mathcal{C}}(s)$
258: where $n\stackrel{\triangle}{=}|\mathcal{C}|$ is the membership size and
259: $k_\mathcal{C}\stackrel{\triangle}{=}
260: \sum_{i\in\mathcal{C}} K_i/|\mathcal{C}|$ is the payload size
261: averaged over the current club membership. Conditioning on the membership
262: size, we have
263: $$
264: \mu_n(s) = n\; k \; g(s)
265: $$
266: where $k \stackrel{\triangle}{=} \sum_{i=1}^{N} K_i/N > 0$ is the payload size averaged over all peers.
267: %
268: We assume further that members' contents are drawn independently, which
269: implies a Poisson distribution for the actual total number of type $s$ chunks
270: being shared. Subsequently demand instances for chunk type $s$ have an
271: average failure rate of $e^{-\mu_n(s)\;\rho} = e^{-n \;k \;g(s)\;\rho}.$ The
272: average success rate of peer $i$'s demand being satisfied in a club of size
273: $n$ is therefore
274: \begin{eqnarray}
275: p_i(n)&\stackrel{\triangle}{=} E_{h_i(s)}[1-e^{-n \;k \;g(s)\;\rho}]\label{eqn:download_probability}
276: \end{eqnarray}
277: where $E[\cdot]$ is the expectation operator. This is compatible with the non-rivalrous assumption as
278: it is independent of the level of demand for this chunk type. \\
279:
280:
281: \subsection{An example: music information sharing club}
282: Tables \ref{table:supply_example} and \ref{table:demand_example} depict an
283: example of six peers sharing music information of five different types. For
284: simplicity, we assume identical payload sizes (identical $K_i$'s) and demand
285: rates (identical $M_i$'s) so that the aggregate distributions are simple
286: unweighted averages of the peers' distributions. Table
287: \ref{table:example_rank} gives the resulting s-ranks and p-ranks of the five
288: music types. The information may be news and messages about the different
289: music types when the club is a discussion forum in nature, or musical audio
290: files when it is a file sharing platform.
291: \begin{singlespace}
292: \begin{table}[hbtp]
293: \begin{center}
294: \caption{Distributions of peers' private payloads, $g_i(s)$}\label{table:supply_example}
295: \begin{tabular}{|r|p{1.8cm}|p{1.8cm}|p{1.8cm}|p{1.8cm}|p{1.8cm}|}
296: \hline
297: &
298: Pop&
299: Classical&
300: Oldies&
301: World&
302: Alternative
303: \tabularnewline
304: \hline
305: \hline
306: Alfred&
307: $0.4$&
308: $0.3$&
309: $0.1$&
310: $0.1$&
311: $0.1$
312: \tabularnewline
313: \hline
314: Bob&
315: $0.4$&
316: $0.2$&
317: $0.2$&
318: $0.15$&
319: $0.05$
320: \tabularnewline
321: \hline
322: Connie&
323: $0.3$&
324: $0.3$&
325: $0.2$&
326: $0.1$&
327: $0.1$
328: \tabularnewline
329: \hline
330: David&
331: $0.2$&
332: $0.3$&
333: $0.3$&
334: $0.15$&
335: $0.05$
336: \tabularnewline
337: \hline
338: Eric&
339: $0.5$&
340: $0.05$&
341: $0.2$&
342: $0.15$&
343: $0.1$
344: \tabularnewline
345: \hline
346: Florence&
347: $0.1$&
348: $0.4$&
349: $0.1$&
350: $0.1$&
351: $0.3$
352: \tabularnewline
353: \hline
354: \hline
355: aggregate supply, g(s)&
356: $0.317$&
357: $0.258$&
358: $0.18$&
359: $0.125$&
360: $0.12$
361: \tabularnewline
362: \hline
363: \end{tabular}
364: \end{center}
365: \end{table}
366:
367: \begin{table}[hbtp]
368: \begin{center}
369: \caption{Distributions of peers demand, $h_i(s)$}\label{table:demand_example}
370: \begin{tabular}{|r|p{1.8cm}|p{1.8cm}|p{1.8cm}|p{1.8cm}|p{1.8cm}|}
371: \hline
372: &
373: Pop&
374: Classical&
375: Oldies&
376: World&
377: Alternative
378: \tabularnewline
379: \hline
380: \hline
381: Alfred&
382: $0.1$&
383: $0.4$&
384: $0.3$&
385: $0.1$&
386: $0.1$
387: \tabularnewline
388: \hline
389: Bob&
390: $0.05$&
391: $0.5$&
392: $0.1$&
393: $0.3$&
394: $0.05$
395: \tabularnewline
396: \hline
397: Connie&
398: $0.1$&
399: $0.2$&
400: $0.3$&
401: $0.2$&
402: $0.2$
403: \tabularnewline
404: \hline
405: David&
406: $0.1$&
407: $0.4$&
408: $0.3$&
409: $0.15$&
410: $0.05$
411: \tabularnewline
412: \hline
413: Eric&
414: $0.1$&
415: $0.4$&
416: $0.2$&
417: $0.2$&
418: $0.1$
419: \tabularnewline
420: \hline
421: Florence&
422: $0.2$&
423: $0.3$&
424: $0.1$&
425: $0.2$&
426: $0.2$
427: \tabularnewline
428: \hline
429: \hline
430: aggregate demand, h(s)&
431: $0.108$&
432: $0.367$&
433: $0.217$&
434: $0.192$&
435: $0.117$
436: \tabularnewline
437: \hline
438: \end{tabular}
439: \end{center}
440: \end{table}
441: \end{singlespace}
442: \begin{singlespace}
443: \begin{table}[hbtp]
444: \begin{center}
445: \caption{The supply and the popularity rank}\label{table:example_rank}
446: \begin{tabular}{|r|p{1.8cm}|p{1.8cm}|p{1.8cm}|p{1.8cm}|p{1.8cm}|}
447: \hline
448: &
449: 1&
450: 2&
451: 3&
452: 4&
453: 5
454: \tabularnewline
455: \hline
456: \hline
457: Supply rank ($s$)&
458: Pop&
459: Classical&
460: Oldies&
461: World&
462: Alternative
463: \tabularnewline
464: \hline
465: Popularity rank ($r$)&
466: Classical&
467: Oldies&
468: World&
469: Alternative&
470: Pop
471: \tabularnewline
472: \hline
473: \end{tabular}
474: \end{center}
475: \end{table}
476: \end{singlespace}
477: A peer's success rate would depend on the types of goods he demands on one
478: hand, viz. $h_i(s)$, and the aggregate supply $g(s)$ on the other. For
479: instance, Alfred's average success rate is given by:
480: \begin{eqnarray*}
481: p_{\textrm{\scriptsize{Alfred}}} = 1-(0.1\,(e^{-6\,(0.317)}) + 0.4\,(e^{-6\,(0.258)}) + \ldots + 0.1\,(e^{-6\,(0.12)})) = 0.69\\
482: \end{eqnarray*}
483:
484: %%%%%%
485:
486: \section{Dynamic equilibrium of membership and content}
487:
488: Generally speaking, peers would join the club as members and share their
489: private payloads as long as their requests are sufficiently met. We make two
490: simplifying assumptions here: (1) a peer would join as long as a single
491: current request is met, and leave otherwise; and (2) any request comprises
492: \mbox{$d \ge 1$} instances of demand. The probability that peer $i$ would
493: join when membership is $\mathcal{C}$ is then $P_{\mathcal{C},i}
494: \stackrel{\triangle}{=} {p^d_{\mathcal{C},i}}$ where $p_{\mathcal{C},i}$ is
495: the probability that an instance of peer $i$'s demand is satisfied when
496: membership is $\mathcal{C}$. Conditioning on the membership size $n$, the expected joining probability of
497: peer $i$ is
498: \begin{equation}
499: P_i(n) \stackrel{\triangle}{=} P_i(n)^d
500: \end{equation}
501:
502: Membership dynamics and content dynamics are closely coupled: as peers join
503: and leave, they alter the total shared content, inducing others to revise
504: their join/leave decisions. The membership size changes always unless the
505: two-way flows between members and non-members are balanced.
506:
507: Consequently, we may define a \emph{statistical equilibrium membership size}
508: $n_{eq}$ as the solution of the balance condition
509: \begin{eqnarray}
510: (N-n_{eq}) \bar P(n_{eq})& = &n_{eq} (1 - \bar P(n_{eq})) \nonumber\\
511: \Leftrightarrow \hspace{1.0in} \bar P(n_{eq}) &= &\frac{n_{eq}}{N} \label{eqn:dynamic_equilibrium}
512: \end{eqnarray}
513: where $\bar P(n) = \frac{1}{N} \sum_{i=1}^{N}{P_i(n)}$ is the joining
514: probability averaged over all peers and all possible memberships of size $n$.
515: Note that equation (\ref{eqn:dynamic_equilibrium}) is in the form of a fixed
516: point equation which is indicative of the coupled dynamics of membership and
517: content.
518: %
519: Further, the stability condition for a fixed point $n_{eq}$ is simply
520: \begin{equation}
521: \left.\frac{\partial \bar P(n)}{\partial n}\right|_{n=n_{eq}} < \frac{1}{N} \label{eqn:start_up_condition}
522: \end{equation}
523: Note that an empty membership is always a fixed point, and would always be
524: stable for sufficiently small $N$, in which case autonomous growth from an empty or small membership is very difficult
525: if not impossible.
526: \mbox{}\\
527: \mbox{}\\
528:
529: \begin{theorem}[Empty Membership Instability]\label{theorem:instability}
530: Empty membership is unstable if and only if requests are simple,
531: viz. $d=1$, and% the \emph{composite control parameter} $\pi$\\
532: \begin{equation}
533: \pi\;\stackrel{\triangle}{=}\;N \; k \; \rho \sum_s h(s) \; g(s) \ge 1\;\;\;. \label{eqn:critical_condition}
534: \end{equation}
535: \end{theorem}
536: \begin{proof}
537:
538: Consider:
539: \begin{eqnarray*}
540: \hspace{10mm}\bar P(n) \,= \,\frac{1}{N} \sum_{i=1}^{N}{P_i(n)} \,= \,\frac{1}{N}\sum_{i=1}^{N}p_i(n)^d \hspace{10mm}d\ge1\;\;.\\
541: \end{eqnarray*}
542: Differentiating with respect to $n$:
543: \begin{eqnarray*}
544: \frac{N}{d}\frac{\partial \bar P(n)}{\partial n} &=&
545: \sum_{i=1}^N p_i(n)^{d-1}\frac{\partial p_i(n)}{\partial n}\\
546: \Leftrightarrow\hspace{.5in}
547: \frac{N}{d k \rho}\frac{\partial \bar P(n)}{\partial n} &=&
548: \sum_{i=1}^N p_i(n)^{d-1} E_{h_i(s)}[e^{-n k \rho g(s)}g(s)]
549: \end{eqnarray*}
550: Since $p_i(0) = 0$, it follows that $\left.\partial \bar P(n)/\partial n\,\right|_{n=0} = 0$ for $d > 1$,
551: in which case an empty membership is always stable. When $d=1$,
552: \begin{eqnarray*}
553: \frac{N}{k \rho}\frac{\partial \bar P(n)}{\partial n} &=&\sum_{i=1}^N E_{h(s)}\,[g(s)] =N \sum_s h(s) g(s)\\
554: \Leftrightarrow\hspace{.5in} \frac{\partial \bar P(n)}{\partial n} &=& k \rho \sum_s \;h(s)g(s)
555: \end{eqnarray*}
556: whence (\ref{eqn:critical_condition}) follows from the stability condition (\ref{eqn:start_up_condition}) for the empty membership fixed point $n_{eq} = 0$.
557: \end{proof}
558:
559: \vspace{0.3cm}
560: %%%%%%%
561: %%20050301 wyng
562:
563: In our model, we regard empty membership instability as a necessary condition for autonomous growth from an
564: empty or small club membership. The above theorem implies that favourable
565: conditions are large $k$ (contribution from members),
566: large $\rho$ (search efficiency) and
567: a large value of $\sum_s h(s)g(s)$, an inner product of $h(s)$ and $g(s)$.
568: Note that
569: $$
570: \sum_s h(s)g(s)\equiv \Vert h \Vert\,\Vert g \Vert \cdot \langle h(s),g(s)\rangle
571: $$
572: where $\Vert h \Vert$ and $\Vert g \Vert$ are the 2-norms of
573: $h(s)$ and $g(s)$ respectively, and \mbox{$\langle h(s),g(s)\rangle$} is their normalized
574: inner product which measures their similarity, or goodness of match.
575: Other favourable conditions are therefore a good match between
576: aggregate demand and supply, and \emph{skewness} -- or
577: small spread -- of their distributions over the chunk types.
578: %%%%%%
579: %20050226
580: \subsection{Music information sharing club example with simple requests}
581:
582: Figure (\ref{figure:club_dynamics}) shows $\bar P(n)$ for the music
583: information sharing club example for four $k\rho$ values for the simple
584: request case, viz. $d=1$.
585:
586: \begin{figure}[h!]
587: \begin{center}
588: {\scriptsize
589: \begin{picture}(0,0)%
590: \includegraphics{03_plot_example_club}%
591: \end{picture}%
592: \begingroup
593: \setlength{\unitlength}{0.0200bp}%
594: \begin{picture}(16200,16200)(0,0)%
595: \put(1650,1650){\makebox(0,0)[r]{\strut{} 0}}%
596: \put(1650,4450){\makebox(0,0)[r]{\strut{} 0.2}}%
597: \put(1650,7250){\makebox(0,0)[r]{\strut{} 0.4}}%
598: \put(1650,10050){\makebox(0,0)[r]{\strut{} 0.6}}%
599: \put(1650,12850){\makebox(0,0)[r]{\strut{} 0.8}}%
600: \put(1650,15650){\makebox(0,0)[r]{\strut{} 1}}%
601: \put(1925,1100){\makebox(0,0){\strut{} 0}}%
602: \put(4258,1100){\makebox(0,0){\strut{} 1}}%
603: \put(6592,1100){\makebox(0,0){\strut{} 2}}%
604: \put(8925,1100){\makebox(0,0){\strut{} 3}}%
605: \put(11258,1100){\makebox(0,0){\strut{} 4}}%
606: \put(13592,1100){\makebox(0,0){\strut{} 5}}%
607: \put(15925,1100){\makebox(0,0){\strut{} 6}}%
608: \put(550,8650){\rotatebox{90}{\makebox(0,0){\strut{}$\bar P(n)$ or $n/N$}}}%
609: \put(8925,275){\makebox(0,0){\strut{}Number of peers N}}%
610: \put(4445,6340){\makebox(0,0)[l]{\strut{}$A(1.9, 0.315)$}}%
611: \put(13592,13060){\makebox(0,0)[l]{\strut{}$B(5.1, 0.845)$}}%
612: \put(6942,11170){\makebox(0,0)[l]{\strut{}$k\;\rho = 2.0$}}%
613: \put(12658,10890){\makebox(0,0)[l]{\strut{}$k\;\rho = 1.0$}}%
614: \put(12658,8650){\makebox(0,0)[l]{\strut{}$k\;\rho = 0.808$}}%
615: \put(13592,6970){\makebox(0,0)[l]{\strut{}$k\;\rho = 0.5$}}%
616: \end{picture}%
617: \endgroup
618: } \caption{The music information sharing club example}
619: \label{figure:club_dynamics}
620: \end{center}
621: \end{figure}
622: %
623: For $k\rho=2$, the model predicts that an empty club is unstable.
624: Any disturbance, e.g. voluntary sharing or contribution, would trigger it
625: to grow. The club would stagger rapidly towards the fixed point
626: $n=5.1$ ---
627: where $\bar P(x)=5.1/6=0.85$ and and sustain itself around there.
628: The peers are active members over $80\%$ of the time on average.
629: %
630: For $k\rho=1$, an empty club is again unstable but the club sustains
631: itself at a smaller average size of $n=1.9$. With less supply and/or less
632: efficient search function, peers are active only around $30\%$ of the time on average.
633: %
634: For $k\rho=0.5$, an empty club now becomes stable. Joining peers are
635: always more than offset by leaving members such that a positive membership
636: is always transient. Peers are almost always inactive.
637: %
638: Finally $k\rho=(N \sum_s h(s) g(s))^{-1}=0.808$ is the critical case when
639: an empty club is just stable/unstable.\\
640:
641:
642: \begin{figure}[h!]
643: \begin{center}
644: {\scriptsize
645: \begin{picture}(0,0)%
646: \includegraphics{03_plot_direction_vector_grid_based}%
647: \end{picture}%
648: \begingroup
649: \setlength{\unitlength}{0.0200bp}%
650: \begin{picture}(16200,16200)(0,0)%
651: \put(1650,1650){\makebox(0,0)[r]{\strut{} 0}}%
652: \put(1650,4450){\makebox(0,0)[r]{\strut{} 0.2}}%
653: \put(1650,7250){\makebox(0,0)[r]{\strut{} 0.4}}%
654: \put(1650,10050){\makebox(0,0)[r]{\strut{} 0.6}}%
655: \put(1650,12850){\makebox(0,0)[r]{\strut{} 0.8}}%
656: \put(1650,15650){\makebox(0,0)[r]{\strut{} 1}}%
657: \put(1925,1100){\makebox(0,0){\strut{} 0}}%
658: \put(4725,1100){\makebox(0,0){\strut{} 0.2}}%
659: \put(7525,1100){\makebox(0,0){\strut{} 0.4}}%
660: \put(10325,1100){\makebox(0,0){\strut{} 0.6}}%
661: \put(13125,1100){\makebox(0,0){\strut{} 0.8}}%
662: \put(15925,1100){\makebox(0,0){\strut{} 1}}%
663: \put(550,8650){\rotatebox{90}{\makebox(0,0){\strut{}$\bar P(n)$ or $n/N$}}}%
664: \put(8925,275){\makebox(0,0){\strut{}Normalized number of peers $n/N$}}%
665: \put(3605,13550){\makebox(0,0)[l]{\strut{}\large{\textsf{Growth Phase}}}}%
666: \put(10045,5850){\makebox(0,0)[l]{\strut{}\large{\textsf{Shrinkage Phase}}}}%
667: \put(6125,6270){\rotatebox{45}{\makebox(0,0)[l]{\strut{}\large{\textsf{Phase Boundary}}}}}%
668: \end{picture}%
669: \endgroup
670:
671: }
672: \caption{Phase diagram of club dynamics with direction field}
673: \label{figure:club_direction_field}
674: \end{center}
675: \end{figure}
676:
677: It is important to note that the above analysis is of the average case. The
678: actual dynamics of a realization of the club membership over time as
679: $\mathcal{C}(t)\subset \mathcal{N}$ would sketch a sample path
680: $(|\mathcal{C}(t)|,P_{\mathcal{C}(t)}(n))$ that staggers around the
681: corresponding $\bar P(n)$ curve\footnote{The staggering, or departure from
682: the average case, would depend on the extent and rate of mixing, viz., the
683: stochasticity of the club membership. Generally speaking, a large number of
684: active peers with strong flows both in and out of the club would stay close
685: to the average case with less staggering. Otherwise a sample path may
686: actually get stuck with a niche self-sufficient club that sees neither peers
687: joining nor members leaving.}. However, the family of $\bar P(n)$ curves for
688: all $\pi$ values define a direction field of average directions of the forces
689: that act upon any sample path. The average direction is towards growth above
690: the $n/N$ diagonal, and towards shrinkage below, as shown in figure
691: (\ref{figure:club_direction_field}). In other words, the $n/N$ diagonal is a
692: boundary between two phases of the club dynamics, a growth phase for the club
693: states above it and a shrinkage phase for those below. This is a powerful way
694: to visualize the club dynamics, especially when $\pi$ may vary over time in
695: more complex cases.
696: %20050226
697:
698: \subsection{Critical behaviour and multiple equilibria}
699:
700: %20050226
701: Note that $p_i(0)=0$ and $p_i(n)$ is bounded and concave increasing in $n$.
702: When $d=1$, $\bar P(n)$ is bounded and concave increasing in $n$ also.
703: Subsequently, there is at most one stable positive fixed point. Theorem
704: \ref{theorem:instability} establishes a sharp threshold for $\pi$, a
705: composite \emph{control parameter} of the club as a dynamical system. The
706: club would stabilize at an empty membership when $\pi < 1$, or the unique
707: stable positive fixed point of equation (\ref{eqn:dynamic_equilibrium})
708: otherwise. In cases when $\pi$ varies across the threshold of unity, the club
709: would undergo critical change,
710: and move towards either of the two stable fixed points.\\
711:
712: When $d>1$, an empty membership is always stable according to Theorem \ref{theorem:instability}.
713: For peer population above some minimum level $N_{crit}>0$ such that $n/N_{crit}$ is first
714: tangential to $\bar P(n)$ as in figure (\ref{figure:s_shape}),
715: at least two positive fixed points exist.
716: The smaller one would be unstable while the larger is
717: always stable (see figure (\ref{figure:s_shape})).
718: The smaller fixed point signifies a lower threshold, a ``critical mass" of membership needed for autonomous growth thereafter.
719: The club would be in danger of collapse whenever its membership falls below
720: this level, even when such fall is transient to begin with.
721:
722: \begin{figure}[htbp]
723: \begin{center}
724: {\scriptsize
725: \begin{picture}(0,0)%
726: \includegraphics{04_show_s_shape}%
727: \end{picture}%
728: \begingroup
729: \setlength{\unitlength}{0.0200bp}%
730: \begin{picture}(18000,10800)(0,0)%
731: \put(550,5400){\rotatebox{90}{\makebox(0,0){\strut{}$\bar P(n)$ or $n/N$}}}%
732: \put(2759,9377){\makebox(0,0)[l]{\strut{}two fixed points}}%
733: \put(2759,8504){\makebox(0,0)[l]{\strut{}beside $n_{crit}$}}%
734: \put(7477,-321){\makebox(0,0)[l]{\strut{}$\frac{\partial \bar P}{\partial n} > \frac{1}{N}$}}%
735: \put(15100,-321){\makebox(0,0)[l]{\strut{}$\frac{\partial \bar P}{\partial n} < \frac{1}{N}$}}%
736: \put(10043,259){\makebox(0,0)[l]{\strut{}$n_{crit}$}}%
737: \put(5642,1520){\makebox(0,0)[l]{\strut{}$\bar P(n)$}}%
738: \put(12739,259){\makebox(0,0)[l]{\strut{}$N_{crit}$}}%
739: \put(15623,259){\makebox(0,0)[l]{\strut{}$N$}}%
740: \put(985,10250){\makebox(0,0)[l]{\strut{}$1$}}%
741: \put(985,356){\makebox(0,0)[l]{\strut{}$0$}}%
742: \put(7477,-903){\makebox(0,0)[l]{\strut{}unstable}}%
743: \put(15100,-903){\makebox(0,0)[l]{\strut{}stable}}%
744: \end{picture}%
745: \endgroup
746: \vspace{5mm}
747: }\caption{Critical population and bifurcation of fixed points, for $d>1$}
748: \label{figure:s_shape}
749: \end{center}
750: \end{figure}
751:
752:
753: %20050226
754:
755: \begin{proposition}[Critical Population and Bifurcation]\label{prop:d_greater_than_1}
756: $N_{crit}$ is the smallest solution to the simultaneous equations
757: $$
758: \left.\frac{\partial \bar P(n)}{\partial n}\right|_{n_{crit}}
759: = \frac{\bar P(n_{crit})}{n_{crit}} = \frac{1}{N_{crit}}
760: $$
761: where $n_{crit}$ is a bifurcation point: once $N$ increases above $N_{crit}$,
762: two fixed points appear on either sides of $n_{crit}$ and move away from it.
763: \end{proposition}
764:
765: This proposition follows simply from the fact that $\bar P(n)$ is smooth,
766: increasing and upper bounded at $1$ (see figure (\ref{figure:s_shape})). The
767: membership level $n_{crit}$ is metastable as it is exactly marginal to the
768: stability condition (\ref{eqn:start_up_condition}). In the special case when
769: the peers are not differentiated in that $h_i(s) = h(s)$, the increase in
770: $\bar P(n)$ concentrates around an inflection point just before $n_{crit}$.
771: However, when the $h_i(s)$'s are spread out so that $\sum_s h(s) g(s)$ is
772: highly variable, $\bar P(n)$ would increase more gradually. As a result, the
773: bifurcation may occur more sharply with a wider spread between the two
774: resulting fixed points.
775:
776:
777: %%%%%%
778: \section{A numerical example with truncated Zipfian aggregate \mbox{demand}}
779:
780: Consider a population of $N$ peers with a truncated Zipfian
781: aggregate supply, viz.:
782: \begin{equation}
783: g(s) = c s^{-\beta} \;\;\;\; 1\le s\le s_{max} \label{eqn:example_zipf}
784: \end{equation}
785: where $c=(\sum_{s=1}^{s_{max}} s^{-\beta})^{-1}$.
786: This rank-frequency distribution is widely observed in Web and
787: \mbox{peer-to-peer} file popularity measurement studies \cite{gdsglz2003}, \cite{sgg2002} .
788: The exponent $\beta$ is often around and below $1$.
789: Its skewness as measured by its norm is
790: $$
791: \Vert g \Vert= c\,\sqrt{\sum_{s=1}^{s_{max}}s^{-2\beta}}
792: $$
793: which is determined by two key parameters, viz. the \emph{peakedness}
794: of the Zipfian distribution as governed by the exponent $\beta$,
795: and the \emph{variety} of chunk types as governed by $s_{max}$.\\
796:
797: Generally speaking, $g(s)$ may match the aggregate demand $h(s)$ to different
798: degrees. Below we analyze two cases, viz. the perfect match case when
799: $h(s)=g(s)$ and the imperfect match case due to a simple shift between $h(s)$
800: and $g(s)$. Also, we consider simple requests ($d=1$) throughout.
801:
802: \subsection{Perfect match case: $h(s)=g(s)$}
803: \begin{figure}[h]
804: \begin{center}
805: {\scriptsize
806: \begin{picture}(0,0)%
807: \includegraphics{06_plot_critical_N_vs_zipf_change_beta}%
808: \end{picture}%
809: \begingroup
810: \setlength{\unitlength}{0.0200bp}%
811: \begin{picture}(18000,10800)(0,0)%
812: \put(2200,1650){\makebox(0,0)[r]{\strut{} 1}}%
813: \put(2200,3800){\makebox(0,0)[r]{\strut{} 10}}%
814: \put(2200,5950){\makebox(0,0)[r]{\strut{} 100}}%
815: \put(2200,8100){\makebox(0,0)[r]{\strut{} 1000}}%
816: \put(2200,10250){\makebox(0,0)[r]{\strut{} 10000}}%
817: \put(2475,1100){\makebox(0,0){\strut{} 0.5}}%
818: \put(3945,1100){\makebox(0,0){\strut{} 0.6}}%
819: \put(5415,1100){\makebox(0,0){\strut{} 0.7}}%
820: \put(6885,1100){\makebox(0,0){\strut{} 0.8}}%
821: \put(8355,1100){\makebox(0,0){\strut{} 0.9}}%
822: \put(9825,1100){\makebox(0,0){\strut{} 1}}%
823: \put(11295,1100){\makebox(0,0){\strut{} 1.1}}%
824: \put(12765,1100){\makebox(0,0){\strut{} 1.2}}%
825: \put(14235,1100){\makebox(0,0){\strut{} 1.3}}%
826: \put(15705,1100){\makebox(0,0){\strut{} 1.4}}%
827: \put(17175,1100){\makebox(0,0){\strut{} 1.5}}%
828: \put(550,5950){\rotatebox{90}{\makebox(0,0){\strut{}$N_{crit}$}}}%
829: \put(9825,275){\makebox(0,0){\strut{}$\beta$}}%
830: \put(2475,526){\makebox(0,0)[l]{\strut{}more spread}}%
831: \put(15705,526){\makebox(0,0)[l]{\strut{}less spread}}%
832: \put(12500,9675){\makebox(0,0)[l]{\strut{}$s_{max} = 3000$}}%
833: \put(12500,9125){\makebox(0,0)[l]{\strut{}$s_{max} = 1000$}}%
834: \put(12500,8575){\makebox(0,0)[l]{\strut{}$s_{max} = 500$}}%
835: \put(12500,8025){\makebox(0,0)[l]{\strut{}$s_{max} = 300$}}%
836: \end{picture}%
837: \endgroup
838: }\caption{$N_{crit}$ vs ($\beta$, $s_{max}$) for perfect match case ($k\,\rho=1$)}
839: \label{figure:Ncrit_vs_zipf_beta}
840: \end{center}
841: \end{figure}
842:
843:
844: \begin{figure}[!h]
845: \begin{center}
846: {\scriptsize
847: \begin{picture}(0,0)%
848: \includegraphics{06_plot_growth_N_vs_zipf_change_beta_log}%
849: \end{picture}%
850: \begingroup
851: \setlength{\unitlength}{0.0200bp}%
852: \begin{picture}(18000,10800)(0,0)%
853: \put(1650,1650){\makebox(0,0)[r]{\strut{} 0}}%
854: \put(1650,2510){\makebox(0,0)[r]{\strut{} 0.1}}%
855: \put(1650,3370){\makebox(0,0)[r]{\strut{} 0.2}}%
856: \put(1650,4230){\makebox(0,0)[r]{\strut{} 0.3}}%
857: \put(1650,5090){\makebox(0,0)[r]{\strut{} 0.4}}%
858: \put(1650,5950){\makebox(0,0)[r]{\strut{} 0.5}}%
859: \put(1650,6810){\makebox(0,0)[r]{\strut{} 0.6}}%
860: \put(1650,7670){\makebox(0,0)[r]{\strut{} 0.7}}%
861: \put(1650,8530){\makebox(0,0)[r]{\strut{} 0.8}}%
862: \put(1650,9390){\makebox(0,0)[r]{\strut{} 0.9}}%
863: \put(1650,10250){\makebox(0,0)[r]{\strut{} 1}}%
864: \put(1925,1100){\makebox(0,0){\strut{} 0.1}}%
865: \put(4467,1100){\makebox(0,0){\strut{} 1}}%
866: \put(7008,1100){\makebox(0,0){\strut{} 10}}%
867: \put(9550,1100){\makebox(0,0){\strut{} 100}}%
868: \put(12092,1100){\makebox(0,0){\strut{} 1000}}%
869: \put(14633,1100){\makebox(0,0){\strut{} 10000}}%
870: \put(17175,1100){\makebox(0,0){\strut{} 100000}}%
871: \put(550,5950){\rotatebox{90}{\makebox(0,0){\strut{}$\bar P(n_{eq})$ or $n_{eq}/N$}}}%
872: \put(9550,275){\makebox(0,0){\strut{}$N$}}%
873: \put(4150,9675){\makebox(0,0)[l]{\strut{}$\beta =0.60$}}%
874: \put(4150,9125){\makebox(0,0)[l]{\strut{}$\beta =0.80$}}%
875: \put(4150,8575){\makebox(0,0)[l]{\strut{}$\beta =1.0$}}%
876: \put(4150,8025){\makebox(0,0)[l]{\strut{}$\beta =1.2$}}%
877: \end{picture}%
878: \endgroup
879:
880: }\caption{Expected equilibrium membership and success rate vs $N\,k\,\rho$ ($s_{max} = 1000$)}
881: \label{figure:zipf_P_vs_Ncrit}
882: \end{center}
883: \end{figure}
884: According to Theorem \ref{theorem:instability}, the autonomous growth condition is
885: \begin{eqnarray}
886: N_{crit} &\ge& \frac{1}{k_{crit} \;\rho_{crit}} \;\frac{1}{c^2\sum_s s^{-2\beta}} \label{eqn:numeric_zipf_1}
887: \end{eqnarray}
888: since $\langle g,h \rangle=1$ and $\Vert g\Vert = \Vert h
889: \Vert=c\,\sqrt{\sum_{s}s^{-2\beta}}$. The dependence of $N_{crit}$ on $\beta$
890: and $s_{max}$ is shown in figure (\ref{figure:Ncrit_vs_zipf_beta}).
891: Autonomous growth is favoured by large $\beta$ (peakedness) and low $s_{max}$
892: (variety).
893:
894: Once the control parameter of the club is above the growth threshold,
895: it would sustain around an equilibrium membership size $n_{eq}$ as the unique
896: stable fixed point of equation (\ref{eqn:dynamic_equilibrium}). Solving for
897: different values of $Nk\rho$ and $\beta$ gives figure
898: (\ref{figure:zipf_P_vs_Ncrit}). This figure shows the proportion $n_{eq}/N$,
899: which is also the performance level of the club in terms of $\bar P(n_{eq})$,
900: the average success rate of information search in the club.
901:
902: \subsection{Imperfect match due to simple shift}\label{sec:match}
903: Supply and demand distributions would seldom match perfectly. In fact, one
904: would often be considered leading the other. For example, a \emph{demand
905: lead} case would demand a wider variety of goods than the aggregate supply
906: distribution offers, while a \emph{supply lead} case sees more variety in the
907: supplied goods. The ``excess'' in demanded types (or supplied types) reflects
908: the types of goods that the supply (or the demand) cannot follow at a
909: particular moment. Here we consider all such excess being concentrated in
910: either the lowest or the highest ranks for simple illustrations.\\
911:
912: In the supply lead case, the supply $g(s)$ is the same as defined in equation
913: (\ref{eqn:example_zipf}), and the demand distribution is :
914: \begin{displaymath}
915: h(s) = \left\{ \begin{array}{ll}
916: 0 & \textrm{if $s \le \delta$}\\
917: c'(s-\delta)^{-\beta} &\textrm{if $\delta < s \le s_{max}$}
918: \end {array}
919: \right.
920: \end{displaymath}
921: for $\delta \ge 0$, and
922: \begin{displaymath}
923: h(s) = \left\{ \begin{array}{ll}
924: c{''}s^{-\beta} & \textrm{if $s \le s_{max} + \delta$}\\
925: 0 &\textrm{if $ s_{max} + \delta < s \le s_{max}$}
926: \end {array}
927: \right.
928: \end{displaymath}
929: for $\delta <0$. $c'$ and $c{''}$ are normalizing constants such that $\sum_s
930: h(s) = 1$. See figure (\ref{figure:delta_meaning}) for an illustrated
931: example. A positive shift $\delta>0$ means the excess types occupy the
932: highest ranks, while a negative shift means they occupy the lowest ranks. The
933: supply lead case would simply have the expressions of $g(s)$ and $h(s)$
934: exchanged.\\
935:
936:
937:
938: \begin{figure}[hbtp]
939: \begin{center}
940: {\scriptsize
941: \hbox{
942: \begin{picture}(0,0)%
943: \includegraphics{08_plot_dist_shift_positive}%
944: \end{picture}%
945: \begingroup
946: \setlength{\unitlength}{0.0200bp}%
947: \begin{picture}(10800,10800)(0,0)%
948: \put(2475,1650){\makebox(0,0)[r]{\strut{} 1e-005}}%
949: \put(2475,3800){\makebox(0,0)[r]{\strut{} 0.0001}}%
950: \put(2475,5950){\makebox(0,0)[r]{\strut{} 0.001}}%
951: \put(2475,8100){\makebox(0,0)[r]{\strut{} 0.01}}%
952: \put(2475,10250){\makebox(0,0)[r]{\strut{} 0.1}}%
953: \put(2749,1100){\makebox(0,0){\strut{} 0}}%
954: \put(4469,1100){\makebox(0,0){\strut{} 2000}}%
955: \put(6189,1100){\makebox(0,0){\strut{} 4000}}%
956: \put(7910,1100){\makebox(0,0){\strut{} 6000}}%
957: \put(9630,1100){\makebox(0,0){\strut{} 8000}}%
958: \put(11350,1100){\makebox(0,0){\strut{} 10000}}%
959: \put(7050,275){\makebox(0,0){\strut{}$s$ rank}}%
960: \put(3007,6976){\makebox(0,0)[l]{\strut{}\tiny{$\delta = 2000$}}}%
961: \put(4975,9675){\makebox(0,0)[l]{\strut{}$g(s)$}}%
962: \put(4975,9125){\makebox(0,0)[l]{\strut{}$h(s)$}}%
963: \end{picture}%
964: \endgroup
965: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
966:
967:
968: \begin{picture}(0,0)%
969: \includegraphics{08_plot_dist_shift_negative}%
970: \end{picture}%
971: \begingroup
972: \setlength{\unitlength}{0.0200bp}%
973: \begin{picture}(10800,10800)(0,0)%
974: \put(2475,1650){\makebox(0,0)[r]{\strut{} 1e-005}}%
975: \put(2475,3800){\makebox(0,0)[r]{\strut{} 0.0001}}%
976: \put(2475,5950){\makebox(0,0)[r]{\strut{} 0.001}}%
977: \put(2475,8100){\makebox(0,0)[r]{\strut{} 0.01}}%
978: \put(2475,10250){\makebox(0,0)[r]{\strut{} 0.1}}%
979: \put(2749,1100){\makebox(0,0){\strut{} 0}}%
980: \put(4469,1100){\makebox(0,0){\strut{} 2000}}%
981: \put(6189,1100){\makebox(0,0){\strut{} 4000}}%
982: \put(7910,1100){\makebox(0,0){\strut{} 6000}}%
983: \put(9630,1100){\makebox(0,0){\strut{} 8000}}%
984: \put(11350,1100){\makebox(0,0){\strut{} 10000}}%
985: \put(7050,275){\makebox(0,0){\strut{}$s$ rank}}%
986: \put(9673,2820){\makebox(0,0)[l]{\strut{}\tiny{$\delta = -2000$}}}%
987: \put(4975,9675){\makebox(0,0)[l]{\strut{}$g(s)$}}%
988: \put(4975,9125){\makebox(0,0)[l]{\strut{}$h(s)$}}%
989: \end{picture}%
990: \endgroup
991: }
992: \hbox{\hspace{48mm}\hbox{(a)\hspace{74mm}(b)}}
993: }
994:
995: \caption{A supply lead case example ($s_{smax} = 10000$) with (a) positive $\delta$ and (b) negative $\delta$}
996: \label{figure:delta_meaning}
997: \end{center}
998: \end{figure}
999:
1000:
1001: Figure (\ref{figure:Ncrit_vs_delta}) shows that excess in the highest ranks are very demanding and would
1002: require a very large increase in $N_{crit}$ for autonomous growth.
1003: However, excess in the lowest ranks actually decreases $N_{crit}$ and autonomous growth becomes easier.
1004: This suggests that focussing of supply on chunk types of the highest ranks would trigger autonomous growth more readily.\\
1005:
1006: In summary, the distinction between supply lead and demand lead cases is immaterial to the autonomous
1007: growth threshold, though it may be important to modelling
1008: supply and demand dynamics. What matters is where they differ --- in the higher or lower ranks.
1009: \begin{figure}[hbtp]
1010: \begin{center}
1011: {\scriptsize
1012: \begin{picture}(0,0)%
1013: \includegraphics{08_plot_match}%
1014: \end{picture}%
1015: \begingroup
1016: \setlength{\unitlength}{0.0200bp}%
1017: \begin{picture}(18000,10800)(0,0)%
1018: \put(2750,1650){\makebox(0,0)[r]{\strut{} 10}}%
1019: \put(2750,4517){\makebox(0,0)[r]{\strut{} 100}}%
1020: \put(2750,7383){\makebox(0,0)[r]{\strut{} 1000}}%
1021: \put(2750,10250){\makebox(0,0)[r]{\strut{} 10000}}%
1022: \put(3025,1100){\makebox(0,0){\strut{} 0}}%
1023: \put(4440,1100){\makebox(0,0){\strut{} 0.1}}%
1024: \put(5855,1100){\makebox(0,0){\strut{} 0.2}}%
1025: \put(7270,1100){\makebox(0,0){\strut{} 0.3}}%
1026: \put(8685,1100){\makebox(0,0){\strut{} 0.4}}%
1027: \put(10100,1100){\makebox(0,0){\strut{} 0.5}}%
1028: \put(11515,1100){\makebox(0,0){\strut{} 0.6}}%
1029: \put(12930,1100){\makebox(0,0){\strut{} 0.7}}%
1030: \put(14345,1100){\makebox(0,0){\strut{} 0.8}}%
1031: \put(15760,1100){\makebox(0,0){\strut{} 0.9}}%
1032: \put(17175,1100){\makebox(0,0){\strut{} 1}}%
1033: \put(550,5950){\rotatebox{90}{\makebox(0,0){\strut{}$N_{crit}$}}}%
1034: \put(10375,275){\makebox(0,0){\strut{}$|\delta/s_{smax}|$}}%
1035: \put(9393,8524){\makebox(0,0)[l]{\strut{}$\delta \ge 0$}}%
1036: \put(9393,5884){\makebox(0,0)[l]{\strut{}$\delta < 0$}}%
1037: \end{picture}%
1038: \endgroup
1039: }\caption{Effect of mismatch due to simple shift ($s_{max}=1000$, $k\rho =1$, $\beta = 0.6$)}
1040: \label{figure:Ncrit_vs_delta}
1041: \end{center}
1042: \end{figure}
1043:
1044:
1045:
1046: \section{Discussion}
1047:
1048: The simple ISC model displays interesting behaviour resulting from coupling
1049: of the membership and content dynamics. As a dynamical system, it exhibits
1050: phase transition in a composite control parameter $\pi$. Information sharing
1051: is sustainable with a good membership only when $\pi$ is above a threshold of
1052: $1$. It implies no effort in increasing $\pi$ is worthwhile (e.g. by
1053: improving efficiency and supply, or increasing peer population) unless
1054: $\pi$ goes above the threshold as a result.\\
1055:
1056: Many researchers study the problem of excessive free-riding \cite{fpcs2004},
1057: \cite{hp2001}, \cite{ah2000} that corresponds to the existence of many peers
1058: with $K_i=0$ in our model. Incentive mechanisms are devised which reward
1059: contribution and/or penalize free-riding. One way to analyze such mechanisms
1060: is by extending the ISC model so that the search efficiency a peer sees
1061: becomes an increasing function of the contribution he decides on a rational
1062: basis. In this case, our ongoing study indicates that empty membership may
1063: become stable always, even for $d=1$, as long as the club relies solely on
1064: members for content. An empty club and a rational (frugal) peer population
1065: are in a dead lock situation. The dead lock may be broken only when either
1066: the club has sufficient initial content to attract the peers, or some peers
1067: are generous enough to contribute without expecting extra
1068: benefit (therefore not rational in the conventional sense).\\
1069:
1070: However, free-riding is not a problem \emph{per se} under the non-rivalrous
1071: assumption. As free-riders are those with $K_i=0$, we may redefine $N$ to
1072: exclude them so that only contributing peers (those with positive payloads)
1073: are counted. As a result, $N$ is reduced, and the average payload size
1074: $k$ is increased (while the club's average content $Nk$ remains the same as
1075: before). All results
1076: in this paper would continue to hold, albeit with the notion of a peer redefined.
1077: %
1078: What matters is the contributing peer population: the club would grow as
1079: long as they are joining to share enough content and attract sufficiently
1080: many of themselves. The existence of free-riders is phantom to the system.
1081: They would be no nuisance as long as provisioning of extra copies carries
1082: no sharing cost.\\
1083:
1084: Incentives would help not by reducing free-riding but by increasing the
1085: contributing peer population, viz. $N$. The distinction may seem frivolous
1086: as reduced free-riding is often regarded as increased contribution. However
1087: this may not be true always. One may imagine a negative incentive scheme
1088: which merely causes free-riders to demand less, or turn away altogether,
1089: without turning them into contributing peers. The club's content is
1090: not benefited. As incentive schemes are often costly to maintain in
1091: practice, negative schemes as such should be saved for positive ones
1092: that aim to increase $N$ directly. A reasonable principle in economizing the
1093: use of incentive schemes would be to focus on those peers who are
1094: bordering on free-riding, to coerce them into contributing.\\
1095:
1096: In fact, the club's well being may actually be harmed in two possible ways
1097: when free-riding is discouraged. First, free-riders may develop into
1098: contributors if only they stay long enough for the club to become
1099: sufficiently important to them. Second, they may in fact be useful audience
1100: to the members, e.g. in newsgroups, BBS and forums, where wider circulation
1101: of the shared information often improves \emph{all} due to network effects.
1102: (This would be diametrically \emph{opposite} to the rivalrous assumption, and
1103: could actually be more appropriate than the non-rivalrous assumption if it
1104: more than offsets any sharing costs due to rivalrous consumption of other
1105: club resources.)
1106:
1107: In cases where the non-rivalrous assumption is not appropriate due to
1108: significant sharing costs, e.g. in processing, storage and/or network
1109: bandwidth, penalizing free-riding would be more necessary to reduce
1110: their loading on the system and the contributing peers.
1111: %
1112: A possible corresponding extension of the ISC model is to incorporate the
1113: natural reduction in availability of information goods as their demand
1114: increases. For instance, the failure rate of demand for chunk type $s$ may
1115: become an increasing function of its total demand $nh(s)$ one way or the
1116: other. However, the choice of functional relation between availability and
1117: demand should depend on the nature of the sharing cost.
1118:
1119: Apart from extensions needed in rivalrous situations,
1120: the ISC model has two intrinsic limitations. First, it captures
1121: only the average case behaviour of a nonlinear and stochastic dynamical
1122: system. Transient and lock-in, especially when the club is small and
1123: peers act with large delay, may render the average case view totally useless.
1124: Second, the join/leave
1125: decisions are often more heterogeneous than assumed here. Requests may comprise
1126: variable numbers of demand instances and peers may deliberate their decisions and behave differently.
1127:
1128:
1129:
1130:
1131: \section{Conclusion and further works}
1132: We have analyzed information sharing in a very general setting, by means of a
1133: statistical model (ISC) with peers of different demand and supply of
1134: information. As a dynamical system, the model exhibits interesting critical
1135: behaviour with multiple equilibria.
1136: %
1137: A unique feature of the ISC model is that information is chunked and
1138: typed, as we believe modelling the composition of the information
1139: content is crucial in many situations of interest. Subsequently, it
1140: displays a sharp growth threshold that depends on the goodness of match
1141: in the types of information being demanded and supplied by the sharing
1142: members. While being rich in behaviour, this model is simple enough for
1143: detail analysis of the equilibrium states. In particular, we analyzed a
1144: truncated Zipfian distribution of information types and derived the
1145: growth threshold for the existence of any sustainable equilibrium, as well
1146: as the corresponding membership size and performance level.\\
1147:
1148: Much simplicity of the ISC model stems from the non-rivalrous assumption
1149: made. Real situations are more complicated in that peers may be sharing both
1150: rivalrous resources and non-rivalrous information at the same time. However,
1151: Benkler \cite{b_sharing_2004} points out that overcapacity is a growing trend
1152: in distributed systems such as the Internet, so much so that even rivalrous
1153: resources are increasingly being shared like non-rivalrous goods. On the
1154: other hand, free-riding would work in the opposite direction if the community
1155: in question is prosperous and attracts so much free-riding that contention
1156: for some shared rivalrous resources begins to happen. The challenge is then
1157: to identify the major sources of social cost of sharing \cite{v2003} and
1158: properly account for them. A natural extension of our work would be to study
1159: the interplay between an information sharing community and different types of
1160: host networks. Another extension is certainly the incentive issue: how
1161: incentive schemes should be devised in response to different mixes of
1162: rivalrous and non-rivalrous resources.
1163: %The ongoing study we mentioned in passing begins with a
1164: %distribution of peer types according to a measure of their propensity to
1165: %free-ride, in terms of the marginal rate of their private cost-benefit change in contributing to the community.
1166:
1167: \bibliographystyle{ieeetran}
1168: \bibliography{all}
1169:
1170:
1171: \begin{biography}{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{w.ps}}{W.-Y. Ng}
1172: Wai-Yin Ng received his BA in 1985 (specializing in control and
1173: operational research) and PhD in control engineering in 1989, both from
1174: the University of Cambridge, U.K. and is currently associate professor
1175: in information engineering in The Chinese University of Hong Kong. His
1176: current research focus is in complex networks, a young vibrant science
1177: concerned with connectivity, complexity and emergent phenomena in both
1178: natural and artificial systems.
1179: \end{biography}
1180:
1181: \begin{biography}{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{k.ps}}{W.K. Lin}
1182: Wing Kai Lin received his B.Eng degree in
1183: 2001 and is currently completing his Master degree in information engineering, both
1184: from the Chinese Univeristy of Hong Kong. His research interest includes
1185: replication in peer to peer systems and economics issues in incentive mechanisms.
1186: \end{biography}
1187: \vspace{0.85cm}
1188: \begin{biography}{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{d.ps}}{D.M. Chiu}
1189: Dah Ming Chiu received his B.Sc. degree from Imperial College London in 1975,
1190: and Ph.D. degree from Harvard University in 1980. He worked for Bell Labs,
1191: Digital Equipment Corporation and Sun Microsystems Laboratories. Currently,
1192: he is a professor in the Department of Information Engineering at the
1193: Chinese University of Hong Kong. He is on the editorial board of the
1194: International Journal of Communication Systems.
1195: \end{biography}
1196:
1197:
1198:
1199:
1200: \end{document}
1201: