1: %% ****** Start of file template.aps ****** %
2: %%
3: %%
4: %% This file is part of the APS files in the REVTeX 4 distribution.
5: %% Version 4.0 of REVTeX, August 2001
6: %%
7: %%
8: %% Copyright (c) 2001 The American Physical Society.
9: %%
10: %% See the REVTeX 4 README file for restrictions and more information.
11: %%
12: %
13: % This is a template for producing manuscripts for use with REVTEX 4.0
14: % Copy this file to another name and then work on that file.
15: % That way, you always have this original template file to use.
16: %
17: % Group addresses by affiliation; use superscriptaddress for long
18: % author lists, or if there are many overlapping affiliations.
19: % For Phys. Rev. appearance, change preprint to twocolumn.
20: % Choose pra, prb, prc, prd, pre, prl, prstab, or rmp for journal
21: % Add 'draft' option to mark overfull boxes with black boxes
22: % Add 'showpacs' option to make PACS codes appear
23: % Add 'showkeys' option to make keywords appear
24: \documentclass[aps,pra,twocolumn,groupedaddress,showkeys]{revtex4}
25: %\documentclass[aps,prl,preprint,superscriptaddress]{revtex4}
26: %\documentclass[aps,rmp,twocolumn,groupedaddress]{revtex4}
27:
28: % You should use BibTeX and apsrev.bst for references
29: % Choosing a journal automatically selects the correct APS
30: % BibTeX style file (bst file), so only uncomment the line
31: % below if necessary.
32: %\bibliographystyle{apsrev}
33: \bibliographystyle{plain}
34:
35: %\usepackage{graphicx}% Include figure files
36: \usepackage{graphics}% Include figure files
37:
38: \begin{document}
39:
40: % Use the \preprint command to place your local institutional report
41: % number in the upper righthand corner of the title page in preprint mode.
42: % Multiple \preprint commands are allowed.
43: % Use the 'preprintnumbers' class option to override journal defaults
44: % to display numbers if necessary
45: %\preprint{}
46:
47: %Title of paper
48: \title{Bipartite Yule Processes in Collections of Journal Papers}
49:
50: \author{Steven A. Morris}
51: \email[]{steven.a.morris@okstate.edu}
52: \homepage[]{http://samorris.ceat.okstate.edu}
53: %\thanks{}
54: %\altaffiliation{}
55: \affiliation{
56: Oklahoma State University\\
57: Electrical and Computer Engineering\\
58: Stillwater, OK 74078, USA }
59:
60: \date{\today}
61:
62: \begin{abstract}
63: Collections of journal papers, often referred to as 'citation
64: networks', can be modeled as a collection of coupled bipartite
65: networks which tend to exhibit linear growth and preferential
66: attachment as papers are added to the collection. Assuming primary
67: nodes in the first partition and secondary nodes in the second
68: partition, the basic bipartite Yule process assumes that as each
69: primary node is added to the network, it links to multiple secondary
70: nodes, and with probability, $\alpha$, each new link may connect to
71: a newly appearing secondary node. The number of links from a new
72: primary node follows some distribution that is a characteristic of
73: the specific network. Links to existing secondary nodes follow a
74: preferential attachment rule. With modifications to adapt to
75: specific networks, bipartite Yule processes simulate networks that
76: can be validated against actual networks using a wide variety of
77: network metrics. The application of bipartite Yule processes to the
78: simulation of paper-reference networks and paper-author networks is
79: demonstrated and simulation results are shown to mimic networks from
80: actual collections of papers across several network metrics.
81: \end{abstract}
82:
83: % insert suggested PACS numbers in braces on next line
84: \pacs{02.50.Ey, 87.23.Ge, 89.75.Hc}
85: % insert suggested keywords - APS authors don't need to do this
86: \keywords{bipartite networks, citation networks, Yule process,
87: Simon-Yule process, network growth model, preferential attachment }
88:
89: %\maketitle must follow title, authors, abstract, \pacs, and \keywords
90: \maketitle
91:
92: % body of paper here - Use proper section commands
93: % References should be done using the \cite, \ref, and \label commands
94: \section{Collections of papers as coupled bipartite networks}
95:
96: As shown in Figure 1, a collection of journal papers constitutes a
97: series of coupled bipartite networks \cite{morris05}. As diagrammed
98: in Figure 1, a collection of papers contains 6 direct bipartite
99: networks: 1) papers to paper authors, 2) papers to references, 3)
100: papers to paper journals, 4) papers to terms, 5) references to
101: reference authors, and 6) references to reference journals.
102: Additionally, there are 15 indirect bipartite networks in
103: collections of papers as defined by the diagram. Examples of
104: interesting indirect networks are paper author to reference author
105: networks, and paper journal to reference journal networks, which can
106: be used for author co-citation analysis \cite{white81} and journal
107: co-citation analysis \cite{mccain91} respectively.
108:
109: \begin{figure}
110: \resizebox{0.45\textwidth}{!}{%
111: \includegraphics{fred.eps}}%
112: \caption{Diagram showing a collection of papers as a series of
113: coupled bipartite networks.\label{coupled}}
114: \end{figure}
115:
116:
117: Modeling the growth of these bipartite networks helps characterize
118: the underlying processes driving a research specialty, such as
119: knowledge accretion, researcher productivity, or collaboration
120: processes. Bipartite growth models produce many network metrics,
121: allowing comprehensive validation of models against real collections
122: of papers.
123:
124: \section{Basic bipartite Yule processes}
125:
126: As originally proposed, Yule processes do not model networks, but
127: simply model the formation of power-laws of frequencies of items
128: \cite{albert02} \cite{price76} \cite{simon55}. For a bipartite Yule
129: process, assume a bipartite network where nodes fall into two
130: partitions: 1) primary nodes and 2) secondary nodes. Typically,
131: primary nodes are papers while secondary nodes are entities that are
132: associated with papers, such as authors, references, journals, or
133: terms.
134:
135: Figure 2 shows a diagram of a bipartite paper-reference network,
136: where the primary nodes are papers and the secondary nodes are
137: references, and papers are linked to references by citations.
138:
139: \begin{figure}
140: \resizebox{0.35\textwidth}{!}{%
141: \includegraphics{paper_ref.eps}}%
142: \caption{Diagram showing a bipartite network of papers and the
143: references that they cite.\label{pr}}
144: \end{figure}
145:
146: Figure 3 shows a diagram of a basic bipartite Yule process:
147:
148: \begin{figure}
149: \resizebox{.45\textwidth}{!}{%
150: \includegraphics{basic.eps}}%
151: \caption{Diagram of a basic bipartite Yule process.\label{basic}}
152: \end{figure}
153:
154: \begin{itemize}
155: \item The network grows by adding primary nodes one at a time.
156:
157: \item When a new primary node is added, it links to $N$ secondary nodes.
158: $N$ is a random deviate drawn from a discrete probability distribution
159: that is a characteristic of the type of network being modeled. For
160: paper-reference networks $N$ is lognormally distributed
161: \cite{morris04a}, while for paper-author networks $N$ is 1-shifted
162: Poisson distributed \cite{goldstein04group} \cite{morris04b}. For
163: paper-journal networks, $N$ is unity, since a paper is only linked
164: to one journal, the one in which it was published. As defined here,
165: a primary entity does not link to any specific secondary entity more
166: than once.
167:
168:
169: \item For each of the $N$ links, there is a probability, $\alpha$, that it will link to a newly
170: appearing secondary node.
171:
172: \item If a link happens to be to an existing secondary node, the linked node is selected using
173: preferential attachment, that is, the probability of linking to a
174: secondary node is proportional to the number of links that the node
175: possesses.
176: \end{itemize}
177:
178:
179: The stationary distribution of the link degree of the secondary
180: nodes is a Yule distribution \cite{johnson92}\cite{simon55}, a power
181: law whose exponent is $1+1/(1-\alpha)$. The stationary distribution
182: is independent of the distribution of $N$, but for finite
183: collections of papers the distribution of $N$ profoundly affects the
184: tail of the distribution \cite{morris04a}.
185:
186: \section{Practical bipartite Yule processes}
187: In practice, the basic bipartite Yule process outlined in the
188: proceeding section must be modified to account for the
189: characteristics of the specific type of bipartite network being
190: studied.
191:
192: \subsection{Paper-reference Yule process}
193: Figure 4 shows a diagram of a bipartite Yule process modified for
194: the characteristics of paper-reference networks. The details of this
195: model, its scope, and a discussion of evidence of the its validity,
196: appear in \cite{morris04a}. Paper-reference networks in collections
197: of papers covering scientific specialties are characterized by the
198: accretion of highly cited exemplar references, which are cited at
199: rates far higher than would be predicted by simple preferential
200: attachment. These exemplar references tend to appear during the
201: initial growth of the network and their rate of appearance decreases
202: exponentially as papers are added to the collection.
203:
204: As each paper is added to the collection, it links to a lognormally
205: distributed number of references, as discussed in \cite{morris04a}.
206: For each reference cited by a paper, there is a probability $\alpha$
207: that the citation is to a newly appearing reference. When a new
208: reference appears, there is a small probability that the reference
209: will be a highly attractive exemplar reference. If so, the reference
210: receives a large initial attraction, $A_0$. Newly created
211: non-exemplar references received no initial attraction. If a
212: citation is to an existing reference, the probability that any
213: particular existing reference will be cited is proportional to the
214: sum of its attraction plus the number of times it has been cited. A
215: specific reference can not be cited more than once by a paper.
216:
217: \begin{figure}
218: \resizebox{.45\textwidth}{!}{%
219: \includegraphics{paper_ref_flowchart.eps}}%
220: \caption{Diagram showing a bipartite Yule process for
221: paper-reference networks.\label{prproc}}
222: \end{figure}
223:
224: \subsection{Paper-author Yule process}
225: Figure 5 shows a diagram of the basic bipartite Yule process
226: modified for the characteristics of paper-author networks. The
227: details of this model, its scope, and a discussion of evidence of
228: the its validity, appear in \cite{goldstein04group} and
229: \cite{morris04b}. In this case the Yule process is applied to teams
230: of researchers rather than individual researchers. As each paper is
231: added, there is a probability that the paper will be authored by a
232: new research team. If so, a team of $N_G$ authors is added to the
233: network, but only $N(\lambda)$ appear as authors of the team's first
234: paper, where $N(\lambda)$ is a random deviate drawn from a 1-shifted
235: Poisson distribution whose parameter is $\lambda$. If choosing an
236: existing team, the teams are chosen using preferential attachment,
237: that is, the probability that a team will author the new paper is
238: proportional to the number of papers that the team has previously
239: published.
240:
241: \begin{figure}
242: \resizebox{0.45\textwidth}{!}{%
243: \includegraphics{paper_auth_flowchart.eps}}%
244: \caption{Diagram showing a bipartite Yule process for paper-author
245: networks.\label{paproc}}
246: \end{figure}
247:
248:
249: When selecting authors for an existing team's paper, $N(\lambda)$
250: authors are chosen and the authors are selected using preferential
251: attachment, specifically, the probability of selecting an author is
252: proportional to 1 plus the number of papers that the author has
253: published. Inter-team collaborations (weak ties) are modeled as
254: random events; when an existing author is to be selected there is a
255: probability $\beta$ that the author will be drawn randomly from some
256: other team.
257:
258:
259: \section{Network metrics}
260: Simulation using a bipartite Yule process fully preserves the
261: topology of the network phenomenon being studied. The adjacency
262: matrix for a bipartite network is a roughly lower triangular
263: rectangular matrix. Figure 6 shows the adjacency matrices of the
264: paper-reference network, paper-author network, and paper-journal
265: network in an actual collection of papers.
266:
267: \begin{figure*}
268: \resizebox{1\textwidth}{!}{%
269: \includegraphics{figurematrix.eps}}%
270: \caption{Diagrams of adjacency matrices of bipartite networks in a
271: collection of 902 papers on the topic of complex
272: networks.\label{matrix}}
273: \end{figure*}
274:
275:
276: From each bipartite network, two co-occurrence networks can be
277: derived with their own characteristic topology. For example, a
278: paper-reference network yields two unipartite networks, a
279: bibliographic coupling network of papers linked by common references
280: and a co-citation network of references linked by their common
281: papers. A paper-author network yields a collaboration network of
282: authors connected by common papers and also a network of papers
283: connected by common authors.
284:
285: Network metrics that characterize a bipartite network can be derived
286: from link degree distributions in the bipartite network and link
287: degree distributions in the associated unipartite co-occurrence
288: networks. Many of these metrics can be tied to indicators of the
289: underlying research process generating the collection of papers.
290:
291: A set of useful metrics for paper-reference networks includes:
292: \begin{itemize}
293: \item \textit{reference per paper distribution} - This tends to be a
294: lognormal distribution whose mean, $m$, is from 15 to 30 references
295: per paper \cite{morris04a}.
296: \item \textit{paper per reference
297: distribution} - This tends to be a power-law distribution with a
298: characteristic exponent that ranges from 2 to 4
299: \cite{naranan71}\cite{redner98}.
300: \item \textit{bibliographic coupling strength per
301: paper pair distribution} - This is the link weight distribution of
302: the bibliographic coupling network.
303:
304: \item \textit{co-citation coupling strength per reference pair distribution} -
305: This is the link weight distribution of the co-citation network.
306: \item \textit{bibliographic coupling clustering coefficient
307: distribution} - This the distribution of the clustering coefficients
308: for the bibliographic coupling network.
309: \end{itemize}
310: In paper-reference networks, the mean references per paper is
311: typically about 30, while the mean papers per reference is typically
312: about 1.4, the mean of a zeta (pure power-law) distribution with
313: exponent of 3. This constrains the ratio of references to papers in
314: the collection to be about 20, that is, a collection of papers
315: typically has about 20 times more references than papers.
316:
317: A set of useful metrics for paper-author networks includes.
318: \begin{itemize}
319: \item \textit{authors per paper distribution} - This tends to be a 1-shifted
320: Poisson distribution whose mean varies from 2 for fields such as
321: mathematics to more than 10 for biomedical fields \cite{morris04b}.
322: \item \textit{paper per author distribution} - This tends to be a
323: power-law (Lotka's Law), whose exponent ranges from 2 to 4
324: \cite{lotka26}.
325: \item \textit{collaborating author distribution} - This is the
326: distribution of the number of unique co-authors per author in the
327: collection, and is the link degree distribution of the unweighted
328: co-authorship network.
329: \item \textit{co-authorship per author pair
330: distribution} - This is the link weight distribution of the weighted
331: co-authorship network.
332: \item \textit{co-authorship clustering coefficient
333: distribution} - This is the clustering coefficient of the unweighted
334: co-authorship network.
335: \item \textit{minimum co-authorship path length
336: distribution} - This is the distribution of minimum pathlengths
337: between author pairs in the unweighted co-authorship network.
338: \end{itemize}
339:
340: \section{Examples}
341: \subsection{Example simulation of paper-reference network}
342: The Yule model for paper-reference networks was tested on a
343: collection of papers that cover the topic of complex networks. This
344: collection was gathered on September 8th, 2003 from ISI's Web of
345: Science product using a series of queries to find all papers that
346: cite key references and authors in the specialty. The collection
347: contains 902 papers with 31355 citations to 19185 references. The
348: Yule parameter, $\alpha$, estimated by dividing the number of
349: references by the number of citations to references, is 0.61. The
350: mean references per paper is 34.8. The parameters used for the
351: bipartite Yule simulation of this collection can be found in
352: \cite{morris04a}.
353:
354: \begin{figure*}
355: \resizebox{.9\textwidth}{!}{%
356: \includegraphics{figure7.eps}}%
357: \caption{Comparison plots of paper per reference frequency (upper
358: left), bibliographic coupling strength frequency (upper right),
359: co-citation strength frequency (lower left), and bibliographic
360: coupling clustering coefficient distribution (lower right), from a
361: collection of 902 papers on the topic of complex networks.
362: \label{pr_results}}
363: \end{figure*}
364:
365: Figure 7 show plots comparing network metrics from the actual data
366: to a Yule simulation of network growth. The upper left plot is of
367: papers per reference frequencies. Maximum likelihood expectation
368: (MLE) estimated power-law exponents are 3.0 for the actual
369: frequencies, and 2.85 for the simulation. The paper-reference Yule
370: process mimics the phenomenon of exceptionally highly cited exemplar
371: references in the extreme lower right of the plot. The upper right
372: plot is of frequency of bibliographic coupling strength per paper
373: pair. The Yule process-based simulation frequencies match the actual
374: frequencies well. The series of high bibliographic coupling strength
375: pairs in the lower right from actual data corresponds to pairs of
376: review papers with long lists of almost identical references, a
377: phenomenon not modeled by the Yule process. The lower left plot of
378: Figure 7 is of frequency of co-citation strength per reference pair.
379: The simulated frequencies match the actual frequencies well across
380: the whole plot. The lower right plot is of bibliographic coupling
381: clustering coefficient distribution. The simulated distribution
382: matches the shape and scale of the actual data.
383:
384: \subsection{Example simulation of a paper-author network}
385: The Yule model for paper-author networks was tested on three
386: collections of papers representing specialties with a wide range of
387: collaboration intensities. A collection of 1391 papers on the topic
388: of distance learning with 51\% single-authored papers represents a
389: specialty with little collaboration. A collection of 900 papers on
390: the topic of complex networks with 21\% single-authored papers
391: represents a specialty with typical amount of collaboration.
392: Finally, a collection of 3095 papers on the topic of atrial ablation
393: with 7\% single-authored papers represents a specialty with heavy
394: collaboration \cite{morris04b}. The parameters used for bipartite
395: Yule simulation of these paper-author networks can be found in
396: \cite{morris04b}.
397:
398: Figures 8, 9 and 10 show the comparison of Yule model simulations to
399: actual data for these three collections using two metrics: 1) paper
400: per author frequency (Lotka's Law), and 2) collaborating author
401: frequency.
402:
403: \begin{figure*}
404: \resizebox{.9\textwidth}{!}{%
405: \includegraphics{figure_distance.eps}}%
406: \caption{Comparison of bipartite Yule simulation against actual data
407: for plots of paper per author frequencies and collaborating author
408: frequencies for the distance education paper collection.\label{distance}}
409: \end{figure*}
410:
411: \begin{figure*}
412: \resizebox{.9\textwidth}{!}{%
413: \includegraphics{figure_complex.eps}}%
414: \caption{Comparison of bipartite Yule simulation against actual data
415: for plots of paper per author frequencies and collaborating author
416: frequencies for the complex networks paper collection.\label{networks}}
417: \end{figure*}
418:
419: \begin{figure*}
420: \resizebox{.9\textwidth}{!}{%
421: \includegraphics{figure_atrial.eps}}%
422: \caption{Comparison of bipartite Yule simulation against actual data
423: for plots of paper per author frequencies and collaborating author
424: frequencies for the atrial ablation paper collection.\label{atrial}}
425: \end{figure*}
426:
427: The left plots in Figures 8, 9 and 10 are paper per author frequency
428: plots. The bipartite Yule process produces excellent matches to
429: actual data. The inset plots show Yule model predicted paper per
430: author distributions derived by gathering statistics from 1000
431: simulations for each collection. A line representing an MLE fitted
432: zeta (pure power-law) distribution is shown in each inset. The Yule
433: model produces excellent fits to the zeta distribution for all three
434: collections, confirming the Yule model's usefulness as a predictor
435: of Lotka's Law. Note that the deviation of the distributions from
436: the zeta distribution in the tail of the distributions is due to
437: truncating the simulations at the number of papers in each
438: collection. The plots on the right side of Figures 8, 9 and 10 show
439: that the bipartite Yule model produces good matches of collaborating
440: author frequencies to actual data across the wide rage of
441: collaboration intensities represented by the three collections.
442:
443: \begin{figure}
444: \resizebox{.5\textwidth}{!}{%
445: \includegraphics{couple_ap_p_r.eps}}%
446: \caption{Example of coupled bipartite networks. The paper-author
447: network is coupled to the paper-reference network through common
448: papers. \label{example}}
449: \end{figure}
450:
451: \section{Future work}
452: The research on bipartite Yule processes discussed here will be
453: extended to modeling of coupled bipartite networks. Figure 10 shows
454: an example of coupled bipartite networks, where a paper-author
455: network is coupled to a paper reference network through common
456: papers. The challenge is to invent a model that reproduces the
457: correlation of groups of authors to groups of references, a
458: phenomenon that cannot be modeled using two separate bipartite
459: processes.
460:
461:
462:
463: % Put \label in argument of \section for cross-referencing
464: %\section{\label{}}
465: %\subsection{}
466: %\subsubsection{}
467:
468: % If in two-column mode, this environment will change to single-column
469: % format so that long equations can be displayed. Use
470: % sparingly.
471: %\begin{widetext}
472: % put long equation here
473: %\end{widetext}
474:
475: % figures should be put into the text as floats.
476: % Use the graphics or graphicx packages (distributed with LaTeX2e)
477: % and the \includegraphics macro defined in those packages.
478: % See the LaTeX Graphics Companion by Michel Goosens, Sebastian Rahtz,
479: % and Frank Mittelbach for instance.
480: %
481: % Here is an example of the general form of a figure:
482: % Fill in the caption in the braces of the \caption{} command. Put the label
483: % that you will use with \ref{} command in the braces of the \label{} command.
484: % Use the figure* environment if the figure should span across the
485: % entire page. There is no need to do explicit centering.
486:
487: % \begin{figure}
488: % \includegraphics{}%
489: % \caption{\label{}}
490: % \end{figure}
491:
492: % Surround figure environment with turnpage environment for landscape
493: % figure
494: % \begin{turnpage}
495: % \begin{figure}
496: % \includegraphics{}%
497: % \caption{\label{}}
498: % \end{figure}
499: % \end{turnpage}
500:
501: % tables should appear as floats within the text
502: %
503: % Here is an example of the general form of a table:
504: % Fill in the caption in the braces of the \caption{} command. Put the label
505: % that you will use with \ref{} command in the braces of the \label{} command.
506: % Insert the column specifiers (l, r, c, d, etc.) in the empty braces of the
507: % \begin{tabular}{} command.
508: % The ruledtabular enviroment adds doubled rules to table and sets a
509: % reasonable default table settings.
510: % Use the table* environment to get a full-width table in two-column
511: % Add \usepackage{longtable} and the longtable (or longtable*}
512: % environment for nicely formatted long tables. Or use the the [H]
513: % placement option to break a long table (with less control than
514: % in longtable).
515: % \begin{table}%[H] add [H] placement to break table across pages
516: % \caption{\label{}}
517: % \begin{ruledtabular}
518: % \begin{tabular}{}
519: % Lines of table here ending with \\
520: % \end{tabular}
521: % \end{ruledtabular}
522: % \end{table}
523:
524: % Surround table environment with turnpage environment for landscape
525: % table
526: % \begin{turnpage}
527: % \begin{table}
528: % \caption{\label{}}
529: % \begin{ruledtabular}
530: % \begin{tabular}{}
531: % \end{tabular}
532: % \end{ruledtabular}
533: % \end{table}
534: % \end{turnpage}
535:
536: % Specify following sections are appendices. Use \appendix* if there
537: % only one appendix.
538: %\appendix
539: %\section{}
540:
541: % If you have acknowledgments, this puts in the proper section head.
542: %\begin{acknowledgments}
543: % put your acknowledgments here.
544: %\end{acknowledgments}
545:
546: % Create the reference section using BibTeX:
547: \bibliography{yule}
548:
549: \end{document}
550: %
551: % ****** End of file template.aps ******
552: