cs0309023/Cite.tex
1: \documentclass[12pt]{article}
2: %
3: % Vladimir Batagelj
4: % Efficient Algorithms for Citation Network Analysis
5: % ----------------------------------------------------------------------------
6: % version : Jan 21, 1991  Pittsburgh, Some Mathematics of Network Analysis
7: % version : Aug 28, 1994  slides
8: % version : May  5, 1996  LaTeX
9: % version : Aug 31, 1997  corrections
10: % version : Sep 15, 2001  used in Layouts for GD01 Graph-Drawing Competition
11: % version : Sep  1, 2002  extensions, real-life example
12: % version : Aug-Sep 2003  extensions, islands, US patents
13: %
14: % Pictures:
15: %   networkc.eps, preprint.eps, mainP.eps, CPM.eps, som07LH.eps, main.eps,
16: %   CPMpath.eps, size.eps, islandMa.eps, island50.eps, island38.eps
17: % ----------------------------------------------------------------------------
18: %
19: 
20: \usepackage{latexsym}
21: \usepackage{times}
22: \usepackage[dvips]{graphicx}
23: 
24: \newcommand{\Qed}{\hspace*{1mm}\hfill$\Box$\endgraf}
25: \def\RR{\hbox{\sf I\kern-.14em\hbox{R}}}
26: \def\NN{\hbox{\sf I\kern-.13em\hbox{N}}}
27: \def\Min{\mathop{\rm Min}\nolimits}
28: \def\Max{\mathop{\rm Max}\nolimits}
29: \newcommand{\Units}{\mathbf{U}}
30: \newcommand{\Net}{\mathbf{N}}
31: \newcommand{\DK}[1]{\stackrel{\rightharpoonup}{\mathbf{K}}_{#1}}
32: \newcommand{\inv}[1]{#1^\mathrm{inv}}
33: \newcommand{\trecl}[1]{{#1}^\star}
34: \newcommand{\tracl}[1]{\overline{#1}}
35: \newcommand{\url}[1]{{\textbf{\texttt{\small #1}}}}
36:  \renewcommand{\textfraction}{.05}
37:  \renewcommand{\topfraction}{.95}
38: 
39: \oddsidemargin 5pt \evensidemargin 5pt \marginparwidth 20pt
40: \marginparsep 10pt \topmargin -12 true mm \headheight 12pt \headsep 25pt
41: \textheight 23 true cm \textwidth 16 true cm
42: \columnsep 10pt \columnseprule 0pt
43: 
44: 
45: \title{Efficient Algorithms for Citation Network Analysis}
46: \author{Vladimir Batagelj \\
47:         University of Ljubljana, Department of Mathematics,\\
48:         Jadranska 19, 1\,111 Ljubljana, Slovenia \\
49:         e-mail: \texttt{vladimir.batagelj@uni-lj.si}}
50: \date{}
51: %\date{\today}
52: %\date{September 14, 2003}
53: 
54: \begin{document}
55: \maketitle
56: 
57: \begin{abstract}
58: In the paper very efficient, linear in number of arcs, algorithms
59: for determining Hummon and Doreian's arc weights SPLC and SPNP in
60: citation network are proposed, and some theoretical properties
61: of these weights are presented. The nonacyclicity problem in
62: citation networks is discussed. An approach to identify
63: on the basis of arc weights an important small subnetwork is proposed
64: and illustrated on the citation networks of SOM (self organizing maps)
65: literature and US patents.
66: \\[4pt]
67: \textbf{Keywords:} large network, acyclic, citation network,
68: main path, CPM path, arc weight,
69:    algorithm, self organizing maps, patent
70: \end{abstract}
71: 
72: \section{Introduction}
73: 
74: The citation network analysis started with the
75: paper of Garfield et al. (1964) \cite{Gar64} in which the introduction
76: of the notion of citation network is attributed to Gordon Allen.
77: In this paper, on the example of Asimov's history of DNA \cite{Asimov},
78: it was shown that the analysis "\textit{demonstrated a high
79: degree of coincidence between an historian's account of events and the citational
80: relationship between these events}". An early overview of possible
81: applications of graph theory in citation network analysis was made
82: in 1965 by Garner \cite{3D}.
83: 
84: The next important step was made by Hummon and Doreian (1989)
85: \cite{HumDor89,HumDor90,HuDoFr90}. They proposed three indices
86: (NPPC, SPLC, SPNP) --
87: weights of arcs that provide us with automatic way to identify
88: the (most) important part of the citation network -- the main path
89: analysis.
90: 
91: In this paper we make a step further. We show how to efficiently
92: compute the Hummon and Doreian's weights, so that they can be used
93: also for analysis of very large citation networks with several
94: thousands of vertices. Besides this some theoretical properties
95: of the Hummon and Doreian's weights are presented.
96: 
97: The proposed methods are implemented in \texttt{\textbf{Pajek}} --
98: a program, for Windows (32 bit), for analysis of \emph{large networks}.
99: It is freely available, for noncommercial use, at its homepage
100: \cite{pajek}.
101: 
102: 
103: 
104: For basic notions of graph theory see Wilson and Watkins \cite{GT}.
105: 
106: \section{Citation Networks}
107: 
108: In a given set of units $\Units $ (articles, books, works,
109: \ldots) we introduce a $citing$ relation
110: $R \subseteq \Units  \times \Units $
111:  \[ u R v \equiv v \mbox{ cites } u \]
112: which determines a \emph{citation network} $\Net = (\Units ,R)$.
113: 
114: In Table~\ref{citnets} some characteristics of real life citation networks
115: are presented. Most of these networks were obtained from the Eugene Garfield's
116: collection of citation data \cite{Gar64,Gar02} produced using
117: \textit{\textbf{HistCite}} Software (formerly called \textit{\textbf{HistComp}}
118: -- \textit{comp}iled \textit{Hist}oriography program)
119: \cite{Gar01}. All of these networks are the result of searches in the Web of Science
120: and are used with the permission of ISI of Philadelphia,
121: \texttt{\textbf{www.isinet.com}}. These networks in \texttt{\textbf{Pajek}}'s format
122: are available from \texttt{\textbf{Pajek}}'s web site \cite{data}.
123: 
124: In Table~\ref{citnets}: $n = |\Units|$ is the number of vertices;
125: $m = |R|$ is the number of arcs;
126: $m_0$ is the number of loops;
127: $n_0$ is the number of isolated vertices;
128: $n_C$ is the size of the largest weakly connected component;
129: $k_C$ is the number of nontrivial weakly connected components;
130: $h$ is the depth of network (minimum number of levels);
131: $\Delta_{in}$ is the maximum input degree;
132: and $\Delta_{out}$ is the maximum output degree.
133: The last three columns contain the numbers of strongly connected components
134: (cyclic parts) of size 2, 3 and 4.
135: 
136: 
137: \begin{table}
138: \caption{Citation network characteristics\label{citnets}}
139: \begin{center}\footnotesize
140: \begin{tabular}{l|r|r|r|r|r|r|r|r|r|r|r|r|}
141: network              &  $n$ &   $m$ & $m_0$ & $n_0$ & $n_C$ & $k_C$ & $h$&$\Delta_{in}$&$\Delta_{out}$&$2$& $3$ & $4$  \\ \hline
142: DNA                  &   40 &    60 &     0 &     1 &    35 &     3 &  11 &          7 &          5 &   0 &   0 &  0   \\
143: Coupling             &  223 &   657 &     1 &     5 &   218 &     1 &  16 &         19 &        134 &   0 &   0 &  0   \\
144: Small world          &  396 &  1988 &     0 &   163 &   233 &     1 &  16 &         60 &        294 &   0 &   0 &  0   \\
145: Small \& Griffith    & 1059 &  4922 &     1 &    35 &  1024 &     1 &  28 &         89 &        232 &   2 &   0 &  0   \\
146: Cocitation           & 1059 &  4929 &     1 &    35 &  1024 &     1 &  28 &         90 &        232 &   2 &   0 &  0   \\
147: Scientometrics       & 3084 & 10416 &     1 &   355 &  2678 &    21 &  32 &        121 &        105 &   5 &   2 &  1   \\
148: Kroto                & 3244 & 31950 &     1 &     0 &  3244 &     1 &  32 &        166 &       3243 &   6 &   0 &  0   \\
149: SOM                  & 4470 & 12731 &     2 &   698 &  3704 &    27 &  24 &         51 &        735 &  11 &   0 &  0   \\
150: Zewail               & 6752 & 54253 &     1 &   101 &  6640 &     5 &  75 &        166 &        227 &  38 &   1 &  2   \\
151: Lederberg            & 8843 & 41609 &     7 &   519 &  8212 &    35 &  63 &        135 &       1098 &  54 &   4 &  0   \\
152: Desalination         & 8851 & 25751 &     7 &  1411 &  7143 &   115 &  27 &         73 &        137 &  12 &   0 &  1   \\
153: US patents     & 3774768 & 16522438 &     1 &     0 & 3764117 & 3627 & 32 &        779 &        770 &   0 &   0 &  0   \\ \hline
154: \end{tabular}
155: \end{center}
156: \end{table}
157: 
158: A citing relation is usually \emph{irreflexive},
159: $\forall u \in \Units  : \lnot u R u$,
160: and (almost) \emph{acyclic} -- no vertex is reachable from
161: itself by a nontrivial path, or formally
162: $\forall u \in \Units  \forall k \in \NN^+ : \lnot u R^k u$.
163: In the following we shall assume that it has this property.
164: We shall postpone the question how to deal with nonacyclic
165: citation networks till the end of the theoretical part of
166: the paper.
167: 
168: For a relation $Q \subseteq \Units  \times \Units $ we denote by
169: $\inv{Q}$ its \emph{inverse} relation,
170: $u \inv{Q} v \equiv v Q u$,
171: and by
172:  \[ Q(u) = \{ v \in \Units  : u Q v \} \]
173: the set of successors of unit $u \in \Units $.
174: If $Q$ is acyclic then also $\inv{Q}$ is acyclic.
175: This means that the network $\inv{\Net} = (\Units, \inv{R})$,
176: $u \inv{R} v \equiv  u \mbox{ cites } v$, is a network of the same
177: type as the original citation network $\Net = (\Units,R)$.
178: Therefore it is just a matter of 'taste' which relation
179: to select.
180: 
181: Let $I = \{ (u,u) : u \in \Units  \}$ be the \emph{identity} relation
182: on $\Units $ and $\tracl{Q} = \bigcup_{k \in \NN^+} Q^k$
183: the \emph{transitive closure} of relation $Q$. Then
184: $Q$ is acyclic iff $\tracl{Q} \cap I = \emptyset$.
185: The relation $\trecl{Q} = \tracl{Q} \cup I$ is the
186: \emph{transitive and reflexive closure} of relation $Q$.
187: 
188: 
189: Since the set of units $\Units $ is finite and $R$ is acyclic we
190: know from the theory of relations that:
191: \begin{itemize}
192:  \item The set of units $\Units $ can be \emph{topologically ordered} --
193:   there exists a surjective mapping (permutation) $i : \Units  \to 1 .. |\Units |$
194:   with the property
195:   \[ u R v \Rightarrow i(u) < i(v) \]
196:  \item Let $\Min R = \{ u \in \Units  : \inv{R}(u) = \emptyset \}$ be the set
197:   of \emph{minimal} elements and
198:    $\Max R = \{ u \in \Units  : R(u) = \emptyset \}$
199:   the set of \emph{maximal} elements. Then $\Min R \ne \emptyset$ and
200:   $\Max R \ne \emptyset$.
201:  \item Every unit $u \in \Units$ and every arc $(u,v) \in R$ belong
202:        to at least one path from  $\Min R $ to $\Max R$:\\
203:        $\forall u \in \Units : R^\star (u) \cap \Max R \ne \emptyset$ \\
204:        $\forall u \in \Units : {\inv{R}}^\star (u) \cap \Min R \ne \emptyset$
205: \end{itemize}
206: 
207: \begin{figure}
208:  \begin{center}
209:   \includegraphics[width=60mm,viewport=10 4 217 280]{./pics/networkc.eps}
210:   \caption{Citation Network in Standard Form\label{net}}
211:  \end{center}
212: \end{figure}
213: 
214: 
215: To simplify the presentation we transform a citation network $\Net = (\Units ,R)$ to its
216: \emph{standard form} $\Net' = (\Units',R')$ (see Figure~\ref{net})
217: by extending the set of units
218: $\Units ' := \Units  \cup \{ s, t \}$, $s, t \notin \Units $ with a common
219: \emph{source} (initial  unit) $s$ and a common \emph{sink}
220: (terminal unit) $t$, and by adding the corresponding arcs to relation $R$
221: \[ R' := R \ \cup\ \{s\} \times \Min R\ \cup\ \Max R \times \{t\}
222:          \ \cup\ \{ (t,s) \} \]
223: This eliminates problems with networks with several connected components
224: and/or several initial/terminal units.
225: In the following we shall assume that the citation network
226: $\Net = (\Units,R)$ is in the standard form.
227: Note that, to make the theory smoother, we added to $R'$ also the
228: 'feedback' arc $(t,s)$, thus destroying its acyclicity.
229: 
230: 
231: \section{Analysis of Citation Networks}
232: 
233: An approach to the analysis of citation network is to determine
234: for each unit / arc its \emph{importance} or \emph{weight}. These values
235: are used afterward to determine the essential substructures in the
236: network.
237: In this paper we shall focus on the methods of assigning weights
238: $w : R \to \RR^+_0$
239: to arcs proposed by Hummon and Doreian \cite{HumDor89,HumDor90}:
240: \begin{itemize}
241:  \item \emph{node pair projection count}  (NPPC) method:
242:   $w_d(u,v) = |\trecl{\inv{R}}(u)|\cdot|\trecl{R}(v)|$
243:  \item \emph{search path link count}  (SPLC) method: $w_l(u,v)$ equals
244:   the number of "\textit{all possible search paths through the network
245:   emanating from an origin node}"  through the arc $(u,v) \in R$,
246:   \cite[p. 50]{HumDor89}.
247:  \item \emph{search path node pair} (SPNP) method:
248:   $w_p(u,v)$  "\textit{accounts for
249:   all connected vertex pairs along the paths through the arc $(u,v) \in R$}",
250:   \cite[p. 51]{HumDor89}.
251: \end{itemize}
252: 
253: \subsection{Computing NPPC weights}
254: 
255: To compute $w_d$ for sets of units of moderate size (up to some thousands of units)
256: the matrix representation of $R$ can be used and its transitive
257: closure computed by Roy-Warshall's algorithm \cite{algo}. The quantities
258: $|\trecl{R}(v)|$ and $|\trecl{\inv{R}}(u)|$ can be obtained
259: from closure matrix as row/column sums.
260: An $O(nm)$ algorithm for computing $w_d$ can be constructed using
261: Breath First Search from each $u \in \Units$ to determine
262: $|\trecl{\inv{R}}(u)|$ and $|\trecl{R}(v)|$.
263: Since it is of order at least $O(n^2)$ this algorithm is not suitable
264: for larger networks (several ten thousands of vertices).
265: 
266: \subsection{Search path count method}
267: 
268: To compute the SPLC and SPNP weights we introduce a related
269: \textit{search path count} (SPC) method for which the
270: weights $N(u,v)$, $u R v$ count the number of
271: different paths  from $s$ to $t$ (or from $\Min R$ to $\Max R$)
272: through the arc $(u,v)$.
273: 
274: To compute $N(u,v)$ we introduce two auxiliary quantities:
275: let $N^-(v)$ denotes the number of different $s$-$v$ paths,
276: and $N^+(v)$ denotes the number of different $v$-$t$ paths.
277: 
278: Every $s$-$t$ path $\pi$ containing the arc $(u,v) \in R$
279: can be uniquely expressed in the form
280: \[ \pi = \sigma \circ (u,v) \circ \tau \]
281: where $\sigma$ is a $s$-$u$ path and $\tau$ is a $v$-$t$ path.
282: Since every pair $(\sigma,\tau)$ of
283: $s$-$u$ / $v$-$t$ paths gives a corresponding
284: $s$-$t$ path  it follows:
285:  \[ N(u,v) = N^-(u)\cdot N^+(v), \qquad (u,v) \in R \]
286: where
287: \[
288: N^-(u) =
289: \cases{
290:  1  & $u = s$ \cr
291:  \sum_{v : v R u} N^-(v) \quad & otherwise}
292: \]
293: and
294: \[
295: N^+(u) =
296: \cases{
297:  1   & $u = t$ \cr
298:  \sum_{v : u R v} N^+(v) \quad & otherwise}
299: \]
300: This is the basis of an efficient algorithm for computing
301: the weights $N(u,v)$ --
302: after the topological sort of the network \cite{algo}
303: we can compute, using the above relations in topological order,
304: the weights in time of order $O(m)$.
305: The topological order ensures that all the quantities in
306: the right side expressions of the above equalities are already
307: computed when needed. The counters $N(u,v)$ are used as SPC
308: weights $w_c(u,v) = N(u,v)$.
309: 
310: \subsection{Computing SPLC and SPNP weights}
311: 
312: The description of SPLC method in \cite{HumDor89} is not very
313: precise. Analyzing the table of SPLC weights from
314: \cite[p. 50]{HumDor89} we see that we have to consider
315: \textbf{each} vertex as an origin of search paths.
316: This is equivalent to apply the SPC method on the
317: extended network
318:  $\Net_l = (\Units',R_l)$
319: \[ R_l := R'\ \cup\ \{ s \} \times (\Units  \setminus  \cup R(s) )  \]
320: 
321: It seems that there are some errors in the table of SPNP
322: weights in \cite[p. 51]{HumDor89}. Using the definition
323: of the SPNP weights we can again reduce their computation
324: to SPC method applied on the extended network
325: $\Net_p = (\Units',R_p)$
326: \[ R_p := R \ \cup\  \{ s \} \times \Units \  \cup \  \Units
327:   \times \{ t \} \ \cup\ \{ (t,s) \} \]
328: in which every unit $u \in U$ is additionaly linked
329: from the source $s$ and to the sink $t$.
330: 
331: 
332: \subsection{Computing the numbers of paths of length $k$}
333: 
334: We could use also a direct approach to determine the
335: weights $w_p$. Let $L^-(u)$ be the number of different
336: paths terminating in $u$
337: and $L^+(u)$ the number of different
338: paths originating in $u$.
339: Then for $uRv$ it holds $ w_p(u,v)  = L^-(u)\cdot L^+(v)$.
340: 
341: The procedure to determine $L^-(u)$ and $L^+(u)$ can be compactly described using two
342: families of polynomial generating functions\\
343: \[ P^-(u;x) = \sum_{k=0}^{h(u)} p^-(u,k) x^k \qquad
344: \mbox{and}  \qquad  P^+(u;x) = \sum_{k=0}^{h^-(u)} p^+(u,k) x^k, \quad  u \in \Units \]
345: where $h(u)$ is the depth of vertex $u$ in network $(\Units,R)$, and
346: $h^-(u)$ is the depth of vertex $u$ in network $(\Units,\inv{R})$,
347: The coefficient $p^-(u,k)$ counts the number of paths of length $k$ to $u$,
348: and $p^+(u,k)$ counts the number of paths of length $k$ from $u$.
349: 
350: Again, by the basic principles of combinatorics
351: \[
352: P^-(u;x) =
353: \cases{
354:  0   & $u=s$ \cr
355:  1 + x \cdot \sum_{v : v R u} P^-(v;x) \quad & otherwise}
356: \]
357: and
358: \[
359: P^+(u;x) =
360: \cases{
361:  0   & $u=t$ \cr
362:  1 + x \cdot \sum_{v : u R v} P^+(v;x) \quad & otherwise}
363: \]
364: and both families can be determined using the definitions and
365: computing the polynomials in the (reverse for $P^+$) topological
366: ordering of $\Units$. The complexity of this procedure is at most
367: $O(hm)$. Finally
368: \[  L^-(u) = P^-(u;1)     \qquad \mathrm{and} \qquad
369:     L^+(v) = P^+(v;1)   \]
370: In real life citation networks the depth $h$ is relatively small as can be seen
371: from the Table~\ref{citnets}.
372: 
373: The complexity of this approach is higher than the complexity of the
374: method proposed in subsection 3.3 -- but we get more detailed information about paths.
375: May be it would make sense to consider 'aging' of references by
376: $ L^-(u) = P^-(u;\alpha)$, for selected $\alpha$, $0 < \alpha \leq 1$.
377: 
378: \subsection{Vertex weights}
379: 
380: The quantities used to compute the arc weights $w$ can be used
381: also to define the corresponding vertex weights $t$
382: \begin{eqnarray*}
383:  t_d(u) & = & |\trecl{\inv{R}}(u)|\cdot|\trecl{R}(u)| \\
384:  t_c(u) & = & N^-(u)\cdot N^+(u) \\
385:  t_l(u) & = & N'^-(u)\cdot N'^+(u) \\
386:  t_p(u) & = & L^-(u)\cdot L^+(u)
387: \end{eqnarray*}
388: They are counting the number of paths of selected type through
389: the vertex $u$.
390: 
391: \subsection{Implementation details}
392: 
393: In our first implementation of the SPNP method the values of
394: $L^-(u)$ and $L^+(u)$ for some large networks (Zewail and Lederberg)
395: exceeded the range of Delphi's \texttt{LargeInt} (20 decimal places).
396: We decided to use the \texttt{Extended} real numbers
397: (range $= 3.6 \times 10^{-4951}\ ..\ 1.1 \times 10^{4932}$,
398: 19-20 significant digits) for counters. This
399: range is safe also for very large citation networks.
400: 
401: 
402: To see this, let us denote $N^*(k) = \max_{u: h(u)=k} N^-(u)$.
403: Note that $h(s) = 0$ and $u R v \Rightarrow h(u) < h(v)$.
404: Let $u^* \in \Units$ be a unit on which the maximum is attained
405: $N^*(k) = N^-(u^*)$. Then
406: \begin{eqnarray*}
407:  N^*(k) & = & \sum_{v:v R u^*} N^-(v) \leq  \sum_{v:v R u^*} N^*(h(v)) \leq \sum_{v:v R u^*} N^*(k-1) = \\
408:    & = & \deg_{in}(u^*) \cdot N^*(k-1) \leq \Delta_{in}(k) \cdot N^*(k-1)
409: \end{eqnarray*}
410: where $\Delta_{in}(k)$ is the maximal input degree at depth $k$. Therefore
411: $N^*(h) \leq \prod_{k=1}^h \Delta_{in}(k) \leq  \Delta_{in}^h$. A similar inequality
412: holds also for $N^+(u)$. From both it follows
413: \[ N(u,v) \leq \Delta_{in}^{h(u)} \cdot \Delta_{out}^{h^-(v)} \leq \Delta^{H-1} \]
414: where $H = h(t)$ and $\Delta = \max(\Delta_{in}, \Delta_{out})$.
415: Therefore  for $H \leq 1000$ and $\Delta \leq 10000$ we get
416: $N(u,v)  \leq \Delta^{H-1} \leq 10^{4000}$ which is still in the range of
417:  \texttt{Extended} reals. Note also that in the derivation of this inequality
418: we were very generous -- in real-life networks $N(u,v)$ will be much smaller
419: than $\Delta^{H-1}$.
420: 
421: Very large/small numbers that result as weights in large networks are
422: not easy to use. One possibility to overcome this problem is to use the
423: logarithms of the obtained weights -- logarithmic transformation
424: is monotone and therefore preserve the ordering of weights (importance
425: of vertices and arcs). The transformed values are also more convenient
426: for visualization with line thickness of arcs.
427: 
428: \section{Properties of weights}
429: 
430: \subsection{General properties of weights}
431: 
432: 
433: Directly from the definitions of weights we get
434: \[  w_k(u,v;R) = w_k(v,u;\inv{R}), \qquad k=d,c,p \]
435: and
436: \[ w_c(u,v) \leq w_l(u,v) \leq  w_p(u,v) \]
437: 
438: % 5. avgust 2002
439: Let $\Net_A = (\Units_A, R_A)$ and $\Net_B = (\Units_B, R_B)$,
440: $\Units_A \cap \Units_B = \emptyset$ be two citation networks,
441: and $\Net_1 = (\Units'_A, R'_A)$
442: and $\Net_2 = ((\Units_A \cup \Units_B)', (R_A \cup R_B)')$
443: the corresponding standardized networks of the first network
444: and of the union of both networks. Then it holds for all
445: $u,v \in \Units_A$ and for all $p,q \in R_A$
446: \[  \frac{t_k^{(1)}(u)}{t_k^{(1)}(v)} = \frac{t_k^{(2)}(u)}{t_k^{(2)}(v)}, \qquad
447: \mbox{and} \qquad
448:     \frac{w_k^{(1)}(p)}{w_k^{(1)}(q)} = \frac{w_k^{(2)}(p)}{w_k^{(2)}(q)}, \qquad k=d,c,l,p \]
449: where $t^{(1)}$ and $w^{(1)}$ is a weight on network $\Net_1$, and
450: $t^{(2)}$ and $w^{(2)}$ is a weight on network $\Net_2$.
451: This means that adding or removing components in a network
452: do not change the ratios (ordering) of the weights inside components.
453: 
454: Let $\Net_1 = (\Units,R_1)$ and $\Net_2 = (\Units,R_2)$ be two citation networks over the same
455: set of units $\Units$ and $R_1 \subseteq R_2$ then
456: \[ w_k(u,v;R_1)  \leq w_k(u,v;R_2), \qquad k=d,c,p  \]
457: 
458: \subsection{NPPC weights}
459: 
460: In an acyclic network for every arc $(u,v) \in R$ hold
461: \[ \trecl{\inv{R}}(u) \cap \trecl{R}(v) = \emptyset \quad \mathrm{and} \quad
462:    \trecl{\inv{R}}(u) \cup \trecl{R}(v) \subseteq \Units \]
463: therefore $|\trecl{\inv{R}}(u)| + |\trecl{R}(v)| \leq n$ and,
464: using the inequality $\sqrt{ab} \leq \frac{1}{2} (a+b)$, also
465: \[ w_d(u,v) = |\trecl{\inv{R}}(u)| \cdot |\trecl{R}(v)| \leq \frac{1}{4} n^2 \]
466: 
467: Close to the source or sink the weights $w_d$ are small,
468: since the sets $\trecl{R}(u)$ (and $\trecl{\inv{R}}(u)$) are monotonic
469: along the paths in a sense
470: \[ u \tracl{R} v \Rightarrow \trecl{R}(u) \subset \trecl{R}(v) \]
471: The weights $w_d$ are larger in the 'middle' of the network.
472: 
473: A more uniform (but less sensitive)
474: weight would be $w_s(u,v) = |\trecl{\inv{R}}(u)| + |\trecl{R}(v)|$
475: or in the normalized form $w'_s(u,v) = \frac{1}{n} w_s(u,v)$.
476: 
477: \subsection{SPC weights}
478: 
479: 
480: For the flow $N(u,v)$ the \emph{Kirchoff's node law} holds:
481: 
482: For every node $v$ in a citation network in standard
483: form it holds
484: \[ \mbox{incoming flow} = \mbox{outgoing flow} = t_c(v)\]
485: 
486: \noindent\textbf{Proof:}
487: \[ \sum_{x:xRv} N(x,v) = \sum_{x:xRv} N^-(x)\cdot N^+(v) =
488:    (\sum_{x:xRv} N^-(x))\cdot N^+(v) = N^-(v)\cdot N^+(v) \]
489: \[ \sum_{y:vRy} N(v,y) = \sum_{y:vRy} N^-(v)\cdot N^+(y) =
490:    N^-(v)\cdot\sum_{y:vRy} N^+(y) = N^-(v)\cdot N^+(v) \]
491: \Qed
492: 
493: 
494: From the Kirchoff's node law it follows that
495: the \emph{total flow} through the citation network equals
496: $N(t,s)$. This gives us a natural way to normalize the weights
497:  \[ w(u,v) = \frac{N(u,v)}{N(t,s)} \quad \Rightarrow \quad
498:     0 \leq w(u,v) \leq 1 \]
499: If $C$ is a minimal arc-cut-set
500:  \[ \sum_{(u,v) \in C} w(u,v) = 1 \]
501: 
502: Let $\DK{n} = \{ (u,v): u,v \in 1..n \land u < v \}$ be the
503: complete acyclic directed graph on $n$ vertices then the value
504: of $N(u,v;\DK{n})$ is maximum over all citation networks on $n$
505: units. It is easy to verify that
506:  \[ N(1,n;\DK{n}) = 2^{n-2} \]
507: and in general
508:  \[ N(i,j;\DK{n}) = 2^{j-i-1}, i < j \]
509: From this result we see that the exhaustive search algorithm proposed
510: in Hummon and Doreian \cite{HumDor89,HumDor90} can require
511: exponential time to compute the arc weights $w$.
512: 
513: % oceni, za koliko se vrednost razlikujejo - ali res ni bistvene razlike ?
514: 
515: \section{Nonacyclic citation networks}
516: 
517: The problem with cycles is that if there is a cycle in a network
518: then there is also an infinite number of trails between some
519: units. There are some standard approaches to overcome the problem:
520: \begin{itemize}
521:  \item to introduce some 'aging' factor which makes the total weight of all trails
522:    converge to some finite value;
523:  \item to restrict the definition of a weight to some finite subset of
524:    trails -- for example paths or geodesics.
525: \end{itemize}
526: But, new problems arise: What is the right value of the 'aging' factor?
527: Is there an efficient algorithm to count the restricted trails?
528: 
529: \begin{figure}
530:  \begin{center}
531:   \includegraphics[width=140mm,viewport=0 0 376 186,clip]{./pics/preprint.eps}
532:   \caption{Preprint transformation\label{preprint}}
533:  \end{center}
534: \end{figure}
535: 
536: 
537: The other possibility, since a citation network is usually almost acyclic,
538: is to transform it into an acyclic network
539: \begin{itemize}
540:  \item by identification (shrinking) of cyclic groups (nontrivial strong
541:        components), or
542:  \item by deleting some arcs, or
543:  \item by transformations such as the 'preprint' transformation
544:        (see Figure~\ref{preprint}) which is based on the following idea:
545:        Each paper from a strong component is duplicated with its 'preprint'
546:        version. The papers inside strong component cite preprints.
547: \end{itemize}
548: 
549: Large strong components in citation network are unlikely --
550: their presence usually indicates an error in the data.
551: An exception from this rule is the
552: citation network of High Energy Particle Physics literature \cite{HEP}
553: from \textbf{arXiv}. In it different versions of the same paper
554: are treated as a unit. This leads to large strongly connected
555: components. The idea of preprint transformation can be used also in
556: this case to eliminate cycles.
557: 
558: 
559: \section{First Example: SOM citation network}
560: 
561: The purpose of this example is not the analysis of the selected
562: citation network on SOM (self-organizing maps) literature \cite{Gar02,SOM,SOMLVQ},
563: but to present typical steps and results in citation network analysis.
564: We made our analysis using program  \texttt{\textbf{Pajek}}.
565: 
566: 
567: First we test the network for acyclicity.
568: Since in the SOM network there are 11 nontrivial strong components
569: of size 2, see Table~\ref{citnets},
570: we have to transform the network into acyclic one. We decided to do
571: this by shrinking each component into a single vertex. This operation
572: produces some loops that should be removed.
573: 
574: Now, we can compute the citation weights. We selected the
575: SPC (search path count) method. It returns the following results:
576: the network with citation weights on arcs,  the main path
577: network and the vector with vertex weights.
578: 
579: 
580: \begin{figure}[!]
581:  \begin{center}
582:   \includegraphics[height=140mm,viewport=150 20 710 775,clip=]{./pics/mainP.eps}\quad
583:   \includegraphics[height=140mm,viewport=320 20 580 775,clip=]{./pics/CPM.eps}
584:   \caption{Main path and CPM path in SOM network with SPC weights\label{main}}
585:  \end{center}
586: \end{figure}
587: 
588: 
589: In a citation network, a \emph{main path}  (sub)network is
590: constructed starting from the source vertex
591: and selecting at each step in the end vertex/vertices the arc(s)
592: with the highest weight, until a sink vertex is reached.
593: 
594: Another possibility is to apply  on the network $\Net = (\Units,R,w)$
595: the critical path method (CPM) from operations research.
596: 
597: First we draw the main path network. The arc weights are represented
598: by the thickness of arcs. To produce a nice picture
599: of it we apply the Pajek's macro \texttt{Layers} which contains a
600: sequence of operations for determining a layered layout of an
601: acyclic network (used also in analysis of genealogies represented
602: by p-graphs). Some experiments with settings of
603: different options are needed to obtain a right picture,
604: see left part of Figure~\ref{main}. In its right part the
605: CPM path is presented.
606: 
607: We see that the upper parts of both paths are identical, but
608: they differ in the continuation. The arcs in the CPM path are
609: thicker.
610: 
611: We could display also the complete SOM network using
612: essentially the same procedure as for the displaying of
613: main path. But the obtained picture would be too complicated
614: (too many vertices and arcs). We have to identify some
615: simpler and important subnetworks inside it.
616: 
617: Inspecting the distribution of values of weights on arcs (lines)
618: we select a threshold 0.007 and determine the corresponding
619: \emph{arc-cut} -- delete all arcs with weights
620: lower than selected threshold and afterwards  delete also all
621: isolated vertices (degree $= 0$).
622: 
623: Now, we are ready to draw the reduced network. We first produce
624: an automatic layout.
625: We notice some small unimportant components. We preserve only
626: the large main component, draw it and improve the obtained layout
627: manually. To preserve the level structure we use the option
628: that allows only the horizontal movement of vertices.
629: 
630: \begin{figure}[!]
631:  \begin{center}
632:   \includegraphics[width=160mm,viewport=70 15 795 765,clip=]{./pics/som07LH.eps}
633:   \caption{Main subnetwork at level 0.007\label{maina}}
634:  \end{center}
635: \end{figure}
636: 
637: 
638: Finally we label the 'most important vertices'
639: with their labels. A vertex is considered important if it is an
640: endpoint of an arc with the weight above the selected
641: threshold (in our case 0.05).
642: 
643: The obtained picture of SOM 'main subnetwork'
644: is presented in Figure~\ref{maina}.
645: We see that the SOM field evolved in
646: two main branches. From CARPENTER-1987 the strongest (main path)
647: arc is leading to the right branch that after some steps disappears.
648: The left, more vital branch is detected by the CPM path.
649: Further investigation of this is left
650: to the readers with additional knowledge about the SOM field.
651: 
652: 
653: \begin{table}[!]
654: \caption{15 Hubs and Authorities \label{huau}}
655: \begin{center}\small
656: \begin{tabular}{r|l|l|l|l|}
657:     Rank  & $h$       &   Hub Id                        &       $a$       &   Authority Id                          \\ \hline
658:        1  & 0.06442   &   CLARK-JW-1991-V36-P1259       &       0.85214   &   HOPFIELD-JJ-1982-V79-P2554  \\
659:        2  & 0.06366   &   \#GARDNER-E-1988-V21-P257     &       0.33427   &   KOHONEN-T-1982-V43-P59        \\
660:        3  & 0.05794   &   HUANG-SH-1994-V17-P212        &       0.14531   &   KOHONEN-T-1990-V78-P1464    \\
661:        4  & 0.05721   &   GULATI-S-1991-V33-P173        &       0.12398   &   CARPENTER-GA-1987-V37-P54   \\
662:        5  & 0.05513   &   SHUBNIKOV-EI-1997-V64-P989    &       0.10376   &   \#GARDNER-E-1988-V21-P257    \\
663:        6  & 0.05496   &   MARSHALL-JA-1995-V8-P335      &       0.09353   &   HOPFIELD-JJ-1986-V233-P625  \\
664:        7  & 0.05488   &   VEMURI-V-1993-V36-P203        &       0.07882   &   MCELIECE-RJ-1987-V33-P461   \\
665:        8  & 0.05409   &   CHENG-B-1994-V9-P2            &       0.07656   &   KOHONEN-T-1988-V1-P3          \\
666:        9  & 0.05360   &   BUSCEMA-M-1998-V33-P17        &       0.07372   &   RUMELHART-DE-1985-V9-P75    \\
667:       10  & 0.05258   &   XU-L-1993-V6-P627             &       0.07271   &   KOSKO-B-1988-V18-P49          \\
668:       11  & 0.05249   &   WELLS-DM-1998-V41-P173        &       0.07246   &   ANDERSON-JA-1977-V84-P413   \\
669:       12  & 0.05233   &   SCHYNS-PG-1991-V15-P461       &       0.07033   &   AMARI-SI-1977-V26-P175        \\
670:       13  & 0.05173   &   SMITH-KA-1999-V11-P15         &       0.06709   &   KOSKO-B-1987-V26-P4947        \\
671:       14  & 0.05149   &   BONABEAU-E-1998-V9-P1107      &       0.05802   &   PERSONNAZ-L-1985-V46-PL359  \\
672:       15  & 0.05126   &   KOHONEN-T-1990-V78-P1464      &       0.05702   &   GROSSBERG-S-1987-V11-P23    \\ \hline
673: \end{tabular}
674: \end{center}
675: \end{table}
676: 
677: 
678: As a complementary information we can determine
679: Kleinberg's hubs and authorities vertex weights \cite{ha}.
680: Papers that are cited by many other papers are called authorities;
681: papers that cite many other documents are
682: called hubs.
683: Good authorities are those that are cited by good hubs
684: and good hubs cite good authorities. The 15 highest
685: ranked hubs and authorities are presented in Table~\ref{huau}.
686: We see that the main authorities are located in eighties
687: and the main hubs in nineties.
688: Note that, since we are using the relation
689: $u R v \equiv u \mbox{ is cited by } v$, we have to
690: interchange the roles of hubs and authorities produced by
691: \texttt{\textbf{Pajek}}.
692: 
693: An elaboration of the hubs and authorities approach to the analysis
694: of citation networks complemented with visualization can be found in
695: Brandes and Willhalm (2002) \cite{BW}.
696: 
697: \section{Second Example: US patents}
698: 
699: The network of US patents from 1963 to 1999 \cite{patents} is an
700: example of very large network (3774768 vertices and 16522438 arcs)
701: that, using some special options in \texttt{\textbf{Pajek}},
702: can still be analyzed on PC with at least 1 G memory.
703: The SPC weights are determined in a range of 1 minute.
704: This shows that the proposed approach can be used also for
705: very large networks.
706: 
707: The obtained main path and CPM path are presented in Figure~\ref{mainpat}.
708: Collecting from the
709: \textbf{\textit{United States Patent and Trademark Office}} \cite{uspto}
710: the basic data about the patents from both paths, see
711: Table~\ref{patinfo}-\ref{patinfoD}, we see that they deal with
712: 'liquid crystal displays'.
713: 
714: \begin{figure}[!]
715:  \begin{center}
716:   \includegraphics[height=175mm,viewport=65 20 375 635,clip=]{./pics/main.eps}\quad
717:   \includegraphics[height=175mm,viewport=0 20 280 785,clip=]{./pics/CPMpath.eps}
718:   \caption{Main path and CPM path subnetwork of Patents\label{mainpat}}
719:  \end{center}
720: \end{figure}
721: 
722: \begin{table}
723: \caption{Patents on the liquid-crystal display\label{patinfo}}
724: \begin{center}
725: %\scriptsize
726: \renewcommand{\arraystretch}{0.83}
727: \begin{tabular}{|r|r|l|}
728: \hline
729: patent  & date         & author(s) and title \\
730: \hline
731: 2544659 & Mar 13, 1951 & Dreyer.
732:         Dichroic light-polarizing sheet and the like and the\\
733:     & & formation and use thereof\\
734: 
735: 2682562 &  Jun 29, 1954  &   Wender, et al.
736:         Reduction of aromatic carbinols\\
737: 
738: 3322485 &  May 30, 1967  &   Williams.
739:         Electro-optical elements utilazing an organic\\
740:     & & nematic compound\\
741: 
742: 3512876 &  May 19, 1970  & Marks.
743:         Dipolar electro-optic structures\\
744: 
745: 3636168 &  Jan 18, 1972  &   Josephson.
746:         Preparation of polynuclear aromatic compounds\\
747: 
748: 3666948 &  May 30, 1972  &   Mechlowitz, et al.
749:         Liquid crystal termal imaging system\\
750:     & & having an undisturbed image on a disturbed background\\
751: 
752: 3675987 & Jul 11, 1972 &  Rafuse.
753:         Liquid crystal compositions and devices \\
754: 
755: 3691755 &  Sep 19, 1972  &   Girard.
756:         Clock with digital display\\
757: 
758: 3697150 &  Oct 10, 1972  &   Wysochi.
759:         Electro-optic systems in which an electrophoretic-\\
760:     & & like or dipolar material is dispersed throughout a liquid\\
761:     & & crystal to reduce the turn-off time\\
762: 
763: 3731986 & May  8, 1973 &  Fergason.
764:         Display devices utilizing liquid crystal light\\
765:     & & modulation \\
766: 
767: 3740717 &  Jun 19, 1973 &  Huener, et al.
768:            Liquid crystal display \\
769: 
770: 3767289 &  Oct 23, 1973  &   Aviram, et al.
771:         Class of stable trans-stilbene compounds,\\
772:     & & some displaying nematic mesophases at or near room\\
773:     & & temperature and others in a range up to 100$^\circ$C\\
774: 
775: 3773747 &  Nov 20, 1973  &   Steinstrasser.
776:         Substituted azoxy benzene compounds\\
777: 
778: 3795436 & Mar  5, 1974 &  Boller, et al.
779:         Nematogenic material which exhibit the Kerr\\
780:     & & effect at isotropic temperatures \\
781: 
782: 3796479 &  Mar 12, 1974  &   Helfrich, et al.
783:         Electro-optical light-modulation cell\\
784:     & & utilizing a nematogenic material which exhibits the Kerr\\
785:     & & effect at isotropic temperatures\\
786: 
787: 3806230 &  Apr 23, 1974  & Haas.
788:         Liquid crystal imaging system having optical storage\\
789:     & & capabilities\\
790: 
791: 3809458 &  May  7, 1974 &  Huener, et al.
792:         Liquid crystal display\\
793: 
794: 3872140 & Mar 18, 1975 &  Klanderman, et al.
795:         Liquid crystalline compositions and\\
796:     & & method \\
797: 
798: 3876286 & Apr  8, 1975 & Deutscher, et al.
799:         Use of nematic liquid crystalline substances\\
800: 
801: 3881806 & May  6, 1975 &  Suzuki.
802:         Electro-optical display device \\
803: 
804: 3891307 & Jun 24, 1975 & Tsukamoto, et al.
805:         Phase control of the voltages applied to\\
806:     & & opposite electrodes for a cholesteric to nematic phase\\
807:     & & transition display \\
808: 
809: 3947375 &  Mar 30, 1976 &  Gray, et al.
810:         Liquid crystal materials and devices \\
811: 
812: 3954653 &  May  4, 1976 &  Yamazaki.
813:         Liquid crystal composition having high dielectric\\
814:     & & anisotropy and display device incorporating same \\
815: 
816: 3960752 & Jun  1, 1976 &  Klanderman, et al.
817:         Liquid crystal compositions \\
818: 
819: 3975286 &  Aug 17, 1976 &  Oh.
820:         Low voltage actuated field effect liquid crystals\\
821:     & & compositions and method of synthesis \\
822: 
823: 4000084 &  Dec 28, 1976 &  Hsieh, et al.
824:         Liquid crystal mixtures for electro-optical\\
825:     & & display devices \\
826: 
827: 4011173 & Mar  8, 1977 &  Steinstrasser.
828:         Modified nematic mixtures with\\
829:     & & positive dielectric anisotropy \\
830: 
831: 4013582 &  Mar 22, 1977 &  Gavrilovic.
832:         Liquid crystal compounds and electro-optic\\
833:     & & devices incorporating them \\
834: 
835: 4017416 &  Apr 12, 1977 &  Inukai, et al.
836:         P-cyanophenyl 4-alkyl-4'-biphenylcarboxylate,\\
837:     & & method for preparing same and liquid crystal compositions\\
838:     & & using same \\
839: 
840: \hline
841: \end{tabular}
842: \end{center}
843: \end{table}
844: 
845: \begin{table}
846: \caption{Patents on the  liquid-crystal display\label{patinfoB}}
847: \begin{center}
848: \renewcommand{\arraystretch}{0.83}
849: \begin{tabular}{|r|r|l|}
850: \hline
851: patent  & date         & author(s) and title \\
852: \hline
853: 
854: 4029595 &  Jun 14, 1977 &  Ross, et al.
855:         Novel liquid crystal compounds and electro-optic\\
856:     & & devices incorporating them \\
857: 
858: 4032470 &  Jun 28, 1977 &  Bloom, et al.
859:         Electro-optic device \\
860: 
861: 4077260 &  Mar  7, 1978 &  Gray, et al.
862:         Optically active cyano-biphenyl compounds and\\
863:     & & liquid crystal materials containing them \\
864: 
865: 4082428 & Apr  4, 1978 &  Hsu.
866:         Liquid crystal composition and method \\
867: 
868: 4083797 &  Apr 11, 1978 &  Oh.
869:         Nematic liquid crystal compositions \\
870: 
871: 4113647 &  Sep 12, 1978 &  Coates, et al.
872:         Liquid crystalline materials \\
873: 
874: 4118335 &  Oct  3, 1978 &  Krause, et al.
875:         Liquid crystalline materials of reduced viscosity \\
876: 
877: 4130502 &  Dec 19, 1978 &  Eidenschink, et al.
878:         Liquid crystalline cyclohexane derivatives \\
879: 
880: 4149413 & Apr 17, 1979 &  Gray, et al.
881:         Optically active liquid crystal mixtures and\\
882:     & & liquid crystal devices containing them \\
883: 
884: 4154697 &   May 15, 1979 &   Eidenschink, et al.
885:         Liquid crystalline hexahydroterphenyl\\
886:     & & derivatives \\
887: 
888: 4195916 &  Apr  1, 1980 &  Coates, et al.
889:         Liquid crystal compounds \\
890: 
891: 4198130 &  Apr 15, 1980 &  Boller, et al.
892:         Liquid crystal mixtures \\
893: 
894: 4202791 &  May 13, 1980 &  Sato, et al.
895:         Nematic liquid crystalline materials \\
896: 
897: 4229315 & Oct 21, 1980 &  Krause, et al.
898:         Liquid crystalline cyclohexane derivatives \\
899: 
900: 4261652 &  Apr 14, 1981 &  Gray, et al.
901:         Liquid crystal compounds and materials and \\
902:     & & devices containing them \\
903: 
904: 4290905 &  Sep 22, 1981 &  Kanbe.
905:         Ester compound \\
906: 
907: 4293434 &  Oct  6, 1981 &  Deutscher, et al.
908:         Liquid crystal compounds \\
909: 
910: 4302352 & Nov 24, 1981 &  Eidenschink, et al.
911:         Fluorophenylcyclohexanes, the preparation\\
912:     & & thereof and their use as components of liquid crystal dielectrics \\
913: 
914: 4330426 &  May 18, 1982 &  Eidenschink, et al.
915:         Cyclohexylbiphenyls, their preparation and\\
916:     & & use in dielectrics and electrooptical display elements \\
917: 
918: 4340498 & Jul 20, 1982 &  Sugimori.
919:         Halogenated ester derivatives \\
920: 
921: 4349452 &  Sep 14, 1982 &  Osman, et al.
922:         Cyclohexylcyclohexanoates\\
923: 
924: 4357078 &  Nov  2, 1982 &  Carr, et al.
925:         Liquid crystal compounds containing an alicyclic \\
926:     & & ring and exhibiting a low dielectric anisotropy and liquid\\
927:     & & crystal materials and devices incorporating such compounds \\
928: 
929: 4361494 &  Nov 30, 1982 &  Osman, et al.
930:         Anisotropic cyclohexyl cyclohexylmethyl ethers \\
931: 
932: 4368135 &  Jan 11, 1983 &  Osman.
933:         Anisotropic compounds with negative or positive\\
934:     & & DC-anisotropy and low optical anisotropy \\
935: 
936: 4386007 & May 31, 1983 &  Krause, et al.
937:         Liquid crystalline naphthalene derivatives \\
938: 
939: 4387038 &  Jun  7, 1983 &  Fukui, et al.
940:         4-(Trans-4'-alkylcyclohexyl) benzoic acid \\
941:     & & 4'"-cyano-4"-biphenylyl esters \\
942: 
943: 4387039 &  Jun  7, 1983 &  Sugimori, et al.
944:         Trans-4-(trans-4'-alkylcyclohexyl)-cyclohexane\\
945:     & & carboxylic acid 4'"-cyanobiphenyl ester \\
946: 
947: 4400293 &  Aug 23, 1983 &  Romer, et al.
948:         Liquid crystalline cyclohexylphenyl derivatives \\
949: 
950: 4415470 &  Nov 15, 1983 &  Eidenschink, et al.
951:         Liquid crystalline fluorine-containing \\
952:     & & cyclohexylbiphenyls and dielectrics and electro-optical display\\
953:     & & elements based thereon \\
954: 
955: 4419263 &  Dec  6, 1983 &  Praefcke, et al.
956:         Liquid crystalline cyclohexylcarbonitrile\\
957:     & & derivatives \\
958: 
959: 4422951 & Dec 27, 1983 &  Sugimori, et al.
960:         Liquid crystal benzene derivatives \\
961: 
962: 4455443 &  Jun 19, 1984 &  Takatsu, et al.
963:         Nematic halogen Compound \\
964: 
965: 4456712 &  Jun 26, 1984 &  Christie, et al.
966:         Bismaleimide triazine composition \\
967: 
968: 4460770 &  Jul 17, 1984 &  Petrzilka, et al.
969:         Liquid crystal mixture \\
970: 
971: 4472293 & Sep 18, 1984 &  Sugimori, et al.
972:         High temperature liquid crystal substances of\\
973:     & & four rings and liquid crystal compositions containing the same \\
974: 
975: \hline
976: \end{tabular}
977: \end{center}
978: \end{table}
979: 
980: \begin{table}
981: \caption{Patents on the liquid-crystal display\label{patinfoC}}
982: \begin{center}
983: \renewcommand{\arraystretch}{0.83}
984: \begin{tabular}{|r|r|l|}
985: \hline
986: patent  & date         & author(s) and title \\
987: \hline
988: 
989: 4472592 &  Sep 18, 1984 &  Takatsu, et al.
990:         Nematic liquid crystalline compounds \\
991: 
992: 4480117 &  Oct 30, 1984 &  Takatsu, et al.
993:         Nematic liquid crystalline compounds \\
994: 
995: 4502974 &  Mar  5, 1985 &  Sugimori, et al.
996:         High temperature liquid-crystalline ester\\
997:     & & compounds \\
998: 
999: 4510069 &  Apr  9, 1985 &  Eidenschink, et al.
1000:         Cyclohexane derivatives \\
1001: 
1002: 4514044 &  Apr 30, 1985 &  Gunjima, et al.
1003:         1-(Trans-4-alkylcyclohexyl)-2-(trans-4'-(p-sub\-\\
1004:     & & stituted phenyl) cyclohexyl)ethane and liquid crystal mixture \\
1005: 
1006: 4526704 & Jul  2, 1985 &  Petrzilka, et al.
1007:         Multiring liquid crystal esters \\
1008: 
1009: 4550981 & Nov  5, 1985 &  Petrzilka, et al.
1010:         Liquid crystalline esters and mixtures \\
1011: 
1012: 4558151 &  Dec 10, 1985 &  Takatsu, et al.
1013:         Nematic liquid crystalline compounds \\
1014: 
1015: 4583826 &  Apr 22, 1986 &  Petrzilka, et al.
1016:         Phenylethanes \\
1017: 
1018: 4621901 &  Nov 11, 1986 &  Petrzilka, et al.
1019:         Novel liquid crystal mixtures \\
1020: 
1021: 4630896 &  Dec 23, 1986 &  Petrzilka, et al.
1022:         Benzonitriles \\
1023: 
1024: 4657695 &  Apr 14, 1987 &  Saito, et al.
1025:         Substituted pyridazines \\
1026: 
1027: 4659502 & Apr 21, 1987 &  Fearon, et al.
1028:         Ethane derivatives \\
1029: 
1030: 4695131 &  Sep 22, 1987 &  Balkwill, et al.
1031:         Disubstituted ethanes and their use in liquid\\
1032:     & & crystal materials and devices \\
1033: 
1034: 4704227 &  Nov  3, 1987 &  Krause, et al.
1035:         Liquid crystal compounds \\
1036: 
1037: 4709030 &  Nov 24, 1987 &  Petrzilka, et al.
1038:         Novel liquid crystal mixtures \\
1039: 
1040: 4710315 & Dec  1, 1987 &  Schad, et al.
1041:         Anisotropic compounds and liquid crystal\\
1042:     & & mixtures therewith \\
1043: 
1044: 4713197 &  Dec 15, 1987 &  Eidenschink, et al.
1045:         Nitrogen-containing heterocyclic compounds \\
1046: 
1047: 4719032 &  Jan 12, 1988 &  Wachtler, et al.
1048:         Cyclohexane derivatives \\
1049: 
1050: 4721367 &  Jan 26, 1988 &  Yoshinaga, et al.
1051:         Liquid crystal device \\
1052: 
1053: 4752414 &  Jun 21, 1988 &  Eidenschink, et al.
1054:         Nitrogen-containing heterocyclic compounds \\
1055: 
1056: 4770503 &  Sep 13, 1988 &  Buchecker, et al.
1057:         Liquid crystalline compounds \\
1058: 
1059: 4795579 &  Jan  3, 1989 &  Vauchier, et al.
1060:         2,2'-difluoro-4-alkoxy-4'-hydroxydiphenyls and\\
1061:     & & their derivatives, their production process and\\
1062:     & & their use in liquid crystal display devices \\
1063: 
1064: 4797228 & Jan 10, 1989 &  Goto, et al.
1065:         Cyclohexane derivative and liquid crystal\\
1066:     & & composition containing same \\
1067: 
1068: 4820839 &  Apr 11, 1989 &  Krause, et al.
1069:         Nitrogen-containing heterocyclic esters \\
1070: 
1071: 4832462 &  May 23, 1989 &  Clark, et al.
1072:         Liquid crystal devices \\
1073: 
1074: 4877547 & Oct 31, 1989 &  Weber, et al.
1075:         Liquid crystal display element \\
1076: 
1077: 4957349 &  Sep 18, 1990 &  Clerc, et al.
1078:         Active matrix screen for the color display of\\
1079:     & & television pictures, control system and process for producing\\
1080:     & & said screen \\
1081: 
1082: 5016988 &  May 21, 1991 &  Iimura.
1083:         Liquid crystal display device with a birefringent\\
1084:     & & compensator \\
1085: 
1086: 5016989 &  May 21, 1991 &  Okada.
1087:         Liquid crystal element with improved contrast and\\
1088:     & & brightness \\
1089: 
1090: 5122295 & Jun 16, 1992 &  Weber, et al.
1091:         Matrix liquid crystal display \\
1092: 
1093: 5124824 &  Jun 23, 1992 &  Kozaki, et al.
1094:         Liquid crystal display device comprising a \\
1095:     & & retardation compensation layer having a maximum principal\\
1096:     & & refractive index in the thickness direction \\
1097: 
1098: 5171469 & Dec 15, 1992 &  Hittich, et al.
1099:         Liquid-crystal matrix display \\
1100: 
1101: 5175638 &  Dec 29, 1992 &  Kanemoto, et al.
1102:         ECB type liquid crystal display device having\\
1103:     & & birefringent layer with equal refractive indexes in the thickness\\
1104:     & & and plane directions\\
1105: 
1106: \hline
1107: \end{tabular}
1108: \end{center}
1109: \end{table}
1110: 
1111: \begin{table}
1112: \caption{Patents on the liquid-crystal display\label{patinfoD}}
1113: \begin{center}
1114: \renewcommand{\arraystretch}{0.83}
1115: \begin{tabular}{|r|r|l|}
1116: \hline
1117: patent  & date         & author(s) and title \\
1118: \hline
1119: 
1120: 5243451 &  Sep 7, 1993 &  Kanemoto, et al.
1121:         DAP type liquid crystal device with cholesteric\\
1122:     & & liquid crystal birefringent layer\\
1123: 
1124: 5283677 &  Feb  1, 1994 &  Sagawa, et al.
1125:         Liquid crystal display with ground regions \\
1126:     & & between terminal groups\\
1127: 
1128: 5308538 & May  3, 1994 &  Weber, et al.
1129:         Supertwist liquid-crystal display \\
1130: 
1131: 5319478 &  June 7, 1994  &   Funfschilling, et al.
1132:         Light control systems with a circular polarizer\\
1133:     & & and a twisted nematic liquid crystal having a minimum path\\
1134:     & & difference of .lambda./2\\
1135: 
1136: 5374374 & Dec 20, 1994 &  Weber, et al.
1137:         Supertwist liquid-crystal display \\
1138: 
1139: 5408346 &  Apr 18, 1995  &   Trissel, et al.
1140:         Optical collimating device employing cholesteric\\
1141:     & & liquid crystal and a non-transmissive reflector\\
1142: 
1143: 5539578 &  Jul 23, 1996  &   Togino, et al.
1144:         Image display apparatus\\
1145: 
1146: 5543077 & Aug  6, 1996 &  Rieger, et al.
1147:         Nematic liquid-crystal composition \\
1148: 
1149: 5555116 &  Sep 10, 1996 &  Ishikawa, et al.
1150:         Liquid crystal display having adjacent\\
1151:     & & electrode terminals set equal in length \\
1152: 
1153: 5683624 & Nov  4, 1997 &  Sekiguchi, et al.
1154:         Liquid crystal composition \\
1155: 
1156: 5771124 &  Jun 23, 1998  &   Kintz, et al.
1157:         Compact display system with two stage magnification\\
1158:     & & and immersed beam splitter\\
1159: 
1160: 5855814 & Jan  5, 1999 &  Matsui, et al.
1161:         Liquid crystal compositions and liquid crystal\\
1162:     & & display elements\\
1163: 
1164: 5991084 &  Nov 23, 1999 &  Hildebrand, et al.
1165:         Compact compound magnified virtual image\\
1166:     & & display with a reflective/transmissive optic\\
1167: 
1168: 6005720 &  Dec 21, 1999 &  Watters, et al.
1169:         Reflective micro-display system \\ \hline
1170: 
1171: 
1172: \end{tabular}
1173: \end{center}
1174: \end{table}
1175: 
1176: 
1177: 
1178: But, in this network there should be thousands of 'themes'.
1179: How to identify them?
1180: Using the arc weights we can define a \emph{theme} as a connected
1181: small subnetwork of size in the interval $k$ .. $K$
1182: (for example, between $k = \frac{1}{3}h$ and $K = 3h$)
1183: with stronger internal cohesion relatively to its
1184: neighborhood.
1185: 
1186: To find such subnetworks we use again the arc-cuts.
1187: We select a treshold $t$ and delete all arcs with weight
1188: lower than $t$. In the so reduced network we determine (weakly)
1189: connected components. The components of size in range $k .. K$,
1190: we call them $(k,K)$-\emph{islands},
1191: represent the themes since:
1192: \begin{itemize}
1193:  \item they are connected and of selected size,
1194:  \item all arcs linking them to their outside neighbors have weight lower
1195:        than $t$, and
1196:  \item each vertex of an island is linked with some other
1197:        vertex in the same island with an arc with a weight
1198:        at least $t$.
1199: \end{itemize}
1200: We discard components of size smaller than $k$ as 'noninteresting'.
1201: 
1202: The components of size larger then $K$ are too large. They contain
1203: several themes. To identify them we repeat the procedure on the
1204: network of these components with a higher threshold value $t'$.
1205: Recently we developed an algorithm, named \emph{Islands} \cite{Islands},
1206: that by 'continuosly' changing the threshold identifies all maximal
1207: $(k,K)$-islands.
1208: 
1209: We determined for SPC weights all (2,90)-islands in the US Patents
1210: network. The reduced network of islands has 470137 vertices, 307472 arcs and for
1211: different $k$: $C_2 = $187610, $C_5 = $8859,$C_{30} = $101,
1212: $C_{50} = $30 islands. The detailed island size frequency distribution
1213: is given in Table~\ref{fris} and presented in a log-log scale
1214: in Figure~\ref{power} that shows that it obeys the power law.
1215: 
1216: \begin{table}
1217: \caption{Island size frequency distribution\label{fris}}
1218: \begin{center}
1219: {\renewcommand{\baselinestretch}{0.7}\small
1220: \begin{verbatim}
1221:          [1]   0 139793  29670 9288 3966 1827 997 578 362 250
1222:         [11] 190    125    104   71   47   37  36  33  21  23
1223:         [21]  17     16      8    7   13   10  10   5   5   5
1224:         [31]  12      3      7    3    3    3   2   6   6   2
1225:         [41]   1      3      4    1    5    2   1   1   1   1
1226:         [51]   2      3      3    2    0    0   0   0   0   1
1227:         [61]   0      0      0    0    1    0   0   2   0   0
1228:         [71]   0      0      1    1    0    0   0   1   0   0
1229:         [81]   2      0      0    0    0    1   2   0   0   7
1230: \end{verbatim}
1231: }
1232: \end{center}
1233: \end{table}
1234: 
1235: \begin{figure}[!]
1236:  \begin{center}
1237:   \includegraphics[width=140mm,viewport=0 10 500 470,clip=]{./pics/size.eps}
1238:   \caption{Island size frequency distribution \label{power}}
1239:  \end{center}
1240: \end{figure}
1241: 
1242: \begin{figure}[!]
1243:  \begin{center}
1244:   \includegraphics[width=160mm,viewport=45 25 620 780,clip=]{./pics/islandMa.eps}
1245:   \caption{Main island 'liquid-crystal display'  \label{M}}
1246:  \end{center}
1247: \end{figure}
1248: 
1249: 
1250: \begin{figure}[!]
1251:  \begin{center}
1252:   \includegraphics[width=160mm,viewport=65 20 660 690,clip=]{./pics/island50.eps}
1253:   \caption{Island 'producing a foam' \label{A}}
1254:  \end{center}
1255: \end{figure}
1256: 
1257: 
1258: \begin{table}
1259: \caption{Some patents from the 'foam' island\label{Ainfo}}
1260: \begin{center}
1261: \begin{tabular}{|r|r|l|}
1262: \hline
1263: patent  & date         & author(s) and title \\
1264: \hline
1265: 4060439 &  Nov 29, 1977 & Rosemund, et al.
1266:            Polyurethane foam composition and method of\\
1267:       & &  making same\\
1268: 
1269: 4292369 & Sep 29, 1981 & Ohashi, et al.
1270:         Fireproof laminates\\
1271: 
1272: 4357430 &  Nov 2, 1982 &  VanCleve.
1273:            Polymer/polyols, methods for making same and\\
1274:       & &  polyurethanes based thereon\\
1275: 
1276: 4459334 & Jul 10, 1984 & Blanpied, et al.
1277:         Composite building panel\\
1278: 
1279: 4496625 & Jan 29, 1985 &  Snider ,   et al.
1280:         Alkoxylated aromatic amine-aromatic polyester\\
1281:         & & polyol blend and polyisocyanurate foam therefrom\\
1282: 
1283: 4544679 & Oct 1, 1985  & Tideswell, et al.
1284:         Polyol blend and polyisocyanurate foam\\
1285:         & & produced therefrom\\
1286: 
1287: 4714717 & Dec 22, 1987 & Londrigan, et al.
1288:         Polyester polyols modified by low molecular\\
1289:         & & weight glycols and cellular foams therefrom\\
1290: 
1291: 4927863 & May 22, 1990 & Bartlett, et al.
1292:         Process for producing closed-cell polyurethane \\
1293:         & & foam compositions expanded with mixtures of blowing agents\\
1294: 
1295: 4996242 &  Feb 26, 1991 & Lin.
1296:            Polyurethane foams manufactured with mixed\\
1297:       & &  gas/liquid blowing agents\\
1298: 
1299: 5169873 & Dec 8, 1992  & Behme, et al.
1300:         Process for the manufacture of foams with the aid\\
1301:         & & of blowing agents containing fluoroalkanes and fluorinated\\
1302:         & & ethers, and foams obtained by this process\\
1303: 
1304: 5187206 & Feb 16, 1993 & Volkert, et al.
1305:         Production of cellular plastics by the\\
1306:     & & polyisocyanate polyaddition process, and low-boiling,\\
1307:     & & fluorinated or perfluorinated, tertiary alkylamines\\
1308:     & & as blowing agent-containing emulsions for this purpose\\
1309: 
1310: 5308881 & May  3, 1994 & Londrigan, et al.
1311:         Surfactant for polyisocyanurate foams\\
1312:     & & made with alternative blowing agents\\
1313: 
1314: 5558810 &  Sep 24, 1996 & Minor, et al.
1315:            Pentafluoropropane compositions\\
1316: 
1317: \hline
1318: \end{tabular}
1319: \end{center}
1320: \end{table}
1321: 
1322: \begin{figure}[!]
1323:  \begin{center}
1324:   \includegraphics[width=160mm,viewport=45 20 560 680,clip=]{./pics/island38.eps}
1325:   \caption{Island 'fiber optics and bags' \label{C}}
1326:  \end{center}
1327: \end{figure}
1328: 
1329: \begin{table}
1330: \caption{Some patents from 'fiber optics and bags' island\label{Cinfo}}
1331: \begin{center}
1332: \begin{tabular}{|r|r|l|}
1333: \hline
1334: patent  & date          & author(s) and title \\
1335: \hline
1336: 4461536 & Jul 24, 1984 & Shaw, et al.
1337:         Fiber coupler displacement transducer\\
1338: 
1339: 4511582 & Apr 16, 1985 & Bair.
1340:         Phenanthrene derivatives\\
1341: 
1342: 4530800 & Jul 23, 1985  & Bair.
1343:         Perylene derivatives\\
1344: 
1345: 4589728 & May 20, 1986 & Dyott, et al.
1346:         Optical fiber polarizer\\
1347: 
1348: 4676378 & Jun 30, 1987  & Baxley, et al.
1349:         Bag pack\\
1350: 
1351: 4719047 & Jan 12, 1988  & Bair.
1352:         Anthracene derivatives\\
1353: 
1354: 4784453 & Nov 15, 1988  & Shaw, et al.
1355:         Backward-flow ladder architecture and method\\
1356: 
1357: 4785938 & Nov 22, 1988 & Benoit, Jr., et al.
1358:         Thermoplastic bag pack\\
1359: 
1360: 4810052 & Mar  7, 1989 & Fling.
1361:         Fiber optic bidirectional data bus tap\\
1362: 
1363: 4811417 & Mar  7, 1989 & Prince, et al.
1364:         Handled bag with supporting slits in handle\\
1365: 
1366: 4829090 & May 9, 1989   & Bair.
1367:         Chrysene derivatives\\
1368: 
1369: 4981216 & Jan  1, 1991  & Wilfong, Jr.
1370:         Easy opening bag pack and supporting rack\\
1371:         & & system and fabricating method\\
1372: 
1373: 4997249 & Mar 5, 1991   & Berry, et al.
1374:         Variable weight fiber optic transversal filter\\
1375: 
1376: 5188235 & Feb 23, 1993  & Pierce, et al.
1377:         Bag pack\\
1378: 
1379: 5307935 & May  3, 1994 & Kemanjian.
1380:         Packs of self opening plastic bags and method of\\
1381:         & & fabricating the same\\
1382: 
1383: 5363965 & Nov 15, 1994  & Nguyen.
1384:         Self-opening thermoplastic bag system\\
1385: 
1386: \hline
1387: \end{tabular}
1388: \end{center}
1389: \end{table}
1390: 
1391: The main island has 90 vertices and contains middle parts of the main
1392: path and the CPM path. They also have a short common part.
1393: Again, the greedy strategy of the main path leads to a less vital branch.
1394: Considering the basic data about the patents from
1395: Table~\ref{patinfo}-\ref{patinfoC}, we see that also the main island
1396: deals with 'liquid crystal displays'.
1397: 
1398: For additional illustration of results obtained by Islands
1399: algorithm we selected two smaller islands at lower levels --
1400: see Figure~\ref{A} (50 vertices) and Figure~\ref{C} (38 vertices).
1401: Retreiving the basic data about some patents in these islands from
1402: \textbf{\textit{United States Patent and Trademark Office}},
1403: see Table~\ref{Ainfo} and Table~\ref{Cinfo}, we can label the
1404: corresponding theme of the first island as 'producing a foam'.
1405: The theme of the second island deals initially with 'fiber optics',
1406: but in the upper part it switches to 'bag pack system'.
1407: 
1408: 
1409: \section{Conclusions}
1410: 
1411: In the paper we proposed an approach to the analysis of citation
1412: networks that can be used also for very large networks with millions
1413: of vertices and arcs.
1414: 
1415: On test cases, the methods SPC, SPLC, NPPC produced almost the
1416: same results. Since the method SPC  has additional
1417: 'nice' properties it could be considered as a 'first choice' --
1418: but, to make a grounded recommendation,
1419: additional experiences should be gained from the analyses of real-life
1420: large citation networks.
1421: 
1422: The granularity of the results strongly depends on the range
1423: for 'interesting themes' $k$ .. $K$ -- varying these two parameters
1424: we get larger or smaller sets of themes.
1425: 
1426: Instead of arc-cuts we could consider also vertex-cuts
1427: with respect to $p$-cores on SPC weights \cite{pCores}
1428: with a $p$-function
1429: \[ p(v,W) = \max( \sum_{u \in W : u R v} w(u,v),
1430:                   \sum_{u \in W : v R u} w(v,u) ) \]
1431: 
1432: The subnetworks approach only filters out the structurally important
1433: subnetworks thus providing a researcher with a smaller manageable
1434: structures which can be further analyzed using more sophisticated
1435: and/or substantial methods.
1436: 
1437: \section{Acknowledgments}
1438: 
1439: The search path count algorithm was developed during my visit in
1440: Pittsburgh in 1991 and presented at the Network seminar
1441: \cite{Bat91}. It was presented to the broader audience
1442: at EASST'94 in Budapest \cite{Bat94}. In 1997 it was
1443: included in program \texttt{\textbf{Pajek}} \cite{pajek}.
1444: The 'preprint' transformation was developed as a part of the
1445: contribution for the Graph drawing contest 2001 \cite{GD01}.
1446: The algorithm for the path length counts was developed in August 2002
1447: and the Islands algorithm in August 2003.
1448: 
1449: The author would like to thank Patrick Doreian and Norm Hummon
1450: for introducing him into the field of citation network analysis,
1451: Eugene Garfield for making available the data on real-life
1452: networks and providing some relevant references,
1453: and Andrej Mrvar and Matja\v{z} Zaver\v{s}nik
1454: for implementing the algorithms in  \texttt{\textbf{Pajek}}.
1455: 
1456: This work was supported by the Ministry of Education, Science and Sport of
1457: Slovenia, Project 0512-0101.
1458: 
1459: \newpage
1460: 
1461: \begin{thebibliography}{99}
1462: 
1463: \bibitem{Asimov}  Asimov I.: The Genetic Code,
1464:   New American Library, New York, 1963.
1465: 
1466: \bibitem{Bat91} Batagelj V.: Some Mathematics of Network Analysis.
1467:  Network Seminar, Department of Sociology,
1468:  University of Pittsburgh, January 21, 1991.
1469: 
1470: \bibitem{Bat94} Batagelj V.: An Efficient Algorithm for Citation Networks
1471:  Analysis. Paper presented at EASST'94, Budapest, Hungary,
1472:  August 28-31, 1994.
1473: 
1474: \bibitem{pajek} Batagelj V., Mrvar A.: \texttt{\textbf{Pajek}} -- program for
1475:  analysis and visualization of large networks. \\
1476:  \url{http://vlado.fmf.uni-lj.si/pub/networks/pajek/}\\
1477:  \url{http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/extreme.htm}
1478: 
1479: \bibitem{GD01} Batagelj V., Mrvar A.:
1480:  Graph Drawing Contest 2001 Layouts \\
1481:  \url{http://vlado.fmf.uni-lj.si/pub/GD/GD01.htm}
1482: 
1483: \bibitem{pCores} Batagelj V., Zaver\v{s}nik M.:
1484:  Generalized Cores.  Submitted, 2002.\\
1485:  \url{http://arxiv.org/abs/cs.DS/0202039}
1486: 
1487: \bibitem{Islands} Batagelj V., Zaver\v{s}nik M.:
1488:  Islands -- identifying themes in large networks. In preparation, August 2003.
1489: % \url{http://arxiv.org/abs/cs.DS/0202039}
1490: 
1491: \bibitem{BW} Brandes U., Willhalm T.:
1492:  Visualization of
1493:  bibliographic networks with a reshaped landscape metaphor.
1494:  Joint Eurographics -- IEEE TCVG Symposium on Visualization,
1495:  D. Ebert, P. Brunet, I. Navazo (Editors), 2002.\\
1496:  \url{http://algo.fmi.uni-passau.de/\symbol{126}brandes/}\\
1497:  \url{\strut\qquad publications/bw-vbnrl-02.pdf}
1498: 
1499: \bibitem{algo}  Cormen T.H.,  Leiserson C.E., Rivest R.L.,  Stein C.:
1500:  Introduction to Algorithms, Second Edition. MIT Press, 2001.
1501: 
1502: \bibitem{Gar64} Garfield E, Sher IH, and Torpie RJ.:
1503:  The Use of Citation Data in Writing the History of Science.
1504:  Philadelphia: The Institute for Scientific Information, December 1964.\\
1505: \url{http://www.garfield.library.upenn.edu/papers/}\\
1506: \url{\strut\qquad useofcitdatawritinghistofsci.pdf}
1507: 
1508: \bibitem{Gar01} Garfield E.:
1509:  From Computational Linguistics to Algorithmic Historiography,
1510:  paper presented at the Symposium in Honor of Casimir Borkowski
1511:  at the University of Pittsburgh School of Information Sciences,
1512:  September 19, 2001.\\
1513:  \url{http://garfield.library.upenn.edu/papers/pittsburgh92001.pdf}
1514: 
1515: \bibitem{Gar02} Garfield E., Pudovkin A.I.,  Istomin, V.S.:
1516:  \textit{\textbf{Histcomp}} -- (\textit{comp}iled \textit{Hist}oriography program)\\
1517:  \url{http://garfield.library.upenn.edu/histcomp/guide.html}\\
1518:  \url{http://www.garfield.library.upenn.edu/histcomp/index.html}
1519: 
1520: \bibitem{3D} Garner R.:
1521:  A computer oriented, graph theoretic analysis of citation index structures.
1522:  Flood B. (Editor), Three Drexel information science studies, Philadelphia:
1523:  Drexel University Press 1967.\\
1524:  \url{http://www.garfield.library.upenn.edu/rgarner.pdf}
1525: 
1526: \bibitem{HumDor89} Hummon N.P., Doreian P.:
1527:  Connectivity in a Citation Network: The Development of DNA Theory.
1528:  Social Networks, {\bf 11}(1989) 39-63.
1529: 
1530: \bibitem{HumDor90} Hummon N.P., Doreian P.:
1531:  Computational Methods for Social Network Analysis.
1532:  Social Networks, {\bf 12}(1990) 273-288.
1533: 
1534: \bibitem{HuDoFr90} Hummon N.P., Doreian P., Freeman L.C.:
1535:  Analyzing the Structure of the Centrality-Productivity Literature
1536:  Created Between 1948 and 1979.
1537:  Knowledge: Creation, Diffusion, Utilization, {\bf 11}(1990)4, 459-480.
1538: 
1539: \bibitem{ha} Kleinberg J.:
1540:  Authoritative sources in a hyperlinked environment.
1541:  In Proc 9th ACMSIAM Symposium on Discrete Algorithms, 1998, p. 668-677.\\
1542:  \url{http://www.cs.cornell.edu/home/kleinber/auth.ps}\\
1543:  \url{http://citeseer.nj.nec.com/kleinberg97authoritative.html}
1544: 
1545: \bibitem{GT} Wilson, R.J., Watkins, J.J.:
1546:  \emph{Graphs: An Introductory Approach}.
1547:  New York: John Wiley and Sons, 1990.
1548: 
1549: \bibitem{data} Pajek's datasets -- citation networks:\\
1550:  \url{http://vlado.fmf.uni-lj.si/pub/networks/data/cite/}
1551: 
1552: \bibitem{HEP} KDD Cup 2003:\\
1553:  \url{http://www.cs.cornell.edu/projects/kddcup/index.html}\\
1554:  \url{http://arxiv.org/}
1555: 
1556: \bibitem{patents} Hall, B.H., Jaffe, A.B. and Tratjenberg M.:
1557:  The NBER U.S. Patent Citations Data File. NBER Working Paper 8498 (2001).\\
1558:  \url{http://www.nber.org/patents/}
1559: 
1560: \bibitem{uspto} The United States Patent and Trademark Office. \\
1561:  \url{http://patft.uspto.gov/netahtml/srchnum.htm}
1562: 
1563: \bibitem{SOMLVQ}
1564:  Bibliography on the Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ)\\
1565:  \url{http://liinwww.ira.uka.de/bibliography/Neural/SOM.LVQ.html}
1566: 
1567: \bibitem{SOM}
1568:  Neural Networks Research Centre: Bibliography of SOM papers.\\
1569:  \url{http://www.cis.hut.fi/research/refs/}
1570: 
1571: \end{thebibliography}
1572: \end{document}
1573: 
1574: