1: \documentclass{elsart}
2: %\usepackage{natbib}
3: \newcommand{\be}{\begin{equation}}
4: \newcommand{\ee}{\end{equation}}
5: \newcommand{\lb}[1]{\label{#1}}
6: \newcommand{\sty}{\scriptstyle}
7: \newcommand{\ssty}{\scriptscriptstyle}
8: \newcommand{\apg}{\:^{>}_{\sim}\:}
9: \newcommand{\apl}{\:^{<}_{\sim}\:}
10: \newcommand{\eg}{{\it e.g.\ }}
11: \begin{document}
12: \runauthor{Moura Jr.\ and Ribeiro}
13: \begin{frontmatter}
14: \title{Zipf Law for Brazilian Cities}
15: \author[NJMJr]{Newton J.\ Moura Jr}
16: \author[MBR]{and Marcelo B.\ Ribeiro}%\thanksref{corresponding}}
17:
18: \address[NJMJr]{IBGE -- Brazilian Institute for Geography and Statistics,
19: Geosciences Directorate, Geodesics Department,
20: Av.\ Brasil 15671, Rio de Janeiro, RJ 21241-051, Brazil;
21: e-mail:~newtonjunior@ibge.gov.br}
22: \address[MBR]{Physics Institute, University of Brazil -- UFRJ, CxP 68532,
23: Rio de Janeiro, RJ 21945-970, Brazil; e-mail: mbr@if.ufrj.br}
24: %\thanks[corresponding]{Corresponding author.}
25: \begin{abstract}
26: This work studies the Zipf Law for cities in Brazil. Data from
27: censuses of 1970, 1980, 1991 and 2000 were used to select a
28: sample containing only cities with 30,000 inhabitants
29: or more. The results show that the population distribution in
30: Brazilian cities does follow a power law similar to the ones found in
31: other countries. Estimates of the power law exponent were found
32: to be $2.22 \pm 0.34$ for the 1970 and 1980 censuses, and $2.26 \pm
33: 0.11$ for censuses of 1991 and 2000. More accurate results were
34: obtained with the maximum likelihood estimator, showing an exponent
35: equal to $2.41$ for 1970 and $2.36$ for the other three years.
36:
37: \vspace{5.0mm}
38: \hspace{-3.5mm}{\it PACS:} \ 89.75Da; 89.65.Cd; 89.75.-k; 05.45.Df
39: \end{abstract}
40: \begin{keyword}
41: Complex Systems; Power Laws; Population of Cities; Fractals
42: \end{keyword}
43: \end{frontmatter}
44:
45: \section{Introduction}
46:
47: It was first observed by Auerbach \cite{au}, although it is often
48: attributed to Zipf \cite{z}, that the way in which urban aggregates are
49: distributed, that is, the way the populations of cities are distributed,
50: follows a power law behaviour with exponent $\alpha \approx 2$. If we
51: assign probabilities to this distribution the resulting behaviour
52: is also a power law, known as the {\it Zipf law}. This law seems to
53: have an universal character, holding at the world level \cite{zan} as
54: well as to single nations. The exponent also seems to be independent
55: of the area of the nation and the social and economical conditions
56: of its population \cite{mas}.
57:
58: Power law exponents of cities have been measured in many countries.
59: It was reported by \cite{zan} that 2,400 cities in the U.S.A.\ have
60: $\alpha = 2.1 \pm 0.1$, whereas \cite{n05} reported $\alpha=2.30 \pm
61: 0.05$ for the U.S.A. census of year 2000. According to \cite{zan}
62: 1,300 municipalities in Switzerland have $\alpha=2.0 \pm 0.1$. Taking
63: together 2,700 cities of the world with population bigger than
64: 100,000 inhabitants produces $\alpha=2.03 \pm 0.05$ \cite{zan}. One
65: should notice that those exponents were calculated by least squares
66: fitting, a method known to introduce biased results if data is not
67: properly handled \cite{g04}. Despite this, most results obtained so
68: far indicate that the exponent seems to follow the universal value
69: of $\alpha \approx 2$.
70:
71: Such power law behaviour seems to be the manifestation of the
72: dynamics of complex systems, whose striking feature is of showing
73: universal laws characterized by exponents in scale invariant
74: distributions that happen to be basically independent of the
75: details in the microscopic dynamics. Social behaviour is an example
76: of interaction of the elements of a complex systems, in this case
77: human beings, giving rise to cooperative evolution which in
78: itself strongly differs from the individual dynamics. So,
79: the demographic distribution of human beings on the Earth's
80: surface, which has sharp peaks of concentrated population - the
81: cities - alternated with relatively large extensions where the
82: population density is much lower, follows a power law typical of
83: complex system dynamics.
84:
85: The aim of this paper is to present empirical evidence that the
86: population distribution of Brazilian cities also follows a
87: power law with exponent close to the universal value. We
88: have selected a sample from Brazil's decennial censuses of 1970,
89: 1980, 1991 and 2000 and obtained probability distribution
90: functions of Brazilian cities with a lower cutoff of 30,000
91: inhabitants. Our procedure took great care to avoid large
92: statistical fluctuations at the tail in order to avoid introducing
93: large biases in the determination of the exponent \cite{n05,g04}.
94: Our results show that Brazilian cities do follow the universal
95: pattern: conservative estimates produced $\alpha = 2.22 \pm 0.34$
96: in 1970 and 1980. For 1991 and 2000 we obtained $\alpha=2.26 \pm
97: 0.11$.
98:
99: The paper is organized as follows. In \S 2 we present the data and our
100: selection methodology, whereas in \S 3 we present the methods to
101: analyze the data. \S 4 shows the results obtained using three different
102: techniques to calculate the exponent $\alpha$. The paper ends with a
103: concluding section.
104:
105: \section{The Data}\lb{data}
106:
107: Brazil is estimated to reach a population of approximately 185
108: million inhabitants by the end of 2005, the 5th place in the
109: ranking of the world's most populous countries. This population
110: occupies over 5 thousand cities, and although most of them have
111: very few inhabitants, 15 cities have more than one million people,
112: with two of them, S\~ao Paulo and Rio de Janeiro, having more than
113: 5 million inhabitants. In order to obtain a sample for the
114: purposes of this work we need to define first of all what
115: we mean by a {\it city}. After surveying the administrative way
116: Brazil is governed we concluded that in Brazil's case we should
117: {\it equate} city to {\it municipality}, defined as being the
118: territorially smallest administrative subdivision of a country
119: that has its own democratically elected representative leadership.
120: This means that Brazil's entire territory is subdivided in
121: municipalities, or cities. Some of them have very big areas,
122: actually bigger than many European countries, but those are usually
123: located in regions very sparsely populated.
124:
125: Censuses of Brazil's entire population have been taking place for over
126: a hundred years at a ten years hiatus since 1890. However, data in
127: digitalized form is only available at IBGE, the government institution
128: responsible for censuses, since 1970. Data in between censuses are
129: obtained by very small sampling and extrapolation. Considering this
130: we decided to take data only from the official, entire population,
131: censuses available in digital format, namely for the years of 1970,
132: 1980, 1991 and 2000. This data shows that the number of Brazilian
133: municipalities has increased to over 30\% from 1970 to 2000. This is
134: clearly a consequence of the fact that the definition of a city is
135: administrative, reflecting Brazil's internal politics, and has been
136: varying over the last decades.
137:
138: The fact that the number of municipalities has shown a sharp increase
139: within the time span of our data will not affect our study because, as
140: mentioned above, most Brazilian cities have small populations and as the
141: Brazilian concept of a city means territorial subdivision, which includes
142: both rural and urban inhabitants, an examination of the data shows that
143: cities with more than 30 thousand inhabitants have their population
144: almost entirely concentrated in the urban area.\footnote{Nowadays
145: IBGE defines what is a rural, as opposed to an urban, area by satellite
146: imagery. See also footnote at page \protect\pageref{rural}.} We have,
147: therefore, decided to include only cities with
148: more than 30 thousand people in our sample, which meant a significant
149: reduction of the number of the municipalities as compared to the
150: original raw data (see table \ref{tab1}). The exclusion of the smaller
151: cities represents in fact the exclusion of the rural population from
152: our sample. In 1970 40\% of Brazilians were living in cities with less
153: than 30 thousand people, whereas in 2000 this figure was reduced to
154: 26\%. In other words, roughly speaking the percentage of Brazilians
155: living in urban areas has increased from 60\% in 1970 to 74\% in 2000.
156: \begin{table}[t]
157: \caption{\it Number of cities in Brazil.}\lb{tab1}
158: \vspace{2mm}
159: \begin{tabular}{|c|c|c|c|c|}
160: \hline
161: year of census & 1970 & 1980 & 1991 & 2000 \\
162: \hline \hline
163: all cities & 3958 & 3806 & 4277 & 5238 \\
164: \hline
165: cities with $\ge$ 30,000 & 614 & 787 & 905 & 955 \\
166: \hline
167: \end{tabular}
168: \end{table}
169:
170: \section{Data Analysis}
171:
172: Once our sample is selected, we need to define our method of analysis.
173: Here we shall follow closely the methodology for fitting power law
174: distributions and estimating goodness-of-fit parameters as proposed by
175: \cite{n05}. We will start with a very brief introductory description
176: of power laws statistics in order to fix the notation.
177:
178: Let ${p}(x)\: dx$ be the fraction of cities with population between $x$ and
179: $x+dx$. So ${p}(x)$ defines a certain distribution of the data $x$. It is
180: useful to express this distribution in terms of the {\it cumulative
181: distribution function} $\mathcal{P}(x)=\int_x^\infty
182: {p}(x^\prime)dx^\prime$, which is simply the probability that a city has
183: a population equal to or greater than $x$.
184: If the fraction ${p}(x)$ follows a power law of the type,
185: \be {p}(x) = C x^{-\alpha}, \lb{1} \ee
186: where $\alpha$ and $C$ are constants, then
187: $\mathcal{P}(x)$ also follows a power law, given by
188: \be \mathcal{P}(x) =
189: \frac{C}{(\alpha - 1)} \; x^{-(\alpha -1)}.
190: \lb{3}
191: \ee
192: Such power law distributions are
193: also known as {\it Zipf law} or {\it Pareto distribution}. From
194: equation (\ref{1}) it is obvious that ${p}(x)$ diverges for any positive
195: value of the exponent $\alpha$ as $x \rightarrow 0$, and this means
196: that the distribution must deviate from a power law below some minimum
197: value $x_{\mathrm{min}}$. In other words, we can only assume that the
198: distribution follows a Zipf law for $x$ above $x_{\mathrm{min}}$, and
199: in this case equation (\ref{1}) can be normalized as $\int_{x_{\mathrm{min}
200: }}^\infty {p}({x^\prime}) \; d{x^\prime} =1$ to obtain the constant $C$
201: only if $x$ and the exponent $\alpha$ obey the following conditions:
202: $ \alpha > 1 $, $ x \ge x_{\mathrm{min}}$. Power laws with exponents less
203: than unity cannot be normalized and do not usually occur in nature \cite{n05}.
204: The normalized constant $C$, given in terms of $\alpha$ and $x_{\mathrm{min}}$,
205: allows us to write the power laws (\ref{1}) and (\ref{3}) as follows,
206: \be \ln p(x) = -\alpha \ln x + B, \lb{lnp} \ee
207: \be \ln \mathcal{P}(x) = \left( 1 - \alpha \right) \ln x + \beta,
208: \lb{lnP}
209: \ee
210: where
211: \be B= \ln \left[ \left( \alpha -1 \right) {x_{\mathrm{min}}}^{(\alpha
212: -1)} \right],
213: \lb{B}
214: \ee
215: \be \beta = \left( \alpha -1 \right) \ln x_{\mathrm{min}}.
216: \lb{beta}
217: \ee
218:
219: If we now define the distribution $p(x_i)$ as being {\it the number
220: of cities with population equal to or bigger than $x_i$}, we are
221: able to create for each sample a set of $n$ observed values
222: $\{x_i\}, (i=1,\ldots,n), (x_1=x_{\mathrm{min}})$, from where we can
223: estimate $\alpha$. To do so we need first of all to create histograms
224: with the data once we define the step separating each set of
225: observed values $\{x_i\}$. The main difficulty that arises in this
226: procedure is the large fluctuation in the tail, towards bins which
227: have a far smaller number of observed values than at previous bins,
228: enhancing then the statistical fluctuations \cite{n05}. In order to
229: decrease such fluctuations we have taken logarithmic binning so that
230: bins span at increasingly larger intervals whose steps increase
231: exponentially according to the following rule,
232: \be x_i= 2^{^{\scriptstyle (i-1)}} x_{\mathrm{min}}. \lb{step} \ee
233: The resulting data is shown in table \ref{tab2} and plotted in figures
234: \ref{fig1} and \ref{fig2}, where one can clearly see a power law
235: behaviour for all years.\footnote{Previous attempts made by us at
236: plotting $\mathcal{P}(x_i)$ vs.\ $x_i$ with $x_{\mathrm{min}}<30,000$
237: showed no power law behaviour in Brazilian cities with population
238: smaller than about 25,000-30,000 inhabitants. So, the transition to
239: a power law behaviour does seem to indicate the change between rural
240: and urban population, that is, the transition from spread out human
241: settlements to the human population aggregations we call cities.
242: Hence, this cutoff in $x_i$ can be used as the critical
243: value that allow us to obtain the fractions of urban and rural
244: populations in a country.}\lb{rural} The cumulative distribution
245: $\mathcal{P}(x_i)$ was obtained dividing $p(x_i)$ by the total
246: number of cities with more than 30,000 inhabitants in each year when
247: an all population census occurred. This means that $\mathcal{P}(x_i)$
248: is the probability that a Brazilian city has population equal to or
249: greater than $x_i$ (see table \ref{tab1}).
250: \begin{table}[t]
251: \caption{\it Distribution functions of Brazilian municipalities.}\label{tab2}
252: \vspace{1mm}
253: \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|}
254: \hline
255: \multicolumn{2}{|c|}{year}&\multicolumn{2}{|c|}{1970}
256: &\multicolumn{2}{|c|}{1980}&\multicolumn{2}{|c|}{1991}
257: &\multicolumn{2}{|c|}{2000}\\ \hline \hline
258: $i$ & $x_{i}$ & $p(x_i)$ & $\mathcal{P}(x_i)$ & $p(x_i)$ &
259: $\mathcal{P}(x_i)$ & $p(x_i)$ &
260: $\mathcal{P}(x_i)$ & $p(x_i)$ &
261: $\mathcal{P}(x_i)$ \\\hline
262: 1 & 30,000 & 614 & 1 & 787 & 1 & 905 & 1 & 955 & 1 \\
263: 2 & 60,000 & 187 & 0.3046& 287 & 0.3647& 383 & 0.4232& 447 & 0.4681 \\
264: 3 & 120,000& 67 & 0.1091& 114 & 0.1449& 152 & 0.1680& 187 & 0.1958 \\
265: 4 & 240,000& 26 & 0.0423& 45 & 0.0572& 67 & 0.0740& 92 & 0.0963 \\
266: 5 & 480,000& 10 & 0.0163& 18 & 0.0229& 27 & 0.0298& 34 & 0.0356 \\
267: 6 & 960,000& 5 & 0.0081& 10 & 0.0127& 12 & 0.0133& 14 & 0.0147 \\
268: 7 & 1,920,000& 2 & 0.0033& 2 & 0.0025& 4 & 0.0044& 6 & 0.0063 \\
269: 8 & 3,840,000& 2 & 0.0033& 2 & 0.0025& 2 & 0.0022& 2 & 0.0021 \\
270: 9 & 7,680,000& - & - & 1 & 0.0013& 1 & 0.0011& 1 & 0.0010 \\
271: \hline
272: \end{tabular}
273: \end{table}
274: \begin{figure}[b]
275: \input{pop7080.tex}
276: \caption{\it Graph of the cumulative distribution function
277: $\mathcal{P}(x_i)$ against the population ${x_i}$ of Brazilian cities
278: with 30,000 people or more in the years of 1970 and 1980. One can
279: clearly see the decaying straight line pattern of a power law
280: behaviour with very little change over the time span of the
281: sample. One can also notice some fluctuations at the tail of the
282: plot, reflecting very small number of cities with large
283: population} \label{fig1}
284: \end{figure}
285: \begin{figure}[b]
286: \input{pop9100.tex}
287: \caption{\it Same graph as in the previous figure, but with data of
288: 1991 and 2000 censuses. As before, one can clearly see the decaying
289: straight line pattern of a power law behaviour. However, the statistical
290: fluctuations at the tail have virtually disappeared as compared to the
291: tail in figure \protect\ref{fig1}, reflecting the fact that there is a
292: bigger number of cities with more than one million inhabitants in Brazil
293: from 1991 on than in the previous years.} \label{fig2}
294: \end{figure}
295:
296: As discussed in \S \ref{data} above, our samples assumed
297: $x_{\mathrm{min}}=30,000$, which still leaves $\alpha$ to be
298: determined. To do so we have applied three different methods to
299: obtain the exponent: maximum likelihood estimator, least squares
300: regression and parameter averaging (very simple bootstrap).
301: These three methods should converge to similar values
302: of $\alpha$, and, taken together, are capable to detect possible
303: systematic biases into the value of the exponent, known to
304: arise from simple fits from the plots (see \cite{n05,g04}). One
305: should notice that least squares fitting is a good method for
306: determining the exponent of a power law distribution,
307: {\it provided} the large fluctuations of the tail arising
308: from logarithmic binning are significantly reduced (see \cite{g04}).
309:
310: \section{Results}
311:
312: \subsection{Maximum Likelihood Estimator}
313:
314: A simple and reliable method for extracting the exponent is to
315: employ the following formula discussed in \cite{n05},
316: \be \alpha= 1+n { \left[ \; \sum_{i=1}^n \ln \left(
317: \frac{x_i}{x_{\mathrm{min}}} \right) \right] }^{-1},
318: \lb{alpha}
319: \ee
320: obtained by means of the maximum likelihood estimator (MLE).
321: The results are shown in table \ref{tab3}, whereas figures \ref{fig3},
322: \ref{fig4}, \ref{fig5} and \ref{fig6} show the exponent fits of table
323: \ref{tab3} drawn as lines for each data.
324: \begin{table}[t]
325: \caption{\it Results for $\alpha$.}\lb{tab3}
326: \vspace{1mm}
327: \begin{tabular}{|c|c|c|c|c|}\hline
328: Method & 1970 & 1980 & 1991 & 2000 \\\hline
329: $\alpha_{\scriptscriptstyle \rm MLE}$ & 2,41& 2,36 & 2,36 & 2,36 \\
330: $\alpha_{\scriptscriptstyle \rm LSF}$ & 2,23& 2,23 & 2,25 & 2,26 \\
331: $\alpha_{\scriptscriptstyle \rm PAE}$ & 2,22 $\pm$ 0,34 &
332: 2,22 $\pm$ 0,34 & 2,25 $\pm$ 0,10 & 2,26 $\pm$ 0,11\\\hline
333: \end{tabular}
334: \end{table}
335:
336: \subsection{Least Squares Fitting}
337:
338: As noticed above, if the large uneven variation in the tail is severely
339: reduced, the possible bias introduced in determining the power law
340: exponent by least squares fitting is also reduced, as discussed in
341: \cite{g04}. In addition, we are applying this method together with other
342: two methodologies, giving us, therefore, confidence in the final
343: results. Results of least squares fitting (LSF) are shown in table
344: \ref{tab3}, whereas figures \ref{fig3}, \ref{fig4}, \ref{fig5} and
345: \ref{fig6} show the line fits.
346:
347: \subsection{Parameter Averaging Estimator}
348:
349: This is in fact a very simple bootstrap estimator, where instead of
350: taking many random samples we have just taken all possible combinations
351: of two points, without repetition, obtained the angular coefficient
352: $\alpha$ and calculated the average and standard deviation of all values
353: of $\alpha$. The aim was to produce an estimate of the error.
354: By taking only two points we have obtained a conservative estimation
355: in the sense that more than two points would decrease the error.
356: However, viewing the results of the parameter averaging estimator (PAE)
357: together with the other two estimator showed us that this conservative
358: method is enough for the purposes of this work. The results are
359: also shown in table \ref{tab3} and their line fits can be found
360: in figures \ref{fig3}, \ref{fig4}, \ref{fig5} and \ref{fig6}.
361: \begin{figure}[b]
362: \input{70.tex}
363: \caption{\it Plot of $\mathcal{P}(x_i)$ vs.\ the population $x_i$ for
364: 1970 data with the fits shown in table \protect\ref{tab3} drawn
365: as lines. Notice that LSF and PAE estimates are almost
366: equal to one another and their line fits are superposed. In
367: addition, one can also notice that MLE does seem to provide a
368: better fit for data with larger statistical fluctuations at
369: the tail.}\label{fig3}
370: \end{figure}
371: \begin{figure}[b]
372: \input{80.tex}
373: \caption{\it Plot of $\mathcal{P}(x_i)$ vs.\ the population $x_i$ for
374: 1980 data with the fits shown in table \protect\ref{tab3} drawn
375: as lines. As in figure \protect\ref{fig3}, LSF and PAE results
376: are almost the same, with their line fits being drawn on top
377: of each other. Again, MLE seems to handle best the fluctuations
378: at the tail} \label{fig4}
379: \end{figure}
380: \begin{figure}[b]
381: \input{91.tex}
382: \caption{\it Plot of $\mathcal{P}(x_i)$ vs.\ the population $x_i$ for
383: 1991 data with the fits shown in table \protect\ref{tab3} drawn
384: as lines. LSF and PAE results are exactly the same and the
385: exponent found with MLE is within the standard deviation of the
386: PAE result.} \label{fig5}
387: \end{figure}
388: \begin{figure}[b]
389: \input{00.tex}
390: \caption{\it Plot of $\mathcal{P}(x_i)$ vs.\ the population $x_i$ for
391: 2000 data with the fits shown in table \protect\ref{tab3} drawn
392: as lines. As in figure \protect\ref{fig5}, LSF and PAE results
393: are the same and MLE estimate is within PAE's standard
394: deviation. This data set is for the census with smallest
395: fluctuations at the tail as compared to the previous cases of
396: years 1970, 1980, 1991, and where all three fitting methods
397: show the smallest difference among each other (see table
398: \protect\ref{tab3}).}\label{fig6}
399: \end{figure}
400:
401: \subsection{Discussion}
402:
403: The results obtained show that LSF and PAE estimators
404: produced basically the same results, whereas all MLE derived
405: exponents are a little higher. If we take MLE as the best estimator,
406: the other two suffered a bias of 8\%, 6\%, 5\% and 4\% for 1970,
407: 1980, 1991 and 2000, respectively. Those biases are well within
408: the error obtained with PAE estimator, showing that once the
409: statistical fluctuations at the tail are successfully reduced by
410: means of an appropriate logarithmic binning (appropriate choice of
411: step and $x_{\mathrm{min}}$), LSF estimator provides a good
412: methodology. In fact, the bias decreases from its maximum in 1970 to
413: its minimum in 2000 simultaneously to a decrease in the statistical
414: fluctuations at the tail in these same years, brought about by the
415: introduction in the sample of more observed values at the tail due
416: to the increase in the number of cities with more than a million
417: inhabitants. In addition, a visual inspection of the fits in figures
418: \ref{fig3}, \ref{fig4}, \ref{fig5}, \ref{fig6} shows that MLE appears
419: to be a better fitting methodology when statistical fluctuations are
420: larger (1970 and 1980) as compared to smaller fluctuations in the data
421: stemming from the 1991 and 2000 data sets.
422:
423: As an extension of our analysis it is interesting to probe why
424: other authors obtain different results from the universal value
425: of $\alpha \approx 2$ for the power law exponent of cities, apart
426: from the large fluctuations at the tail and LSF fitting mentioned
427: above. For instance, \cite{mss} reported $\alpha \approx 1$ for
428: cities in Indonesia for the 1961 to 1990 decennial censuses. For
429: Indonesia's year 2000 census they found an exponent smaller than
430: one (see \cite{mss}, table 2). Inasmuch as we saw above that a
431: normalized power law must have $\alpha > 1$, a possible, and likely,
432: cause for these unexpected results is the absence of, or inappropriate,
433: $x_{\mathrm{min}}$ definition for their samples. Then, without
434: a proper normalization it is probable that their exponent estimates
435: suffered contamination from the region of the plot where there is
436: no power law behaviour. In other words, the set of observed values
437: from where \cite{mss} calculated $\alpha$ was probably contaminated
438: with data from small cities with few inhabitants, and which should
439: have been removed from the data set used to calculate $\alpha$. As
440: seen above, finding $x_{\mathrm{min}}$ is a critical step to avoid
441: such a contamination.
442:
443: To summarize our results, conservative estimates for the exponent of
444: the Zipf law in Brazilian cities are reached by taking all methods
445: within the error margin. That results in $\alpha = 2.22 \pm 0.34$ for
446: 1970 and 1980, and $\alpha=2.26 \pm 0.11$ for 1991 and 2000. On the
447: other hand, accurate results come from MLE estimates, producing
448: $\alpha=2.41$ for 1970 and $\alpha=2.36$ for the other years.
449:
450: \section{Conclusion}
451:
452: In this paper we have discussed the Zipf law in Brazilian cities. We
453: have obtained data from censuses carried out in Brazil in the years of
454: 1970, 1980, 1991 and 2000 from where we selected a sample which included
455: only cities with 30,000 or more inhabitants. Then we calculated the
456: cumulative distribution function $\mathcal{P}(x_i)$ of Brazilian cities,
457: which gives the probability that a city has a population equal or bigger
458: than $x_i$. We found that this distribution does follow a decaying
459: power law, whose exponent $\alpha$ was estimated by three different
460: methods: maximum likelihood estimator, least squares fitting and average
461: parameter estimator. Our results show that a conservative estimate,
462: which includes the results of all three methods, produces
463: $\alpha = 2.22 \pm 0.34$ in 1970 and 1980, and $\alpha=2.26 \pm 0.11$
464: for 1991 and 2000. More accurate results are given by the maximum
465: likelihood estimator, showing $\alpha=2.41$ for 1970 and $\alpha=2.36$
466: for all other years.
467:
468: \begin{thebibliography}{999}
469: \bibitem{au} F. Auerbach, {\em Petermanns Geographische Mitteilungen}
470: {\bf 59} (1913), 74-76
471: \bibitem{z} G.K. Zipf, {\em Human Behaviour and the Principle of Least
472: Effort}, Addison-Wesley, Reading, 1949
473: \bibitem{zan} D.H. Zanette, S.C. Manrubia, {\em Phys. Rev. Let.}
474: {\bf 79} (1997) 523
475: \bibitem{mas} G.\ Malescio, N.V.\ Dokholyan, S.V.\ Buldyrev, H.\ Eugene
476: Stanley, {\em preprint}, cond-mat/0005178 v1 (2000)
477: \bibitem{n05} M.E.J.\ Newman, {\em Contemporary Physics} {\bf 46} (2005)
478: 323, cond-mat/0412004 v2
479: \bibitem{g04} M.L. Goldstein, S.A. Morris, G.G. Yen,
480: {\em Eur. Phys. J.} {\bf 41B} (2004) 255, cond-mat/0402322 v3
481: \bibitem{mss} I.\ Mulianta, H.\ Situngkir, Y.\ Surya, {\em preprint},
482: nlin.PS/0409006 v1 (2004)
483: \end{thebibliography}
484: \end{document}
485: