0511:physics0511216/ms.tex

1: \documentclass{elsart}

2: %\usepackage{natbib}

3: \newcommand{\be}{\begin{equation}}

4: \newcommand{\ee}{\end{equation}}

5: \newcommand{\lb}[1]{\label{#1}}

6: \newcommand{\sty}{\scriptstyle}

7: \newcommand{\ssty}{\scriptscriptstyle}

8: \newcommand{\apg}{\:^{>}_{\sim}\:}

9: \newcommand{\apl}{\:^{<}_{\sim}\:}

10: \newcommand{\eg}{{\it e.g.\ }}

11: \begin{document}

12: \runauthor{Moura Jr.\ and Ribeiro}

13: \begin{frontmatter}

14: \title{Zipf Law for Brazilian Cities}

15: \author[NJMJr]{Newton J.\ Moura Jr}

16: \author[MBR]{and Marcelo B.\ Ribeiro}%\thanksref{corresponding}}

17:

18: \address[NJMJr]{IBGE -- Brazilian Institute for Geography and Statistics,

19:               Geosciences Directorate, Geodesics Department,

20:               Av.\ Brasil 15671, Rio de Janeiro, RJ 21241-051, Brazil;

21: 	      e-mail:~newtonjunior@ibge.gov.br}

22: \address[MBR]{Physics Institute, University of Brazil -- UFRJ, CxP 68532,

23:               Rio de Janeiro, RJ 21945-970, Brazil; e-mail: mbr@if.ufrj.br}

24: %\thanks[corresponding]{Corresponding author.}

25: \begin{abstract}

26: This work studies the Zipf Law for cities in Brazil. Data from

27: censuses of 1970, 1980, 1991 and 2000 were used to select a

28: sample containing only cities with 30,000 inhabitants

29: or more. The results show that the population distribution in

30: Brazilian cities does follow a power law similar to the ones found in

31: other countries. Estimates of the power law exponent were found

32: to be $2.22 \pm 0.34$ for the 1970 and 1980 censuses, and $2.26 \pm

33: 0.11$ for censuses of 1991 and 2000. More accurate results were

34: obtained with the maximum likelihood estimator, showing an exponent

35: equal to $2.41$ for 1970 and $2.36$ for the other three years.

36:

37: \vspace{5.0mm}

38: \hspace{-3.5mm}{\it PACS:} \ 89.75Da; 89.65.Cd; 89.75.-k; 05.45.Df

39: \end{abstract}

40: \begin{keyword}

41: Complex Systems; Power Laws; Population of Cities; Fractals

42: \end{keyword}

43: \end{frontmatter}

44:

45: \section{Introduction}

46:

47: It was first observed by Auerbach \cite{au}, although it is often

48: attributed to Zipf \cite{z}, that the way in which urban aggregates are

49: distributed, that is, the way the populations of cities are distributed,

50: follows a power law behaviour with exponent $\alpha \approx 2$. If we

51: assign probabilities to this distribution the resulting behaviour

52: is also a power law, known as the {\it Zipf law}. This law seems to

53: have an universal character, holding at the world level \cite{zan} as

54: well as to single nations. The exponent also seems to be independent

55: of the area of the nation and the social and economical conditions

56: of its population \cite{mas}.

57:

58: Power law exponents of cities have been measured in many countries.

59: It was reported by \cite{zan} that 2,400 cities in the U.S.A.\ have

60: $\alpha = 2.1 \pm 0.1$, whereas \cite{n05} reported $\alpha=2.30 \pm

61: 0.05$ for the U.S.A. census of year 2000. According to \cite{zan}

62: 1,300 municipalities in Switzerland have $\alpha=2.0 \pm 0.1$. Taking

63: together 2,700 cities of the world with population bigger than

64: 100,000 inhabitants produces $\alpha=2.03 \pm 0.05$ \cite{zan}. One

65: should notice that those exponents were calculated by least squares

66: fitting, a method known to introduce biased results if data is not

67: properly handled \cite{g04}. Despite this, most results obtained so

68: far indicate that the exponent seems to follow the universal value

69: of $\alpha \approx 2$.

70:

71: Such power law behaviour seems to be the manifestation of the

72: dynamics of complex systems, whose striking feature is of showing

73: universal laws characterized by exponents in scale invariant

74: distributions that happen to be basically independent of the

75: details in the microscopic dynamics. Social behaviour is an example

76: of interaction of the elements of a complex systems, in this case

77: human beings, giving rise to cooperative evolution which in

78: itself strongly differs from the individual dynamics. So,

79: the demographic distribution of human beings on the Earth's

80: surface, which has sharp peaks of concentrated population - the

81: cities - alternated with relatively large extensions where the

82: population density is much lower, follows a power law typical of

83: complex system dynamics.

84:

85: The aim of this paper is to present empirical evidence that the

86: population distribution of Brazilian cities also follows a

87: power law with exponent close to the universal value. We

88: have selected a sample from Brazil's decennial censuses of 1970,

89: 1980, 1991 and 2000 and obtained probability distribution

90: functions of Brazilian cities with a lower cutoff of 30,000

91: inhabitants. Our procedure took great care to avoid large

92: statistical fluctuations at the tail in order to avoid introducing

93: large biases in the determination of the exponent \cite{n05,g04}.

94: Our results show that Brazilian cities do follow the universal

95: pattern: conservative estimates produced $\alpha = 2.22 \pm 0.34$

96: in 1970 and 1980. For 1991 and 2000 we obtained $\alpha=2.26 \pm

97: 0.11$.

98:

99: The paper is organized as follows. In \S 2 we present the data and our

100: selection methodology, whereas in \S 3 we present the methods to

101: analyze the data. \S 4 shows the results obtained using three different

102: techniques to calculate the exponent $\alpha$. The paper ends with a

103: concluding section.

104:

105: \section{The Data}\lb{data}

106:

107: Brazil is estimated to reach a population of approximately 185

108: million inhabitants by the end of 2005, the 5th place in the

109: ranking of the world's most populous countries. This population

110: occupies over 5 thousand cities, and although most of them have

111: very few inhabitants, 15 cities have more than one million people,

112: with two of them, S\~ao Paulo and Rio de Janeiro, having more than

113: 5 million inhabitants. In order to obtain a sample for the

114: purposes of this work we need to define first of all what

115: we mean by a {\it city}. After surveying the administrative way

116: Brazil is governed we concluded that in Brazil's case we should

117: {\it equate} city to {\it municipality}, defined as being the

118: territorially smallest administrative subdivision of a country

119: that has its own democratically elected representative leadership.

120: This means that Brazil's entire territory is subdivided in

121: municipalities, or cities. Some of them have very big areas,

122: actually bigger than many European countries, but those are usually

123: located in regions very sparsely populated.

124:

125: Censuses of Brazil's entire population have been taking place for over

126: a hundred years at a ten years hiatus since 1890. However, data in

127: digitalized form is only available at IBGE, the government institution

128: responsible for censuses, since 1970. Data in between censuses are

129: obtained by very small sampling and extrapolation. Considering this

130: we decided to take data only from the official, entire population,

131: censuses available in digital format, namely for the years of 1970,

132: 1980, 1991 and 2000. This data shows that the number of Brazilian

133: municipalities has increased to over 30\% from 1970 to 2000. This is

134: clearly a consequence of the fact that the definition of a city is

135: administrative, reflecting Brazil's internal politics, and has been

136: varying over the last decades.

137:

138: The fact that the number of municipalities has shown a sharp increase

139: within the time span of our data will not affect our study because, as

140: mentioned above, most Brazilian cities have small populations and as the

141: Brazilian concept of a city means territorial subdivision, which includes

142: both rural and urban inhabitants, an examination of the data shows that

143: cities with more than 30 thousand inhabitants have their population

144: almost entirely concentrated in the urban area.\footnote{Nowadays

145: IBGE defines what is a rural, as opposed to an urban, area by satellite

146: imagery. See also footnote at page \protect\pageref{rural}.} We have,

147: therefore, decided to include only cities with

148: more than 30 thousand people in our sample, which meant a significant

149: reduction of the number of the municipalities as compared to the

150: original raw data (see table \ref{tab1}). The exclusion of the smaller

151: cities represents in fact the exclusion of the rural population from

152: our sample. In 1970 40\% of Brazilians were living in cities with less

153: than 30 thousand people, whereas in 2000 this figure was reduced to

154: 26\%. In other words, roughly speaking the percentage of Brazilians

155: living in urban areas has increased from 60\% in 1970 to 74\% in 2000.

156: \begin{table}[t]

157: \caption{\it Number of cities in Brazil.}\lb{tab1}

158: \vspace{2mm}

159: \begin{tabular}{|c|c|c|c|c|}

160: \hline

161: year of census & 1970 & 1980 & 1991 & 2000 \\

162: \hline \hline

163: all cities & 3958 & 3806 & 4277 & 5238 \\

164: \hline

165: cities with $\ge$ 30,000 & 614 & 787 & 905 & 955 \\

166: \hline

167: \end{tabular}

168: \end{table}

169:

170: \section{Data Analysis}

171:

172: Once our sample is selected, we need to define our method of analysis.

173: Here we shall follow closely the methodology for fitting power law

174: distributions and estimating goodness-of-fit parameters as proposed by

175: \cite{n05}. We will start with a very brief introductory description

176: of power laws statistics in order to fix the notation.

177:

178: Let ${p}(x)\: dx$ be the fraction of cities with population between $x$ and

179: $x+dx$. So ${p}(x)$ defines a certain distribution of the data $x$. It is

180: useful to express this distribution in terms of the {\it cumulative

181: distribution function} $\mathcal{P}(x)=\int_x^\infty

182: {p}(x^\prime)dx^\prime$, which is simply the probability that a city has

183: a population equal to or greater than $x$.

184: If the fraction ${p}(x)$ follows a power law of the type,

185: \be {p}(x) = C x^{-\alpha}, \lb{1} \ee

186: where $\alpha$ and $C$ are constants, then

187: $\mathcal{P}(x)$ also follows a power law, given by

188: \be \mathcal{P}(x) =

189:      \frac{C}{(\alpha - 1)} \; x^{-(\alpha -1)}.

190:     \lb{3}

191: \ee

192: Such power law distributions are

193: also known as {\it Zipf law} or {\it Pareto distribution}. From

194: equation (\ref{1}) it is obvious that ${p}(x)$ diverges for any positive

195: value of the exponent $\alpha$ as $x \rightarrow 0$, and this means

196: that the distribution must deviate from a power law below some minimum

197: value $x_{\mathrm{min}}$. In other words, we can only assume that the

198: distribution follows a Zipf law for $x$ above $x_{\mathrm{min}}$, and

199: in this case equation (\ref{1}) can be normalized as $\int_{x_{\mathrm{min}

200: }}^\infty {p}({x^\prime}) \; d{x^\prime} =1$ to obtain the constant $C$

201: only if $x$ and the exponent $\alpha$ obey the following conditions:

202: $ \alpha > 1 $, $  x \ge x_{\mathrm{min}}$. Power laws with exponents less

203: than unity cannot be normalized and do not usually occur in nature \cite{n05}.

204: The normalized constant $C$, given in terms of $\alpha$ and $x_{\mathrm{min}}$,

205: allows us to write the power laws (\ref{1}) and (\ref{3}) as follows,

206: \be \ln p(x) = -\alpha \ln x + B, \lb{lnp} \ee

207: \be \ln \mathcal{P}(x) = \left( 1 - \alpha \right) \ln x + \beta,

208:     \lb{lnP}

209: \ee

210: where

211: \be B= \ln \left[ \left( \alpha -1 \right) {x_{\mathrm{min}}}^{(\alpha

212:        -1)} \right],

213:        \lb{B}

214: \ee

215: \be \beta = \left( \alpha -1 \right) \ln x_{\mathrm{min}}.

216:      \lb{beta}

217: \ee

218:

219: If we now define the distribution $p(x_i)$ as being {\it the number

220: of cities with population equal to or bigger than $x_i$}, we are

221: able to create for each sample a set of $n$ observed values

222: $\{x_i\}, (i=1,\ldots,n), (x_1=x_{\mathrm{min}})$, from where we can

223: estimate $\alpha$. To do so we need first of all to create histograms

224: with the data once we define the step separating each set of

225: observed values $\{x_i\}$. The main difficulty that arises in this

226: procedure is the large fluctuation in the tail, towards bins which

227: have a far smaller number of observed values than at previous bins,

228: enhancing then the statistical fluctuations \cite{n05}. In order to

229: decrease such fluctuations we have taken logarithmic binning so that

230: bins span at increasingly larger intervals whose steps increase

231: exponentially according to the following rule,

232: \be x_i= 2^{^{\scriptstyle (i-1)}}  x_{\mathrm{min}}. \lb{step} \ee

233: The resulting data is shown in table \ref{tab2} and plotted in figures

234: \ref{fig1} and \ref{fig2}, where one can clearly see a power law

235: behaviour for all years.\footnote{Previous attempts made by us at

236: plotting $\mathcal{P}(x_i)$ vs.\ $x_i$ with $x_{\mathrm{min}}<30,000$

237: showed no power law behaviour in Brazilian cities with population

238: smaller than about 25,000-30,000 inhabitants. So, the transition to

239: a power law behaviour does seem to indicate the change between rural

240: and urban population, that is, the transition from spread out human

241: settlements to the human population aggregations we call cities.

242: Hence, this cutoff in $x_i$ can be used as the critical

243: value that allow us to obtain the fractions of urban and rural

244: populations in a country.}\lb{rural} The cumulative distribution

245: $\mathcal{P}(x_i)$ was obtained dividing $p(x_i)$ by the total

246: number of cities with more than 30,000 inhabitants in each year when

247: an all population census occurred. This means that $\mathcal{P}(x_i)$

248: is the probability that a Brazilian city has population equal to or

249: greater than $x_i$ (see table \ref{tab1}).

250: \begin{table}[t]

251: \caption{\it Distribution functions of Brazilian municipalities.}\label{tab2}

252: \vspace{1mm}

253: \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|}

254:  \hline

255:  \multicolumn{2}{|c|}{year}&\multicolumn{2}{|c|}{1970}

256:  &\multicolumn{2}{|c|}{1980}&\multicolumn{2}{|c|}{1991}

257:  &\multicolumn{2}{|c|}{2000}\\ \hline \hline

258:    $i$ & $x_{i}$ & $p(x_i)$ & $\mathcal{P}(x_i)$ & $p(x_i)$ &

259:    $\mathcal{P}(x_i)$ & $p(x_i)$ &

260:    $\mathcal{P}(x_i)$ & $p(x_i)$ &

261:    $\mathcal{P}(x_i)$  \\\hline

262: 1 & 30,000 & 614 & 1     & 787 & 1     & 905 & 1     & 955 & 1      \\

263: 2 & 60,000 & 187 & 0.3046& 287 & 0.3647& 383 & 0.4232& 447 & 0.4681 \\

264: 3 & 120,000& 67  & 0.1091& 114 & 0.1449& 152 & 0.1680& 187 & 0.1958 \\

265: 4 & 240,000& 26  & 0.0423& 45  & 0.0572& 67  & 0.0740& 92  & 0.0963 \\

266: 5 & 480,000& 10  & 0.0163& 18  & 0.0229& 27  & 0.0298& 34  & 0.0356 \\

267: 6 & 960,000&  5  & 0.0081& 10  & 0.0127& 12  & 0.0133& 14  & 0.0147 \\

268: 7 & 1,920,000& 2 & 0.0033& 2   & 0.0025& 4   & 0.0044& 6   & 0.0063 \\

269: 8 & 3,840,000& 2 & 0.0033& 2   & 0.0025& 2   & 0.0022& 2   & 0.0021 \\

270: 9 & 7,680,000& - & -     & 1   & 0.0013& 1   & 0.0011& 1   & 0.0010 \\

271: \hline

272:    \end{tabular}

273: \end{table}

274: \begin{figure}[b]

275: \input{pop7080.tex}

276: \caption{\it Graph of the cumulative distribution function

277: $\mathcal{P}(x_i)$ against the population ${x_i}$ of Brazilian cities

278: with 30,000 people or more in the years of 1970 and 1980. One can

279: clearly see the decaying straight line pattern of a power law

280: behaviour with very little change over the time span of the

281: sample. One can also notice some fluctuations at the tail of the

282: plot, reflecting very small number of cities with large

283: population} \label{fig1}

284: \end{figure}

285: \begin{figure}[b]

286: \input{pop9100.tex}

287: \caption{\it Same graph as in the previous figure, but with data of

288: 1991 and 2000 censuses. As before, one can clearly see the decaying

289: straight line pattern of a power law behaviour. However, the statistical

290: fluctuations at the tail have virtually disappeared as compared to the

291: tail in figure \protect\ref{fig1}, reflecting the fact that there is a

292: bigger number of cities with more than one million inhabitants in Brazil

293: from 1991 on than in the previous years.} \label{fig2}

294: \end{figure}

295:

296: As discussed in \S \ref{data} above, our samples assumed

297: $x_{\mathrm{min}}=30,000$, which still leaves $\alpha$ to be

298: determined. To do so we have applied three different methods to

299: obtain the exponent: maximum likelihood estimator, least squares

300: regression and parameter averaging (very simple bootstrap).

301: These three methods should converge to similar values

302: of $\alpha$, and, taken together, are capable to detect possible

303: systematic biases into the value of the exponent, known to

304: arise from simple fits from the plots (see \cite{n05,g04}). One

305: should notice that least squares fitting is a good method for

306: determining the exponent of a power law distribution,

307: {\it provided} the large fluctuations of the tail arising

308: from logarithmic binning are significantly reduced (see \cite{g04}).

309:

310: \section{Results}

311:

312: \subsection{Maximum Likelihood Estimator}

313:

314: A simple and reliable method for extracting the exponent is to

315: employ the following formula discussed in \cite{n05},

316: \be \alpha= 1+n { \left[ \; \sum_{i=1}^n \ln \left(

317:             \frac{x_i}{x_{\mathrm{min}}} \right) \right] }^{-1},

318:     \lb{alpha}

319: \ee

320: obtained by means of the maximum likelihood estimator (MLE).

321: The results are shown in table \ref{tab3}, whereas figures \ref{fig3},

322: \ref{fig4}, \ref{fig5} and \ref{fig6} show the exponent fits of table

323: \ref{tab3} drawn as lines for each data.

324: \begin{table}[t]

325: \caption{\it Results for $\alpha$.}\lb{tab3}

326: \vspace{1mm}

327: \begin{tabular}{|c|c|c|c|c|}\hline

328:   Method & 1970 & 1980 & 1991 & 2000 \\\hline

329:   $\alpha_{\scriptscriptstyle \rm MLE}$ & 2,41& 2,36 & 2,36 & 2,36 \\

330:   $\alpha_{\scriptscriptstyle \rm LSF}$ & 2,23& 2,23 & 2,25 & 2,26 \\

331:   $\alpha_{\scriptscriptstyle \rm PAE}$ & 2,22 $\pm$ 0,34 &

332:            2,22 $\pm$ 0,34 & 2,25 $\pm$ 0,10 & 2,26 $\pm$ 0,11\\\hline

333: \end{tabular}

334: \end{table}

335:

336: \subsection{Least Squares Fitting}

337:

338: As noticed above, if the large uneven variation in the tail is severely

339: reduced, the possible bias introduced in determining the power law

340: exponent by least squares fitting is also reduced, as discussed in

341: \cite{g04}. In addition, we are applying this method together with other

342: two methodologies, giving us, therefore, confidence in the final

343: results. Results of least squares fitting (LSF) are shown in table

344: \ref{tab3}, whereas figures \ref{fig3}, \ref{fig4}, \ref{fig5} and

345: \ref{fig6} show the line fits.

346:

347: \subsection{Parameter Averaging Estimator}

348:

349: This is in fact a very simple bootstrap estimator, where instead of

350: taking many random samples we have just taken all possible combinations

351: of two points, without repetition, obtained the angular coefficient

352: $\alpha$ and calculated the average and standard deviation of all values

353: of $\alpha$. The aim was to produce an estimate of the error.

354: By taking only two points we have obtained a conservative estimation

355: in the sense that more than two points would decrease the error.

356: However, viewing the results of the parameter averaging estimator (PAE)

357: together with the other two estimator showed us that this conservative

358: method is enough for the purposes of this work. The results are

359: also shown in table \ref{tab3} and their line fits can be found

360: in figures \ref{fig3}, \ref{fig4}, \ref{fig5} and \ref{fig6}.

361: \begin{figure}[b]

362: \input{70.tex}

363: \caption{\it Plot of $\mathcal{P}(x_i)$ vs.\ the population $x_i$ for

364:          1970 data with the fits shown in table \protect\ref{tab3} drawn

365: 	 as lines. Notice that LSF and PAE estimates are almost

366: 	 equal to one another and their line fits are superposed. In

367: 	 addition, one can also notice that MLE does seem to provide a

368: 	 better fit for data with larger statistical fluctuations at

369: 	 the tail.}\label{fig3}

370: \end{figure}

371: \begin{figure}[b]

372: \input{80.tex}

373: \caption{\it Plot of $\mathcal{P}(x_i)$ vs.\ the population $x_i$ for

374:          1980 data with the fits shown in table \protect\ref{tab3} drawn

375: 	 as lines. As in figure \protect\ref{fig3}, LSF and PAE results

376: 	 are almost the same, with their line fits being drawn on top

377: 	 of each other. Again, MLE seems to handle best the fluctuations

378: 	 at the tail} \label{fig4}

379: \end{figure}

380: \begin{figure}[b]

381: \input{91.tex}

382: \caption{\it Plot of $\mathcal{P}(x_i)$ vs.\ the population $x_i$ for

383:          1991 data with the fits shown in table \protect\ref{tab3} drawn

384: 	 as lines. LSF and PAE results are exactly the same and the

385: 	 exponent found with MLE is within the standard deviation of the

386: 	 PAE result.} \label{fig5}

387: \end{figure}

388: \begin{figure}[b]

389: \input{00.tex}

390: \caption{\it Plot of $\mathcal{P}(x_i)$ vs.\ the population $x_i$ for

391:          2000 data with the fits shown in table \protect\ref{tab3} drawn

392: 	 as lines. As in figure \protect\ref{fig5}, LSF and PAE results

393: 	 are the same and MLE estimate is within PAE's standard

394: 	 deviation. This data set is for the census with smallest

395: 	 fluctuations at the tail as compared to the previous cases of

396: 	 years 1970, 1980, 1991, and where all three fitting methods

397: 	 show the smallest difference among each other (see table

398: 	 \protect\ref{tab3}).}\label{fig6}

399: \end{figure}

400:

401: \subsection{Discussion}

402:

403: The results obtained show that LSF and PAE estimators

404: produced basically the same results, whereas all MLE derived

405: exponents are a little higher. If we take MLE as the best estimator,

406: the other two suffered a bias of 8\%, 6\%, 5\% and 4\% for 1970,

407: 1980, 1991 and 2000, respectively. Those biases are well within

408: the error obtained with PAE estimator, showing that once the

409: statistical fluctuations at the tail are successfully reduced by

410: means of an appropriate logarithmic binning (appropriate choice of

411: step and $x_{\mathrm{min}}$), LSF estimator provides a good

412: methodology. In fact, the bias decreases from its maximum in 1970 to

413: its minimum in 2000 simultaneously to a decrease in the statistical

414: fluctuations at the tail in these same years, brought about by the

415: introduction in the sample of more observed values at the tail due

416: to the increase in the number of cities with more than a million

417: inhabitants. In addition, a visual inspection of the fits in figures

418: \ref{fig3}, \ref{fig4}, \ref{fig5}, \ref{fig6} shows that MLE appears

419: to be a better fitting methodology when statistical fluctuations are

420: larger (1970 and 1980) as compared to smaller fluctuations in the data

421: stemming from the 1991 and 2000 data sets.

422:

423: As an extension of our analysis it is interesting to probe why

424: other authors obtain different results from the universal value

425: of $\alpha \approx 2$ for the power law exponent of cities, apart

426: from the large fluctuations at the tail and LSF fitting mentioned

427: above. For instance, \cite{mss} reported $\alpha \approx 1$ for

428: cities in Indonesia for the 1961 to 1990 decennial censuses. For

429: Indonesia's year 2000 census they found an exponent smaller than

430: one (see \cite{mss}, table 2). Inasmuch as we saw above that a

431: normalized power law must have $\alpha > 1$, a possible, and likely,

432: cause for these unexpected results is the absence of, or inappropriate,

433: $x_{\mathrm{min}}$ definition for their samples. Then, without

434: a proper normalization it is probable that their exponent estimates

435: suffered contamination from the region of the plot where there is

436: no power law behaviour. In other words, the set of observed values

437: from where \cite{mss} calculated $\alpha$ was probably contaminated

438: with data from small cities with few inhabitants, and which should

439: have been removed from the data set used to calculate $\alpha$. As

440: seen above, finding $x_{\mathrm{min}}$ is a critical step to avoid

441: such a contamination.

442:

443: To summarize our results, conservative estimates for the exponent of

444: the Zipf law in Brazilian cities are reached by taking all methods

445: within the error margin. That results in $\alpha = 2.22 \pm 0.34$ for

446: 1970 and 1980, and $\alpha=2.26 \pm 0.11$ for 1991 and 2000. On the

447: other hand, accurate results come from MLE estimates, producing

448: $\alpha=2.41$ for 1970 and $\alpha=2.36$ for the other years.

449:

450: \section{Conclusion}

451:

452: In this paper we have discussed the Zipf law in Brazilian cities. We

453: have obtained data from censuses carried out in Brazil in the years of

454: 1970, 1980, 1991 and 2000 from where we selected a sample which included

455: only cities with 30,000 or more inhabitants. Then we calculated the

456: cumulative distribution function $\mathcal{P}(x_i)$ of Brazilian cities,

457: which gives the probability that a city has a population equal or bigger

458: than $x_i$. We found that this distribution does follow a decaying

459: power law, whose exponent $\alpha$ was estimated by three different

460: methods: maximum likelihood estimator, least squares fitting and average

461: parameter estimator. Our results show that a conservative estimate,

462: which includes the results of all three methods, produces

463: $\alpha = 2.22 \pm 0.34$ in 1970 and 1980, and $\alpha=2.26 \pm 0.11$

464: for 1991 and 2000. More accurate results are given by the maximum

465: likelihood estimator, showing $\alpha=2.41$ for 1970 and $\alpha=2.36$

466: for all other years.

467:

468: \begin{thebibliography}{999}

469: \bibitem{au}  F. Auerbach, {\em Petermanns Geographische Mitteilungen}

470:         {\bf 59} (1913), 74-76

471: \bibitem{z}   G.K. Zipf, {\em Human Behaviour and the Principle of Least

472:         Effort}, Addison-Wesley, Reading, 1949

473: \bibitem{zan} D.H. Zanette, S.C. Manrubia, {\em Phys. Rev. Let.}

474:         {\bf 79} (1997) 523

475: \bibitem{mas} G.\ Malescio, N.V.\ Dokholyan, S.V.\ Buldyrev, H.\ Eugene

476:         Stanley, {\em preprint}, cond-mat/0005178 v1 (2000)

477: \bibitem{n05} M.E.J.\ Newman, {\em Contemporary Physics} {\bf 46} (2005)

478:         323, cond-mat/0412004 v2

479: \bibitem{g04} M.L. Goldstein, S.A. Morris, G.G. Yen,

480:         {\em Eur. Phys. J.} {\bf 41B} (2004) 255, cond-mat/0402322 v3

481: \bibitem{mss} I.\ Mulianta, H.\ Situngkir, Y.\ Surya, {\em preprint},

482:         nlin.PS/0409006 v1 (2004)

483: \end{thebibliography}

484: \end{document}

485: