0608:physics0608297/t.tex

1: \documentclass[aps]{revtex4}

2: %\documentclass[prb]{revtex4}% Physical Review B

3:

4: \usepackage{graphics,epsfig}

5: \usepackage{amsmath}

6: \usepackage{amssymb}

7: \usepackage{amsfonts}

8: \usepackage{rotate}

9: \usepackage{epsfig}

10:

11: \begin{document}

12:

13:

14: \title{A network-based prediction of retail stores commercial

15: categories and optimal locations}

16:

17: \author{Pablo Jensen}

18:

19: \email{pablo.jensen@ens-lyon.fr}

20: \affiliation{Laboratoire de Physique, CNRS UMR 5672, Ecole Normale

21:   Sup\'erieure de Lyon, 46 All\'ee d'Italie, 69364 Lyon Cedex 07,

22:   France \\ Laboratoire d'Economie des Transports, CNRS UMR 5593,

23:   ISH-Universit\'e Lyon-2, 14, Av. Berthelot, 69007 Lyon, France}

24: \date{\today}

25:

26: \begin{abstract}

27:   I study the spatial organization of retail commercial activities.

28:   These are organized in a network comprising ``anti-links'', i.e.

29:   links of negative weight. From pure location data, network analysis

30:   leads to a community structure that closely follows the commercial

31:   classification of the US Department of Labor. The interaction

32:   network allows to build a 'quality' index of optimal location niches

33:   for stores, which has been empirically tested.

34: \end{abstract}

35:

36: \pacs{89.65.-s;89.75.-k;05.65.+b}

37:

38:

39: \maketitle

40:

41: %\vspace{.5cm}

42:

43: Walking in any big city reveals the extreme diversity of retail store

44: location patterns. Fig. \ \ref{carte} shows a map of the city of Lyon

45: (France) including all the drugstores, shoes stores and furniture

46: stores. A qualitative commercial organisation is visible in this map :

47: shoe stores aggregate at the town shopping center, while furniture

48: stores are partially dispersed on secondary poles and drugstores are

49: strongly dispersed across the whole town.  Understanding this kind of

50: features and, more generally, the commercial logics of the spatial

51: distribution of retail stores, seems a complex task. Many factors

52: could play important roles, arising from the distincts characteristics

53: of the stores or the location sites.  Stores differ by product sold,

54: surface, number of employees, total sales per month or inauguration

55: date.  Locations differ by price of space, local consumer

56: characteristics, visibility (corner locations for example) or

57: accessibility. Only by taking into account most of these complex

58: features of retail world can we hope to understand the logics of store

59: commercial strategies, let alone finding potentially interesting

60: locations for new businesses.

61:

62: Here I show that location data alone suffices to reveal many important

63: facts about the commercial organisation of retail trade \cite{data}.

64: First, I quantify the interactions among activities using network

65: analysis. I find a few homogeneous commercial categories for the 55

66: trades in Lyon. These groups closely match the usual commercial

67: categories : personal services, home furniture, food stores and

68: apparel stores.  Second, I introduce a quality indicator for the

69: location of a given activity and empirically test its relevance. I

70: stress that these results are obtained from a mathematical analysis of

71: solely {\it location} data.  This supports the importance of business

72: location for retailers, a point that is intuitively well-known in the

73: field, and summarized by the retailing ``mantra'' : {\it the three

74:   points that matter most in a retailer's world are : location,

75:   location and ...  location}.

76:

77: \vspace{.5cm}

78: {\bf Finding meaningful commercial categories}

79: \vspace{.2cm}

80:

81: To analyze in detail the interactions of stores of different trades, I

82: start from the spatial pair correlations. These functions are used to

83: reveal store-store interactions, as atom-atom interactions are deduced

84: from atomic distribution functions in materials science \cite{pair}.

85: Tools from that discipline cannot be used directly, though, because

86: there is no underlying crystalline substrate to define a reference

87: distribution.  Neither is a homogeneous space appropriate, since the

88: density of consumers is not uniform and some town areas cannot host

89: stores, as is clearly seen in the blank spaces of the map (due to the

90: presence of rivers, parks, or residential spaces defined by town

91: regulations).

92:

93: A clever idea proposed by G. Duranton and H. G. Overman

94: \cite{duranton} is to take as reference a random distribution of

95: stores located on the array of {\it all existing} sites (black dots in

96: Fig. \ \ref{carte}).  This is the best way to take into account

97: automatically the geographical peculiarities of each town. I then use

98: the ``M'' index \cite{puech} to quantify the spatial interactions

99: between categories of stores.  The definition of $M_{AB}$ at a given

100: distance {\it r} is straightforward : draw a disk of radius {\it r}

101: around each store of category A, count the total number of stores

102: ($n_{tot}$), the number of B stores ($n_B$) and compare the ratio $n_B

103: / n_{tot}$ to the average ratio $N_B / N_{tot}$ where capital N refer

104: to the total number of stores in town. If this ratio, averaged over

105: all A stores, is larger than 1, this means that A ``attracts'' B,

106: otherwise that there is repulsion between these two activities

107: \cite{note_ave}. To ascertain the statistical significance of the

108: repulsion or attraction, I have simulated 800 random distributions of

109: $n_B$ stores on all possible sites, calculating for each distribution

110: the $n_B / n_{tot}$ ratio around the same A locations.  This gives the

111: statistical fluctuations and allows to calculate how many times the

112: random ratio deviates from 1 as much as the real one.  I assume that

113: if there are less than $3\%$ random runs that deviate more than the

114: real one, the result is significant ($97\%$ confidence interval). I

115: have chosen $r=100m$ as this represents a typical distance a customer

116: accepts to walk to visit different stores \cite{note_dist}

117:

118: I can now define a network structure of retail stores.  Nodes are

119: defined as the 55 retail activities (Table I). The weighted

120: \cite{weight} links are given by $a_{AB} \equiv \log(M_{AB})$, which

121: reveal the spatial attraction or repulsion between activities A and B

122: \cite{note_stat}.  This retail network represents the first a social

123: network with quantified ``anti-links'', i.e.  repulsive links between

124: nodes \cite{repulsion}.  The anti-links add to the usual (positive)

125: links and to the absence of any significant link, forming an essential

126: part of the network. If only positive links are used, the analysis

127: leads to different results, which are less satisfactory (see below).

128:

129: To divide the store network into communities, I adapt the ``Potts''

130: algorithm \cite{potts}. This algorithm identifies the store types as

131: magnetic spins and groups them in several homogeneous magnetic domains

132: to minimize the system energy.  Anti-links can then be interpreted as

133: anti-ferromagnetic interactions between the spins.  Therefore, this

134: algorithm naturally groups the activities that attract each other, and

135: places trades that repel into different groups. A natural definition

136: \cite{potts,radicchi} of the satisfaction ($-1 \leq s_i \leq 1$) of

137: site $i$ to belong to group $\sigma_i$ is :

138:

139: \begin{equation}

140: s_i \equiv {{\sum_{j \neq i} a_{ij} \pi_{\sigma_i \sigma_j}} \over {\sum_{j \neq i} |a_{ij}| }}

141: \label{s}

142: \end{equation}

143:

144: where $\pi_{\sigma_i \sigma_j} \equiv 1$ if $\sigma_i = \sigma_j$ and

145: $\pi_{\sigma_i \sigma_j} \equiv -1$ if $\sigma_i \not= \sigma_j$.

146:

147: To obtain the group structure, I run a standard simulated annealing

148: algorithm \cite{sa} to maximize the overall site satisfaction (without

149: the normalizing denominator) :

150:

151: \begin{equation}

152: K \equiv \sum_{i,j = 1,55; i \neq j} a_{ij} \pi_{\sigma_i \sigma_j}

153: \label{K}

154: \end{equation}

155:

156: Pott's algorithm divides the retail store network into five

157: homogeneous groups (Table I, note that the number of groups is not

158: fixed in advance but a variable of the maximisation). This group

159: division reaches a global satisfaction of $80 \%$ of the maximum K

160: value and captures more than $90 \%$ of positive interactions inside

161: groups.  Except for one category (``Repair of shoes''), our groups are

162: communities in the strong sense of Ref.  \cite{radicchi}.  This means

163: that the grouping achieves a positive satisfaction for every element

164: of the group. This is remarkable since hundreds of ``frustrated''

165: triplets exist \cite{note_frust}. Taking into account only the

166: positive links and using the modularity algorithm \cite{modularity}

167: leads to two large communities, whose commercial interpretation is

168: less clear.

169:

170: Two arguments ascertain the commercial relevance of this

171: classification.  First, the grouping closely follows the usual

172: categories defined in commercial classifications, as the U.S.

173: Department of Labor Standard Industrial Classification System

174: \cite{sic} (see Table I). It is remarkable that, starting exclusively

175: from location data, one can recover most of such a significant

176: commercial structure. Such a significant classification has also been

177: found for Brussels and Marseilles stores (to be presented elsewhere),

178: suggesting the universality of the classification for European towns.

179: There are only a few exceptions, mostly non-food proximity stores

180: which belong to the ``Food store'' group or vice-versa. Second, the

181: different groups are homogeneous in relation to correlation with

182: population density. The majority of stores from groups 1 and 2 (18 out

183: of 26) locate according to population density, while most of the

184: remaining stores (22 out of 29) ignore this characteristic

185: \cite{note_corr}. Exceptions can be explained by the small number of

186: stores or the strong heterogeneities \cite{note_hetero} of those

187: activities.

188:

189: \vspace{.5cm}

190: {\bf From interactions to location niches}

191: \vspace{.2cm}

192:

193: Thanks to the quantification of retail store interactions, we can

194: construct a mathematical index to automatically detect promising

195: locations for retail stores. The basic idea is that a location that

196: resembles the average location of the actual bakeries might well be a

197: good location for a new bakery. To characterize the average

198: environment of activity $i$, we use the average number of neighbor

199: stores (inside a circle of radius 100 m) of all the activities $j$,

200: thus obtaining the list of {\it average} $\overline{nei_{ij}}$.  We

201: then use the network matrix $a_{ij}$ to quantify deviations from this

202: average. For example, if an environment lacks a bakery (or other shops

203: that are usually repelled by bakeries), this should increase the

204: suitability of that location. We then calculate the quality $Q_i(x,y)$

205: of an environment around (x,y) for an activity $i$ as :

206:

207: \begin{equation}

208: Q_i(x,y) \equiv \sum_{j = 1,55} a_{ij} (nei_{ij}(x,y)-\overline{nei_{ij}})

209: \label{quality}

210: \end{equation}

211:

212: where $nei_{ij}(x,y)$ represents the number of neighbor stores around

213: x,y. To calculate the location quality for an existing store, one

214: removes it from town and calculates $Q$ at its location.

215:

216: As often in social contexts, it is difficult to test empirically the

217: relevance of our quality index. In principle, one should open several

218: bakeries at different locations and test whether those located at the

219: ``best'' places (as defined by $Q$) are on average more successful.

220: Since it may be difficult to fund this kind of experiment, I use

221: location data from two years, 2003 and 2005. It turns out (Fig. \

222: \ref{0305}) that bakeries closed between these two years are located

223: on significantly lower quality sites. Inversely, new bakeries (not

224: present in the 2003 database) do locate preferently on better places

225: than a random choice would dictate. This stresses the importance of

226: location for bakeries, and the relevance of the quality here defined

227: to quantify the interest of each possible site. Possibly, the

228: correlation would be less satisfactory for retail activities whose

229: locations are not so critical for commercial success. Practical

230: applications of $Q$ are under development together with Lyon's Chamber

231: of Commerce and Industry : advice to newcommers on good locations,

232: advice to city mayor's on improving commercial opportunities on

233: specific town sectors.

234:

235: This study shows that, through locations, the retail world is now

236: accessible to physicists. This opens many research directions, such as

237: : are there optimum store distributions, whose overall quality is

238: higher than the actual one? Can one define store-store interaction

239: "potentials" by analogy with those used for atomic species? Moreover,

240: new tools are needed to describe networks containing anti-links,

241: starting with a basic one : ``how to define a node degree?''.

242:

243: \vspace{1cm}

244:

245: \begin{center} {\bf Table I} Retail store groups obtained from Pott's

246:   algorithm. Our groups closely match the categories of the U.S.

247:   Department of Labor Standard Industrial Classification (SIC) System

248:   : group 1 corresponds to Personal Services, 2 to Food stores, 3 to

249:   Home Furniture, 4 to Apparel and Accessory Stores and 5 to Used

250:   Merchandise Stores. The columns correspond to : group number,

251:   activity name, satisfaction, activity concentration (see below),

252:   median distance travelled by costumers, correlation with population

253:   density (U stands for uncorrelated, P for Population correlated) and

254:   finally number of stores of that activity in Lyon. The activity

255:   concentration $c_{same}$ represents the number of stores located

256:   nearer than 100 m from another similar store, normalized to the

257:   number expected from a random distribution. For space reasons, only

258:   activities with more than 50 stores are shown.

259: \end{center}

260:

261: {\it

262: \begin{tabbing}

263:   group\=activity  \hspace{6cm} \= s \hspace{.95cm} \= $c_{same}$

264:   \hspace{.5cm} \=distance \hspace{.5cm}

265:   \= pop corr \hspace{.5cm} \= $N_{stores}$\\ \\

266:   1\>bookstores and newspapers\>1.00\>1.00\> \>U\>250\\

267:   1\>Repair of electronic household goods\>0.71\>1.00\>1.16\>P\>54\\

268:   1\>make up, beauty treatment\>0.68\>1.00\>1.20\>P\>255\\

269:   1\>hairdressers\>0.67\>0.67\>0.99\>P\>844\\

270:   1\>Power Laundries\>0.66\>1.00\>1.48\>P\>210\\

271:   1\>Drug Stores\>0.55\>0.21\>1.09\>P\>235\\

272:   1\>Bakery (from frozen bread)\>0.54\>0.29\>0.00\>P\>93\\ \\

273:

274:   2\>Other repair of personal goods\>1.00\>1.00\> \>U\>111\\

275:   2\>Photographic Studios\>1.00\>1.00\> \>P\>94\\

276:   2\>delicatessen\>0.91\>1.00\>0.77\>U\>246\\

277:   2\>grocery ( surface $< 120 m^2$)\>0.77\>0.61\>0.00\>P\>294\\

278:   2\>cakes\>0.77\>1.00\>0.35\>P\>99\\

279:   2\>Miscellaneous food stores\>0.75\>2.22\>0.00\>P\>80\\

280:   2\>bread, cakes\>0.70\>1.00\> \>U\>56\\

281:   2\>tobacco products\>0.70\>0.38\> \>P\>162\\

282:   2\>hardware, paints (surface $< 400 m^2$)\>0.69\>1.00\> \>U\>63\\

283:   2\>meat \>0.64\>1.41\>0.86\>P\>244\\

284:   2\>flowers\>0.58\>0.65\>1.52\>P\>200\\

285:   2\>retail bakeries (home made)\>0.47\>0.36\>0.00\>P\>248\\

286:   2\>alcoholic and other beverages\>0.17\>1.00\>0.77\>U\>67\\ \\

287:

288:   3\>Computer\>1.00\>1.00\>3.07\>P\>251\\

289:   3\>medical and orthopaedic goods\>1.00\>1.00\> \>U\>63\\

290:   3\>Sale and repair of motor vehicles\>1.00\>1.00\>1.68\>P\>285\\

291:   3\>sport, fishing, camping goods\>1.00\>1.00\>2.73\>U\>119\\

292:   3\>Sale of motor vehicle accessories\>0.67\>0.00\>0.00\>U\>54\\

293:   3\>furniture, household articles \>0.62\>3.15\>2.57\>U\>172\\

294:   3\>household appliances\>0.48\>1.00\>3.08\>U\>171\\

295:

296:   4\>cosmetic and toilet articles\>1.00\>2.09\>2.57\>U\>98\\

297:   4\>Jewellery\>1.00\>5.85\>2.77\>U\>230\\

298:   4\>shoes\>1.00\>5.76\>2.43\>U\>178\\

299:   4\>textiles\>1.00\>2.39\>3.87\>U\>103\\

300:   4\>watches, clocks and jewellery\>1.00\>5.02\>2.77\>U\>92\\

301:   4\>clothing\>0.91\>5.10\>3.16\>U\>914\\

302:   4\>tableware\>0.83\>1.96\>2.43\>U\>183\\

303:   4\>opticians\>0.78\>1.98\>1.55\>U\>137\\

304:   4\>Other retail sale in specialized stores\>0.77\>1.51\>2.32\>U\>367\\

305:   4\>Other personal services \>0.41\>1.00\> \>U\>92\\

306:   4\>Repair of boots, shoes \>-0.18\>1.00\> \>U\>77\\ \\

307:

308:   5\>second-hand goods \>0.97\>16.13\>3.52\>U\>410\\

309:   5\>framing, upholstery\>0.81\>1.67\> \>U\>135\\

310:

311: \end{tabbing}

312: }

313:

314: \vspace{.4cm}

315:

316: \begin{thebibliography}{100}

317:

318: \bibitem{data} Christophe Baume and Frederic Miribel (commerce

319:   chamber, Lyon) have kindly provided extensive location data for 8500

320:   stores of the city of Lyon.

321:

322: \bibitem{pair} See for example, T. Egami and S. Billinge, {\it

323:     Underneath the Bragg Peaks : Structural Analysis of Complex

324:     Materials}, Pergamon Materials Series (2003)

325:

326: \bibitem{duranton} G. Duranton and H. G. Overman, Review of Economic

327:   Studies (to be published, 2006), available at \begin{verbatim}

328:   http://158.143.49.27/~overman/research/nonrandom_final.pdf

329: \end{verbatim} (accessed

330:   Sept. 7th 2005).

331:

332: \bibitem{puech} E. Marcon and F. Puech, to be published (2006),

333:   available at \begin{verbatim}

334:   http://team.univ-paris1.fr/teamperso/puech/textes/Marcon-Puech_ImprovingDistance-BasedMethods.pdf \end{verbatim},

335:   (accessed Sept. 7th 2005).

336:

337: \bibitem{note_ave}

338: One could argue that the average is dominated by the denser regions,

339: thus eliminating the influence of peripheral areas. This effect

340: exists, even if it is partially corrected through the ponderation by

341: the total number of stores. I have tried several other statistical

342: representation of the relative concentration, such as the mode or the

343: median, but none performed as well as the average. The median, for

344: example, fails because most $A$ stores have no $B$ stores around them,

345: leading to mostly null interaction coefficients.

346:

347: \bibitem{note_dist}

348: Alternatively, one can fully count stores closer than 50 m and linearly

349: decrease the counting coefficient until 150 m. This leads to similar

350: results.

351:

352: \bibitem{weight}

353: Important differences introduced by including weighted links are

354: stressed for example in M. Barthelemy, A. Barrat, R. Pastor-Satorras

355: and A. Vespignani, Physica A {\bf 346} 34 (2005)

356:

357: \bibitem{note_stat} For a pair interaction to be significant, I demand

358:   that both $a_{AB}$ and $a_{BA}$ be different from zero, to avoid

359:   artificial correlations \cite{puech}.  For Lyon's city, I end up

360:   with 300 significant interactions (roughly $10 \%$ of all possible

361:   interactions), of which half are repulsive.

362:

363: \bibitem{repulsion} While store-store attraction is easy to justify

364:   (the ``market share'' strategy, where stores gather in commercial

365:   poles, to attract costumers), direct repulsion is generally limited

366:   to stores of the same trade which locate far from each other to

367:   capture neighbor costumers (the ``market power'' strategy).  The

368:   repulsion quantified here is induced (indirectly) by the price of

369:   space (the sq. meter is too expensive downtown for car stores) or

370:   different location strategies. For introductory texts on retail

371:   organization ans its spatial analysis, see : B.J.L. Berry et al.

372:   {\it Market Centers and Retail Location: Theory and Application},

373:   Englewood Cliffs, N.J.: Prentice Hall (1988) and the Web book on

374:   regional science by E. M. Hoover and F. Giarratani, available at

375:   http://www.rri.wvu.edu/WebBook/Giarratani/contents.htm.

376:

377: \bibitem{potts} J. Reichardt and S. Bornholdt, Phys. Rev. Lett. {\bf

378:     93} 218701 (2004).  Note that the presence of anti-links

379:   automatically ensures that the ground-state is not the homogeneous

380:   one, when all spins point into the same direction (i.e. all nodes

381:   belong to the same cluster). Then, there is no need then of a

382:   $\gamma$ coefficient here.

383:

384: \bibitem{radicchi} F. Radicchi, C. Castellano, F. Cecconi, V. Loreto,

385:   and D. Parisi. Publ. Natl. Acad. Sci. USA, {\bf 101} 2658 (2004).

386:

387: \bibitem{sa} S. Kirkpatrick, C.D. Gelatt Jr. and M. P. Vecchi, Science

388:   {\bf 220}, 671 (1983)

389:

390: \bibitem{note_frust} A frustrated (A,B,C) triplet is one for which A

391:   attracts B, B attracts C, but A repels C, which is the case for the

392:   triplet shown in Fig. \ \ref{carte}.

393:

394: \bibitem{modularity} M. E. J. Newman. and M Girvan, Phys. Rev. E {\bf

395:     69} 026113 (2004)

396:

397: \bibitem{sic} See for example the U.S. Department of Labor Internet

398:   page : \begin{verbatim} http://www.osha.gov/pls/imis/sic_manual.html

399: \end{verbatim} (accessed Sep. $28^{th}$, 2005)

400:

401: \bibitem{note_corr} To calculate the correlation of store and

402:   population density for a given activity, I count both densities for

403:   each of the 50 commercially homogeneous sectors of Lyon. I then test

404:   with standard econometric tools (see J. H. Stock and M. W. Watson,

405:   {\it Introduction to Econometrics}, Addison-Wesley, 2003) the

406:   hypothesis that store and population densities are uncorrelated

407:   (zero slope of the least squares fit), with a confidence interval of

408:   $80\%$.

409:

410: \bibitem{note_hetero} Several retail categories defined by the

411:   Commerce Chamber are unfortunately heterogeneous : for example,

412:   ``Meat'' refers to the proximity butcher stores, but also to a big

413:   commercial pole of casher butchers who attract costumers from far

414:   away towns. ``Bookstores and newspapers'' refers to big stores

415:   selling books and CDs as well as to the proximity newspaper

416:   stand. Instead, bakeries are precisely classified in 4 different

417:   categories : it is a French commercial structure!

418:

419: \end{thebibliography}

420:

421: \begin{figure}

422: \centerline{

423: \epsfxsize=6cm

424: \epsfbox{carte-lyon.eps}}

425: \caption{\vspace{.3cm}

426: (Color online) Map of Lyon showing the location of all the retail

427: stores, shoe stores, furniture dealers and drugstores}

428: \label{carte}

429: \end{figure}

430:

431: \vspace{2cm}

432:

433: \begin{figure}

434: \centerline{(a) \hspace{.1cm}

435: \epsfxsize=4cm

436: \epsfbox{0305a.eps}

437: (b) \hspace{.1cm}

438: \epsfxsize=4cm

439: \epsfbox{0305b.eps}}

440: \caption{ (Color online) The landscape defined by the quality index is closely

441:   correlated to the location decisions of bakeries. (a) The 19

442:   bakeries that closed between 2003 and 2005 had an average quality of

443:   $-2.2$ x $ 10^{-3}$ to be compared to the average of all bakeries ($4.6$

444:    x $10^{-3}$), the difference being signifcative with probability

445:   0.997). Taking into account the small number of closed bakeries and

446:   the importance of many other factors in the closing decision (family

447:   problems, bad management...), the sensitivity of the quality index

448:   is remarkable. (b) Concerning the 80 new bakeries in the 2005

449:   database (20 truly new, the rest being an improvement of the

450:   database), their average quality is $-6.8$ x $10^{-4}$, to be compared

451:   to the average quality of all possible sites in Lyon ($-1.6$ x

452:   $10^{-2}$), a difference significant with probability higher than

453:   0.9999).}

454: \label{0305}

455: \end{figure}

456: \end{document}

457: