physics0608297/t.tex
1: \documentclass[aps]{revtex4}
2: %\documentclass[prb]{revtex4}% Physical Review B
3: 
4: \usepackage{graphics,epsfig}
5: \usepackage{amsmath}
6: \usepackage{amssymb}
7: \usepackage{amsfonts}
8: \usepackage{rotate}
9: \usepackage{epsfig}
10: 
11: \begin{document}
12: 
13: 
14: \title{A network-based prediction of retail stores commercial 
15: categories and optimal locations}
16: 
17: \author{Pablo Jensen} 
18: 
19: \email{pablo.jensen@ens-lyon.fr}
20: \affiliation{Laboratoire de Physique, CNRS UMR 5672, Ecole Normale
21:   Sup\'erieure de Lyon, 46 All\'ee d'Italie, 69364 Lyon Cedex 07,
22:   France \\ Laboratoire d'Economie des Transports, CNRS UMR 5593,
23:   ISH-Universit\'e Lyon-2, 14, Av. Berthelot, 69007 Lyon, France}
24: \date{\today}
25: 
26: \begin{abstract}
27:   I study the spatial organization of retail commercial activities.
28:   These are organized in a network comprising ``anti-links'', i.e.
29:   links of negative weight. From pure location data, network analysis
30:   leads to a community structure that closely follows the commercial
31:   classification of the US Department of Labor. The interaction
32:   network allows to build a 'quality' index of optimal location niches
33:   for stores, which has been empirically tested.
34: \end{abstract}
35: 
36: \pacs{89.65.-s;89.75.-k;05.65.+b}
37: 
38: 
39: \maketitle
40: 
41: %\vspace{.5cm}
42: 
43: Walking in any big city reveals the extreme diversity of retail store
44: location patterns. Fig. \ \ref{carte} shows a map of the city of Lyon
45: (France) including all the drugstores, shoes stores and furniture
46: stores. A qualitative commercial organisation is visible in this map :
47: shoe stores aggregate at the town shopping center, while furniture
48: stores are partially dispersed on secondary poles and drugstores are
49: strongly dispersed across the whole town.  Understanding this kind of
50: features and, more generally, the commercial logics of the spatial
51: distribution of retail stores, seems a complex task. Many factors
52: could play important roles, arising from the distincts characteristics
53: of the stores or the location sites.  Stores differ by product sold,
54: surface, number of employees, total sales per month or inauguration
55: date.  Locations differ by price of space, local consumer
56: characteristics, visibility (corner locations for example) or
57: accessibility. Only by taking into account most of these complex
58: features of retail world can we hope to understand the logics of store
59: commercial strategies, let alone finding potentially interesting
60: locations for new businesses.
61: 
62: Here I show that location data alone suffices to reveal many important
63: facts about the commercial organisation of retail trade \cite{data}.
64: First, I quantify the interactions among activities using network
65: analysis. I find a few homogeneous commercial categories for the 55
66: trades in Lyon. These groups closely match the usual commercial
67: categories : personal services, home furniture, food stores and
68: apparel stores.  Second, I introduce a quality indicator for the
69: location of a given activity and empirically test its relevance. I
70: stress that these results are obtained from a mathematical analysis of
71: solely {\it location} data.  This supports the importance of business
72: location for retailers, a point that is intuitively well-known in the
73: field, and summarized by the retailing ``mantra'' : {\it the three
74:   points that matter most in a retailer's world are : location,
75:   location and ...  location}.
76: 
77: \vspace{.5cm}
78: {\bf Finding meaningful commercial categories}
79: \vspace{.2cm}
80: 
81: To analyze in detail the interactions of stores of different trades, I
82: start from the spatial pair correlations. These functions are used to
83: reveal store-store interactions, as atom-atom interactions are deduced
84: from atomic distribution functions in materials science \cite{pair}.
85: Tools from that discipline cannot be used directly, though, because
86: there is no underlying crystalline substrate to define a reference
87: distribution.  Neither is a homogeneous space appropriate, since the
88: density of consumers is not uniform and some town areas cannot host
89: stores, as is clearly seen in the blank spaces of the map (due to the
90: presence of rivers, parks, or residential spaces defined by town
91: regulations).
92: 
93: A clever idea proposed by G. Duranton and H. G. Overman
94: \cite{duranton} is to take as reference a random distribution of
95: stores located on the array of {\it all existing} sites (black dots in
96: Fig. \ \ref{carte}).  This is the best way to take into account
97: automatically the geographical peculiarities of each town. I then use
98: the ``M'' index \cite{puech} to quantify the spatial interactions
99: between categories of stores.  The definition of $M_{AB}$ at a given
100: distance {\it r} is straightforward : draw a disk of radius {\it r}
101: around each store of category A, count the total number of stores
102: ($n_{tot}$), the number of B stores ($n_B$) and compare the ratio $n_B
103: / n_{tot}$ to the average ratio $N_B / N_{tot}$ where capital N refer
104: to the total number of stores in town. If this ratio, averaged over
105: all A stores, is larger than 1, this means that A ``attracts'' B,
106: otherwise that there is repulsion between these two activities
107: \cite{note_ave}. To ascertain the statistical significance of the
108: repulsion or attraction, I have simulated 800 random distributions of
109: $n_B$ stores on all possible sites, calculating for each distribution
110: the $n_B / n_{tot}$ ratio around the same A locations.  This gives the
111: statistical fluctuations and allows to calculate how many times the
112: random ratio deviates from 1 as much as the real one.  I assume that
113: if there are less than $3\%$ random runs that deviate more than the
114: real one, the result is significant ($97\%$ confidence interval). I
115: have chosen $r=100m$ as this represents a typical distance a customer
116: accepts to walk to visit different stores \cite{note_dist}
117: 
118: I can now define a network structure of retail stores.  Nodes are
119: defined as the 55 retail activities (Table I). The weighted
120: \cite{weight} links are given by $a_{AB} \equiv \log(M_{AB})$, which
121: reveal the spatial attraction or repulsion between activities A and B
122: \cite{note_stat}.  This retail network represents the first a social
123: network with quantified ``anti-links'', i.e.  repulsive links between
124: nodes \cite{repulsion}.  The anti-links add to the usual (positive)
125: links and to the absence of any significant link, forming an essential
126: part of the network. If only positive links are used, the analysis
127: leads to different results, which are less satisfactory (see below).
128: 
129: To divide the store network into communities, I adapt the ``Potts''
130: algorithm \cite{potts}. This algorithm identifies the store types as
131: magnetic spins and groups them in several homogeneous magnetic domains
132: to minimize the system energy.  Anti-links can then be interpreted as
133: anti-ferromagnetic interactions between the spins.  Therefore, this
134: algorithm naturally groups the activities that attract each other, and
135: places trades that repel into different groups. A natural definition
136: \cite{potts,radicchi} of the satisfaction ($-1 \leq s_i \leq 1$) of
137: site $i$ to belong to group $\sigma_i$ is :
138: 
139: \begin{equation}
140: s_i \equiv {{\sum_{j \neq i} a_{ij} \pi_{\sigma_i \sigma_j}} \over {\sum_{j \neq i} |a_{ij}| }}
141: \label{s}
142: \end{equation}
143: 
144: where $\pi_{\sigma_i \sigma_j} \equiv 1$ if $\sigma_i = \sigma_j$ and
145: $\pi_{\sigma_i \sigma_j} \equiv -1$ if $\sigma_i \not= \sigma_j$.
146: 
147: To obtain the group structure, I run a standard simulated annealing
148: algorithm \cite{sa} to maximize the overall site satisfaction (without
149: the normalizing denominator) :
150: 
151: \begin{equation}
152: K \equiv \sum_{i,j = 1,55; i \neq j} a_{ij} \pi_{\sigma_i \sigma_j} 
153: \label{K}
154: \end{equation}
155: 
156: Pott's algorithm divides the retail store network into five
157: homogeneous groups (Table I, note that the number of groups is not
158: fixed in advance but a variable of the maximisation). This group
159: division reaches a global satisfaction of $80 \%$ of the maximum K
160: value and captures more than $90 \%$ of positive interactions inside
161: groups.  Except for one category (``Repair of shoes''), our groups are
162: communities in the strong sense of Ref.  \cite{radicchi}.  This means
163: that the grouping achieves a positive satisfaction for every element
164: of the group. This is remarkable since hundreds of ``frustrated''
165: triplets exist \cite{note_frust}. Taking into account only the
166: positive links and using the modularity algorithm \cite{modularity}
167: leads to two large communities, whose commercial interpretation is
168: less clear.
169: 
170: Two arguments ascertain the commercial relevance of this
171: classification.  First, the grouping closely follows the usual
172: categories defined in commercial classifications, as the U.S.
173: Department of Labor Standard Industrial Classification System
174: \cite{sic} (see Table I). It is remarkable that, starting exclusively
175: from location data, one can recover most of such a significant
176: commercial structure. Such a significant classification has also been
177: found for Brussels and Marseilles stores (to be presented elsewhere),
178: suggesting the universality of the classification for European towns.
179: There are only a few exceptions, mostly non-food proximity stores
180: which belong to the ``Food store'' group or vice-versa. Second, the
181: different groups are homogeneous in relation to correlation with
182: population density. The majority of stores from groups 1 and 2 (18 out
183: of 26) locate according to population density, while most of the
184: remaining stores (22 out of 29) ignore this characteristic
185: \cite{note_corr}. Exceptions can be explained by the small number of
186: stores or the strong heterogeneities \cite{note_hetero} of those
187: activities.
188: 
189: \vspace{.5cm}
190: {\bf From interactions to location niches}
191: \vspace{.2cm}
192: 
193: Thanks to the quantification of retail store interactions, we can
194: construct a mathematical index to automatically detect promising
195: locations for retail stores. The basic idea is that a location that
196: resembles the average location of the actual bakeries might well be a
197: good location for a new bakery. To characterize the average
198: environment of activity $i$, we use the average number of neighbor
199: stores (inside a circle of radius 100 m) of all the activities $j$,
200: thus obtaining the list of {\it average} $\overline{nei_{ij}}$.  We
201: then use the network matrix $a_{ij}$ to quantify deviations from this
202: average. For example, if an environment lacks a bakery (or other shops
203: that are usually repelled by bakeries), this should increase the
204: suitability of that location. We then calculate the quality $Q_i(x,y)$
205: of an environment around (x,y) for an activity $i$ as :
206: 
207: \begin{equation}
208: Q_i(x,y) \equiv \sum_{j = 1,55} a_{ij} (nei_{ij}(x,y)-\overline{nei_{ij}})
209: \label{quality}
210: \end{equation}
211: 
212: where $nei_{ij}(x,y)$ represents the number of neighbor stores around
213: x,y. To calculate the location quality for an existing store, one
214: removes it from town and calculates $Q$ at its location.
215: 
216: As often in social contexts, it is difficult to test empirically the
217: relevance of our quality index. In principle, one should open several
218: bakeries at different locations and test whether those located at the
219: ``best'' places (as defined by $Q$) are on average more successful.
220: Since it may be difficult to fund this kind of experiment, I use
221: location data from two years, 2003 and 2005. It turns out (Fig. \
222: \ref{0305}) that bakeries closed between these two years are located
223: on significantly lower quality sites. Inversely, new bakeries (not
224: present in the 2003 database) do locate preferently on better places
225: than a random choice would dictate. This stresses the importance of
226: location for bakeries, and the relevance of the quality here defined
227: to quantify the interest of each possible site. Possibly, the
228: correlation would be less satisfactory for retail activities whose
229: locations are not so critical for commercial success. Practical
230: applications of $Q$ are under development together with Lyon's Chamber
231: of Commerce and Industry : advice to newcommers on good locations,
232: advice to city mayor's on improving commercial opportunities on
233: specific town sectors.
234: 
235: This study shows that, through locations, the retail world is now
236: accessible to physicists. This opens many research directions, such as
237: : are there optimum store distributions, whose overall quality is
238: higher than the actual one? Can one define store-store interaction
239: "potentials" by analogy with those used for atomic species? Moreover,
240: new tools are needed to describe networks containing anti-links,
241: starting with a basic one : ``how to define a node degree?''.
242: 
243: \vspace{1cm}
244: 
245: \begin{center} {\bf Table I} Retail store groups obtained from Pott's
246:   algorithm. Our groups closely match the categories of the U.S.
247:   Department of Labor Standard Industrial Classification (SIC) System
248:   : group 1 corresponds to Personal Services, 2 to Food stores, 3 to
249:   Home Furniture, 4 to Apparel and Accessory Stores and 5 to Used
250:   Merchandise Stores. The columns correspond to : group number,
251:   activity name, satisfaction, activity concentration (see below),
252:   median distance travelled by costumers, correlation with population
253:   density (U stands for uncorrelated, P for Population correlated) and
254:   finally number of stores of that activity in Lyon. The activity
255:   concentration $c_{same}$ represents the number of stores located
256:   nearer than 100 m from another similar store, normalized to the
257:   number expected from a random distribution. For space reasons, only
258:   activities with more than 50 stores are shown.
259: \end{center}
260: 
261: {\it
262: \begin{tabbing} 
263:   group\=activity  \hspace{6cm} \= s \hspace{.95cm} \= $c_{same}$
264:   \hspace{.5cm} \=distance \hspace{.5cm} 
265:   \= pop corr \hspace{.5cm} \= $N_{stores}$\\ \\
266:   1\>bookstores and newspapers\>1.00\>1.00\> \>U\>250\\ 
267:   1\>Repair of electronic household goods\>0.71\>1.00\>1.16\>P\>54\\ 
268:   1\>make up, beauty treatment\>0.68\>1.00\>1.20\>P\>255\\ 
269:   1\>hairdressers\>0.67\>0.67\>0.99\>P\>844\\ 
270:   1\>Power Laundries\>0.66\>1.00\>1.48\>P\>210\\ 
271:   1\>Drug Stores\>0.55\>0.21\>1.09\>P\>235\\ 
272:   1\>Bakery (from frozen bread)\>0.54\>0.29\>0.00\>P\>93\\ \\
273:  
274:   2\>Other repair of personal goods\>1.00\>1.00\> \>U\>111\\ 
275:   2\>Photographic Studios\>1.00\>1.00\> \>P\>94\\ 
276:   2\>delicatessen\>0.91\>1.00\>0.77\>U\>246\\ 
277:   2\>grocery ( surface $< 120 m^2$)\>0.77\>0.61\>0.00\>P\>294\\ 
278:   2\>cakes\>0.77\>1.00\>0.35\>P\>99\\ 
279:   2\>Miscellaneous food stores\>0.75\>2.22\>0.00\>P\>80\\ 
280:   2\>bread, cakes\>0.70\>1.00\> \>U\>56\\ 
281:   2\>tobacco products\>0.70\>0.38\> \>P\>162\\ 
282:   2\>hardware, paints (surface $< 400 m^2$)\>0.69\>1.00\> \>U\>63\\ 
283:   2\>meat \>0.64\>1.41\>0.86\>P\>244\\ 
284:   2\>flowers\>0.58\>0.65\>1.52\>P\>200\\ 
285:   2\>retail bakeries (home made)\>0.47\>0.36\>0.00\>P\>248\\ 
286:   2\>alcoholic and other beverages\>0.17\>1.00\>0.77\>U\>67\\ \\
287: 
288:   3\>Computer\>1.00\>1.00\>3.07\>P\>251\\ 
289:   3\>medical and orthopaedic goods\>1.00\>1.00\> \>U\>63\\ 
290:   3\>Sale and repair of motor vehicles\>1.00\>1.00\>1.68\>P\>285\\ 
291:   3\>sport, fishing, camping goods\>1.00\>1.00\>2.73\>U\>119\\ 
292:   3\>Sale of motor vehicle accessories\>0.67\>0.00\>0.00\>U\>54\\ 
293:   3\>furniture, household articles \>0.62\>3.15\>2.57\>U\>172\\ 
294:   3\>household appliances\>0.48\>1.00\>3.08\>U\>171\\ 
295: 
296:   4\>cosmetic and toilet articles\>1.00\>2.09\>2.57\>U\>98\\ 
297:   4\>Jewellery\>1.00\>5.85\>2.77\>U\>230\\ 
298:   4\>shoes\>1.00\>5.76\>2.43\>U\>178\\ 
299:   4\>textiles\>1.00\>2.39\>3.87\>U\>103\\ 
300:   4\>watches, clocks and jewellery\>1.00\>5.02\>2.77\>U\>92\\ 
301:   4\>clothing\>0.91\>5.10\>3.16\>U\>914\\ 
302:   4\>tableware\>0.83\>1.96\>2.43\>U\>183\\ 
303:   4\>opticians\>0.78\>1.98\>1.55\>U\>137\\ 
304:   4\>Other retail sale in specialized stores\>0.77\>1.51\>2.32\>U\>367\\ 
305:   4\>Other personal services \>0.41\>1.00\> \>U\>92\\ 
306:   4\>Repair of boots, shoes \>-0.18\>1.00\> \>U\>77\\ \\
307:  
308:   5\>second-hand goods \>0.97\>16.13\>3.52\>U\>410\\ 
309:   5\>framing, upholstery\>0.81\>1.67\> \>U\>135\\ 
310: 
311: \end{tabbing}
312: }
313: 
314: \vspace{.4cm}
315: 
316: \begin{thebibliography}{100}
317: 
318: \bibitem{data} Christophe Baume and Frederic Miribel (commerce
319:   chamber, Lyon) have kindly provided extensive location data for 8500
320:   stores of the city of Lyon.
321: 
322: \bibitem{pair} See for example, T. Egami and S. Billinge, {\it
323:     Underneath the Bragg Peaks : Structural Analysis of Complex
324:     Materials}, Pergamon Materials Series (2003)
325: 
326: \bibitem{duranton} G. Duranton and H. G. Overman, Review of Economic
327:   Studies (to be published, 2006), available at \begin{verbatim}
328:   http://158.143.49.27/~overman/research/nonrandom_final.pdf
329: \end{verbatim} (accessed
330:   Sept. 7th 2005).
331: 
332: \bibitem{puech} E. Marcon and F. Puech, to be published (2006),
333:   available at \begin{verbatim}
334:   http://team.univ-paris1.fr/teamperso/puech/textes/Marcon-Puech_ImprovingDistance-BasedMethods.pdf \end{verbatim},
335:   (accessed Sept. 7th 2005).
336: 
337: \bibitem{note_ave} 
338: One could argue that the average is dominated by the denser regions,
339: thus eliminating the influence of peripheral areas. This effect
340: exists, even if it is partially corrected through the ponderation by
341: the total number of stores. I have tried several other statistical
342: representation of the relative concentration, such as the mode or the
343: median, but none performed as well as the average. The median, for
344: example, fails because most $A$ stores have no $B$ stores around them,
345: leading to mostly null interaction coefficients.
346: 
347: \bibitem{note_dist} 
348: Alternatively, one can fully count stores closer than 50 m and linearly
349: decrease the counting coefficient until 150 m. This leads to similar
350: results.
351: 
352: \bibitem{weight} 
353: Important differences introduced by including weighted links are
354: stressed for example in M. Barthelemy, A. Barrat, R. Pastor-Satorras
355: and A. Vespignani, Physica A {\bf 346} 34 (2005)
356: 
357: \bibitem{note_stat} For a pair interaction to be significant, I demand
358:   that both $a_{AB}$ and $a_{BA}$ be different from zero, to avoid
359:   artificial correlations \cite{puech}.  For Lyon's city, I end up
360:   with 300 significant interactions (roughly $10 \%$ of all possible
361:   interactions), of which half are repulsive.
362: 
363: \bibitem{repulsion} While store-store attraction is easy to justify
364:   (the ``market share'' strategy, where stores gather in commercial
365:   poles, to attract costumers), direct repulsion is generally limited
366:   to stores of the same trade which locate far from each other to
367:   capture neighbor costumers (the ``market power'' strategy).  The
368:   repulsion quantified here is induced (indirectly) by the price of
369:   space (the sq. meter is too expensive downtown for car stores) or
370:   different location strategies. For introductory texts on retail
371:   organization ans its spatial analysis, see : B.J.L. Berry et al.
372:   {\it Market Centers and Retail Location: Theory and Application},
373:   Englewood Cliffs, N.J.: Prentice Hall (1988) and the Web book on
374:   regional science by E. M. Hoover and F. Giarratani, available at
375:   http://www.rri.wvu.edu/WebBook/Giarratani/contents.htm.
376: 
377: \bibitem{potts} J. Reichardt and S. Bornholdt, Phys. Rev. Lett. {\bf
378:     93} 218701 (2004).  Note that the presence of anti-links
379:   automatically ensures that the ground-state is not the homogeneous
380:   one, when all spins point into the same direction (i.e. all nodes
381:   belong to the same cluster). Then, there is no need then of a
382:   $\gamma$ coefficient here.
383: 
384: \bibitem{radicchi} F. Radicchi, C. Castellano, F. Cecconi, V. Loreto,
385:   and D. Parisi. Publ. Natl. Acad. Sci. USA, {\bf 101} 2658 (2004).
386: 
387: \bibitem{sa} S. Kirkpatrick, C.D. Gelatt Jr. and M. P. Vecchi, Science
388:   {\bf 220}, 671 (1983)
389: 
390: \bibitem{note_frust} A frustrated (A,B,C) triplet is one for which A
391:   attracts B, B attracts C, but A repels C, which is the case for the
392:   triplet shown in Fig. \ \ref{carte}.
393: 
394: \bibitem{modularity} M. E. J. Newman. and M Girvan, Phys. Rev. E {\bf
395:     69} 026113 (2004)
396: 
397: \bibitem{sic} See for example the U.S. Department of Labor Internet
398:   page : \begin{verbatim} http://www.osha.gov/pls/imis/sic_manual.html
399: \end{verbatim} (accessed Sep. $28^{th}$, 2005)
400: 
401: \bibitem{note_corr} To calculate the correlation of store and
402:   population density for a given activity, I count both densities for
403:   each of the 50 commercially homogeneous sectors of Lyon. I then test
404:   with standard econometric tools (see J. H. Stock and M. W. Watson,
405:   {\it Introduction to Econometrics}, Addison-Wesley, 2003) the
406:   hypothesis that store and population densities are uncorrelated
407:   (zero slope of the least squares fit), with a confidence interval of
408:   $80\%$.
409: 
410: \bibitem{note_hetero} Several retail categories defined by the
411:   Commerce Chamber are unfortunately heterogeneous : for example,
412:   ``Meat'' refers to the proximity butcher stores, but also to a big
413:   commercial pole of casher butchers who attract costumers from far
414:   away towns. ``Bookstores and newspapers'' refers to big stores
415:   selling books and CDs as well as to the proximity newspaper
416:   stand. Instead, bakeries are precisely classified in 4 different
417:   categories : it is a French commercial structure!
418: 
419: \end{thebibliography}
420: 
421: \begin{figure}
422: \centerline{
423: \epsfxsize=6cm
424: \epsfbox{carte-lyon.eps}}
425: \caption{\vspace{.3cm} 
426: (Color online) Map of Lyon showing the location of all the retail
427: stores, shoe stores, furniture dealers and drugstores}
428: \label{carte}
429: \end{figure}
430: 
431: \vspace{2cm}
432: 
433: \begin{figure}
434: \centerline{(a) \hspace{.1cm}
435: \epsfxsize=4cm
436: \epsfbox{0305a.eps}
437: (b) \hspace{.1cm}
438: \epsfxsize=4cm
439: \epsfbox{0305b.eps}}
440: \caption{ (Color online) The landscape defined by the quality index is closely
441:   correlated to the location decisions of bakeries. (a) The 19
442:   bakeries that closed between 2003 and 2005 had an average quality of
443:   $-2.2$ x $ 10^{-3}$ to be compared to the average of all bakeries ($4.6$
444:    x $10^{-3}$), the difference being signifcative with probability
445:   0.997). Taking into account the small number of closed bakeries and
446:   the importance of many other factors in the closing decision (family
447:   problems, bad management...), the sensitivity of the quality index
448:   is remarkable. (b) Concerning the 80 new bakeries in the 2005
449:   database (20 truly new, the rest being an improvement of the
450:   database), their average quality is $-6.8$ x $10^{-4}$, to be compared
451:   to the average quality of all possible sites in Lyon ($-1.6$ x
452:   $10^{-2}$), a difference significant with probability higher than
453:   0.9999).}
454: \label{0305}
455: \end{figure}
456: \end{document}
457: