cs0109039/wl4.tex
1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2: % This is the fourth paper on word length by J.C. and H.A. 	%
3: %								%
4: %   Compile with LaTeX2e (TeX -> LaTeX)				%
5: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
6: \documentclass[a4paper,12pt,twoside]{article}
7: \usepackage{graphicx,wl4-mydefs}
8: \begin{document}
9: \title{\bf\Large Testing for Mathematical Lineation in\\
10: Jim Crace's {\it Quarantine}\\
11: and T.~S.~Eliot's {\it Four Quartets}}
12: \author{{\large John Constable}\\
13: {\tt jbc12@cam.ac.uk}\\
14: {\it Magdalene College, Cambridge CB3 0AG, United Kingdom}\\
15: and\\
16: Hideaki Aoyama\\
17: {\tt aoyama@phys.h.kyoto-u.ac.jp}\\
18: \it Faculty of Integrated Human Studies\\
19: \it Kyoto University, Kyoto 606-8501, Japan}
20: \date{}
21: %Submitted to {\it the Journal of the Belgian Linguistics Society}%}
22: \maketitle
23: \begin{abstract}
24: The mathematical distinction between prose and verse may be detected in writings that are not apparently lineated, for example in T. S. Eliot's {\it Burnt Norton}, and Jim Crace's {\it Quarantine}. In this paper we offer comments on appropriate statistical methods for such work, and also on the nature of formal innovation in these two texts. Additional remarks are made on the roots of lineation as a metrical form, and on the prose-verse continuum. 
25: \end{abstract}
26: 
27: 
28: \section{Introduction}
29: In Aoyama and Constable (1999) and Constable and Aoyama (1999)
30: we have developed a technique, the $Q_n$  distribution, for detecting 
31: a mathematical distinction between prose and 
32: isometrically lineated text (verse) in English. For prose the $Q_n$  
33: distribution is flat, for isometrically lineated verse it is peaked
34: at the value of $n$ which equals the number of syllables in the normative 
35: line. Our major focus in that work was on
36: the theoretical value of such a distinction, while in this paper we 
37: will turn to look more closely at a point raised before only in passing
38: (see Constable and Aoyama (1999:515)), namely the possibility 
39: that the features we
40: have identified might be employed as a `test' or diagnostic for 
41: lineation, enabling an analyst to detect the presence of mathematically
42: lineated text where it is not visually represented on the page (syllabic verse
43: printed as prose for example)  or is concealed by the layout on the page, 
44: as for example when a sonnet 
45: is embedded in prose. We cautioned against the belief that our 
46: procedure would enable the detection of small passages of verse embedded in prose on the grounds that the effects of such lineation on the $Q_n$ 
47: distribution would be so small
48: that any peaks resulting would be swamped by random fluctuations in 
49: the rest of the distribution.
50: However, we suggested that for longer texts the case might be different, 
51: and our aim here is to examine
52: two works, T. S. Eliot's poem {\it Four Quartets} (1936-1943)  and Jim Crace's novel {\it Quarantine} (1997)
53: with a view to assessing the
54: reliability of the procedure, and developing statistical techniques for 
55: determining the strength of inferences
56: based on the computation of $Q_n$ distributions.
57: 
58: In concluding we offer further comments on the experimental goals of these two texts, the roots of lineation as a metrical form, and the prose-verse continuum.
59:  
60: 
61: \section{Mathematical Preliminaries}
62: The essential feature of the syllabic structure of 
63: English prose found by Aoyama and Constable (1999) is  
64: {\it random segmentation}; by which we mean that the
65: probability of having a word boundary after any syllable 
66: is constant.  
67: Mathematically, this implies that the probability of
68: the syllable length of the $k$-th word, $S_k$, being equal
69: to $S$ is given by a geometric distribution;
70: \begin{equation}
71: P\{S_k=S\}=q(1-q)^{S-1},
72: \label{eqn:geom}
73: \end{equation}
74: where $q$ is the probability of the occurrence of a
75: word boundary after any syllable, this latter probability being a parameter
76: which varies between data sets.
77: The reader might note that the right hand side of the probability
78: (\ref{eqn:geom}) is independent from $k$, that is, 
79: the probability distribution does not depend on where the
80: word is located in the whole article.
81: 
82: This property of random segmentation leads to the finding that
83: the expectation value of the number of syllables
84: of the $k$-th word is independent from $k$;
85: \begin{equation}
86: E\{S_k\}=s, 
87: \label{eqn:whitenoise1}
88: \end{equation}
89: where the mean syllable length $s$ is given by $s=1/q$, 
90: and the following results for the expectation value of 
91: the products of the syllable lengths of the $k$-th word and the $k'$-th word:
92: \begin{equation}
93: E\{(S_k-E\{S_k\})(S_{k'}-E\{S_{k'}\})\}=\delta_{k,k'}\Delta,
94: \label{eqn:whitenoise2}
95: \end{equation}
96: where the variance $\Delta$ is given by $\Delta=(1-q)/q^2$.
97: 
98: The finding noted above was made mainly through studies of
99: the $Q_n$ distribution, supplemented with studies of the 
100: correlations between word lengths.
101: In Constable and Aoyama (1999), we found by examining the $Q_n$ distributions that isometrically lineated verse shows systematic deviation from the above properties.
102: In investigating the properties of {\it Quarantine} and
103: {\it Four Quartets}, we find it useful to utilize two other
104: standard statistical tools, Fourier analysis and 
105: Correlation functions,
106: in addition to the $Q_n$ distribution.
107: In the following, we will define these three quantities 
108: to prepare for the analysis.
109: 
110: \subsection{Fourier analysis}
111: The Fourier component $\tilde{S}_m$ is defined by the following 
112: equation:
113: \begin{equation}
114: S_k=\frac1{\sqrt{K}}\sum_{m=0}^{K-1} \tilde{S}_m e^{2\pi i mk/K},
115: \label{fouriercom}
116: \end{equation}
117: where $K$ is the total number of words in the data set.
118: The Fourier component can be directly calculated by 
119: the following inversion formula;
120: \begin{equation}
121: \tilde{S}_m=\frac1{\sqrt{K}}\sum_{k=1}^K S_k e^{-2\pi i mk/K}.
122: \end{equation}
123: These Fourier components satisfy the following relations:
124: \begin{eqnarray}
125: \tilde{S}_{m+K}&=&\tilde{S}_m,\\
126: \tilde{S}_{K-m}&=&\tilde{S}^*_m.\label{eqn:symm}
127: \end{eqnarray}
128: 
129: If the data set is randomly segmented, 
130: the expectation values for the Fourier coefficients $\tilde{S}_m$
131: satisfy the following equation:
132: \begin{equation}
133: E\{\tilde{S}_m\}=\sqrt{K} s\, \delta_{m,0}, \quad
134: E\{|\tilde{S}_m|^2\}=\Delta+\frac{\,s^2}{K}\,\delta_{m,0}, \quad
135: \end{equation}
136: which can be proved by using 
137: Eqs.(\ref{eqn:whitenoise1}) and (\ref{eqn:whitenoise2}).
138: 
139: The Fourier analysis is sensitive to any
140: periodic structure in the data; if there is a periodicity with
141: a period of $\ell$ in the sequence $\{S_1, S_2, \cdots, S_K\}$,
142: the coefficient $\tilde{S}_{K/\ell}$ (or its absolute value) will 
143: be large compared to the other Fourier coefficients.
144: The degree of predominance depends on the strength of the
145: periodicity: If the periodicity is weak the predominance of 
146: $\tilde{S}_\ell$ will be weak.
147: 
148: \subsection{The Correlation Function of Word Length}
149: We will  now turn to a consideration of the correlation function:
150: \begin{equation}
151: G_\ell\equiv
152: \frac{E\{(S_k-E\{S_k\})(S_{k+\ell}-E\{S_{k+\ell}\})\}}
153: {E\{(S_k-E\{S_k\})^2\}}.
154: \end{equation}
155: where the subscript of $S$ is defined by modulus $K$, i.e.,
156: $S_{K+1}\equiv S_1$, and so on. Since the value of $K$ is typically of 
157: the order $10^3 \gg 1$, this does not greatly affect the value of $G_\ell$.
158: From Eq.(\ref{eqn:whitenoise2}), it is evident that the
159: randomly segmented data leads to the following:
160: \begin{equation}
161: G_\ell=\delta_{k,k'}
162: \label{deltakk}
163: \end{equation}
164: 
165: 
166: \subsection{The $Q_n$ distribution}
167: The probability $Q_n$ is defined as the probability
168: that a sequence of adjacent words has
169: the total number of syllables $n$.
170: To be precise, let us define $L_{n,k}$ to be the number
171: of occurrences that $k$ sequential words have $n$ syllables in 
172: total.  From this definition, it is evident that the
173: following identity is satisfied:
174: \begin{equation}
175: \sum_{n=1}^\infty L_{n,k}=K.
176: \end{equation}
177: This is because since there are $K$ words in the data, there are $K$ 
178: sequences. 
179: The $Q_n$ distribution is defined by the following:
180: \begin{equation}
181: Q_n\equiv \frac1K \sum_{k=1}^n L_{n,k}.
182: \end{equation}
183: The upper limit of the sum in the above is induced by the property
184: that $L_{n,k}=0$ for $n<k$, which follows from the fact
185: that any English word is at least one-syllable long.
186: 
187: An alternative, but equivalent, definition
188: of $Q_n$ is that it is the probability that a word boundary occurs 
189: $n$-syllables after a word boundary. 
190: In this sense, $Q_n$ may be called the word-boundary correlation function.
191: 
192: \section{Jim Crace {\it Quarantine}}
193: \subsection{Fourier analysis}
194: \begin{figure}[ht]
195: \begin{center}
196: \includegraphics[width=6cm]{qfourierj.eps}
197: \caption{The real part (upper plot) and the
198: imaginary part (lower plot) of the Fourier components
199: $\tilde{S}_m$ for {\it Quarantine} for $m=1$ to $80,009$.}
200: \label{fig:qfourier}
201: \end{center}
202: \end{figure}
203: The plot of the Fourier coefficients $\tilde{S}_m$
204: for {\it Quarantine} (with $K=80,010$ words)
205: is given in Fig.\ref{fig:qfourier}.
206: Due to the identity (\ref{eqn:symm}), the real part of
207: $\tilde{S}_m$ is symmetric around the middle-point $m=K/2$,
208: while the imaginary part of $\tilde{S}_m$ is anti-symmetric
209: around the middle-point.
210:  
211: \begin{figure}[ht]
212: \begin{center}
213: \includegraphics[width=11cm]{qfgaussj.eps}
214: \caption{The comparison between the Gaussian accumulated distribution
215: (\ref{eqn:gac}) and the behaviours of the
216: actual Fourier coefficients of {\it Quarantine}.}
217: \label{fig:qfgauss}
218: \end{center}
219: \end{figure}
220: It is apparent that the real part of $\tilde{S}_m$
221: has a significant peak denoted by the letter $A$
222: at the middle-point $m=K/2$, where $\tilde{S}_{K/2}=-2.662$.
223: No other structure is visible and the rest of the Fourier coeficients
224: are consistent with the white noise characteristics (\ref{eqn:whitenoise1}) 
225: and (\ref{eqn:whitenoise2}).
226: In fact, the average number of syllables in this section is
227: $s=1.303$, which leads to $\sqrt{\Delta}=0.628$.
228: This humber is consistent with the plotted result, as is apparent in Fig.\ref{fig:qfgauss}.
229: If the random segmentation is valid,
230: the central limiting theorem dictates that
231: the distribution of the Fourier coefficients $\tilde{S}_m$
232: for $m\ne0$ should follow the Gaussian distribution,
233: \begin{equation}
234: P_{\rm G}(s)=
235: \frac1{\sqrt{2\pi}\,\Delta}\exp\left[-\frac{s^2}{2\Delta}\right],
236: \end{equation}
237: where $s={\rm Re}(\tilde{S}_m), {\rm Im}(\tilde{S}_m)$.
238: In Fig.\ref{fig:qfgauss}, the solid curve is the accumulated distribution,
239: \begin{equation}
240: P(\ge s)\equiv \int_s^\infty P_{\rm G}(s)ds,
241: \label{eqn:gac}
242: \end{equation}
243: with the above value of $\Delta$.
244: The dots are plots of the positive ${\rm Re}(\tilde{S}_m)$
245: versus its rank (the largest being 1, next 2, and so on) devided by
246: the total number of the positive ${\rm Re}(\tilde{S}_m)$.
247: The open circles are similar plots for the negative ${\rm Re}(\tilde{S}_m)$.
248: Within statistical accuracy the dots and open circles should follow
249: the solid curve if the Gaussian distribution applies.
250: In Fig.\ref{fig:qfgauss} we see that almost all the points are close to the solid curve, except for the isolated point $A$ at the 
251: far right, which is the peak $A$ at $m=K/2$ in Fig.\ref{fig:qfourier}.
252: The fact that this point $A$ is far above the theoretical curve 
253: suggests that it is very unlikely that this value of $A$ 
254: is achieved simply by a statistical accident.
255: This can be explained as follows.
256: At this point, $s=2.662$, we find that the accumulated probability
257: distribution is $P(\ge s)=2.037\times10^{-9}$, which means that
258: the probability that a point exists beyond this value of $s$ is
259: $2.037\times10^{-9}$. On the other hand, there are 19,861
260: negative ${\rm Re}(\tilde{S}_m)$ and the point $A$ exist at this $s$.
261: So the ``measured probability" is $1/19,861=5.035\times10^{-5}$, which
262: is the vertical coordinate of this point.
263: Therefore, the existence of this point $A$ simply by accident
264: is quite unlikely (by the probability $4.045\times10^{-5}$).
265: Putting this in another way, we may argue as follows:
266: Since we have $k=K/2$ number of points, the expected number of points
267: beyond this value of $s$ is equal to $P(\ge s)\times (K/2)=
268: 4.07\times10^{-5}$.  This should be contrasted with the 
269: existence of a single point $A$, which leads to a conclusion that
270: the existence of $A$ by chance is almost impossible.
271: This argument establishes that the peak $A$ in Fig.\ref{fig:qfourier}
272: is not an accident of statistical fluctuation, but is a real effect.
273: 
274: The existence of the peak $A$ shows that there is a strong period of 
275: 2 in {\it Quarantine}.  This is readily seen from the fact that
276: the contribution of
277: the $m=K/2$ term in the expression of the Fourier series (\ref{fouriercom})
278: is $(-1)^m\tilde{S}_{K/2}$.
279: Since $\tilde{S}_{K/2}$ is negative, we find that throughout 
280: {\it Quarantine}, words in even positions (2nd, 4th, 6th, $\cdots$ words)
281: tend to be shorter than those in odd positions. That is to say, there is a tendency for shorter and longer words to alternate in Crace's text. The relationship between this fact and the rhythmical patterning and consequent lineation noted below may be easily guessed at, but has yet to be examined in any scrupulous way.
282: 
283: 
284: \subsection{The Correlation Function}
285: The result is plotted in Fig.\ref{fig:qcorr}, where it will be seen 
286: that there is no apparent structure and the result is 
287: consistent with the prediction (\ref{deltakk}) from the random segmentation
288: hypothesis. 
289: The only deviation from (\ref{deltakk}) is the slight deviation
290: at $\ell=2$.  We find that this is qualitatively consistent with the analysis
291: done for the $Q_2$ dip in English prose by Aoyama and Constable (1999).
292: \begin{figure}[ht]
293: \begin{center}
294: \includegraphics[width=11cm]{qcorr.ai}
295: \caption{The correlation function $G_\ell$ for {\it Quarantine}.}
296: \label{fig:qcorr}
297: \end{center}
298: \end{figure}
299: 
300: \subsection{The $Q_n$ distribution}
301: \begin{figure}[ht]
302: \begin{center}
303: \includegraphics[width=11cm]{qqn.ai}
304: \caption{The $Q_n$ distribution for {\it Quarantine}.}
305: \label{fig:qqn}
306: \end{center}
307: \end{figure}
308: Fig.\ref{fig:qqn} shows the $Q_n$ distribution for {\it Quarantine}. The peaks at $n=4$, 6, 8, 10, and so on, are sufficiently evident. It should also be noted that the value for $n=2$ is not, as might be thought, inconsistent with these results. In fact a dip at $n=2$ is normal in English prose (see Aoyama and Constable 1999), and that found in {\it Quarantine} proves to be rather smaller than would otherwise be expected. 
309: 
310: \begin{figure}[ht]
311: \begin{center}
312: \includegraphics[width=13cm]{pq.ai}
313: \caption{The probability distributions of word length (dots) and the 
314: geometric distribution (solid line) for
315: all of the prose data in Aoyama and Constable (1999) and {\it Quarantine}.
316: The geometric distributions were defined so that the average number
317: of syllables per word agrees with the actual distributions for 
318: each case.}
319: \label{fig:pq}
320: \end{center}
321: \end{figure}
322: 
323: In other words what appears to be a puzzling dip at $n=2$, where one might naively expect a peak relating to those at multiples of two, is in fact a smaller than usual depression. 
324: The underlying causes of this reduced depression appear to be an enhancement of disyllables, as can be seen by comparing the word length distribution of Crace's text with both an ideal geometric distribution and the data from a large sample of English prose (See Fig.\ref{fig:pq}).
325: Crace uses slightly more disyllables than would be predicted from the geometric distribution, whereas the prose corpus texts employ slightly less.
326: 
327: \section{T. S. Eliot \it Four Quartets}
328: \subsection{Fourier coefficients}
329: We have found that the Fourier coefficients $\tilde{S}_m$
330: for any of the four sections
331: have no significant structure, unlike {\it Quarantine}.
332: They are all consistent with 
333: the white noise characteristics (\ref{eqn:whitenoise1}) 
334: and (\ref{eqn:whitenoise2}).
335: 
336: \subsection{The Correlation Function}
337: The result is plotted in Fig. \ref{fig:fcorr}, where it will be seen 
338: that there is no apparent structure.  
339: \begin{figure}[ht]
340: \begin{center}
341: \includegraphics[width=11cm]{fcorr.ai}
342: \caption{The correlation function $G_\ell$ for each of the sections.}
343: \label{fig:fcorr}
344: \end{center}
345: \end{figure}
346: 
347: \begin{figure}[ht]
348: \begin{center}
349: \includegraphics[width=11cm]{fqn.ai}
350: \caption{The $Q_n$ plot of the four sections.}
351: \label{fig:fqn}
352: \end{center}
353: \end{figure}
354: 
355: \subsection{The $Q_n$ distribution}
356: Fig.\ref{fig:fqn} shows these $Q_n$ distributions 
357: for each of all four sections of {\it The Four Quartets}.
358: As is apparent in this plot,  {\it Burnt Norton} alone
359: has a marked peak,  at $n=4$, a fact which suggests that a substantial 
360: part of it is composed in units of four syllables.\footnote{Similar peaks
361: are observed for isometrically lineated verse. See Constable
362: and Aoyama (1999).}
363: Of course due to the randomness of the original syllable distribution,
364: one might observe a similar peak by chance. Some care is required in
365: determining whether this $Q_4$ peak is the result of authorial 
366: compositional ordering, or a simple accident.
367: We can determine the statistical significance of the peak $Q_4$ 
368: by calculating the average 
369: ($\bar Q$) and the standard deviation ($\sigma_Q$)
370: of $Q_n$ from $n=1$ to 200 . For the {\it Burnt Norton} section 
371: we find the following values:
372: \begin{eqnarray}
373: \bar{Q}&\equiv&\frac1N \sum_{n=1}^N Q_n=0.6914,\cr
374: \sigma_Q&\equiv&\frac1N \sqrt{\sum_{n=1}^N (Q_n-\bar{Q})^2}=0.007219,
375: \end{eqnarray}
376: where $N=200$.  
377: The value of the $Q_4$ in this section is,
378: \begin{equation}
379: Q_4=0.7218\simeq\bar{Q}+4.2\,\sigma_Q,
380: \end{equation}
381: showing that this is a ``4$\sigma$ effect".
382: 
383: \begin{figure}[ht]
384: \begin{center}
385: \includegraphics[width=11cm]{gauss.ai}
386: \caption{The log-linear plot of the accumulated probability density
387: $P(\ge Q)$ (dashed line) and the distribution of the
388: measured $Q_n$ $(n=1\sim1000)$ for the {\it Burnt Norton} section.
389: The agreement with the Gaussian distribution is evident, 
390: except for $Q_4$.}
391: \label{fig:gauss}
392: \end{center}
393: \end{figure}
394: Another way to guage the significance of the $Q_4$ peak is to make a plot similar to Fig.\ref{fig:qfgauss}.
395: The probability of such a large deviation from the mean
396: value $\bar{Q}$ is given by the accumulated
397: probability $P(\ge Q)$, which is defined as the probability
398: that a value larger than or equal to $Q$ is observed;
399: \begin{equation}
400: P(\ge Q)\equiv\int_Q^\infty P_{\rm G}(Q')\,dQ',
401: \label{eqn:gauss}
402: \end{equation}
403: where $P_{\rm G}(Q)$ is the Gaussian distribution;
404: \begin{equation}
405: P_{\rm G}(Q)=\frac1{\sqrt{2\pi}\,\sigma_Q}
406: \exp\left[-\frac{(Q-\bar{Q})^2}{2\sigma_Q^2}\right].
407: \end{equation} 
408: For $Q_4=0.7218$, we find that $P(\ge Q_4)=1.3\times 10^{-5}$.
409: Straightforwardly, the probability of observing as large a peak 
410: as that for $Q_4$ simply by accident is about $1.3\times 10^{-5}$.
411: This situation is shown graphically in Fig.\ref{fig:gauss},
412: which is similar to Fig.\ref{fig:qfgauss}.
413: In this figure, the accumulated probability density
414: $P(\ge Q)$ defined in Eq.(\ref{eqn:gauss}) is shown by the dashed line
415: and the actual distribution of $Q_n$ in this section by dots.
416: The agreement between the continuous Gaussian distribution
417: with the actual measured value is excellent, {\it except for $Q_4$}.
418: This both justifies the use of the Gaussian distribution above, 
419: and visualizes the abnormality of the $Q_4$ peak.
420: 
421: This result is consistent with the lack of any structure in
422: the Fourier coefficients $\tilde{S}_n$, since the latter are
423: only sensitive to periodic structures involving the syllable numbers
424: $\{S_1, S_2, \cdots, S_K\}$, and would be ideal for studying 
425: such structures. That is to say, if the four syllable units
426: appear in combination and in a periodic manner, 
427: the Fourier coefficients would make their existence evident. However, 
428: four syllable units, may appear in various combinations,
429: of 1 and 3 syllables, 2+2, 1+1+2, and so on; furthermore, they may appear randomly in the section.
430: Both of these facts would disrupt the periodic structure, and
431: render the Fourier analysis useless.
432: The same is true for the correlation function $G_\ell$.
433: On the other hand, our word-boundary correlation function $Q_n$ is by definition
434: sensitive to the existence of the four syllable units
435: even in this situation. 
436: 
437: \section{Conclusion and Further Comments}
438: 
439: We have shown that the $Q_n$ computation is a sound and fairly sensitive register of one fundamental feature of lineated text, and complementary in some respects to two other statistical methods. In concluding what has been so far a methodological paper we will return to the language materials under consideration and offer several brief comments on the significance of the $Q_n$ peaks detected in the {\it Four Quartets} and {\it Quarantine}. 
440: 
441: \subsection{T. S. Eliot, {\it Four Quartets}}
442: The metrical status of Eliot's {\it Four Quartets} has been much debated, notably in Cooper (1998). $Q_n$ analysis will not resolve all aspects of this debate, but  does contribute significantly in regard to lineation. Our findings are that with the exception of {\it Burnt Norton} the {\it Quartets} are not mathematically lineated. In this we are in basic agreement with Cooper, who carried out an exhaustive rhythmical scansion. We supplement his remarks by observing that whilst the later three {\it Quartets} appear on reading to be to some degree rhythmically regular (and there are passages which are very obviously composed in isometric lines), {\it Burnt Norton} alone is mathematically lineated overall, the basic mathematical line being a segment of four syllables. It seems likely that this results from a two beat duple segment, running as follows: offbeat, beat, offbeat, beat. It should be noted from the $Q_n$ distribution that there is no subsequent peak at {\it n} = 8, suggesting that while these four syllable segments are frequent, they are not so frequently adjacent as to produce an eight syllable mathematical line. This is surprising, and suggests deliberate avoidance on Eliot's part, as if he were unwilling to let his rhythms move too close to the familiar four beat octosyllabic unit.
443: 
444: It seems possible that the poem as it survives traces an experiment with rhythmical patterning, and that Eliot's neglect of the 4 syllable line in the subsequent sections is a refinement of method, and the adoption of a revised compositional principle which permitted the production of the desired rhythmical effects without relying on clear echos of regular metrical structures. {\it Burnt Norton} is an early attempt to employ rhythmical metre in a form which is looser in its lineation structure than standard verse. As it happens it is still mathematically lineated. The subsequent three Quartets reveal further and more adventurous developments of this experiment. It would be interesting to know whether this anticipates in any way the character of the metre employed in Eliot's later plays, and we note this as being a topic for future research.
445: 
446: \subsection{Jim Crace, {\it Quarantine}}
447: Crace's writing, certainly the more recent books, has been very widely regarded as of a type differing from standard prose. The critic Frank Kermode has noted of {\it Quarantine} that it is from `the end of the fiction spectrum where the novel is most like a poem, most turned in on itself, most closely wrought for the sake of art and internal cohesion' (Kermode 1998). John Banville, discussing another of Crace's novels, {\it Being Dead}, has even pointed out that technically the prose often uses verse fragments, and that much of this book is `written in a kind of broken blank verse, and indeed could be successfully laid out in verse form', adding that Crace is `particularly fond of iambic pentameter' 
448: (Banville 2000). 
449: 
450: Certainly, {\it Quarantine} is readily recognized as being rhythmically more regular than ordinary prose, but there are no visually salient lines, and even careful examination during reading fails to find any convincing or consistent division. Although it is possible, as Banville says of {\it Being Dead}, to find occasional lines, even short sequences, it is difficult to successfully display any sizeable piece of the text as isometric verse. Casual examination can make no further progress. However, the $Q_n$ procedure is sufficiently sensitive to detect lineation even when distributed in small and isolated packets, and of {\it Quarantine} it reveals that syllabic groups of two, four, six (which is, curiously, less prominent), eight, ten, and subsequent multiples of two are significantly more common than they would be in standard prose. {\it Quarantine} is non-randomly segmented, and even though it does not employ a core isometric line length, and its `lines' do not follow on one from another, it is still, and in a novel and important sense, {\it lineated}.
451: 
452: The fact that there is no particular preference for one line length over another, that is no preference, say, for decasyllabic groups over octosyllabic groups, is of very great interest, and suggests that Crace did not set out to write in lines and then to conceal them by printing his work with a prose layout. Instead we suspect that the lineation detected by the $Q_n$ procedure arises as a byproduct of a deliberate organisation of the rhythmical patterning of the text.
453: 
454: The character of this general organisation can be inferred from the $Q_2$ data. As noted above (see Section 3.3), although apparently anomalous, because it seems to register a depression rather than a peak, this data point is in fact consistent with the subsequent peaks at multiples of two, since the depression in Crace's text is actually smaller than expected. The fundamental cause of this, as noted in our discussion above, is simply that Crace has employed slightly more disyllabic words than would be expected in standard prose, and thus we may conclude that lineation arises in Crace's text from a predisposition to employ disyllabic words so as to facilitate the construction of alternating rhythms.
455: 
456: As far as we know Crace has not commented in public on his reasons for employing this structuring. If asked he would perhaps reply, taking up terms similar to those used by Kermode above, that he felt it gave his work a self-supporting and self-consciously artistic structural brace that unpatterned prose did not possess. While this might be a satisfactory proximal psychological explanation, we suspect that a deeper account of the resulting experiences for both reader and writer is available and may more adequately account for the appeal of this formal device. Anyone who has read Crace's work will agree with John Updike that it has a `hallucinatory' quality (Updike 1999). That this is not readily accounted for by reference to the facts of the narrative is worth remark, and is  a quality which recalls the character of much of the most successful poetry.
457: 
458: The exact consequences of the variety of mathematical lineation found in {\it Quarantine} are yet to be fully understood, but it is so far clear that at least some of the previous restrictions noted in regard to isometrically lineated text also apply to Crace's composition. The peaked $Q_n$ distribution indicates that some degree of syntactic distortion occurs, and we have seen above that dictional distortion is present in the enhancement of disyllabic words. It is also possible that even though he was not composing deliberately in a single isometrical line, but varying his segment length, that some reduction in mean word length has taken place, in order to increase the syntactical options (a known consequence of isometric lineation: see Constable and Aoyama 1999). Both these distortions have been discussed by Constable (1998) within his Disruption Theory, which proposes that readers attempting to process text which is evenly and subtly disrupted can neither reject it as damaged nor successfully interpret it, and consequently experience an illusion of profundity (see also the sketch in Constable and Aoyama (1999)). However, it seems almost certain that with the variety of lineation employed in Crace's {\it Quarantine} these disruptions are less marked. However, it should be noted that a further source of disruption, that caused by rhythmical patterning, may well be present, namely an alteration in the frequency of stressed and unstressed syllables leading to an increase in the frequency of content terms. This disruption is also found in standard duple verse, but in Crace's text it may be the principal source of disruption, with syntactical and word length distortions playing a much smaller part. The tendency to use more disyllables is relevant here, though its dictional effects, on the frequency of the various parts of speech for example, are at present unknown.
459: 
460: On such a view the text is experimental in the sense that it varies the proportions of the disruptive influences in order to produce new experiences for readers. It might be seen as an attempt to provide prose with some of the richness of effect familiar from isometric verse without suffering to the same degree its disablements, and should be compared with experimental works such as Eliot's {\it Four Quartets} which approach the problem from the other side, and beginning from a verse base move towards the freedom of prose without losing the benefits of disruption (see Constable (1998) for a discussion of the decline of isometric verse and the rise of prose in terms of disruption theory).
461: 
462: \subsection{The Roots of Lineation and Formal Innovation}
463: Comparison of these two texts sheds light on the relationship between mathematical lineation and rhythmical organisation. In one case, {\it Four Quartets}, we have rhythmical organisation with and without mathematical lineation, whereas in {\it Quarantine} we find both mathematical lineation and rhythmical organisation, but with lineation only appearing as an unconscious byproduct of the attempt to create regular rhythmical effects.
464: 
465: This may shed some light on the roots of visible and salient lineation as a metrical rule, since we can see that even when an author is not aware of lineation as a constraint, which is arguably the case in preliterate composers, lineation may arise as a byproduct of rhythmical regularity. What we have in Crace is some evidence suggesting that lineation is a likely, indeed a very probable outcome of rhythmical organisation. We hypothesize that the roots of lineation as a metrical restriction are deep, at least in English, and that mathematical lineation may be coeval with rhythmical patterning. Its extraction and development into a separately understood metrical object, the line, is likely to have come much later. What is particularly interesting is that while lineation should have been recognized as independent of rhythm (i.e. in syllabic verse), there has been no parallel recognition of regular rhythmical patterning without isometric lineation.
466: 
467: But, and this is the most interesting point arising from Eliot's texts, lineation would appear to be unnecessary to some degree of rhythmical organisation, and with care it seems possible that it may be avoided. We speculate that Eliot's experimental work in {\it Four Quartets} is not an isolated example, and that others may be found in Seventeenth Century Drama, particularly in Shakespeare. Nevertheless, even if such antecedents are found it seems likely that regularly rhythmical but unlineated text is a largely unexplored technical possibility, one which does not come about by chance with any great frequency and one which is difficult to achieve deliberately, certainly without detailed knowledge. It is also possible that given other features of English, namely those relating to word length, part of speech frequency, and stress, that mathematical lineation is an all but unavoidable consequence of definite rhythmical patterning if noticeably unnatural output is to be avoided. Further theoretical work is expected to permit more conclusive comments on this matter.
468: 
469: Interesting though {\it Quarantine} is, our tests show that there is still room for further experimentation in this vein, and the abstract understanding of lineation offered in this paper and our previous papers, may go some way to assisting writers in devising composition methods which further minimize lineation. The psychological effects of text of this type are quite unknown, but it is conceivable that a diction somewhat distorted by the construction of rhythmical regularity would combine with very well with a syntax undisrupted by non-random word length ordering, and perhaps produce a formal constraint as vigorous and attractive to readers as isometric verse once was.
470: 
471: \subsection{The Prose-Verse Continuum}
472: The characters of Eliot's {\it Quartets} and Crace's {\it Quarantine} as revealed here both act as an incitement to reconceive the relationship between verse and prose. Colloquial and academic literary critical discussion, has generally treated `verse' and `prose' as intuitively obvious exclusive categories separated by an ill-defined grey area. This essentialist tendency has been encouraged by the binary leanings of generative metrics, and our own work on the computable distinction between prose and verse might even be seen as reinforcing this habit. However, there is nothing in the facts of the matter which obliges us to conceive of prose and verse in this rigid way, and we believe that careful consideration of our theoretical research will show that there are very good reasons for taking up an alternative conception, namely that what we think of as {\it typical verse} and {\it typical prose} are {\it distinguishable clusters of texts positioned on a continuum}.
473: 
474: Underlying such a view is the fundamental insight that while isometric verse involves an ordering of word length this is not a feature which is either absolutely present or absent, but one that varies in degree between texts. Thus we may say that there is a continuum between random disorder of word length on one side and maximal order on the other. From an abstract theoretical perspective we can see that any particular text may lie at some point along this continuum, but practically speaking there are considerable difficulties in the way of assigning it to a precise position. However, this need not deter us from recognizing the utility of the continuum as a conception, and in this light the difficulties of text assignment appear as technical challenges rather than as obstacles to accepting the continuum in the first place.
475: 
476: A key issue in addressing this matter will be a refined understanding of word length ordering. All the prose texts we have so far studied in fact exhibit a very small degree of order in the fine structure of the word length profile (see Aoyama and Constable (1999), and Constable and Aoyama (1999)), and so seem to lie at some distance from the theoretical prose pole, while at the other extreme no text so far examined is without randomness in its word length characteristics, and thus lies at some distance from the theoretical verse pole. In between we can be fairly confident that some texts we have studied lie closer to the theoretical maximum of order, such as Longfellow's {\it Hiawatha}, which has a very marked repeating structure in its $Q_n$, while others seem to lie closer to the disordered pole, such as most blank verse by Wordsworth, whose normative line length peaks are modest in size and whose multiple peaks diminish rapidly as a result of weak long range correlation (i.e. high frequency of variant lines). The location of a text such as Crace's {\it Quarantine} is more difficult to determine. As noted above, while it exhibits a repeating structure it has no major peak (i.e. no predominant line length), but instead a whole series of minor peaks. At present we are not in a position to distinguish definitively between the degree of order in such a text and that in, for example, a fundamentally isometric text such as Wordsworth's {\it Prelude}. It is tempting to suppose that {\it Quarantine} must lie closer to the prose end of the continuum, on account of its lack of concentrated order, but this, though convenient, may be no more than a residual effect of the exclusive categories that the conception of the continuum was introduced to replace. Satisfactory resolution of this matter will not be possible until some better method has been determined for remarking upon the degrees of order and disorder in a text's lineation structure, and it may be noted here in passing that the Fourier analysis will probably of use in this regard.
477: 
478: % Bibliography
479: \begin{thebibliography}{9}
480: \bibitem{wl1}
481: Aoyama, Hideaki and John Constable. 1999. 
482: ``Word Length Frequency and Distribution in English: 
483: Part I, Prose: Observations, Theory, and Implications". 
484: {\it Literary and Linguistic Computing} 14/3.339-358.
485: \bibitem{jb2000}
486: Banville, J. 2000.  
487: ``A Rare Species". 
488: {\it New York Review of Books}, 13 Apr. 2000 
489: (quoted from NYRB web site: 
490: http://www.nybooks.com/nyrev /WWWarchdisplay.cgi?20000413030R).
491: \bibitem{wl2}
492: Constable, John and Hideaki Aoyama. 1999.
493: ``Word Length Frequency and Distribution in English: Part II, 
494: An Empirical and Mathematical Examinations of the 
495: Character and Consequences of Isometric Lineation".
496: {\it Literary and Linguistic Computing} 14/4.507-536.
497: \bibitem{jc98}
498: Constable, John. 1998.
499: ``The Character and Future of Rich Poetic Effects".
500: {\it The View from Kyoto: Essays on Twentieth-Century Poetry} 
501: ed. by Shoichiro Sakurai, 89-108. Kyoto: Rinsen Books.
502: \bibitem{gbc98}
503: Cooper, G. Burns. 1998.
504: {\it Mysterious Music: rhythm and free verse}. Stanford: Stanford U.P.
505: \bibitem{fm98}
506: Kermode, F. 1998.
507: ``Review of Jim Crace, {\it Quarantine}". New York Times, 12 April 1998,
508: quoted from the {\it New York Times} web page: 
509: http://search.nytimes.com/books/search/).
510: \bibitem{ju99}
511: Updike, John. 1999.
512: {\it More Matter: essays and criticism} New York: Knopf.
513: \end{thebibliography}
514: \end{document}
515: 
516: