0311:cs0311045/cs0311045

1: \batchmode

2: \documentclass{llncs}

3:

4: \usepackage{fancyvrb}

5: % \usepackage[round]{natbib}

6: \usepackage[dvipdfm]{graphicx}

7:

8: \bibliographystyle{plain}

9:

10: \title{Unsupervised Grammar Induction in a Framework of Information

11: Compression by Multiple Alignment, Unification and Search}

12:

13: \author{J Gerard Wolff}

14:

15: \institute{CognitionResearch.org.uk,\\

16: Telephone: +44(0)1248 712962,\\

17: Email: jgw@cognitionresearch.org.uk.}

18:

19: \begin{document}

20:

21: \maketitle

22:

23: \begin{abstract}

24:

25: This paper describes a novel approach to grammar induction that has been developed within a framework designed to integrate learning with other aspects of computing, AI, mathematics and logic. This framework, called {\em information compression by multiple alignment, unification and search} (ICMAUS), is founded on principles of Minimum Length Encoding pioneered by Solomonoff and others. Most of the paper describes SP70, a computer model of the ICMAUS framework that incorporates processes for unsupervised learning of grammars. An example is presented to show how the model can infer a plausible grammar from appropriate input. Limitations of the current model and how they may be overcome are briefly discussed.

26:

27: \end{abstract}

28:

29: \section{Introduction}

30:

31: This paper describes a novel approach to unsupervised grammar induction that has been developed within a research programme whose overarching goal is the {\em integration} of diverse functions---learning, recognition, reasoning and others---within one relatively simple framework. This has had a substantial impact on the way in which the learning processes are organised.

32:

33: The new framework called {\em information compression by multiple alignment, unification and search} (ICMAUS) originated in earlier research developing the SNPR model of grammar induction \cite{wolff_1988,wolff_1982}. Without supervision, the SNPR model successfully learns artificial context-free phrase-structure grammars (CF-PSGs) using a technique of `hierarchical chunking' combined with a search for disjunctive (part of speech) categories and processes for generalising grammatical rules and correcting over-generalisations.

34:

35: In the ICMAUS programme, the aim has been to match or exceed these capabilities within a system that has been generalised to model a range of other aspects of computing, AI, mathematics and logic. It became apparent at an early stage that this would mean a radical reorganisation of the SNPR model. In the ICMAUS framework a concept of {\em multiple alignment}---to be described---has replaced hierarchical chunking as the predominant mode of organisation. With this new orientation, the system provides an interpretation for concepts in computing, mathematics and logic and it has a range of AI capabilities described in \cite{wolff_icmaus_overview} and earlier papers cited there. The present paper describes how the system has been developed for unsupervised learning of grammars.

36:

37: A much fuller account of the research described here may be found in \cite{wolff_unsupervised_learning}, available from http://www.cognitionresearch.org.uk/papers/ul/ul.htm.

38:

39: \subsection{Relationship with Other Research on Grammar Induction}

40:

41: This research extends the tradition of distributional linguistics pioneered by \cite{harris_1951,fries_1952} and others.

42:

43: At the heart of ICMAUS system are principles of Minimum Length Encoding (MLE) pioneered by \cite{solomonoff_1964} (see also \cite{li_vitanyi_1997}). In this framework, grammar induction is conceived as a process of optimisation rather than a process of identifying a target grammar `in the limit' as postulated by \cite{gold_1967}. In the MLE framework, there is no target grammar, merely a process of searching for grammars that are `good' in terms of MLE principles.

44:

45: Recent studies that are, perhaps, most closely related to the present research include: \cite{adriaans_et_al_2000,allison_wallace_yee_1992,clark_2001,denis_2001,henrichsen_2002,johnson_reizler_2002,klein_manning_2001,nevill-manning_witten_1997,oliveira_sv_1996,rapp_et_al_1994,solan_etal_2002,van_zaanen_2002,van_zaanen_thesis_2002,watkinson_manandhar_2001}. Not all of these studies have adopted MLE principles but they deal with issues and processes that relate to the present research. The idea of combining learning with parsing---to be described---has also been developed by Nakamura (see \cite{nakamura_ishiwata_2000} and this workshop).

46:

47: Compared with other work on unsupervised learning of grammar-like structures, the most distinctive features of the ICMAUS research are:

48:

49: \begin{itemize}

50:

51: \item The integration of learning with other areas of AI, computation, mathematics and logic.

52:

53: \item \sloppy The multiple alignment concept as it has been developed in the ICMAUS framework, described below. There is, however, a clear affinity with `alignment-based learning' \cite{van_zaanen_2002,van_zaanen_thesis_2002}.

54:

55: \end{itemize}

56:

57: \section{The ICMAUS Framework}\label{ICMAUS_section}

58:

59: In the ICMAUS framework, {\em all} knowledge is stored as {\em patterns}: arrays of symbols in one or two dimensions.\footnote{In work to date, the focus has been on one-dimensional patterns.} Despite the simplicity of this format, it is possible within the ICMAUS system to represent several different kinds of knowledge including context-free and context sensitive grammars, networks, trees, if-then rules and others.

60:

61: Given the generality of this format for knowledge, the learning techniques described in this paper are relevant to the learning of {\em any} kind of knowledge, not just `grammars', narrowly conceived.

62:

63: The ICMAUS framework is intended as an abstract model of {\em any} kind of system for computing or cognition, either natural or artificial. In broad terms, the system works by receiving `New' information from its environment and transferring it to a repository of `Old' information. At the same time, it tries to compress the information as much as possible by finding patterns that match each other and merging or `unifying' patterns that are the same. In these broad terms it is similar to a ZIP program but it differs in the thoroughness of the search for `good' unifications of patterns and in the `multiple alignment' concept, to be described.

64:

65: \subsection{Multiple Alignment}\label{multiple_alignment_section}

66:

67: The concept of {\em multiple alignment} in the ICMAUS framework has been borrowed from the field of bio-informatics and adapted as described in \cite{wolff_icmaus_overview}.

68:

69: An example of an ICMAUS multiple alignment is shown in Figure \ref{alignment_1}. Row 0 contains the New pattern `o n e o f t h e m d o e s' and all the other rows contain Old patterns, one pattern per row. By convention, the New pattern is always shown in row 0 but otherwise the assignment of patterns to rows is entirely arbitrary.

70:

71: \begin{figure}[!hbt]

72: \fontsize{06.00pt}{07.20pt}

73: \begin{center}

74: \begin{BVerbatim}

75: 0                             o n e               o f            t h e m                d o e s   0

76:                               | | |               | |            | | | |                | | | |

77: 1                             | | |               | |   < N Np 0 t h e m >              | | | |   1

78:                               | | |               | |   | |              |              | | | |

79: 2                             | | |   < Q 0 < P   | | > < N              > >            | | | |   2

80:                               | | |   | |   | |   | | |                    |            | | | |

81: 3                             | | |   | |   < P 2 o f >                    |            | | | |   3

82:                               | | |   | |                                  |            | | | |

83: 4                             | | |   | |                                  |   < V Vs 1 d o e s > 4

84:                               | | |   | |                                  |   | | |            |

85: 5 S Num     ; < NP            | | |   | |                                  | > < V |            > 5

86:      |      | | |             | | |   | |                                  | |     |

87: 6    |      | | |    < N Ns 3 o n e > | |                                  | |     |              6

88:      |      | | |    | | |          | | |                                  | |     |

89: 7    |      | < NP 0 < N |          > < Q                                  > >     |              7

90:      |      |            |              |                                          |

91: 8   Num SNG ;            Ns             Q                                          Vs             8

92: \end{BVerbatim}

93: \end{center}

94: \caption{A multiple alignment with `o n e o f t h e m d o e s' in New and patterns representing grammatical rules in Old.}

95: \label{alignment_1}

96: \end{figure}

97:

98: Apart from the pattern in row 8, the patterns from Old in this example are like re-write rules in a CF-PSG with the re-write arrow omitted. If we ignore row 8, the alignment shown in Figure \ref{alignment_1} is very much like a conventional parsing, marking the main components of the sentence: words and phrases and the sentence pattern itself (shown in row 5).

99:

100: Row 8 shows how the `discontinuous' dependency that exists between the singular noun in the subject of the sentence (`Ns') and the singular verb (`Vs') can be marked within the alignment in a relatively direct manner. Despite the simplicity of the format for representing knowledge, the formation of multiple alignments enables the system to express `context sensitive' aspects of language and other kinds of knowledge.

101:

102: In each Old pattern there are two kinds of symbols: {\em ID-symbols} like `$<$', `N', `Np', `0' and `$>$' in `$<$ N Np 0 t h e m $>$' serve to identify the pattern and the remaining symbols (`t h e m' in this example) are {\em C-symbols} that represent the contents or substance of the pattern.

103:

104: Much more detail, with many more examples, may be found in \cite{wolff_2000}.

105:

106: \section{SP70}\label{SP70_section}

107:

108: All the main components of the ICMAUS framework outlined in Section \ref{ICMAUS_section} are now realised within the SP70 software model (version 9.2). The model is able to abstract plausible grammars from sets of simple sentences without prior knowledge of word segments or the classes to which they belong, and the computational complexity of the model appears to be acceptable (Section \ref{computational_complexity}). However, in its current form, the model has at least two significant shortcomings and some other deficiencies, discussed briefly in Section \ref{discussion_section}.

109:

110: \subsection{Objectives}

111:

112: In the development of this model, the main problems that have been addressed are:

113:

114: \begin{itemize}

115:

116: \item How to identify significant segments in the `corpus' of raw data when the boundary between one segment and the next is not marked explicitly.

117:

118: \item How to identify disjunctive classes of syntactically-equivalent segments (e.g., `nouns', `verbs' and `adjectives').

119:

120: \item How to combine the learning of segmental structure with the learning of disjunctive classes.

121:

122: \item How to learn segments and disjunctive classes through two or more levels of abstraction.

123:

124: \item \sloppy How to generalize grammatical rules beyond the data and how to correct over-generalizations without feedback from a `teacher' or the provision of `negative' samples or the grading of the data from `easy' to `hard' ({\em cf.} \cite{gold_1967}).

125:

126: \end{itemize}

127:

128: Solutions to these problems were found in the SNPR model \cite{wolff_1988,wolff_1982} but, as noted earlier, the organisation of this model is quite unsuited to the wider goals of the present research---integration of diverse functions within one framework. The SP70 model (v.~9.2) provides solutions to the first three problems and partial solutions to the fourth and fifth problems. Further development is planned as indicated in Section \ref{discussion_section}, below.

129:

130: \subsection{Overall Structure of the Model}

131:

132: Figure \ref{SP70_figure} shows the high-level organisation of the SP70 model.

133:

134: \begin{figure}[!hbt]

135: \begin{center}

136: \fontsize{08.00pt}{09.60pt}

137: \begin{BVerbatim}

138: SP70()

139: {

140:      1 Read a set of patterns into New. Old is initially empty.

141:      2 Compile an alphabet of symbol types in New and, for each type,

142:           find its frequency of occurrence and the number of bits

143:           required to encode it (using the Shannon-Fano-Elias method).

144:      3 While (there are unprocessed patterns in New)

145:      {

146:           3.1 Identify the first or next pattern from New as the

147:                `current pattern from New' (CPFN).

148:           3.2 Apply the function CREATE_MULTIPLE_ALIGNMENTS() to

149:                create multiple alignments, each one between the

150:                CPFN and one or more patterns from Old.

151:           3.3 During 3.2, the CPFN is copied into Old, one symbol

152:                at a time, in such a way that the CPFN can be

153:                aligned with its copy but that any one symbol in

154:                the CPFN cannot be aligned with the corresponding

155:                symbol in the copy.

156:           3.4 Sort the alignments formed by this function in order

157:                of their compression scores and select the best

158:                few for further processing.

159:           3.5 Process the selected alignments with the function

160:                DERIVE_PATTERNS(). This function derives encoded

161:                patterns from alignments and adds them to Old.

162:      }

163:

164:      4 Apply the function SIFTING_AND_SORTING() to create one or

165:           more alternative grammars for the patterns in New, each

166:           one scored in terms of MLE principles. Each grammar is

167:           a subset of the patterns in Old.

168: }

169: \end{BVerbatim}

170: \end{center}

171: \caption{The organisation of SP70. The workings of the functions {\em create\_multiple\_alignments()}, {\em derive\_patterns()} and {\em sifting\_and\_sorting()} are explained in the text.}

172: \label{SP70_figure}

173: \end{figure}

174:

175: The function {\em create\_multiple\_alignments()} referred to in Figure \ref{SP70_figure} creates zero or more multiple alignments, each one comprising the current pattern from New (CPFN) and one or more patterns from Old. This function is essentially the same as the main component of the SP61 model, described quite fully in \cite{wolff_2000}. Readers are referred to this source for a more detailed description of how multiple alignments are formed in the ICMAUS framework.

176:

177: \subsection{Deriving Patterns from Alignments}\label{derive_patterns_section}

178:

179: In operation 3.5 in Figure \ref{SP70_figure}, the {\em derive\_patterns()} function is applied to a selection of the best alignments formed and, in each case, it looks for sequences of unmatched symbols within the alignment and also sequences of matched symbols.

180:

181: Consider the alignment shown in Figure \ref{alignment_2}. From an alignment like that, the function finds the unmatched sequences `g i r l' and `b o y' and, within row 1, it also finds the matched sequences `t h a t' and `r u n s'. With respect to row 1, the focus of interest is the matched and unmatched sequences of C-symbols---ID-symbols are ignored.

182:

183: \begin{figure}[!hbt]

184: \begin{center}

185: \begin{BVerbatim}

186: 0        t h a t g i r l r u n s   0

187:          | | | |         | | | |

188: 1 < %1 9 t h a t b o y   r u n s > 1

189: \end{BVerbatim}

190: \end{center}

191: \caption{A simple alignment from which other patterns may be derived.}

192: \label{alignment_2}

193: \end{figure}

194:

195: A copy of each of the four sequences is made, ID-symbols are added to each copy and the copy is added to Old. In addition, another `abstract' pattern is made that records the sequence of matched and unmatched patterns within the alignment. The result in this case is five patterns like those shown in Figure \ref{patterns_figure_1}.

196:

197: \begin{figure}[!hbt]

198: \begin{center}

199: \begin{BVerbatim}

200: < %7 12 t h a t >

201: < %9 14 b o y >

202: < %9 15 g i r l >

203: < %8 13 r u n s >

204: < %10 16 < %7 > < %9 > < %8 > >

205: \end{BVerbatim}

206: \end{center}

207: \caption{Patterns derived from the alignment shown in Figure \ref{alignment_2}.}

208: \label{patterns_figure_1}

209: \end{figure}

210:

211: It should be clear that the set of patterns in Figure \ref{patterns_figure_1} is, in effect, a simple grammar for the two sentences in Figure \ref{alignment_2}, with patterns representing grammatical rules in much the same style as those shown in Figure \ref{alignment_1}. The abstract pattern `$<$ \%10 220 $<$ \%7 $>$ $<$ \%9 $>$ $<$ \%8 $>$ $>$' describes the overall structure of this kind of sentence with slots that may receive individual words at appropriate points in the pattern.

212:

213: Notice how the symbol `\%9' serves to mark `b o y' and `g i r l' as alternatives in the middle of the sentence. This is a grammatical class in the tradition of distributional or structural linguistics (see, for example, \cite{harris_1951,fries_1952}).

214:

215: \subsection{Sifting and Sorting of Patterns}\label{sifting_and_sorting_section}

216:

217: In the example just shown, all the patterns derived from the alignment are `correct'. But in many cases, patterns that are derived in this way and added to Old are `wrong'. The wrong patterns are weeded out in the {\em sifting\_and\_sorting()} stage of processing (operation 4 in Figure \ref{SP70_figure}), where the system develops one or more alternative grammars for the patterns in New in accordance with MLE principles. Figure \ref{sifting_and_sorting_figure} shows the overall structure of the {\em sifting\_and\_sorting()} function.

218:

219: \begin{figure}[!hbt]

220: \begin{center}

221: \fontsize{08.00pt}{09.60pt}

222: \begin{BVerbatim}

223: SIFTING_AND_SORTING()

224: {

225:      1 For each pattern in Old, set its frequency of occurrence to 0.

226:      2 While (there are still unprocessed patterns in New)

227:      {

228:           2.1 Identify the first or next pattern from New as the CPFN.

229:           2.2 Apply the function CREATE_MULTIPLE_ALIGNMENTS() to

230:                create multiple alignments, each one between the CPFN

231:                and one or more patterns from Old.

232:           2.3 From amongst the best of the multiple alignments formed,

233:                select `full' alignments in which all the symbols of

234:                the CPFN are matched and all the C-symbols are

235:                matched in each pattern from Old.

236:           2.4 For each pattern from Old, count the maximum number of

237:                times it appears in any one of the full alignments

238:                selected in operation 2.3. Add this count to the

239:                frequency of occurrence of the given pattern.

240:      }

241:      3 Compute frequencies of symbol types and their encoding costs.

242:           From these values, compute encoding costs of patterns in

243:           Old and new compression scores for each of the full

244:           alignments created in operation 2.

245:      4 Using the alignments created in 2 and the values computed in

246:           operation 3, COMPILE_ALTERNATIVE_GRAMMARS().

247: }

248: \end{BVerbatim}

249: \end{center}

250: \caption{The organisation of the {\em sifting\_and\_sorting()} function. The {\em compile\_alternative\_grammars()} function is described in the text.}

251: \label{sifting_and_sorting_figure}

252: \end{figure}

253:

254: \subsubsection{Compiling a Set of Alternative Grammars}\label{compile_grammars}

255:

256: \sloppy A set of alternative grammars for the patterns in New that are good in terms of MLE principles are derived (in the {\em compile\_alternative\_grammars()} function) in operation 4 of Figure \ref{sifting_and_sorting_figure}. Each grammar is a subset of the patterns that have been added to Old during operation 3 of Figure \ref{SP70_figure}.

257:

258: The process of compiling good grammars is essentially a hill-climbing search through the abstract space of alternative grammars, trying to minimise $(G + E)$ for each grammar, where $G$ is the size of the given grammar (in bits) and $E$ is the size of all the New patterns (in bits) after they have been encoded in terms of the grammar. Minimising $(G + E)$ is, of course, the central idea in grammar induction using MLE principles. In what follows, $(G + E)$ is abbreviated as $T$.

259:

260: The grammars are built in stages, at first trying to minimise $T$ for the first New pattern alone, then trying to minimise $T$ for the first and second New pattern, followed by the first, second and third, and so on.

261:

262: \section{Computational Complexity}\label{computational_complexity}

263:

264: In a serial processing environment, the time complexity of SP70 is approximately O$(N^2)$ where $N$ is the number of patterns in New. In a parallel processing environment, the time complexity may approach O$(N)$, depending on how well the parallel processing is applied. In serial or parallel environments, the space complexity should be O$(N)$.

265:

266: The time complexity of the program may be improved when it has been developed, as envisaged, so that the New patterns are processed in batches, with a purging of Old between each batch to remove all patterns except those in the best grammar. In this case, the time complexity should be O$(N)$.

267:

268: \section{Example}\label{example_section}

269:

270: When New contains the eight sentences shown in Figure \ref{example_2_patterns}, the best grammar found by SP70 is the one shown in Figure \ref{example_2_grammar}.

271:

272: \begin{figure}[!hbt]

273: \begin{center}

274: \begin{BVerbatim}

275: t h a t b o y r u n s

276: t h a t g i r l r u n s

277: t h a t b o y w a l k s

278: t h a t g i r l w a l k s

279: s o m e b o y r u n s

280: s o m e g i r l r u n s

281: s o m e b o y w a l k s

282: s o m e g i r l w a l k s

283: \end{BVerbatim}

284: \end{center}

285: \caption{Eight sentences supplied to SP70 as New.}

286: \label{example_2_patterns}

287: \end{figure}

288:

289: \begin{figure}[!hbt]

290: \begin{center}

291: \begin{BVerbatim}

292: < %2 2 s o m e >

293: < %2 3 t h a t >

294: < %1 5 b o y >

295: < %1 6 g i r l >

296: < %3 4 r u n s >

297: < %3 7 w a l k s >

298: < 1 < %2 > < %1 > < %3 > >

299: \end{BVerbatim}

300: \end{center}

301: \caption{The best grammar (in terms of MLE principles) that is found by SP70 when New contains the eight sentences shown in Figure \ref{example_2_patterns}.}

302: \label{example_2_grammar}

303: \end{figure}

304:

305: \subsection{Intermediate Results}

306:

307: As the first phase of learning proceeds (operation 3 of Figure \ref{SP70_figure}), intermediate results are often much less tidy than the example shown in Section \ref{derive_patterns_section}. For example, when Old contains only the first pattern shown in Figure \ref{example_2_patterns}, the only alignment it can create is:

308:

309: \begin{center}

310: \begin{BVerbatim}

311: 0 t h a  t b o y r u n s         0

312:          |

313: 1 < %1 9 t h a t b o y r u n s > 1

314: \end{BVerbatim}

315: \end{center}

316:

317: Notice that the Old pattern (in row 1) is, in effect, {\em the same pattern} as the New pattern (in row 0) so it is not permissible to match `o' in the New pattern, for example, with `o' in the Old pattern because that would mean matching a given symbol with itself!

318:

319: From the alignment just shown, the program derives `bad' patterns like `$<$ \%3 14 t h a $>$', `$<$ \%4 18 b o y r u n s $>$' and `$<$ \%4 17 h a t b o y r u n s $>$' and these are added to Old. However, as later patterns are processed, the repository of Old patterns begins to accumulate enough patterns that are good in MLE terms so that it is able to create quite respectable looking parsings like this:

320:

321: \begin{center}

322: \fontsize{07.00pt}{08.40pt}

323: \begin{BVerbatim}

324: 0                    t h a t                g i r l                w a l k s     0

325:                      | | | |                | | | |                | | | | |

326: 1                    | | | |                | | | |   < %8 %36 926 w a l k s >   1

327:                      | | | |                | | | |   | |                    |

328: 2 < %10 220 < %7     | | | | > < %9         | | | | > < %8                   > > 2

329:             | |      | | | | | | |          | | | | |

330: 3           | |      | | | | | < %9 %18 215 g i r l >                            3

331:             | |      | | | | |

332: 4           < %7 208 t h a t >                                                   4

333: \end{BVerbatim}

334: \end{center}

335:

336: In the {\em sifting\_and\_sorting()} phase, all the `bad' patterns are discarded and the `good' patterns are cleaned up by removing unnecessary ID-symbols and renaming the retained ID-symbols in a tidy manner.

337:

338: \subsection{Values for $G$, $E$, $T$ and Compression}

339:

340: Figure \ref{plotting_table} shows changing values for $G$, $E$ and $T$ for the best grammar found (in terms of MLE principles) as successive patterns from New are processed in {\em compile\_alternative\_grammars()}. It is interesting to see that, as successive patterns are processed, progressively more compression is achieved, represented by the falling values for ($T$ / `original'), shown in the last column.

341:

342: \setlength{\tabcolsep}{2mm}

343: \begin{table}[!hbt]

344: \begin{center}

345: \begin{tabular}{| r | r | r | r | r | r |} \hline

346: \em Pattern & \em G &     \em E & \em T   &   \em Original &  \em Compression \\ \hline\hline

347: 1 &           7970.49 &   26.78 &     7997.27 &   7943.70 &       1.00 \\

348: 2 &           11085.38 &  191.29 &    11276.67 &  16569.42 &      0.68 \\

349: 3 &           14665.26 &  302.09 &    14967.35 &  25195.14 &      0.59 \\

350: 4 &           14665.26 &  397.57 &    15062.83 &  34502.87 &      0.44 \\

351: 5 &           17650.07 &  563.32 &    18213.39 &  42488.08 &      0.42 \\

352: 6 &           17650.07 &  713.75 &    18363.82 &  51155.30 &      0.36 \\

353: 7 &           17650.07 &  887.00 &    18537.07 &  59822.52 &      0.31 \\

354: 8 &           17650.07 &  1044.92 &   18694.99 &  69171.76 &      0.27 \\ \hline

355: \end{tabular}

356: \end{center}

357: \caption{Cumulative values (in bits) of $G$, $E$ and $T$ for the best grammar found as successive patterns from New are processed in {\em compile\_alternative\_grammars()}. For comparison purposes, the cumulative sizes of the original patterns (excluding ID-symbols) are shown in the `original' column and values for compression ($T$ / `original') are shown in the last column.}

358: \label{plotting_table}

359: \end{table}

360:

361: \section{Discussion}\label{discussion_section}

362:

363: \subsection{Evaluation}

364:

365: In accordance with the `looks-good-to-me' approach to the evaluation of grammar induction systems \cite{van_zaanen_thesis_2002}, the grammar shown in Figure \ref{example_2_grammar} looks like an appropriate grammar for the patterns shown in Figure \ref{example_2_patterns}.\footnote{A possible improvement might be a grammar that isolates the `s' in `r u n s' and `w a l k s' as a separate morpheme.} This may seem like a sloppy method of evaluation but it should not be forgotten that the human brain is, by a wide margin, the best learning system on the planet. This provides a justification for using human judgement of what does or does not `look good' as a means of evaluating the output of artificial learning systems. With any system that is sufficiently robust to be applied to realistic samples of natural language, then there is no alternative to (human) judgements about what is or is not a `correct' grammar for a given language or (human) conventions about how language is segmented into words. Statistical tests may be applied to establish whether or not there is a significant level of agreement between structures established by human judgement and the results of artificial learning \cite{wolff_1977,wolff_1980}.

366:

367: Notice that the use of a `target' grammar as a criterion of success (as in Gold's approach to learning \cite{gold_1967}) does not overcome the problem that, for any given language sample, there are many alternative grammars that are compatible with the sample and some are `better' than others.

368:

369: \subsection{Reorganisation Needed}

370:

371: The example in the previous section is good enough to show that the approach is sound but experiments with other examples have shown that the model suffers from two main weaknesses:

372:

373: \begin{itemize}

374:

375: \item Although the model in its current form can isolate basic segments and tie them together in an overall abstract structure, it is not good at finding intermediate levels of abstraction.

376:

377: \item In the development of the model to date, no attempt has been made to enable the system to detect discontinuous dependencies such as number dependency between the subject of a sentence and its main verb (as mentioned in Section \ref{multiple_alignment_section}). Although this kind of capability may seem like a refinement that we can afford to do without at this stage of development, a deficiency in this area seems to have an impact on the program's performance at an elementary level.

378:

379: \end{itemize}

380:

381: A possible solution to both problems is a reorganisation of the model so that learning is integrated even more closely with parsing. Recent work has shown that operation 2.2 in the {\em sifting\_and\_sorting()} function (Figure \ref{sifting_and_sorting_figure}) can be omitted---the multiple alignments from operation 3.2 in Figure \ref{SP70_figure} can be used instead. It is also envisaged that New patterns will be processed in batches and that, after each batch, {\em sifting\_and\_sorting()} will be applied and Old patterns that are not proving useful will be discarded.

382:

383: \section{Conclusion}

384:

385: SP70 is not yet an `industrial strength' system for unsupervised learning but I believe the framework has considerable potential and provides a sound basis for further development.

386:

387: A key attraction of this approach to learning is that the ICMAUS framework provides a unified view of a variety of issues in AI thus facilitating the integration of grammar induction with other aspects of intelligence. Given the generality of the framework, the learning techniques described here are relevant to the learning of {\em any} kind of knowledge, not just grammars.

388:

389: \section*{Acknowledgements}

390:

391: I am grateful to Pat Langley and Menno van Zaanen for constructive comments on the report on which this paper is based. The paper has also benefitted from comments and suggestions by an anonymous referee.

392:

393: \raggedright

394:

395: \begin{thebibliography}{10}

396:

397: \bibitem{adriaans_et_al_2000}

398: P.~Adriaans, M.~Trautwein, and M.~Vervoort.

399: \newblock Towards high speed grammar induction on large text corpora.

400: \newblock In V.~Hlav{\'a}{\"c}, K.~G. Jeffery, and J.~Wiedermann, editors, {\em

401:   SOFSEM 2000}, volume 1963 of {\em Lecture Notes in Computer Science}, pages

402:   173--186. Springer, 2000.

403:

404: \bibitem{allison_wallace_yee_1992}

405: L.~Allison, C.~S. Wallace, and C.~N. Yee.

406: \newblock Minimum message length encoding, evolutionary trees and

407:   multiple-alignment.

408: \newblock In {\em Proceedings of the Hawaii International Conference on Systems

409:   Science, HICCS-25}, January 1992.

410:

411: \bibitem{clark_2001}

412: A.~Clark.

413: \newblock Unsupervised induction of stochastic context-free grammars using

414:   distributional clustering.

415: \newblock In Daelemans and Zajac \cite{daelemans_zajac_2001}, pages 105--112.

416:

417: \bibitem{daelemans_zajac_2001}

418: W.~Daelemans and R.~Zajac, editors.

419: \newblock {\em Proceedings of CoNLL-2001 (at ACL-2001, Toulouse, France)},

420:   2001.

421:

422: \bibitem{denis_2001}

423: F.~Denis.

424: \newblock Learning regular languages from simple positive examples.

425: \newblock {\em Machine Learning}, 44(1/2):37--66, 2001.

426:

427: \bibitem{fries_1952}

428: C.~C. Fries.

429: \newblock {\em The Structure of English}.

430: \newblock Harcourt, Brace \& World, New York, 1952.

431:

432: \bibitem{gold_1967}

433: M.~Gold.

434: \newblock Language identification in the limit.

435: \newblock {\em Information and Control}, 10:447--474, 1967.

436:

437: \bibitem{harris_1951}

438: Z.~S. Harris.

439: \newblock {\em Methods in Structural Linguistics}.

440: \newblock University of Chicago Press, Chicago, 1951.

441:

442: \bibitem{henrichsen_2002}

443: P.~J. Henrichsen.

444: \newblock Grasp: Grammar learning from unlabelled speech corpora.

445: \newblock In D.~Roth and A.~{van den Bosch}, editors, {\em Proceedings of

446:   CoNLL-2002 (at COLING-2002)}, pages 22--28, 2002.

447:

448: \bibitem{johnson_reizler_2002}

449: M.~Johnson and S.~Riezler.

450: \newblock Statistical models of syntax learning and use.

451: \newblock {\em Cognitive Science}, 26:239--253, 2002.

452:

453: \bibitem{klein_manning_2001}

454: D.~Klein and C.~Manning.

455: \newblock Distributional phrase structure induction.

456: \newblock In Daelemans and Zajac \cite{daelemans_zajac_2001}.

457:

458: \bibitem{li_vitanyi_1997}

459: M.~Li and P.~Vit\'{a}nyi.

460: \newblock {\em An Introduction to Kolmogorov Complexity and Its Applications}.

461: \newblock Springer-Verlag, New York, 1997.

462:

463: \bibitem{nakamura_ishiwata_2000}

464: K.~Nakamura and T.~Ishiwata.

465: \newblock Synthesizing context free grammars from sample strings based on

466:   inductive cyk algorithm.

467: \newblock In {\em Proceedings of the International Colloquium on Grammatical

468:   Inference ({ICGI} 2000)}, pages 186--195, 2000.

469:

470: \bibitem{nevill-manning_witten_1997}

471: C.~G. Nevill-Manning and I.~H. Witten.

472: \newblock Compression and explanation using hierarchical grammars.

473: \newblock {\em Computer Journal}, 40(2/3):103--116, 1997.

474:

475: \bibitem{oliveira_sv_1996}

476: A.~L. Oliveira and A.~Sangiovanni-Vincentelli.

477: \newblock Using the minimum description length principle to infer reduced

478:   ordered decision graphs.

479: \newblock {\em Machine Learning}, 25(1):23--50, 1996.

480:

481: \bibitem{rapp_et_al_1994}

482: P.~E. Rapp, I.~D. Zimmerman, E.~P. Vining, N.~Cohen, A.~M. Albano, and M.~A.

483:   Jimenez-Montano.

484: \newblock The algorithmic complexity of neural spike trains increases during

485:   focal seizures.

486: \newblock {\em Journal of Neuroscience}, 14(8):4731--4739, 1994.

487:

488: \bibitem{solan_etal_2002}

489: Z.~Solan, E.~Ruppin, D.~Horn, and S.~Edelman.

490: \newblock Automatic acquisition and efficient representation of syntactic

491:   structures.

492: \newblock In {\em Proceedings of NIPS-2002}, 2002.

493:

494: \bibitem{solomonoff_1964}

495: R.~J. Solomonoff.

496: \newblock A formal theory of inductive inference. parts {I} and {II}.

497: \newblock {\em Information and Control}, 7:1--22 and 224--254, 1964.

498:

499: \bibitem{watkinson_manandhar_2001}

500: S.Watkinson and S.~Manandhar.

501: \newblock A psychologically plausible and computationally effective approach to

502:   learning syntax.

503: \newblock In Daelemans and Zajac \cite{daelemans_zajac_2001}.

504:

505: \bibitem{van_zaanen_thesis_2002}

506: M.~van Zaanen.

507: \newblock {\em Bootstrapping Structure into Language: {A}lignment-{B}ased

508:   {L}earning}.

509: \newblock PhD thesis, University of Leeds, Leeds, UK, January 2002.

510:

511: \bibitem{van_zaanen_2002}

512: M.~van Zaanen.

513: \newblock Implementing {A}lignment-{B}ased {L}earning.

514: \newblock In {\em Proceedings of the International Colloquium on Grammatical

515:   Inference ({ICGI} 2002); Amsterdam, the Netherlands}, September 2002.

516:

517: \bibitem{wolff_1977}

518: J.~G. Wolff.

519: \newblock The discovery of segments in natural language.

520: \newblock {\em British Journal of Psychology}, 68:97--106, 1977.

521: \newblock Copy: www.cognitionresearch.org.uk/lang\_learn.html\#wolff\_1977.

522:

523: \bibitem{wolff_1980}

524: J.~G. Wolff.

525: \newblock Language acquisition and the discovery of phrase structure.

526: \newblock {\em Language \& Speech}, 23:255--269, 1980.

527: \newblock Copy: www.cognitionresearch.org.uk/lang\_learn.html\#wolff\_1980.

528:

529: \bibitem{wolff_1982}

530: J.~G. Wolff.

531: \newblock Language acquisition, data compression and generalization.

532: \newblock {\em Language \& Communication}, 2:57--89, 1982.

533: \newblock Copy: www.cognitionresearch.org.uk/lang\_learn.html\#wolff\_1982.

534:

535: \bibitem{wolff_1988}

536: J.~G. Wolff.

537: \newblock Learning syntax and meanings through optimization and distributional

538:   analysis.

539: \newblock In Y.~Levy, I.~M. Schlesinger, and M.~D.~S. Braine, editors, {\em

540:   Categories and Processes in Language Acquisition}, pages 179--215. Lawrence

541:   Erlbaum, Hillsdale, NJ, 1988.

542: \newblock Copy: www.cognitionresearch.org.uk/lang\_learn.html\#wolff\_1988.

543:

544: \bibitem{wolff_2000}

545: J.~G. Wolff.

546: \newblock Syntax, parsing and production of natural language in a framework of

547:   information compression by multiple alignment, unification and search.

548: \newblock {\em Journal of Universal Computer Science}, 6(8):781--829, 2000.

549: \newblock Copy: www.jucs.org/jucs\_6\_8. Three articles that are the basis of

550:   this article may be obtained from www.iicm.edu/wolff/1998d1, d2, d3.

551:

552: \bibitem{wolff_unsupervised_learning}

553: J.~G. Wolff.

554: \newblock Unsupervised learning in a framework of information compression by

555:   multiple alignment, unification and search.

556: \newblock Technical report, CognitionResearch.org.uk, 2002.

557: \newblock Copy: http://uk.arxiv.org/abs/cs.AI/0302015 or

558:   http://www.cognitionresearch.org.uk/papers/ul/ul.htm.

559:

560: \bibitem{wolff_icmaus_overview}

561: J.~G. Wolff.

562: \newblock Information compression by multiple alignment, unification and search

563:   as a unifying principle in computing and cognition.

564: \newblock {\em Artificial Intelligence Review}, 19(3):193--230, 2003.

565: \newblock Copy: www.cognitionresearch.org.uk/papers/overview/overview.htm.

566:

567: \end{thebibliography}

568:

569: \end{document}

570: