0605:cs0605024/ior.tex

1:

2: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

3: %%         A Formal Measure of Machine Intelligence          %%

4: %%            Shane Legg and Marcus Hutter (2005)            %%

5: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

6:

7: \documentclass[twocolumn]{article}

8: \usepackage[dvips]{graphicx}

9: \usepackage{latexsym,amssymb}

10: \topmargin=-15mm  \oddsidemargin=0mm \evensidemargin=0mm

11: \textwidth=16cm \textheight=24cm

12:

13: \def\OO{\mathcal O} \def\RR{\mathcal R} \def\UU{\mathcal U} \def\PP{\mathcal P}

14: \def\NNN{\mathbb N} \def\QQQ{\mathbb Q} \def\BBB{\mathbb B}

15: \def\E{\mathbf{E}}

16:

17: \begin{document}

18:

19: \title{\vspace{-3ex}\normalsize\sc Technical Report \hfill IDSIA-10-06

20: \vskip 2mm\bf\Large\hrule height5pt \vskip 6mm

21: A Formal Measure of Machine Intelligence

22: \vskip 6mm \hrule height2pt}

23: \author{{\bf Shane Legg} and {\bf Marcus Hutter}\\[3mm]

24: \normalsize IDSIA, Galleria 2, CH-6928\ Manno-Lugano, Switzerland\\

25: \normalsize \{shane,marcus\}@idsia.ch \hspace{9ex} http://www.idsia.ch/ }

26: \date{14 April 2006}

27: \maketitle

28:

29: \begin{abstract}

30: A fundamental problem in artificial intelligence is that nobody really

31: knows what intelligence is.  The problem is especially acute when we

32: need to consider artificial systems which are significantly different

33: to humans.  In this paper we approach this problem in the following

34: way: We take a number of well known informal definitions of human

35: intelligence that have been given by experts, and extract their

36: essential features.  These are then mathematically formalised to

37: produce a general measure of intelligence for arbitrary machines.  We

38: believe that this measure formally captures the concept of machine

39: intelligence in the broadest reasonable sense.

40: \end{abstract}

41:

42: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

43: \section{Introduction}

44: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

45:

46: Most of us think that we recognise intelligence when we see it, but we

47: are not really sure how to precisely define or measure it.  We

48: informally judge the intelligence of others by relying on our past

49: experiences in dealing with people.  Naturally, this naive approach is

50: highly subjective and imprecise.  A more principled approach would be

51: to use one of the many standard intelligence tests that are available.

52: Contrary to popular wisdom, these tests, when correctly applied by a

53: professional, deliver statistically consistent results and have

54: considerable power to predict the future performance of individuals in

55: many mentally demanding tasks.  However, while these tests work well

56: for humans, if we wish to measure the intelligence of other things,

57: perhaps of a monkey or a new machine learning algorithm, they are

58: clearly inappropriate.

59:

60: One response to this problem might be to develop specific kinds of

61: tests for specific kinds of entities; just as intelligence tests for

62: children differ to intelligence tests for adults.  While this works

63: well when testing humans of different ages, it comes undone when we

64: need to measure the intelligence of entities which are profoundly

65: different to each other in terms of their cognitive capacities, speed,

66: senses, environments in which they operate, and so on.  To measure the

67: intelligence of such diverse systems in a meaningful way we must step

68: back from the specifics of particular systems and establish the

69: underlying fundamentals of what it is that we are really trying to

70: measure.  That is, we need to establish a notion of intelligence that

71: goes beyond the specifics of particular kinds of systems.

72:

73: The difficulty of doing this is readily apparent.  Consider, for

74: example, the memory and numerical computation tasks that appear in

75: some intelligence tests and which were once regarded as defining

76: hallmarks of human intelligence.  We now know that these tasks are

77: absolutely trivial for a machine and thus do not test the machine's

78: intelligence.  Indeed even the mentally demanding task of playing

79: chess has been largely reduced to brute force search.  As technology

80: advances, our concept of what intelligence is continues to evolve with

81: it.

82:

83: How then are we to develop a concept of intelligence that is

84: applicable to all kinds of systems?  Any proposed definition must

85: encompass the essence of human intelligence, as well as other

86: possibilities, in a consistent way.  It should not be limited to any

87: particular set of senses, environments or goals, nor should it be

88: limited to any specific kind of hardware, such as silicon or

89: biological neurons.  It should be based on principles which are

90: sufficiently fundamental so as to be unlikely to alter over time.

91: Furthermore, the intelligence measure should ideally be formally

92: expressed, objective, and practically realisable.

93:

94: This paper approaches this problem in the following way.  In

95: \emph{Section \ref{sec:infint}} we consider a range of definitions of

96: human intelligence that have been put forward by well known

97: psychologists.  From these we extract the most common and essential

98: features and use them to create an informal definition of

99: intelligence.  \emph{Section \ref{sec:aef}} then introduces the

100: framework which we use to construct our formal measure of

101: intelligence.  This framework is formally defined in \emph{Section

102: \ref{sec:formframe}}.  In \emph{Section \ref{sec:ior}} we use our

103: developed formalism to produce a formal definition of intelligence.

104: \emph{Section \ref{sec:conc}} closes with a short summary.

105:

106: A preliminary sketch of the ideas in this paper appeared in the poster

107: \cite{Legg:05iors}.  It can be shown that the intelligence measure

108: presented here is in fact a variant of the Intelligence Order Relation

109: that appears in the theory of AIXI, the provably optimal universal

110: agent \cite{Hutter:04uaibook}.  A long journal version of this paper

111: is being written in which we give the proposed measure of machine

112: intelligence and its relation to other such tests a much more

113: comprehensive treatment.

114:

115: Naturally, we expect such a bold initiative to be met with resistance.

116: However, we hope that the reader will appreciate the value of our

117: approach: With a formally precise definition put forward we aim to

118: better our understanding of what is a notoriously subjective and

119: slippery concept.

120:

121: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

122: \section{The concept of intelligence}\label{sec:infint}

123: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

124:

125: Although definitions of human intelligence given by experts in the

126: field vary, most of their views cluster around a few common

127: perspectives.  Perhaps the most common perspective, roughly stated, is

128: to think of intelligence as being the ability to successfully operate

129: in uncertain environments by learning and adapting based on

130: experience.  The following often quoted definitions, which can be

131: found in \cite{Sternberg:00}, \cite{Wechsler:58}, \cite{Bingham:37}

132: and \cite{Gottfredson:97msoi}, all express this notion of intelligence

133: but with different emphasis in each case:

134:

135: \begin{itemize}

136:

137: \item ``The capacity to learn or to profit by experience.''

138: \mbox{--~W. F. Dearborn}

139:

140: \item ``Ability to adapt oneself adequately to relatively new situations in

141: life.''

142: --~R. Pinter

143:

144: \item ``A person possesses intelligence insofar as he has learned, or can

145: learn, to adjust himself to his environment.''

146: --~S. S. Colvin

147:

148: \item ``We shall use the term `intelligence' to mean the ability of an

149: organism to solve new problems\ldots.''

150: \mbox{--~W. V. Bingham}

151:

152: \item ``A global concept that involves an individual's ability to

153: act purposefully, think rationally, and deal effectively with

154: the environment.''

155: \mbox{-- D. Wechsler}

156:

157: \item ``Intelligence is a very general mental capability that, among

158: other things, involves the ability to reason, plan, solve problems,

159: think abstractly, comprehend complex ideas, learn quickly and learn

160: from experience.''

161: \mbox{--~L. S. Gottfredson and 52 expert signatories}

162:

163: \end{itemize}

164:

165: These definitions have certain common features; in some cases they are

166: explicitly stated, while in others they are more implicit.  Perhaps

167: the most elementary feature is that intelligence is seen as a property

168: of an entity which is interacting with an external environment,

169: problem or situation.  Indeed this much is common to practically all

170: proposed definitions of intelligence.  As we will be referring back to

171: these concepts regularly, we will refer to the entity whose

172: intelligence is in question as the \emph{agent}, and the external

173: environment, problem or situation that it faces as the

174: \emph{environment}.  An environment could be a large complex world in

175: which the agent exists, similar to the usual meaning, or something as

176: narrow as a game of tic-tac-toe.

177:

178: The second common feature of these definitions is that an agent's

179: intelligence is related to its ability to succeed in an environment.

180: This implies that the agent has some kind of an objective.  Perhaps we

181: could consider an agent intelligent, in an abstract sense, without

182: having any objective.  However without any objective what so ever, the

183: agent's intelligence would have no observable consequences.

184: Intelligence then, at least the concrete kind that interests us, comes

185: into effect when an agent has an objective to apply its intelligence

186: to.  Here we will refer to this as its \emph{goal}.

187:

188: The emphasis on learning, adaption and experience in these definitions

189: implies that the environment is not fully known to the agent and may

190: contain surprises and new situations which could not have been

191: anticipated in advance.  Thus intelligence is not the ability to deal

192: with one fixed and known environment, but rather the ability to deal

193: with some range of possibilities which cannot be wholly anticipated.

194: This means that an intelligent agent may not be the best possible in

195: any specific environment, particularly before it has had sufficient

196: time to learn.  What is important is that the agent is able to learn

197: and adapt so as to perform well over a wide range of specific

198: environments.

199:

200: Although there is a great deal more to this topic than we have

201: presented here, the above brief analysis gives us the necessary

202: building blocks for our informal working definition of intelligence:

203: \begin{quote}

204: \emph{Intelligence measures an agent's ability to achieve goals in a

205: wide range of environments.}

206: \end{quote}

207:

208: We realise that some researchers who study intelligence will take

209: issue with this definition.  Given the diversity of views on the

210: nature of intelligence, a debate which is still being fought, this is

211: unavoidable.  Nevertheless, we are confident that our proposed

212: informal working definition is fairly mainstream.  We also believe

213: that our definition captures what we are interested in achieving in

214: machines: A very general and flexible capacity to succeed when faced

215: with a wide range of problems and situations.  Even those who

216: subscribe to different perspectives on the nature and correct

217: definition of intelligence will surely agree that this is a central

218: objective for anyone wishing to extend the power and usefulness of

219: machines.  It is also a definition that can be successfully

220: formalised.

221:

222: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

223: \section{The agent-environment framework}\label{sec:aef}

224: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

225:

226: In the previous section we identified three essential components for

227: our model of intelligence: An agent, an environment, and a goal.

228: Clearly, the agent and the environment must be able to interact with

229: each other; specifically, the agent needs to be able to send signals

230: to the environment and also receive signals being sent from the

231: environment.  Similarly the environment must be able to receive and

232: send signals to the agent.  In our terminology we will adopt the

233: agent's perspective on these communications and refer to the signals

234: from the agent to the environment as \emph{actions}, and the signals

235: from the environment as \emph{perceptions}.

236:

237: What is missing from this setup is the goal.  As discussed in the

238: previous section, our definition of an agent's intelligence requires

239: there to be some kind of goal for the agent to try to achieve.  This

240: implies that the agent somehow knows what the goal is.  One

241: possibility would be for the goal to be known in advance and for this

242: knowledge to be built into the agent.  The problem with this however

243: is that it limits each agent to just one goal.  We need to allow

244: agents which are more flexible than this.

245:

246: If the goal is not known in advance, the other alternative is to

247: somehow inform the agent of what the goal is.  For humans this is

248: easily done using language.  In general however, the possession of a

249: sufficiently high level of language is too strong an assumption to

250: make about the agent.  Indeed, even for something as intelligent as a

251: dog or a cat, direct explanation will obviously not work.

252:

253: Fortunately there is another possibility.  We can define an additional

254: communication channel with the simplest possible semantics: A signal

255: that indicates how good the agent's current situation is.  We will

256: call this signal the \emph{reward}.  The agent's goal is then simply

257: to maximise the amount of reward it receives, so in a sense its goal

258: is fixed.  This is not limiting though as we have not said anything

259: about what causes different levels of reward to occur.  In a complex

260: setting the agent might be rewarded for winning a game or solving a

261: difficult puzzle.  From a broad perspective then, the goal is

262: flexible.  If the agent is to succeed in its environment, that is,

263: receive a lot of reward, it must learn about the structure of the

264: environment and in particular what it needs to do in order to get

265: reward.

266:

267: \begin{figure}[t]

268: \centerline{\includegraphics[width=0.77\columnwidth]{agent-env.eps}}

269: \caption{\label{agent-env}The agent and the environment interact by

270: sending action, observation and reward signals to each other.}

271: \end{figure}

272:

273: Not surprisingly, this is exactly the way in which we condition an

274: animal to achieve a goal: by selectively rewarding certain behaviours.

275: In a narrow sense the animal's goal is fixed, perhaps to get more

276: treats to eat, but in a broader sense this may require doing a trick

277: or solving a puzzle.

278:

279: In our framework we will include the reward signal as a part of the

280: perception generated by the environment.  The perceptions also contain

281: a non-reward part, which we will refer to as \emph{observations}.

282: This now gives us the complete system of interacting agent and

283: environment in Figure~\ref{agent-env}.  The goal, in the broad

284: flexible sense, is implicitly defined by the environment as this is

285: what defines when rewards are generated.  Thus in this framework, to

286: test an agent in any given way, it is sufficient to fully define the

287: environment.

288:

289: In artificial intelligence, this framework is used in the area of

290: reinforcement learning \cite{Sutton:98}.  By appropriately renaming

291: things, it also describes the controller-plant framework used in

292: control theory.  It is a widely used and very general structure that

293: can describe seemingly any kind of learning or control problem.  The

294: interesting point for us is that this type of framework follows

295: naturally from our informal definition of intelligence.  The only

296: difficulty was how to deal with the notion of success, or profit.

297: This requires the existence of some kind of objective or goal, and the

298: most flexible and elegant way to bring this into our framework is by

299: using a simple reward signal.

300:

301:

302: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

303: \section{A formal framework for intelligence} \label{sec:formframe}

304: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

305:

306: Having made the basic framework explicit, we can now formalise things.

307: See \cite{Hutter:04uaibook} for a more complete technical description

308: along with many more example agents and environments.

309:

310: The agent sends information to the environment by sending

311: \emph{symbols} from some finite set, for example, $\AA := \{ left,

312: right, forwards, backwards \}$.  We will call this set the

313: \emph{action space} and denote it by $\AA$.  Similarly, the

314: environment sends signals to the agent with symbols from a finite set

315: called the \emph{perception space}, which we will denote $\PP$.  The

316: \emph{reward space}, denoted by $\RR$, will always be a finite subset

317: of the rational unit interval $[0,1] \cap \QQQ$.  Every perception

318: consists of two separate parts; an observation and a reward.  For

319: example, we might have $\PP := \{ (cold, 0.0), (warm, 1.0), (hot,

320: 0.3), (roasting, 0.0) \}$.

321:

322: To denote symbols being sent we will use the lower case variable names

323: $a$, $o$ and $r$ for actions, observations and rewards respectively.

324: We will also index these in the order in which they occur, thus $a_1$

325: is the agent's first action, $a_2$ is the second action and so on.

326: The agent and the environment will take turns at sending symbols,

327: starting with the environment.  This produces a history of

328: observations, rewards and actions which we will denote by, $o_1 r_1

329: a_1 o_2 r_2 a_2 o_3 r_3 a_3 o_4 \ldots$.  Our restriction to finite

330: action and perception spaces is deliberate as an agent should not be

331: able to receive or generate information without bound in a single

332: cycle in time.  Of course, the action and perception spaces can still

333: be extremely large, if required.

334:

335: Formally, the agent is a function, denoted by $\pi$, which takes the

336: current history as input and chooses the next action as output.  A

337: convenient way of representing the agent is as a probability measure

338: over actions conditioned on the current history.  Thus $\pi( a_3 | o_1

339: r_1 a_1 o_2 r_2 )$ is the probability of action $a_3$ in the third

340: cycle, given that the current history is $o_1 r_1 a_1 o_2 r_2$.  A

341: deterministic agent is simply one that always assigns a probability of

342: 1 to some action for any given history.  How the agent produces the

343: distribution over actions for any given history is left completely

344: open.  Of course in artificial intelligence the agent will be a

345: machine and so $\pi$ will be a computable function.

346:

347: The environment, denoted $\mu$, is defined in a similar way.

348: Specifically, for any $k \in \NNN$ the probability of $o_k r_k$, given

349: the current history $o_1 r_1 a_1 \ldots o_{k-1} r_{k-1} a_{k-1}$, is

350: $\mu( o_k r_k | o_1 r_1 a_1 \ldots o_{k-1} r_{k-1} a_{k-1} )$.  For

351: the moment we will not place any further restrictions on the

352: environment.

353:

354: Our next task is to formalise the idea of ``profit'' or ``success''

355: for an agent.  Informally, we know that the agent must try to maximise

356: the amount of reward it receives, however this could mean several

357: different things.

358:

359: \vspace{1em}

360:

361: \noindent {\bf Example.}  Define the reward space $\RR := \{ 0, 1 \}$, an action

362: space $\AA := \{ 0, 1 \}$ and an observation space that just contains

363: the null string, $\OO := \{ \varepsilon \}$.  Now define a simple

364: environment,

365: \[

366: \mu( r_k | o_1 \ldots a_{k-1} ) := 1 - | r_k - a_{k-1} |.

367: \]

368: As the agent always get a reward equal to its action, the optimal

369: agent for this environment is clearly $\pi_{opt} ( a_k | o_1 \ldots

370: r_k ) := a_k$.  Consider now two other agents for this environment,

371: $\pi_1 ( a_k | o_1 \ldots r_k ) = \frac{1}{2}$ and

372: \begin{displaymath}

373: \pi_2( a_k | o_1 \ldots r_k ) := \left\{

374: \begin{array}{ll}

375: 1 & \mathrm{for\ } a_k = 0 \land k \leq 100,\\

376: 1 & \mathrm{for\ } a_k = 1 \land 100 < k \leq 5000,\\

377: \frac{1}{2} & \mathrm{for\ }  5000 < k,\\

378: 0 & \mathrm{otherwise}.

379: \end{array} \right.

380: \end{displaymath}

381:

382:

383: For $1 \leq k \leq 100$ the expected reward per cycle for $\pi_1$ is

384: higher than it is for $\pi_2$.  Thus in the short term $\pi_1$ is the

385: most successful.  On the other hand, for $100 < k \leq 5000$, $\pi_2$

386: has switched to the optimal strategy of always guessing that 1 head

387: will be thrown, while $\pi_1$ has not.  Thus in the medium term

388: $\pi_2$ is more successful.  Finally, for $k > 5000$, both agents use

389: random actions and thus in the limit they are equally successful.

390:

391: Which is the better agent?  If you want to maximise short term

392: rewards, it is agent $\pi_1$.  If you want to maximise medium term

393: rewards, then it is agent $\pi_2$.  And if you only care about the

394: long run, both agents are equally successful.  Which agent you prefer

395: depends on your temporal preferences, something which is currently

396: outside of our formulation.

397:

398: The standard way of formalising this in reinforcement learning is to

399: assume that the value of rewards decay geometrically into the future

400: at a rate given by a discount parameter $\gamma \in (0,1)$, that is,

401: \begin{equation}\label{eqn:disval}

402: V^{\pi}_{\mu}(\gamma) := \: \frac{1}{\Gamma} \E \left(

403: \sum_{i=1}^\infty \gamma^i r_i \right)

404: \end{equation}

405: where $r_i$ is the reward in cycle $i$ of a given history, the

406: normalising constant is $\Gamma := \sum_{i=1}^\infty \gamma^i$, and

407: the expected value is taken over all histories of $\pi$ and $\mu$

408: interacting.  By increasing $\gamma$ towards 1 we weight long term

409: rewards more heavily, conversely by reducing it we balance the

410: weighting towards short term rewards.

411:

412: Of course this has not actually answered the question of how to weight

413: near term rewards versus longer term rewards.  Rather it has simply

414: expressed this weighting as a parameter.  While that is adequate for

415: some purposes, what we would like is a single test of intelligence for

416: machines, not a range of tests that vary according to some free

417: parameter.  That is, we would like the temporal preferences to be

418: included in the model, not external to it.

419:

420: One possibility might be to use harmonic discounting, $\gamma_t :=

421: \frac{1}{t^2}$.  This has some nice properties, in particular the

422: agent needs to look forward into the future in a way that is

423: proportional to its current age \cite{Hutter:04uaibook}.  However an

424: even more elegant solution is possible.

425:

426: If we look at the value function in Equation~\ref{eqn:disval}, we see

427: that geometric discounting plays two roles.  Firstly, it normalises

428: the total reward received which makes the sum finite, in this case

429: with a maximum value of 1.  Secondly, it weights the reward at

430: different points in the future which in effect defines a temporal

431: preference.  We can solve both of these problems, without needing an

432: external parameter, by simply requiring that the total reward returned

433: by the environment cannot exceed 1.  For a reward summable environment

434: $\mu$ we can now define the value function to be simply,

435: \begin{equation}\label{eqn:unival}

436: V^{\pi}_{\mu} := \: \E \left( \sum_{i=1}^\infty r_i \right)

437: \leq 1.

438: \end{equation}

439:

440: One way of viewing this is that the rewards returned by the

441: environment now have the temporal preference factored in and thus we

442: do not need to add this.  The cost is that this is an additional

443: condition that we place on the environments.  Previously we required

444: that each reward signal was in a finite subset of $[0,1] \cap \QQQ$,

445: now we have the additional constraint that the sum is bounded.

446:

447: It may seem that there is a philosophical problem here.  If an

448: environment $\mu$ is an artificial game, like chess, then it seems

449: fairly natural for $\mu$ to meet any requirements in its definition,

450: such as having a bounded reward sum.  However if we think of the

451: environment $\mu$ as being ``the universe'' in which the agent lives,

452: then it seems unreasonable to expect that it should be required to

453: respect such a bound.  The flaw in this argument is that a

454: ``universe'' does not have any notion of reward for particular agents.

455:

456: Strictly speaking, reward is an interpretation of the state of the

457: environment.  In humans this is built in, for example, the pain that

458: is experienced when you touch something hot.  In which case, maybe it

459: should really be a part of the agent rather than the environment?  If

460: we gave the agent complete control over rewards then our framework

461: would become meaningless: The perfect agent could simply give itself

462: constant maximum reward.  Indeed humans cannot easily do this either,

463: at least not without taking drugs designed to interfere with their

464: pleasure-pain mechanism.

465:

466: Thus the most accurate framework would consist of an agent, an

467: environment and a separate goal system that interpreted the state of

468: the environment and rewarded the agent appropriately.  In such a set

469: up the bounded rewards restriction would be a part of the goal system

470: and thus the above philosophical problem does not occur.  However for

471: our current purposes it is seem sufficient just to fold this goal

472: mechanism into the environment and add an easily implemented

473: constraint to how the environment may generate rewards.

474:

475:

476: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

477: \section{A formal measure of intelligence}\label{sec:ior}

478: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

479:

480: We have now formally defined the space of agents, how they interact

481: with each other, and how we measure the performance of an agent in any

482: specific environment.  Before we can put all this together into a

483: single performance measure, we firstly need to define what me mean by

484: ``a wide range of environments.''

485:

486: As our goal is to produce a measure of intelligence that is as broad

487: and encompassing as possible, the space of environments used in our

488: definition should be as large as possible.  Given that our environment

489: is a probability measure with a certain structure, an obvious

490: possibility would be to consider the space of all probability measures

491: of this form.  Unfortunately, this extremely broad class of

492: environments causes problems.  As the space of all probability

493: measures is uncountably infinite, we cannot list the members of this

494: set, nor can we always describe environments in a finite way.

495:

496: The solution is to require the environmental measures to be

497: computable.  Not only is this necessary if we are to have an effective

498: measure of intelligence, it is also not all that restrictive.  There

499: are an infinite number of environments in this set, with no upper

500: bound on their complexity.  Furthermore, it is only the measure which

501: describes the environment that must be computable.  For example,

502: although a typical sequence of 1's and 0's generated by flipping a

503: coin is not computable, the probability measure which describes this

504: process is computable.  Thus, even environments which behave randomly

505: are included in our space of environments.  This appears to be the

506: largest reasonable space of environments.  Indeed, no physical system

507: has ever been shown to lie outside of this set.  If such a physical

508: system was found, it would overturn the Church-Turing thesis and alter

509: our view of the universe.

510:

511: How can we combine the agent's performance over all these

512: environments?  As there are an infinite number of environments, we

513: cannot simply take a uniform distribution over them.  Mathematically,

514: we must weight some environments more highly than others.  If we

515: consider the agent's perspective on the problem, this question is the

516: same as asking: Given several different hypotheses which are

517: consistent with the data, which hypothesis should be considered the

518: most likely?  This is a frequently occurring problem in inductive

519: inference where we must employ a philosophical principle to decide

520: which hypothesis is the most likely.  The most successful approach is

521: to invoke the principle of Occam's razor: Given multiple hypotheses

522: which are consistent with the data, the simplest should be preferred.

523: This is generally considered the rational and intelligent thing to do.

524:

525: Consider for example the following type of question which commonly

526: appears in intelligence tests.  There is a sequence such as 2, 4, 6,

527: 8, and the test subject needs to predict the next number.  Of course

528: the pattern is immediately clear: The numbers are increasing by 2 each

529: time.  An intelligent person would easily identify this pattern and

530: predict the next digit to be 10.  However, the polynomial $2k^4 -20k^3

531: +70k^2 -98k +48$ is also consistent with the data, in which case the

532: next number in the sequence would be 58.  Why then do we consider the

533: first answer to be more likely?  It is because we use, perhaps

534: unconsciously, the principle of Occam's razor.  Furthermore, the fact

535: that the test defines this as the correct answer shows that it too

536: embodies the concept of Occam's razor.  Thus, although we don't

537: usually mention Occam's razor when defining intelligence, the ability

538: to effectively use Occam's razor is clearly a part of intelligent

539: behaviour.

540:

541: Our formal measure of intelligence needs to reflect this.

542: Specifically, we need to test the agents in such a way that they are,

543: at least on average, rewarded for correctly applying Occam's razor.

544: Formally, this means that our a priori distribution over environments

545: should be weighted towards simpler environments.  The problem now

546: becomes: How should we measure the complexity of environments?

547:

548: As each environment is computable, it can be represented by a program,

549: or more formally, a binary string $p \in \BBB^*$ on some prefix

550: universal Turing machine $\UU$.  Thus we can use Kolmogorov complexity

551: to measure the complexity of an environment $\mu \in E$,

552: \[

553: K( \mu ) := \min_{p \in \BBB^*} \big\{ |p| : \UU(p)

554: \mathrm{\ computes\ } \mu \big\}.

555: \]

556: This measure is independent of the choice of $\UU$ up to an additive

557: constant that is independent of $\mu$, thus, we simply pick one

558: universal Turing machine $\UU$ and fix it.  The correct way to turn

559: this into a prior distribution is by taking $2^{-K(\mu)}$.  This is

560: known as the algorithmic probability distribution and it has a number

561: of important properties, particularly in the context of universally

562: optimal learning agents.  See \cite{Li:97} or \cite{Hutter:04uaibook}

563: for an overview of Kolmogorov complex and universal prior

564: distributions.

565:

566: Putting this all together, we can now define our formal measure of

567: intelligence for arbitrary systems.  Let $E$ be the space of all

568: programs that compute environmental measures of summable reward with

569: respect to a prefix universal Turing machine $\UU$, let $K$ be the

570: Kolmogorov complexity function.  The intelligence of an agent $\pi$ is

571: defined as,

572: \[

573: \Upsilon(\pi) :=  \sum_{\mu \in E} 2^{-K(\mu)} V^{\pi}_{\mu} = V^{\pi}_{\xi},

574: \]

575: where $\xi := \sum_{\mu \in E} 2^{-K(\mu)} \mu$ due to the linearity

576: of $V$.  $\xi$~is the Solomonoff-Levin universal a priori distribution

577: generalised to reactive environments.

578:

579:

580: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

581: \section{Properties of the intelligence measure}

582: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

583:

584: To better understand the performance of this measure consider some

585: example agents.

586:

587: \emph{A random agent.}  The agent with the lowest intelligence, at

588: least among those that are not actively trying to perform badly, would

589: be one that makes uniformly random actions.  We will call this

590: $\pi^\mathtt{rand}$.  In general such an agent will not be very

591: successful as it will fail to exploit any regularities in the

592: environment, no matter how simple they are.  It follows then that the

593: values of $V^{\pi^\mathtt{rand}}_\mu$ will typically be low compared

594: to other agents, and thus $\Upsilon (\pi^\mathtt{rand})$ will be low.

595:

596: \emph{A very specialised agent.}  From the equation for $\Upsilon$, we

597: see that an agent could have very low intelligence but still perform

598: extremely well at a few very specific and complex tasks.  Consider,

599: for example, IBM's Deep Blue chess supercomputer, which we will

600: represent by $\pi^\mathtt{dblue}$.  When $\mu^\mathtt{chess}$

601: describes the game of chess,

602: $V^{\pi^\mathtt{dblue}}_{\mu^\mathtt{chess}}$ is very high.  However

603: $2^{-K(\mu^\mathtt{chess})}$ is small, and for $\mu \neq

604: \mu^\mathtt{chess}$ the value function will be low relative to other

605: agents as $\pi^\mathtt{dblue}$ only plays chess.  Therefore, the value

606: of $\Upsilon (\pi^\mathtt{dblue})$ will be very low.  Intuitively,

607: this is because Deep Blue is too inflexible and narrow to have general

608: intelligence.

609:

610: \emph{A general but simple agent.}  Imagine an agent that does very

611: basic learning by building up a table of observation and action pairs

612: and keeping statistics on the rewards that follow.  Each time an

613: observation that has been seen before occurs, the agent takes the

614: action with highest estimated expected reward in the next cycle with

615: 90\% probability, or a random action with 10\% probability.  We will

616: call this agent $\pi^\mathtt{basic}$.  It is immediately clear that

617: many environments, both complex and very simple, will have at least

618: some structure that such an agent would take advantage of.  Thus for

619: almost all $\mu$ we will have $V^{\pi^\mathtt{basic}}_\mu >

620: V^{\pi^\mathtt{rand}}_\mu$ and so $\Upsilon (\pi^\mathtt{basic}) >

621: \Upsilon (\pi^\mathtt{rand})$.  Intuitively, this is what we would

622: expect as $\pi^\mathtt{basic}$, while very simplistic, is surely more

623: intelligent than $\pi^\mathtt{rand}$.

624:

625: \emph{A simple agent with more history.}  A natural extension of

626: $\pi^\mathtt{basic}$ is to use a longer history of actions,

627: observations and rewards in its internal table.  Let

628: $\pi^\mathtt{2back}$ be the agent that builds a table of statistics

629: for the expected reward conditioned on the last two actions, rewards

630: and observations.  It is immediately clear $\pi^\mathtt{2back}$ is a

631: generalisation of $\pi^\mathtt{basic}$ by definition and thus will

632: adapt to any regularity that $\pi^\mathtt{basic}$ can adapt to.  It

633: follows then that in general $V^{\pi^\mathtt{2back}}_\mu >

634: V^{\pi^\mathtt{basic}}_\mu$ and so $\Upsilon (\pi^\mathtt{2back}) >

635: \Upsilon (\pi^\mathtt{basic})$, as we would intuitively expect.

636:

637: In a similar way agents of increasing complexity and adaptability can

638: be defined which will have still greater intelligence.  However with

639: more complex agents it is usually difficult to theoretically establish

640: whether one agent has more or less intelligence than another.

641: Nevertheless, it is hopefully clear from these simple examples that

642: the more flexible and powerful an agent is, the higher its machine

643: intelligence.

644:

645: \emph{A human.}  For extremely simple environments, a human should be

646: able to identify their simple structure and exploit this to maximise

647: reward.  For more complex environments however it is hard to know how

648: well a human would perform without experimental results.

649:

650: \emph{Super-human intelligence.}  It can be easily proven that the

651: theoretical AIXI agent \cite{Hutter:04uaibook} is the maximally

652: intelligent agent with respect to $\Upsilon$.  AIXI has been proven to

653: have many universal optimality properties, including being Pareto

654: optimal and self-optimising in any environment in which this is

655: possible for a general agent.  Thus it is clear that agents with very

656: high $\Upsilon$ must be extremely powerful.

657:

658: In addition to sensibly ordering many simple learning agents, this

659: formal definition has many significant and desirable properties:

660:

661: \emph{Valid}.  The most important property of a measure of

662: intelligence is that it does indeed measure ``intelligence''.  As

663: $\Upsilon$ formalises a mainstream informal definition, we believe

664: that it is valid measure.

665:

666: \emph{Meaningful}. An agent with a high $\Upsilon$ value must perform

667: well over a very wide range of environments, in particular it must

668: perform well in almost all simple environments.  If such a agent

669: existed, it would clearly be very powerful and practically useful.  It

670: also sensibly orders the intelligence of simple learning agents.

671:

672: \emph{Repeatable}. We can test an agent using the $\Upsilon$

673: repeatedly without problem.  This is because it is defined across all

674: well defined environments, not just a specific test subset which an

675: agent might adapt to.

676:

677: \emph{Absolute}.  $\Upsilon$ gives us a single real absolute value,

678: unlike the pass-fail Turing test \cite{Turing:50}.  This is important

679: if we want to make distinctions between similar learning algorithms

680: that are not close to human level intelligence.

681:

682: \emph{Wide range}. As we have seen, $\Upsilon$ can measure performance

683: from extremely simple agents right up to the super powerful AIXI

684: agent.  Other tests cannot hand such an enormus range.

685:

686:

687: \emph{General}.  The test is clearly non-specific to the

688: implementation of the agent as the inner workings of the agent is left

689: completely undefined.  It is also very general in terms of what senses

690: or actuators the agent might have as all information exchanged between

691: the agent and the environment takes place over basic Shannon like

692: communication channels.

693:

694: \emph{Dynamic}.  One aspect of our test of intelligence is that it is,

695: in the terminology of intelligence testing, a highly dynamic test

696: \cite{Sternberg:02}.  Normally intelligence tests for humans only test

697: the ability to solve one-off problems.  There are no dynamic aspects

698: to the test where the test subject has to interact with something and

699: learn and adapt their behaviour accordingly.  This makes it very hard

700: to test things like the individual's ability to quickly pick up new

701: skills and adapt to new situations.  One way to overcome these

702: problems is to use more sophisticated dynamic tests.  In these tests

703: there is an active tester who constantly interacts with the test

704: subject, much like what happens in our formal intelligence measure.

705:

706: \emph{Unbiased}.  The test is not weighted towards ability in certain

707: specific kinds of areas or problems, rather it is simply weighted

708: towards simpler environments no matter what they are.

709:

710: \emph{Fundamental}.  The test is based on the theory of information,

711: Turing computation and complexity theory.  These are all fundamental

712: ideas which are likely to remain very stable over time irrespective of

713: changes in technology.

714:

715: \emph{Formal}.  Unlike many tests of intelligence, $\Upsilon$ is

716: completely formally, mathematically, specified.

717:

718: \emph{Objective}.  Unlike the Turing test which requires a panel of

719: judges to decide if an agent is intelligent or not, $\Upsilon$ is fee

720: of such subjectivity.

721:

722:

723: Our definition of intelligence also has some weaknesses.  One is the

724: fact that the environmental distribution $2^{-K(\mu)}$ that we have

725: used is invariant, up to a multiplicative constant, to changes in the

726: reference machine $\UU$.  While this affords us some protection, it

727: still means that the relative intelligence of agents can change if we

728: change our reference machine.  One approach to this problem might be

729: to limit the complexity of the reference machine, for example by

730: limiting its state-symbol complexity.  We expect that for highly

731: intelligent machines that can deal with a wide range of environments

732: of varying complexity, the effect of changing from one simple

733: reference machine to another will be minor.  For agents which are less

734: complex than the reference machine however, such a change could be

735: significant.

736:

737: A theoretical problem is that our distribution over environments is

738: not computable.  While this is fine for a theoretical definition of

739: intelligence, it makes the measure impossible to directly implement.

740: The solution is to use a more tractable measure of complexity such as

741: Levin's $Kt$ complexity \cite{Levin:73search}, or Schmidhuber's Speed

742: prior \cite{Schmidhuber:02speed}.  Both of these consider the

743: complexity of an algorithm to be determined by both its description

744: length and running time.  Intuitively it also makes good sense,

745: because we would not usually consider a very short algorithm that

746: takes an enormous amount of time to compute, to be a particularly

747: simple one.

748:

749:

750: The only closely related work to ours is the C-Test

751: \cite{Hernandez:00btt}.  While our intelligence measure is fully

752: dynamic and interactive, the C-Test is a purely static sequence

753: prediction test similar to standard IQ tests for humans.  The C-Test

754: always ensures that each question has an unambiguous answer in the

755: sense that there is always one consistent hypothesis with

756: significantly lower complexity than the alternatives.  Perhaps this is

757: useful for some kinds of tests, but we believe that it is unrealistic

758: and limiting.  Like our intelligence test, the C-Test also has to deal

759: with the problem of the incomputability of Kolmogorov complexity.  By

760: using Levin's $Kt$ complexity, the C-Test was able to compute a number

761: of test problems which were used to test humans.  The ``compression

762: test''\cite{Mahoney:99} for machine intelligence is similarly

763: restricted to sequence prediction.  We consider the linguistic

764: complexity tests of Treister-Goren et.\ al.\ to be far too narrow.

765: The psychometric approach of Bringsjord and Schimanski is only

766: appropriate if the machine has a sufficiently human-like intelligence.

767:

768:

769: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

770: \section{Conclusions}\label{sec:conc}

771: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

772:

773: Given the obvious significance of formal definitions of intelligence

774: for research, and calls for more direct measures of machine

775: intelligence to replace the problematic Turing test and other

776: imitation based tests \cite{Johnson:92}, very little work has been

777: done in this area.  In this paper we have attempted to tackle this

778: problem head on.  Although the test has a few weaknesses, it also has

779: many unique strengths.  In particular, we believe that it expresses

780: the essentials of machine intelligence in an elegant and powerful way.

781: Furthermore, more tractable measures of complexity should lead to

782: practical tests based on this theoretical model.

783:

784:

785: \subsection*{Acknowledgments}

786:

787: This was supported by SNF grant 200020-107616.

788:

789: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

790: %         Bibliography        %

791: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

792:

793: \begin{small}

794: \begin{thebibliography}{}\parskip=-0.2ex

795:

796: \bibitem[Bin37]{Bingham:37}

797: W.~V. Bingham.

798: \newblock {\em Aptitudes and aptitude testing}.

799: \newblock Harper \& Brothers, New York, 1937.

800:

801: \bibitem[Got97]{Gottfredson:97msoi}

802: L.~S. Gottfredson.

803: \newblock Mainstream science on intelligence: An editorial with 52 signatories,

804:   history, and bibliography.

805: \newblock {\em Intelligence}, 24(1):13--23, 1997.

806:

807: \bibitem[HO00]{Hernandez:00btt}

808: J.~Hern{\'a}ndez-Orallo.

809: \newblock Beyond the {T}uring test.

810: \newblock {\em Journal of Logic, Language and Information}, 9(4):447--466,

811:   2000.

812:

813: \bibitem[Hut04]{Hutter:04uaibook}

814: M.~Hutter.

815: \newblock {\em Universal Artificial Intelligence: Sequential Decisions based on

816:   Algorithmic Probability}.

817: \newblock Springer, Berlin, 2004.

818: \newblock 300 pages, http://www.idsia.ch/$_{^{\sim}}$marcus/ai/uaibook.htm.

819:

820: \bibitem[Joh92]{Johnson:92}

821: W.~L. Johnson.

822: \newblock Needed: {A} new test of intelligence.

823: \newblock {\em SIGARTN: SIGART Newsletter (ACM Special Interest Group on

824:   Artificial Intelligence)}, 3, 1992.

825:

826: \bibitem[Lev73]{Levin:73search}

827: L.~A. Levin.

828: \newblock Universal sequential search problems.

829: \newblock {\em Problems of Information Transmission}, 9:265--266, 1973.

830:

831: \bibitem[LH05]{Legg:05iors}

832: S.~Legg and M.~Hutter.

833: \newblock A universal measure of intelligence for artificial agents.

834: \newblock In {\em Proc. 21st International Joint Conf. on Artificial

835:   Intelligence ({IJCAI-2005})}, number IDSIA-04-05, pages 1509--1510,

836:   Edinburgh, 2005.

837:

838: \bibitem[LV97]{Li:97}

839: M.~Li and P.~M.~B. Vit\'anyi.

840: \newblock {\em An introduction to {Kolmogorov} complexity and its

841:   applications}.

842: \newblock Springer, 2nd edition, 1997.

843:

844: \bibitem[Mah99]{Mahoney:99}

845: M.~V. Mahoney.

846: \newblock Text compression as a test for artificial intelligence.

847: \newblock In {\em {AAAI}/{IAAI}}, 1999.

848:

849: \bibitem[SB98]{Sutton:98}

850: R.~Sutton and A.~Barto.

851: \newblock {\em Reinforcement learning: An introduction}.

852: \newblock Cambridge, MA, MIT Press, 1998.

853:

854: \bibitem[Sch02]{Schmidhuber:02speed}

855: J.~Schmidhuber.

856: \newblock The {Speed Prior:} a new simplicity measure yielding near-optimal

857:   computable predictions.

858: \newblock In {\em Proc. 15th Annual Conference on Computational Learning Theory

859:   (COLT 2002)}, Lecture Notes in Artificial Intelligence, pages 216--228,

860:   Sydney, Australia, July 2002. Springer.

861:

862: \bibitem[SG02]{Sternberg:02}

863: R.~J. Sternberg and E.~L. Grigorenko, editors.

864: \newblock {\em Dynamic Testing: {T}he nature and measurement of learning

865:   potential}.

866: \newblock Cambridge University Press, 2002.

867:

868: \bibitem[Ste00]{Sternberg:00}

869: R.~J. Sternberg, editor.

870: \newblock {\em Handbook of Intelligence}.

871: \newblock Cambridge University Press, 2000.

872:

873: \bibitem[Tur50]{Turing:50}

874: A.~M. Turing.

875: \newblock Computing machinery and intelligence.

876: \newblock {\em Mind}, October 1950.

877:

878: \bibitem[Wec58]{Wechsler:58}

879: D.~Wechsler.

880: \newblock {\em The measurement and appraisal of adult intelligence}.

881: \newblock Williams \& Wilkinds, Baltimore, 4 edition, 1958.

882:

883: \end{thebibliography}

884: \end{small}

885:

886: \end{document}

887:

888: %--------------------End-of-IOR.tex---------------------------%

889: