1:
2: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
3: %% A Formal Measure of Machine Intelligence %%
4: %% Shane Legg and Marcus Hutter (2005) %%
5: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
6:
7: \documentclass[twocolumn]{article}
8: \usepackage[dvips]{graphicx}
9: \usepackage{latexsym,amssymb}
10: \topmargin=-15mm \oddsidemargin=0mm \evensidemargin=0mm
11: \textwidth=16cm \textheight=24cm
12:
13: \def\OO{\mathcal O} \def\RR{\mathcal R} \def\UU{\mathcal U} \def\PP{\mathcal P}
14: \def\NNN{\mathbb N} \def\QQQ{\mathbb Q} \def\BBB{\mathbb B}
15: \def\E{\mathbf{E}}
16:
17: \begin{document}
18:
19: \title{\vspace{-3ex}\normalsize\sc Technical Report \hfill IDSIA-10-06
20: \vskip 2mm\bf\Large\hrule height5pt \vskip 6mm
21: A Formal Measure of Machine Intelligence
22: \vskip 6mm \hrule height2pt}
23: \author{{\bf Shane Legg} and {\bf Marcus Hutter}\\[3mm]
24: \normalsize IDSIA, Galleria 2, CH-6928\ Manno-Lugano, Switzerland\\
25: \normalsize \{shane,marcus\}@idsia.ch \hspace{9ex} http://www.idsia.ch/ }
26: \date{14 April 2006}
27: \maketitle
28:
29: \begin{abstract}
30: A fundamental problem in artificial intelligence is that nobody really
31: knows what intelligence is. The problem is especially acute when we
32: need to consider artificial systems which are significantly different
33: to humans. In this paper we approach this problem in the following
34: way: We take a number of well known informal definitions of human
35: intelligence that have been given by experts, and extract their
36: essential features. These are then mathematically formalised to
37: produce a general measure of intelligence for arbitrary machines. We
38: believe that this measure formally captures the concept of machine
39: intelligence in the broadest reasonable sense.
40: \end{abstract}
41:
42: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
43: \section{Introduction}
44: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
45:
46: Most of us think that we recognise intelligence when we see it, but we
47: are not really sure how to precisely define or measure it. We
48: informally judge the intelligence of others by relying on our past
49: experiences in dealing with people. Naturally, this naive approach is
50: highly subjective and imprecise. A more principled approach would be
51: to use one of the many standard intelligence tests that are available.
52: Contrary to popular wisdom, these tests, when correctly applied by a
53: professional, deliver statistically consistent results and have
54: considerable power to predict the future performance of individuals in
55: many mentally demanding tasks. However, while these tests work well
56: for humans, if we wish to measure the intelligence of other things,
57: perhaps of a monkey or a new machine learning algorithm, they are
58: clearly inappropriate.
59:
60: One response to this problem might be to develop specific kinds of
61: tests for specific kinds of entities; just as intelligence tests for
62: children differ to intelligence tests for adults. While this works
63: well when testing humans of different ages, it comes undone when we
64: need to measure the intelligence of entities which are profoundly
65: different to each other in terms of their cognitive capacities, speed,
66: senses, environments in which they operate, and so on. To measure the
67: intelligence of such diverse systems in a meaningful way we must step
68: back from the specifics of particular systems and establish the
69: underlying fundamentals of what it is that we are really trying to
70: measure. That is, we need to establish a notion of intelligence that
71: goes beyond the specifics of particular kinds of systems.
72:
73: The difficulty of doing this is readily apparent. Consider, for
74: example, the memory and numerical computation tasks that appear in
75: some intelligence tests and which were once regarded as defining
76: hallmarks of human intelligence. We now know that these tasks are
77: absolutely trivial for a machine and thus do not test the machine's
78: intelligence. Indeed even the mentally demanding task of playing
79: chess has been largely reduced to brute force search. As technology
80: advances, our concept of what intelligence is continues to evolve with
81: it.
82:
83: How then are we to develop a concept of intelligence that is
84: applicable to all kinds of systems? Any proposed definition must
85: encompass the essence of human intelligence, as well as other
86: possibilities, in a consistent way. It should not be limited to any
87: particular set of senses, environments or goals, nor should it be
88: limited to any specific kind of hardware, such as silicon or
89: biological neurons. It should be based on principles which are
90: sufficiently fundamental so as to be unlikely to alter over time.
91: Furthermore, the intelligence measure should ideally be formally
92: expressed, objective, and practically realisable.
93:
94: This paper approaches this problem in the following way. In
95: \emph{Section \ref{sec:infint}} we consider a range of definitions of
96: human intelligence that have been put forward by well known
97: psychologists. From these we extract the most common and essential
98: features and use them to create an informal definition of
99: intelligence. \emph{Section \ref{sec:aef}} then introduces the
100: framework which we use to construct our formal measure of
101: intelligence. This framework is formally defined in \emph{Section
102: \ref{sec:formframe}}. In \emph{Section \ref{sec:ior}} we use our
103: developed formalism to produce a formal definition of intelligence.
104: \emph{Section \ref{sec:conc}} closes with a short summary.
105:
106: A preliminary sketch of the ideas in this paper appeared in the poster
107: \cite{Legg:05iors}. It can be shown that the intelligence measure
108: presented here is in fact a variant of the Intelligence Order Relation
109: that appears in the theory of AIXI, the provably optimal universal
110: agent \cite{Hutter:04uaibook}. A long journal version of this paper
111: is being written in which we give the proposed measure of machine
112: intelligence and its relation to other such tests a much more
113: comprehensive treatment.
114:
115: Naturally, we expect such a bold initiative to be met with resistance.
116: However, we hope that the reader will appreciate the value of our
117: approach: With a formally precise definition put forward we aim to
118: better our understanding of what is a notoriously subjective and
119: slippery concept.
120:
121: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
122: \section{The concept of intelligence}\label{sec:infint}
123: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
124:
125: Although definitions of human intelligence given by experts in the
126: field vary, most of their views cluster around a few common
127: perspectives. Perhaps the most common perspective, roughly stated, is
128: to think of intelligence as being the ability to successfully operate
129: in uncertain environments by learning and adapting based on
130: experience. The following often quoted definitions, which can be
131: found in \cite{Sternberg:00}, \cite{Wechsler:58}, \cite{Bingham:37}
132: and \cite{Gottfredson:97msoi}, all express this notion of intelligence
133: but with different emphasis in each case:
134:
135: \begin{itemize}
136:
137: \item ``The capacity to learn or to profit by experience.''
138: \mbox{--~W. F. Dearborn}
139:
140: \item ``Ability to adapt oneself adequately to relatively new situations in
141: life.''
142: --~R. Pinter
143:
144: \item ``A person possesses intelligence insofar as he has learned, or can
145: learn, to adjust himself to his environment.''
146: --~S. S. Colvin
147:
148: \item ``We shall use the term `intelligence' to mean the ability of an
149: organism to solve new problems\ldots.''
150: \mbox{--~W. V. Bingham}
151:
152: \item ``A global concept that involves an individual's ability to
153: act purposefully, think rationally, and deal effectively with
154: the environment.''
155: \mbox{-- D. Wechsler}
156:
157: \item ``Intelligence is a very general mental capability that, among
158: other things, involves the ability to reason, plan, solve problems,
159: think abstractly, comprehend complex ideas, learn quickly and learn
160: from experience.''
161: \mbox{--~L. S. Gottfredson and 52 expert signatories}
162:
163: \end{itemize}
164:
165: These definitions have certain common features; in some cases they are
166: explicitly stated, while in others they are more implicit. Perhaps
167: the most elementary feature is that intelligence is seen as a property
168: of an entity which is interacting with an external environment,
169: problem or situation. Indeed this much is common to practically all
170: proposed definitions of intelligence. As we will be referring back to
171: these concepts regularly, we will refer to the entity whose
172: intelligence is in question as the \emph{agent}, and the external
173: environment, problem or situation that it faces as the
174: \emph{environment}. An environment could be a large complex world in
175: which the agent exists, similar to the usual meaning, or something as
176: narrow as a game of tic-tac-toe.
177:
178: The second common feature of these definitions is that an agent's
179: intelligence is related to its ability to succeed in an environment.
180: This implies that the agent has some kind of an objective. Perhaps we
181: could consider an agent intelligent, in an abstract sense, without
182: having any objective. However without any objective what so ever, the
183: agent's intelligence would have no observable consequences.
184: Intelligence then, at least the concrete kind that interests us, comes
185: into effect when an agent has an objective to apply its intelligence
186: to. Here we will refer to this as its \emph{goal}.
187:
188: The emphasis on learning, adaption and experience in these definitions
189: implies that the environment is not fully known to the agent and may
190: contain surprises and new situations which could not have been
191: anticipated in advance. Thus intelligence is not the ability to deal
192: with one fixed and known environment, but rather the ability to deal
193: with some range of possibilities which cannot be wholly anticipated.
194: This means that an intelligent agent may not be the best possible in
195: any specific environment, particularly before it has had sufficient
196: time to learn. What is important is that the agent is able to learn
197: and adapt so as to perform well over a wide range of specific
198: environments.
199:
200: Although there is a great deal more to this topic than we have
201: presented here, the above brief analysis gives us the necessary
202: building blocks for our informal working definition of intelligence:
203: \begin{quote}
204: \emph{Intelligence measures an agent's ability to achieve goals in a
205: wide range of environments.}
206: \end{quote}
207:
208: We realise that some researchers who study intelligence will take
209: issue with this definition. Given the diversity of views on the
210: nature of intelligence, a debate which is still being fought, this is
211: unavoidable. Nevertheless, we are confident that our proposed
212: informal working definition is fairly mainstream. We also believe
213: that our definition captures what we are interested in achieving in
214: machines: A very general and flexible capacity to succeed when faced
215: with a wide range of problems and situations. Even those who
216: subscribe to different perspectives on the nature and correct
217: definition of intelligence will surely agree that this is a central
218: objective for anyone wishing to extend the power and usefulness of
219: machines. It is also a definition that can be successfully
220: formalised.
221:
222: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
223: \section{The agent-environment framework}\label{sec:aef}
224: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
225:
226: In the previous section we identified three essential components for
227: our model of intelligence: An agent, an environment, and a goal.
228: Clearly, the agent and the environment must be able to interact with
229: each other; specifically, the agent needs to be able to send signals
230: to the environment and also receive signals being sent from the
231: environment. Similarly the environment must be able to receive and
232: send signals to the agent. In our terminology we will adopt the
233: agent's perspective on these communications and refer to the signals
234: from the agent to the environment as \emph{actions}, and the signals
235: from the environment as \emph{perceptions}.
236:
237: What is missing from this setup is the goal. As discussed in the
238: previous section, our definition of an agent's intelligence requires
239: there to be some kind of goal for the agent to try to achieve. This
240: implies that the agent somehow knows what the goal is. One
241: possibility would be for the goal to be known in advance and for this
242: knowledge to be built into the agent. The problem with this however
243: is that it limits each agent to just one goal. We need to allow
244: agents which are more flexible than this.
245:
246: If the goal is not known in advance, the other alternative is to
247: somehow inform the agent of what the goal is. For humans this is
248: easily done using language. In general however, the possession of a
249: sufficiently high level of language is too strong an assumption to
250: make about the agent. Indeed, even for something as intelligent as a
251: dog or a cat, direct explanation will obviously not work.
252:
253: Fortunately there is another possibility. We can define an additional
254: communication channel with the simplest possible semantics: A signal
255: that indicates how good the agent's current situation is. We will
256: call this signal the \emph{reward}. The agent's goal is then simply
257: to maximise the amount of reward it receives, so in a sense its goal
258: is fixed. This is not limiting though as we have not said anything
259: about what causes different levels of reward to occur. In a complex
260: setting the agent might be rewarded for winning a game or solving a
261: difficult puzzle. From a broad perspective then, the goal is
262: flexible. If the agent is to succeed in its environment, that is,
263: receive a lot of reward, it must learn about the structure of the
264: environment and in particular what it needs to do in order to get
265: reward.
266:
267: \begin{figure}[t]
268: \centerline{\includegraphics[width=0.77\columnwidth]{agent-env.eps}}
269: \caption{\label{agent-env}The agent and the environment interact by
270: sending action, observation and reward signals to each other.}
271: \end{figure}
272:
273: Not surprisingly, this is exactly the way in which we condition an
274: animal to achieve a goal: by selectively rewarding certain behaviours.
275: In a narrow sense the animal's goal is fixed, perhaps to get more
276: treats to eat, but in a broader sense this may require doing a trick
277: or solving a puzzle.
278:
279: In our framework we will include the reward signal as a part of the
280: perception generated by the environment. The perceptions also contain
281: a non-reward part, which we will refer to as \emph{observations}.
282: This now gives us the complete system of interacting agent and
283: environment in Figure~\ref{agent-env}. The goal, in the broad
284: flexible sense, is implicitly defined by the environment as this is
285: what defines when rewards are generated. Thus in this framework, to
286: test an agent in any given way, it is sufficient to fully define the
287: environment.
288:
289: In artificial intelligence, this framework is used in the area of
290: reinforcement learning \cite{Sutton:98}. By appropriately renaming
291: things, it also describes the controller-plant framework used in
292: control theory. It is a widely used and very general structure that
293: can describe seemingly any kind of learning or control problem. The
294: interesting point for us is that this type of framework follows
295: naturally from our informal definition of intelligence. The only
296: difficulty was how to deal with the notion of success, or profit.
297: This requires the existence of some kind of objective or goal, and the
298: most flexible and elegant way to bring this into our framework is by
299: using a simple reward signal.
300:
301:
302: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
303: \section{A formal framework for intelligence} \label{sec:formframe}
304: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
305:
306: Having made the basic framework explicit, we can now formalise things.
307: See \cite{Hutter:04uaibook} for a more complete technical description
308: along with many more example agents and environments.
309:
310: The agent sends information to the environment by sending
311: \emph{symbols} from some finite set, for example, $\AA := \{ left,
312: right, forwards, backwards \}$. We will call this set the
313: \emph{action space} and denote it by $\AA$. Similarly, the
314: environment sends signals to the agent with symbols from a finite set
315: called the \emph{perception space}, which we will denote $\PP$. The
316: \emph{reward space}, denoted by $\RR$, will always be a finite subset
317: of the rational unit interval $[0,1] \cap \QQQ$. Every perception
318: consists of two separate parts; an observation and a reward. For
319: example, we might have $\PP := \{ (cold, 0.0), (warm, 1.0), (hot,
320: 0.3), (roasting, 0.0) \}$.
321:
322: To denote symbols being sent we will use the lower case variable names
323: $a$, $o$ and $r$ for actions, observations and rewards respectively.
324: We will also index these in the order in which they occur, thus $a_1$
325: is the agent's first action, $a_2$ is the second action and so on.
326: The agent and the environment will take turns at sending symbols,
327: starting with the environment. This produces a history of
328: observations, rewards and actions which we will denote by, $o_1 r_1
329: a_1 o_2 r_2 a_2 o_3 r_3 a_3 o_4 \ldots$. Our restriction to finite
330: action and perception spaces is deliberate as an agent should not be
331: able to receive or generate information without bound in a single
332: cycle in time. Of course, the action and perception spaces can still
333: be extremely large, if required.
334:
335: Formally, the agent is a function, denoted by $\pi$, which takes the
336: current history as input and chooses the next action as output. A
337: convenient way of representing the agent is as a probability measure
338: over actions conditioned on the current history. Thus $\pi( a_3 | o_1
339: r_1 a_1 o_2 r_2 )$ is the probability of action $a_3$ in the third
340: cycle, given that the current history is $o_1 r_1 a_1 o_2 r_2$. A
341: deterministic agent is simply one that always assigns a probability of
342: 1 to some action for any given history. How the agent produces the
343: distribution over actions for any given history is left completely
344: open. Of course in artificial intelligence the agent will be a
345: machine and so $\pi$ will be a computable function.
346:
347: The environment, denoted $\mu$, is defined in a similar way.
348: Specifically, for any $k \in \NNN$ the probability of $o_k r_k$, given
349: the current history $o_1 r_1 a_1 \ldots o_{k-1} r_{k-1} a_{k-1}$, is
350: $\mu( o_k r_k | o_1 r_1 a_1 \ldots o_{k-1} r_{k-1} a_{k-1} )$. For
351: the moment we will not place any further restrictions on the
352: environment.
353:
354: Our next task is to formalise the idea of ``profit'' or ``success''
355: for an agent. Informally, we know that the agent must try to maximise
356: the amount of reward it receives, however this could mean several
357: different things.
358:
359: \vspace{1em}
360:
361: \noindent {\bf Example.} Define the reward space $\RR := \{ 0, 1 \}$, an action
362: space $\AA := \{ 0, 1 \}$ and an observation space that just contains
363: the null string, $\OO := \{ \varepsilon \}$. Now define a simple
364: environment,
365: \[
366: \mu( r_k | o_1 \ldots a_{k-1} ) := 1 - | r_k - a_{k-1} |.
367: \]
368: As the agent always get a reward equal to its action, the optimal
369: agent for this environment is clearly $\pi_{opt} ( a_k | o_1 \ldots
370: r_k ) := a_k$. Consider now two other agents for this environment,
371: $\pi_1 ( a_k | o_1 \ldots r_k ) = \frac{1}{2}$ and
372: \begin{displaymath}
373: \pi_2( a_k | o_1 \ldots r_k ) := \left\{
374: \begin{array}{ll}
375: 1 & \mathrm{for\ } a_k = 0 \land k \leq 100,\\
376: 1 & \mathrm{for\ } a_k = 1 \land 100 < k \leq 5000,\\
377: \frac{1}{2} & \mathrm{for\ } 5000 < k,\\
378: 0 & \mathrm{otherwise}.
379: \end{array} \right.
380: \end{displaymath}
381:
382:
383: For $1 \leq k \leq 100$ the expected reward per cycle for $\pi_1$ is
384: higher than it is for $\pi_2$. Thus in the short term $\pi_1$ is the
385: most successful. On the other hand, for $100 < k \leq 5000$, $\pi_2$
386: has switched to the optimal strategy of always guessing that 1 head
387: will be thrown, while $\pi_1$ has not. Thus in the medium term
388: $\pi_2$ is more successful. Finally, for $k > 5000$, both agents use
389: random actions and thus in the limit they are equally successful.
390:
391: Which is the better agent? If you want to maximise short term
392: rewards, it is agent $\pi_1$. If you want to maximise medium term
393: rewards, then it is agent $\pi_2$. And if you only care about the
394: long run, both agents are equally successful. Which agent you prefer
395: depends on your temporal preferences, something which is currently
396: outside of our formulation.
397:
398: The standard way of formalising this in reinforcement learning is to
399: assume that the value of rewards decay geometrically into the future
400: at a rate given by a discount parameter $\gamma \in (0,1)$, that is,
401: \begin{equation}\label{eqn:disval}
402: V^{\pi}_{\mu}(\gamma) := \: \frac{1}{\Gamma} \E \left(
403: \sum_{i=1}^\infty \gamma^i r_i \right)
404: \end{equation}
405: where $r_i$ is the reward in cycle $i$ of a given history, the
406: normalising constant is $\Gamma := \sum_{i=1}^\infty \gamma^i$, and
407: the expected value is taken over all histories of $\pi$ and $\mu$
408: interacting. By increasing $\gamma$ towards 1 we weight long term
409: rewards more heavily, conversely by reducing it we balance the
410: weighting towards short term rewards.
411:
412: Of course this has not actually answered the question of how to weight
413: near term rewards versus longer term rewards. Rather it has simply
414: expressed this weighting as a parameter. While that is adequate for
415: some purposes, what we would like is a single test of intelligence for
416: machines, not a range of tests that vary according to some free
417: parameter. That is, we would like the temporal preferences to be
418: included in the model, not external to it.
419:
420: One possibility might be to use harmonic discounting, $\gamma_t :=
421: \frac{1}{t^2}$. This has some nice properties, in particular the
422: agent needs to look forward into the future in a way that is
423: proportional to its current age \cite{Hutter:04uaibook}. However an
424: even more elegant solution is possible.
425:
426: If we look at the value function in Equation~\ref{eqn:disval}, we see
427: that geometric discounting plays two roles. Firstly, it normalises
428: the total reward received which makes the sum finite, in this case
429: with a maximum value of 1. Secondly, it weights the reward at
430: different points in the future which in effect defines a temporal
431: preference. We can solve both of these problems, without needing an
432: external parameter, by simply requiring that the total reward returned
433: by the environment cannot exceed 1. For a reward summable environment
434: $\mu$ we can now define the value function to be simply,
435: \begin{equation}\label{eqn:unival}
436: V^{\pi}_{\mu} := \: \E \left( \sum_{i=1}^\infty r_i \right)
437: \leq 1.
438: \end{equation}
439:
440: One way of viewing this is that the rewards returned by the
441: environment now have the temporal preference factored in and thus we
442: do not need to add this. The cost is that this is an additional
443: condition that we place on the environments. Previously we required
444: that each reward signal was in a finite subset of $[0,1] \cap \QQQ$,
445: now we have the additional constraint that the sum is bounded.
446:
447: It may seem that there is a philosophical problem here. If an
448: environment $\mu$ is an artificial game, like chess, then it seems
449: fairly natural for $\mu$ to meet any requirements in its definition,
450: such as having a bounded reward sum. However if we think of the
451: environment $\mu$ as being ``the universe'' in which the agent lives,
452: then it seems unreasonable to expect that it should be required to
453: respect such a bound. The flaw in this argument is that a
454: ``universe'' does not have any notion of reward for particular agents.
455:
456: Strictly speaking, reward is an interpretation of the state of the
457: environment. In humans this is built in, for example, the pain that
458: is experienced when you touch something hot. In which case, maybe it
459: should really be a part of the agent rather than the environment? If
460: we gave the agent complete control over rewards then our framework
461: would become meaningless: The perfect agent could simply give itself
462: constant maximum reward. Indeed humans cannot easily do this either,
463: at least not without taking drugs designed to interfere with their
464: pleasure-pain mechanism.
465:
466: Thus the most accurate framework would consist of an agent, an
467: environment and a separate goal system that interpreted the state of
468: the environment and rewarded the agent appropriately. In such a set
469: up the bounded rewards restriction would be a part of the goal system
470: and thus the above philosophical problem does not occur. However for
471: our current purposes it is seem sufficient just to fold this goal
472: mechanism into the environment and add an easily implemented
473: constraint to how the environment may generate rewards.
474:
475:
476: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
477: \section{A formal measure of intelligence}\label{sec:ior}
478: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
479:
480: We have now formally defined the space of agents, how they interact
481: with each other, and how we measure the performance of an agent in any
482: specific environment. Before we can put all this together into a
483: single performance measure, we firstly need to define what me mean by
484: ``a wide range of environments.''
485:
486: As our goal is to produce a measure of intelligence that is as broad
487: and encompassing as possible, the space of environments used in our
488: definition should be as large as possible. Given that our environment
489: is a probability measure with a certain structure, an obvious
490: possibility would be to consider the space of all probability measures
491: of this form. Unfortunately, this extremely broad class of
492: environments causes problems. As the space of all probability
493: measures is uncountably infinite, we cannot list the members of this
494: set, nor can we always describe environments in a finite way.
495:
496: The solution is to require the environmental measures to be
497: computable. Not only is this necessary if we are to have an effective
498: measure of intelligence, it is also not all that restrictive. There
499: are an infinite number of environments in this set, with no upper
500: bound on their complexity. Furthermore, it is only the measure which
501: describes the environment that must be computable. For example,
502: although a typical sequence of 1's and 0's generated by flipping a
503: coin is not computable, the probability measure which describes this
504: process is computable. Thus, even environments which behave randomly
505: are included in our space of environments. This appears to be the
506: largest reasonable space of environments. Indeed, no physical system
507: has ever been shown to lie outside of this set. If such a physical
508: system was found, it would overturn the Church-Turing thesis and alter
509: our view of the universe.
510:
511: How can we combine the agent's performance over all these
512: environments? As there are an infinite number of environments, we
513: cannot simply take a uniform distribution over them. Mathematically,
514: we must weight some environments more highly than others. If we
515: consider the agent's perspective on the problem, this question is the
516: same as asking: Given several different hypotheses which are
517: consistent with the data, which hypothesis should be considered the
518: most likely? This is a frequently occurring problem in inductive
519: inference where we must employ a philosophical principle to decide
520: which hypothesis is the most likely. The most successful approach is
521: to invoke the principle of Occam's razor: Given multiple hypotheses
522: which are consistent with the data, the simplest should be preferred.
523: This is generally considered the rational and intelligent thing to do.
524:
525: Consider for example the following type of question which commonly
526: appears in intelligence tests. There is a sequence such as 2, 4, 6,
527: 8, and the test subject needs to predict the next number. Of course
528: the pattern is immediately clear: The numbers are increasing by 2 each
529: time. An intelligent person would easily identify this pattern and
530: predict the next digit to be 10. However, the polynomial $2k^4 -20k^3
531: +70k^2 -98k +48$ is also consistent with the data, in which case the
532: next number in the sequence would be 58. Why then do we consider the
533: first answer to be more likely? It is because we use, perhaps
534: unconsciously, the principle of Occam's razor. Furthermore, the fact
535: that the test defines this as the correct answer shows that it too
536: embodies the concept of Occam's razor. Thus, although we don't
537: usually mention Occam's razor when defining intelligence, the ability
538: to effectively use Occam's razor is clearly a part of intelligent
539: behaviour.
540:
541: Our formal measure of intelligence needs to reflect this.
542: Specifically, we need to test the agents in such a way that they are,
543: at least on average, rewarded for correctly applying Occam's razor.
544: Formally, this means that our a priori distribution over environments
545: should be weighted towards simpler environments. The problem now
546: becomes: How should we measure the complexity of environments?
547:
548: As each environment is computable, it can be represented by a program,
549: or more formally, a binary string $p \in \BBB^*$ on some prefix
550: universal Turing machine $\UU$. Thus we can use Kolmogorov complexity
551: to measure the complexity of an environment $\mu \in E$,
552: \[
553: K( \mu ) := \min_{p \in \BBB^*} \big\{ |p| : \UU(p)
554: \mathrm{\ computes\ } \mu \big\}.
555: \]
556: This measure is independent of the choice of $\UU$ up to an additive
557: constant that is independent of $\mu$, thus, we simply pick one
558: universal Turing machine $\UU$ and fix it. The correct way to turn
559: this into a prior distribution is by taking $2^{-K(\mu)}$. This is
560: known as the algorithmic probability distribution and it has a number
561: of important properties, particularly in the context of universally
562: optimal learning agents. See \cite{Li:97} or \cite{Hutter:04uaibook}
563: for an overview of Kolmogorov complex and universal prior
564: distributions.
565:
566: Putting this all together, we can now define our formal measure of
567: intelligence for arbitrary systems. Let $E$ be the space of all
568: programs that compute environmental measures of summable reward with
569: respect to a prefix universal Turing machine $\UU$, let $K$ be the
570: Kolmogorov complexity function. The intelligence of an agent $\pi$ is
571: defined as,
572: \[
573: \Upsilon(\pi) := \sum_{\mu \in E} 2^{-K(\mu)} V^{\pi}_{\mu} = V^{\pi}_{\xi},
574: \]
575: where $\xi := \sum_{\mu \in E} 2^{-K(\mu)} \mu$ due to the linearity
576: of $V$. $\xi$~is the Solomonoff-Levin universal a priori distribution
577: generalised to reactive environments.
578:
579:
580: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
581: \section{Properties of the intelligence measure}
582: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
583:
584: To better understand the performance of this measure consider some
585: example agents.
586:
587: \emph{A random agent.} The agent with the lowest intelligence, at
588: least among those that are not actively trying to perform badly, would
589: be one that makes uniformly random actions. We will call this
590: $\pi^\mathtt{rand}$. In general such an agent will not be very
591: successful as it will fail to exploit any regularities in the
592: environment, no matter how simple they are. It follows then that the
593: values of $V^{\pi^\mathtt{rand}}_\mu$ will typically be low compared
594: to other agents, and thus $\Upsilon (\pi^\mathtt{rand})$ will be low.
595:
596: \emph{A very specialised agent.} From the equation for $\Upsilon$, we
597: see that an agent could have very low intelligence but still perform
598: extremely well at a few very specific and complex tasks. Consider,
599: for example, IBM's Deep Blue chess supercomputer, which we will
600: represent by $\pi^\mathtt{dblue}$. When $\mu^\mathtt{chess}$
601: describes the game of chess,
602: $V^{\pi^\mathtt{dblue}}_{\mu^\mathtt{chess}}$ is very high. However
603: $2^{-K(\mu^\mathtt{chess})}$ is small, and for $\mu \neq
604: \mu^\mathtt{chess}$ the value function will be low relative to other
605: agents as $\pi^\mathtt{dblue}$ only plays chess. Therefore, the value
606: of $\Upsilon (\pi^\mathtt{dblue})$ will be very low. Intuitively,
607: this is because Deep Blue is too inflexible and narrow to have general
608: intelligence.
609:
610: \emph{A general but simple agent.} Imagine an agent that does very
611: basic learning by building up a table of observation and action pairs
612: and keeping statistics on the rewards that follow. Each time an
613: observation that has been seen before occurs, the agent takes the
614: action with highest estimated expected reward in the next cycle with
615: 90\% probability, or a random action with 10\% probability. We will
616: call this agent $\pi^\mathtt{basic}$. It is immediately clear that
617: many environments, both complex and very simple, will have at least
618: some structure that such an agent would take advantage of. Thus for
619: almost all $\mu$ we will have $V^{\pi^\mathtt{basic}}_\mu >
620: V^{\pi^\mathtt{rand}}_\mu$ and so $\Upsilon (\pi^\mathtt{basic}) >
621: \Upsilon (\pi^\mathtt{rand})$. Intuitively, this is what we would
622: expect as $\pi^\mathtt{basic}$, while very simplistic, is surely more
623: intelligent than $\pi^\mathtt{rand}$.
624:
625: \emph{A simple agent with more history.} A natural extension of
626: $\pi^\mathtt{basic}$ is to use a longer history of actions,
627: observations and rewards in its internal table. Let
628: $\pi^\mathtt{2back}$ be the agent that builds a table of statistics
629: for the expected reward conditioned on the last two actions, rewards
630: and observations. It is immediately clear $\pi^\mathtt{2back}$ is a
631: generalisation of $\pi^\mathtt{basic}$ by definition and thus will
632: adapt to any regularity that $\pi^\mathtt{basic}$ can adapt to. It
633: follows then that in general $V^{\pi^\mathtt{2back}}_\mu >
634: V^{\pi^\mathtt{basic}}_\mu$ and so $\Upsilon (\pi^\mathtt{2back}) >
635: \Upsilon (\pi^\mathtt{basic})$, as we would intuitively expect.
636:
637: In a similar way agents of increasing complexity and adaptability can
638: be defined which will have still greater intelligence. However with
639: more complex agents it is usually difficult to theoretically establish
640: whether one agent has more or less intelligence than another.
641: Nevertheless, it is hopefully clear from these simple examples that
642: the more flexible and powerful an agent is, the higher its machine
643: intelligence.
644:
645: \emph{A human.} For extremely simple environments, a human should be
646: able to identify their simple structure and exploit this to maximise
647: reward. For more complex environments however it is hard to know how
648: well a human would perform without experimental results.
649:
650: \emph{Super-human intelligence.} It can be easily proven that the
651: theoretical AIXI agent \cite{Hutter:04uaibook} is the maximally
652: intelligent agent with respect to $\Upsilon$. AIXI has been proven to
653: have many universal optimality properties, including being Pareto
654: optimal and self-optimising in any environment in which this is
655: possible for a general agent. Thus it is clear that agents with very
656: high $\Upsilon$ must be extremely powerful.
657:
658: In addition to sensibly ordering many simple learning agents, this
659: formal definition has many significant and desirable properties:
660:
661: \emph{Valid}. The most important property of a measure of
662: intelligence is that it does indeed measure ``intelligence''. As
663: $\Upsilon$ formalises a mainstream informal definition, we believe
664: that it is valid measure.
665:
666: \emph{Meaningful}. An agent with a high $\Upsilon$ value must perform
667: well over a very wide range of environments, in particular it must
668: perform well in almost all simple environments. If such a agent
669: existed, it would clearly be very powerful and practically useful. It
670: also sensibly orders the intelligence of simple learning agents.
671:
672: \emph{Repeatable}. We can test an agent using the $\Upsilon$
673: repeatedly without problem. This is because it is defined across all
674: well defined environments, not just a specific test subset which an
675: agent might adapt to.
676:
677: \emph{Absolute}. $\Upsilon$ gives us a single real absolute value,
678: unlike the pass-fail Turing test \cite{Turing:50}. This is important
679: if we want to make distinctions between similar learning algorithms
680: that are not close to human level intelligence.
681:
682: \emph{Wide range}. As we have seen, $\Upsilon$ can measure performance
683: from extremely simple agents right up to the super powerful AIXI
684: agent. Other tests cannot hand such an enormus range.
685:
686:
687: \emph{General}. The test is clearly non-specific to the
688: implementation of the agent as the inner workings of the agent is left
689: completely undefined. It is also very general in terms of what senses
690: or actuators the agent might have as all information exchanged between
691: the agent and the environment takes place over basic Shannon like
692: communication channels.
693:
694: \emph{Dynamic}. One aspect of our test of intelligence is that it is,
695: in the terminology of intelligence testing, a highly dynamic test
696: \cite{Sternberg:02}. Normally intelligence tests for humans only test
697: the ability to solve one-off problems. There are no dynamic aspects
698: to the test where the test subject has to interact with something and
699: learn and adapt their behaviour accordingly. This makes it very hard
700: to test things like the individual's ability to quickly pick up new
701: skills and adapt to new situations. One way to overcome these
702: problems is to use more sophisticated dynamic tests. In these tests
703: there is an active tester who constantly interacts with the test
704: subject, much like what happens in our formal intelligence measure.
705:
706: \emph{Unbiased}. The test is not weighted towards ability in certain
707: specific kinds of areas or problems, rather it is simply weighted
708: towards simpler environments no matter what they are.
709:
710: \emph{Fundamental}. The test is based on the theory of information,
711: Turing computation and complexity theory. These are all fundamental
712: ideas which are likely to remain very stable over time irrespective of
713: changes in technology.
714:
715: \emph{Formal}. Unlike many tests of intelligence, $\Upsilon$ is
716: completely formally, mathematically, specified.
717:
718: \emph{Objective}. Unlike the Turing test which requires a panel of
719: judges to decide if an agent is intelligent or not, $\Upsilon$ is fee
720: of such subjectivity.
721:
722:
723: Our definition of intelligence also has some weaknesses. One is the
724: fact that the environmental distribution $2^{-K(\mu)}$ that we have
725: used is invariant, up to a multiplicative constant, to changes in the
726: reference machine $\UU$. While this affords us some protection, it
727: still means that the relative intelligence of agents can change if we
728: change our reference machine. One approach to this problem might be
729: to limit the complexity of the reference machine, for example by
730: limiting its state-symbol complexity. We expect that for highly
731: intelligent machines that can deal with a wide range of environments
732: of varying complexity, the effect of changing from one simple
733: reference machine to another will be minor. For agents which are less
734: complex than the reference machine however, such a change could be
735: significant.
736:
737: A theoretical problem is that our distribution over environments is
738: not computable. While this is fine for a theoretical definition of
739: intelligence, it makes the measure impossible to directly implement.
740: The solution is to use a more tractable measure of complexity such as
741: Levin's $Kt$ complexity \cite{Levin:73search}, or Schmidhuber's Speed
742: prior \cite{Schmidhuber:02speed}. Both of these consider the
743: complexity of an algorithm to be determined by both its description
744: length and running time. Intuitively it also makes good sense,
745: because we would not usually consider a very short algorithm that
746: takes an enormous amount of time to compute, to be a particularly
747: simple one.
748:
749:
750: The only closely related work to ours is the C-Test
751: \cite{Hernandez:00btt}. While our intelligence measure is fully
752: dynamic and interactive, the C-Test is a purely static sequence
753: prediction test similar to standard IQ tests for humans. The C-Test
754: always ensures that each question has an unambiguous answer in the
755: sense that there is always one consistent hypothesis with
756: significantly lower complexity than the alternatives. Perhaps this is
757: useful for some kinds of tests, but we believe that it is unrealistic
758: and limiting. Like our intelligence test, the C-Test also has to deal
759: with the problem of the incomputability of Kolmogorov complexity. By
760: using Levin's $Kt$ complexity, the C-Test was able to compute a number
761: of test problems which were used to test humans. The ``compression
762: test''\cite{Mahoney:99} for machine intelligence is similarly
763: restricted to sequence prediction. We consider the linguistic
764: complexity tests of Treister-Goren et.\ al.\ to be far too narrow.
765: The psychometric approach of Bringsjord and Schimanski is only
766: appropriate if the machine has a sufficiently human-like intelligence.
767:
768:
769: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
770: \section{Conclusions}\label{sec:conc}
771: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
772:
773: Given the obvious significance of formal definitions of intelligence
774: for research, and calls for more direct measures of machine
775: intelligence to replace the problematic Turing test and other
776: imitation based tests \cite{Johnson:92}, very little work has been
777: done in this area. In this paper we have attempted to tackle this
778: problem head on. Although the test has a few weaknesses, it also has
779: many unique strengths. In particular, we believe that it expresses
780: the essentials of machine intelligence in an elegant and powerful way.
781: Furthermore, more tractable measures of complexity should lead to
782: practical tests based on this theoretical model.
783:
784:
785: \subsection*{Acknowledgments}
786:
787: This was supported by SNF grant 200020-107616.
788:
789: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
790: % Bibliography %
791: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
792:
793: \begin{small}
794: \begin{thebibliography}{}\parskip=-0.2ex
795:
796: \bibitem[Bin37]{Bingham:37}
797: W.~V. Bingham.
798: \newblock {\em Aptitudes and aptitude testing}.
799: \newblock Harper \& Brothers, New York, 1937.
800:
801: \bibitem[Got97]{Gottfredson:97msoi}
802: L.~S. Gottfredson.
803: \newblock Mainstream science on intelligence: An editorial with 52 signatories,
804: history, and bibliography.
805: \newblock {\em Intelligence}, 24(1):13--23, 1997.
806:
807: \bibitem[HO00]{Hernandez:00btt}
808: J.~Hern{\'a}ndez-Orallo.
809: \newblock Beyond the {T}uring test.
810: \newblock {\em Journal of Logic, Language and Information}, 9(4):447--466,
811: 2000.
812:
813: \bibitem[Hut04]{Hutter:04uaibook}
814: M.~Hutter.
815: \newblock {\em Universal Artificial Intelligence: Sequential Decisions based on
816: Algorithmic Probability}.
817: \newblock Springer, Berlin, 2004.
818: \newblock 300 pages, http://www.idsia.ch/$_{^{\sim}}$marcus/ai/uaibook.htm.
819:
820: \bibitem[Joh92]{Johnson:92}
821: W.~L. Johnson.
822: \newblock Needed: {A} new test of intelligence.
823: \newblock {\em SIGARTN: SIGART Newsletter (ACM Special Interest Group on
824: Artificial Intelligence)}, 3, 1992.
825:
826: \bibitem[Lev73]{Levin:73search}
827: L.~A. Levin.
828: \newblock Universal sequential search problems.
829: \newblock {\em Problems of Information Transmission}, 9:265--266, 1973.
830:
831: \bibitem[LH05]{Legg:05iors}
832: S.~Legg and M.~Hutter.
833: \newblock A universal measure of intelligence for artificial agents.
834: \newblock In {\em Proc. 21st International Joint Conf. on Artificial
835: Intelligence ({IJCAI-2005})}, number IDSIA-04-05, pages 1509--1510,
836: Edinburgh, 2005.
837:
838: \bibitem[LV97]{Li:97}
839: M.~Li and P.~M.~B. Vit\'anyi.
840: \newblock {\em An introduction to {Kolmogorov} complexity and its
841: applications}.
842: \newblock Springer, 2nd edition, 1997.
843:
844: \bibitem[Mah99]{Mahoney:99}
845: M.~V. Mahoney.
846: \newblock Text compression as a test for artificial intelligence.
847: \newblock In {\em {AAAI}/{IAAI}}, 1999.
848:
849: \bibitem[SB98]{Sutton:98}
850: R.~Sutton and A.~Barto.
851: \newblock {\em Reinforcement learning: An introduction}.
852: \newblock Cambridge, MA, MIT Press, 1998.
853:
854: \bibitem[Sch02]{Schmidhuber:02speed}
855: J.~Schmidhuber.
856: \newblock The {Speed Prior:} a new simplicity measure yielding near-optimal
857: computable predictions.
858: \newblock In {\em Proc. 15th Annual Conference on Computational Learning Theory
859: (COLT 2002)}, Lecture Notes in Artificial Intelligence, pages 216--228,
860: Sydney, Australia, July 2002. Springer.
861:
862: \bibitem[SG02]{Sternberg:02}
863: R.~J. Sternberg and E.~L. Grigorenko, editors.
864: \newblock {\em Dynamic Testing: {T}he nature and measurement of learning
865: potential}.
866: \newblock Cambridge University Press, 2002.
867:
868: \bibitem[Ste00]{Sternberg:00}
869: R.~J. Sternberg, editor.
870: \newblock {\em Handbook of Intelligence}.
871: \newblock Cambridge University Press, 2000.
872:
873: \bibitem[Tur50]{Turing:50}
874: A.~M. Turing.
875: \newblock Computing machinery and intelligence.
876: \newblock {\em Mind}, October 1950.
877:
878: \bibitem[Wec58]{Wechsler:58}
879: D.~Wechsler.
880: \newblock {\em The measurement and appraisal of adult intelligence}.
881: \newblock Williams \& Wilkinds, Baltimore, 4 edition, 1958.
882:
883: \end{thebibliography}
884: \end{small}
885:
886: \end{document}
887:
888: %--------------------End-of-IOR.tex---------------------------%
889: