User:Samfreed/Dreyfus

Book cover of the 1979 paperback edition

What Computers Can't Do: The Limits of Artificial Intelligence (ISBN 0-06-090613-8) is a controversial work on artificial intelligence, authored by Hubert Dreyfus, a professor of philosophy at the University of California, Berkeley. The book deals with the idea that thought, intelligence or reason can be reduced to computation. After a summary of the idea's history Dreyfus proceeds to attack this project, and show why it is impossible, regardless of the claims of the Artificial Intelligence (AI) research community. The attack consists of two strands: philosophical argumentation to show the qualitative difference between Human and Machine intelligence, and an exposure of the consistently unwarranted nature of the AI community's optimism. At least partly because of this ridiculing strand, no book has ever produced as much controversy and emotion in the AI community.

The book initially appeared under this title in 1972, and a second edition with a new introduction was published under the same name in 1979 (ISBN 0-06-090624-3). A third edition was published under the name What Computers Still Can't Do (ISBN 0-262-54067-3) in 1992.

The text below is a summary of the book, based on the 1979 edition.

Introduction

I

The quest for automated reason, culminating in Artificial Intelligence (AI), is as old as western culture itself: Socrates (as relayed by Plato) says: "I want to know of what is the characteristic of piety that makes all action pious... That I may have it to turn to and to use it as a standard whereby to judge your actions and those of other men". Aristotle (2 generations later) recognises that there is a difficulty in that matters such as piety and blame are matters of perception, and cannot be mechanically decided. The Platonic project requires that all appeal to intuition and judgement be eliminated. This "cybernetic" tradition, as founded by Socrates & Plato, is followed also by Hobbes: "...for reason is nothing but reckoning".

Leibnitz worked on assigning a specific number to each and every object in the world, as a prelude to an algebraic solution to all possible problems. In the early 19th century, George Boole set out to "investigate the fundamental laws of those operations of the mind by which reasoning is performed, to give expression to them in the symbolic language of a calculus", thereby inventing Boolean Algebra. In 1835 Charles Babbage designed the first mechanical digital computer along Boole's lines, that was never built.

In 1944 Howard Aiken built the first practical electric digital computer, using 3,000 telephone relays. Thus Alan Turing's "universal machine", was born. Any process that could be formalized could now be programmed. At last, the tradition founded by Plato found the computing machine of its dreams: implementing pure logic, with no recourse to intuition or judgement.

In 1950 Alan Turning proposed his now-famous "Turing Test", to ascertain when intelligence is achieved. The time was now ripe to produce the appropriate symbolism and instructions to achieve Leibniz's project. At the end of his article, Turing wonders what would be the next step: Programming something abstract such as chess, or teaching a computer like a child, starting with English, pointing out objects and naming them. As if on cue, both projects were begun.

Claude Shannon, the inventor of Information Theory mused in 1950 about the difficulty of chess, saying that thinking ahead 40 moved in few directions of play is just as bad as thinking only two steps ahead in all directions. He also noted the difficulty in distinguishing which lines of play may be important. In 1955 Allen Newell wrote a survey of the problem, lamenting that the proposed solutions may not work. The next year, a group in Los Alamos produced a program that played legal chess on a reduced board. In 1957 Alex Bernstein programmed an IBM 704 which played two "passable amateur games".

Meanwhile, linguistic programming made little or no progress, despite some efforts. At this point, as a result of the analysis of how Humans deal with problems, the idea of heuristics was brought up: that people use rough-and-ready rules, which are not generally correct (as in logic or algorithms), but work most of the time. The proponents of this approach were Newell, Shaw and Simon who developed the idea of computerized Theorem-proving using Heuristics. Later (1957) they came up with the General Problem Solver, seeking to apply Means-ends analysis Heuristics to produce "intelligent, adaptive, and creative behaviour". Seemingly, at last a system has been found to turn reasoning into reckoning - the enthusiasm was boundless: "Intuition, insight, and learning are no longer exclusive possessions of Humans: any large high-speed computer can be programmed to exhibit them also".

II

Extrapolation, though a useful tool in many fields, has been abused badly in popular perception of the ability of computers. Note how outlandish the following claim from a US newspaper, in 1968, still is today:

Cosmos, the West German publishing house... has come up with a new idea in gifts... It's a genuine (if small) computer, and it costs about $20. Battery operated, it looks like a portable typewriter. But it can be programmed like any big computer to translate foreign languages, diagnose illnesses, even provide a weather forecast.

This wild run of imaginations continues with titles like "Meet 'Shakey', the first electronic person" (Life Magazine, 1970), where leading computer scientists are quoted saying that within 3-15 years we will have "a machine with the general intelligence of an average Human Being,... and in a few [more] months ... it will be at genius level". A good understanding of the hopes and expectation of the late 1960s can be seen in the film 2001: A Space Odyssey, which was not a flight of fancy, but a well-documented attempt at predicting technology. Marvin Minsky, a leading light of the AI community, was the technical consultant for the film.

Though much prediction of machine-thought and understanding has been going on, and such prediction has been adopted my many as "established fact" there is little evidence of anything of the sort actually being accomplished: for example, Herbert Simon, following his General Problem Solver (1957), predicts that within 10 years a computer will:

Be world champion in Chess and
Discover and prove an important new mathematical theorem. In the same timescale
most theories in psychology will take the form of computer programs.

The ten years are over, and at the time of the first edition of Dreyfus's "What Computers Can't Do" 14 years had passed.

Already 5 years after these predictions, William Ross Ashby declared that prediction No. 2 has come to pass. That the "major new theorem" that was supposedly discovered by computer ended up being merely an unusual (but known) way to prove a basic theorem in geometry went unnoticed.

Prediction No. 1, about chess, can serve as a model for the production of intellectual smog in this area. In 1958 Newel Shaw and Simon presented a "not yet fully debugged" chess-playing program, that was "good in the ... opening". No further detailed accounts of this program were ever published. However, later the same year the authors announced "we have written a program that plays chess". Public gullibility and the enthusiasm of the researchers resulted in Norbert Wiener declaring (1959) that chess-playing machines as of now will counter the moves of a master game with the moves recognised as right in the textbooks, up to some point in the middle-game". In fact, the program was so bad as to be beaten by a 10-year old in 1960. Fact, however, has ceased to be relevant.

Ignoring these embarrassments, Simon published three years later a report of another program that plays "highly creative" endgame, involving "combinations as difficult as any that have been recorded in chess history", with barely a mention of severe limitations. By glossing over limitations, the claim that chess can be played end-to-end seems to have been made. At this point it seemed that the world championship will fall at a computer at any moment. On the cover of the Oct 1968 edition of Science, Donald Michie wrote "today machines can play chess at championship level", while chess Masters were far less impressed.

Part I: Ten Years of research in Artificial Intelligence

Phase I (1957-1962) Cognitive Simulation

I. Analysis of Work in Language Translation, Problem Solving, and Pattern Recognition

Language translation

The case of Computerized translation can serve as a paradigm for other cases in AI - the initial results were very encouraging, but the later difficulties proved overwhelming. As Yehoshua Bar-Hillel summed it:

During the first year of research in machine translation, a considerable amout of progress was made... It created among many of the workers actively engaged in this field the strong feeling that a working system was just around the corner. Though it is understandable that such an illusion should have been formed at the time, it was an illusion. It was created ... by the fact that a large number of problems was rather readily solved... It was not sufficiently realized that the gap between such output... and high quality translation proper was still enormous, and that the problems solved until then were indeed many but just the simplest ones whereas the "few" remaining problems were the harder ones - very hard indeed.

One should note that all the science-fiction fantasies of "2001" etc. are dependant an on computer comprehension of "natural language", and therefore nowhere near solved.

Problem Solving

Here, in the field of problem solving using heuristics derived from interviews with human subjects, we again find initial success followed by disappointment: in 1957 the "Logic Theorist" proved 38 out of 52 theorems from the Principia Mathematica, and two years later the "cannibal and missionary" problem was solved by the GPS. Analysing traces of these runs, researchers found similarities with Human protocols, and announced that at least this kind of thought is no longer mysterious. Soon, however, Simon made more sweeping claims, implying that Heuristics can explain intuition and judgement.

However, the difficulties reasserted themselves on the organisational level: since computer time and memory are expensive, and problems complex, it became necessary to select avenues of search, and not brute-force all possibilities. However, at that level, the categories involved are far more complex, and therefore the heuristic programs could no longer deal effectively with the planning level. The GPS was quietly abandoned in 1967, ten years after the initial predictions.

Pattern Recognition

Pattern recognition is fundamental to all other human perception, and therefore any Human activity. Again, there has been early success, e.g. in reading Morse code, or a limited set of handwritten words, and some printed fonts. All these algorithms work by looking for a pre-determined set of features in the input. But all these have been ad-hoc solutions to specific well-defined problems.The problem remains that extracting the salient features is still done by a Human.

Even if the features could be extracted from samples by computer, there is still an assumption here that the problem is one of features, but there are no invariant shared features in handwriting, and speech.

Conclusion

As we saw, the pattern of success, enthusiasm, failure, and (sometimes) pessimism is well established. However, optimism is boundless, as Feigenbaum and Feldman say: "the forecast for for progress in research in human cognitive processes is most encouraging". But Dreyfus asks, what about the prospects? Feigenbaum and Feldman claim that progress is being made, but they define progress as any progress, even the smallest. By this definition any person climbing a tree can be claiming to be making progress in his quest for the moon.

Rather than climbing blindly, it is time to examine the underlying issues:

II. The Underlying significance of Failure to Achieve Predicted Results

Diminishing results indicate the presence of an unexpected barrier - whether that barrier is one that can be overcome by more energy/resources or is a discontinuity like the end of the tree in the case of the tree-climbing moon explorer remains to be seen. Let us look next at some underlying differences between the ways humans and machines think.

Fringe consciousness vs. Heuristically guided search

There are games which are easily enumerated, and therefore solved for a computer, like tic-tac-toe. Other games, like Checkers, are easy enough to evaluate in mid-game, leading to reasonable computer-playing programs. Chess's midgame situations, on the other hand, are difficult to evaluate, and the number of possibilities is significant for every turn, leading to explosive exponential growth of the search-tree. Pruning this tree is problematic, because every avenue that is left unexplored may contain a critical game-deciding move.

The pruning problem has led to various suggestions, such as Newell's "random element", suggesting that once in a while unlikely scenarios be explored, such as sacrificing a queen. But that is unsatisfactory, because the queen-sacrifice should be considered in all pertinent situations, and a heuristic for what makes a situation pertinent is yet to be discovered.

Note the following protocol quoted from Simon:

Again I notice that once of the pieces is not defended, the Rook, and there must be ways of taking advantage of this. Suppose now, if I push the pawn up at Bishop four, if the Bishop retreats I have a queen check and I can pick up the Rook. If, etc., etc.

Look at two distinct types of thinking going on here: first he "notices", and only after that noticing does the player start counting out the possibilities, as a computer would. This explains how a master-player can consider only 100-200 situations and play a good game, while a computer works through 26,000 different board layouts and still plays a poor game. The point here is not the quality of the move, but the difference between 26,000 and 200. This shows that the Human is definitely doing something other than "counting out" or examining alternatives.

Humans can not usually give detailed account on how they select the area of interest, say on a chess board. However, such a selection is determined by the overall situation of the board. this ability to notice something without awareness of the process has been called by William James "the fringes of consciousness". This encompasses the unconscious attention that is "scattered" around the object of our direct attention. It is this sort of attention to outlying area that allows the overall context on the board to affect our choice of an "interesting" area on the board, and allows us to "zero in" on that area.

Simon and Newell give passing acknowledgement to this situation, in noting that the protocols of players contain such terms as "developed position", "control of the center", "weak king side" and "a won position", and admitting that they have found no specific features to allow heuristics to identify these.

A related matter is that experienced chess-players can remember board situations much better than weaker players: "because of the large number of prior associations which an experienced player has acquired, he does not visualise a chess position as a conglomeration of scattered squares and wooden pieces, but as an organized pattern (like the "Gestalt". or integrated configuration, emphasised by the Gestalt psychologists)."

So, computer-programs do not use past experience at all, while humans do. Moreover, computer programs have no notion of strategy, but treat every turn as one would treat a chess problem from a book. The question is how does all this background information generate the figure/background distinction in human perception. None of this exists in computerised counting-out.

The possible argument that all this is based of unconscious counting-out, and therefore is spurious, fails because if such rapid counting-out of 26,000 positions were possible why does this brilliant mind eventually stop and count out explicitly and oh-so-slowly the few positions that it does choose to count out? There is no evidence other than a-priori prejudice of computer scientists to support the "unconscious counting out" theory.

This distinction clarifies the early success and later failure of work in cognitive simulation. In all game-playing programs, early success is attained by working on those games or parts of games in which heuristically guided counting out is feasible; failure occurs at the point where complexity is such that global awareness would be necessary to avoid an overwhelming exponential growth of possibilities to be counted.

Ambiguity tolerance vs. context-free precision

It is the ambiguity of language that has caused the initial progress of automated dictionaries and naive syntax processors to grind to a halt. Neither the order of the words not the surrounding textual context determined unequivocally what was the intent of a sentence. The leading researchers in this field admit that they have encountered "very mysterious semantic processes that enable most reasonable people to interpret most reasonable sentences unequivocally most of the time".

Sentences are heard by humans always within a context, and within that context most of the "possible" interpretations of the sentence do not pertain. However, computers have no notion of context. This argument bears some relation to Wittgenstein's language-game argument. This leads to the second quality of human thinking inherently absent in computers - the ability to tolerate ambiguity, and to continue functioning in a situation or interacting while bearing in mind more than one interpretation, because one knows that either "it doesn't matter" or "it will be revealed in the future.

In a sense, what we called fringe consciousness above is the ability to glean much inexplicit information from the fringes or context of the situation. Ambiguity tolerance is he ability to summarily ignore the multitude of possible but implausible information.

Since correct interpretation of language depends of understanding a situation, many experts sought to resolve this problem by an appeal to learning. In the field of computer learning of languages, all that has been achieved so far is a Pavlovian learning of meaningless syllable associations (by a program called EPAM, by Feigenbaum). But this is not learning a language at all, as all language depends on context, and by the entire tradition (starting from Ebbinghaus) of meaningless syllables is to exclude context. It is no surprise that machines and Humans "learn" these association in a similar way - they is not a real cognitive task at all!

Note (Wittgenstein's example) That when an adult points at a table and says "brown", the child has no way of knowing whether "brown" means "table", the colour, or this specific object. it is only in the context of shared motivations between parent and child that a child can get to grips with these inherent ambiguities. As Wittgenstein himself put it:

Can someone be a man's teacher in this? Certainly. From time to time he gives him the right tip.... This is what learning and teaching are like here... What one acquires here is not a technique; one learns correct judgements. There are also rules, but they do not form a system, and only experienced people can apply them right. Unlike calculation rules.

Essential/inessential discrimination vs. trial-and-error search

As we have seen in the examples above, work in AI moves quickly to solve the simple problems that can be handled by simple counting-out of dictionary lookup, and then the progress gets stuck wherever people use insight, and the computer is at a loss.

An example of how human and the GPS diverge is the application of the rule A · B → A and the rule A · B → B to both sides of the conjunction $(-R\wedge -P)\centerdot (R\wedge Q)$ . Humans report applying the rule on both sides simultaneously, while the GPS views each application as a separate operation. The human sees the symmetry, the machine doesn't. It is in these cases of divergence that AI theorists tend to assume, with no evidence, that what is going on in the human mind is just more of the same counting-out but "unconsciously". As Max Wertheimer points out, trial-and-error excludes the most important element of problems solving, which is having a grasp of the essential structure of a problem (this he calls insight). This shortcoming of AI is addressed by Newell Shaw and Simon as the "``Heuristics of Planning".

This problem of lack of insight is often "solved" by the researchers lending him program some of his own insight in analysing the problem at hand, and making this insight part of the software of the computer. Also in chess, the thing that distinguishes masters from novices is not the ability to count out, but to separate the essential elements of a situation from the inessential.

Minsky was already aware of this problem in 1961, but does not seem able to conceive of a mechanism that is not heuristic searching:

When we call for the use of "reasoning", we intend no suggestion of giving up the game by invoking an intelligent subroutine. The program that administers the search will be just another heuristic program. Almost certainly it will be composed largely of the same sorts of objects and processes that will comprise the subject-domain programs

However, such an "administrator" program would also need a higher-level program, and so an infinite regress is created. One way to try to escape the difficulties of this model is to fall back on the "learning algorithm", but that too fails as we saw above.

Perspicuous Grouping vs. Character Lists

Computers recognize things according to a list of specific traits, but Humans proceed by ways which not only require all the specifically-human abilities listed above, but also utilize what Wittgenstein would call "family resemblance". It is no wonder, then, that in pattern recognition failure has been the most total, and progress the hardest.

Insight is required to see a figure (say of a letter) for what it is, regardless of variation in orientation, size and skew. Ignoring these minor variations requires a very complex process of "normalization" in computer algorithms which is never present in any human introspective report. Fringe consciousness - There seems to be an assumption almost AI researchers that recognition equals classification, and therefore must employ lists of traits, noise reduction, classification rules etc. However, exponential growth like in chess lies this way, and humans do net need to concentrate at all to read text or recognize animals and people. Fringe consciousness, as we described it above for chess, simply gives us the result - there is no "counting out".

Context-Dependent Ambiguity Reduction There are three ways in which recognition is not classification:

First, The naive view of classification of an object by a definig feature is just that, naive: Not all dogs have tails, nor all pens have points, etc. etc. What would be seen as the defining feature behind the classification of an object in a class can also be context dependent

Second, What is seen by a human, for example in another human face, is dependent on context in the sense that the same expressionless face can be seen as quietly happy, sorrowful, or interested dependent on what was shown before, in this experiment a dead body, a child, and a bowl of soup were shown respectively.

Third, when we recognize someone as a member of a family, it is not because of any specific one, or four, characteristics, see Wittgenstein.

Dreyfus summarises thus: Humans can recognise patterns under the following increasingly difficult conditions:

The patterns may be skewed, incomplete, deformed, and embedded in noise;
The traits required for recognition nay be "so fine and so numerous" that, even if they could be formalized, a search through a branching list of such traits would soon become unmanageable as new patterns for discrimination were added;
The traits may depend upon external and internal context and are thus not amenable to context-free specification;
There may be no common traits but a "complicated network of overlapping similarities," capable of assimilating ever new variations.

Any system which can equal human performance, must therefore be able to:

Distinguish the essential from the inessential;
Use cues which remain on the fringe of consciousness;
Take account of the context
Perceive the individual as typical, i.e. situate the individual with respect of a paradigm case.

Any progress towards truly impressive AI, such as "2001", awaits reliable pattern recognition, which in turn requires all the human-only traits we have seen so far.

Conclusion

We have surveyed the difficulties of the various AI projects, culminating in pattern recognition, where all the difficulties coalesce into one. In any case, Simon's three prediction's time is well up, and unfulfilled. This has led some to tame their enthusiasm, but not Simon, who predicted in 1965 that in twenty years' time any job done by a human will be doable by computer.

Phase II (1962-1967) Semantic Information Processing

When computers became available to the research community, in the early 1950s, research was split in three directions:

Cybernetics, which focused on loosely-specified self-organising mechanisms. Little was accomplished, despite some wildly optimistic predictions.
Cognitive Simulation, as discussed above
What we will call here Artificial Intelligence (not the acronym AI) was an attempt to build intelligent systems without ny pretence of similarity to biological of human forms.

We will now turn to this ad-hoc tradition. We will find that this new group of programs is charetcerized by very clever selection of problems. None of these deveopments solve any of the above-mentioned problems.

I. Analysis of Semantic Information Processing Programs

Analysis of a program that "understands English" - Bobrow's student

Daniel G Bobrow developed a program called STUDENT which "understood English", insofar as the English text was expressing algebraic equations. It worked by locating all "operators" like "plus", "times", "equals" etc. and considering the text between the operators as variable name. Thus days times 24 equals hours works perfectly, but Number of times I went to the movies does not count as a scalar variable, but as a product of "number of" and "I went to the movies", which is clearly not the intention. Bobrow himself was cautious and exact in his reporting of his work and always used double-quotes around "understands" when referring to his work, but Minsky wrote in Scientific American that "STUDENT... understands English", and later in the same text compounds his misunderstanding by claiming such ingenuity for STUDENT that he says: "its learning is too brilliant to be called so".

Evan's Analogy Program

Thomas Evans wrote a program called "ANALOGY", which solved IQ test's graphic-analogy questions, which require the person to select one digram of several D diagrams so that the selected D is to C as B is to A. This program was successful within its limited scope. This program, like others, gave rise to much speculation, in this case going as far as Technological Singularity.

The program worked by applying transformative rules, and searching the resulting tree of possibilities for one of the possible answers, D1, D2, D3. This, obvoiously is not how people approach such a problem.

Quinlan's Semantic memory program

Quinlan, under Simon, correctly noting Bobrow's failure and the general inadequacy of syntax as conveyor of meaning, developed a program to understand words as a network. This heuristic-based program creates elaborate data structures, using processes never reported by any human. Though unclear form the published descriptions, it would seem that the computational complexity involved is at least N², and therefore unlikely to be similar to what Humans do. Quinlan encoded only a few dozen words, a far cry from the 850 word required for a very basic vocabulary.

II. Significance of Current Difficulties

All the efforts we have seen so far are restricted to very specific problems. Even where an attempt is being made at being generic, the techniques are limited, and no attempt is being mounted to integrate what little progress has been achieved in 2 or more efforts. In predicting great things and delivering these miserable results, the excuse is always the same: this is the "first step" or "another step toward". Again, by this definition climbing a tree to get to the moon is progress.

Not only have the AI community not provided a means of storing all the information required for intelligent behaviour, they have not presented any argument why a collection of as many disjoint facts as they may want will in any way be useful. Judging from their behaviour, human beings avoid rather than resolve the problems facing these researchers, by avoiding explicit discrete information representation altogether.

Part II: Assumptions Underlying Persistent Optimism

Introduction

All work in Cognitive Simulation and in Artificial Intelligence is predicated on one basic assumption: the Humans in some fundamental way process information in ways that computers can emulate. This is no small assumption, because all computer-based information is explicit, discreet, linear, rule-based and definitive, while we have no evidence that human thought is so. The assumption that humans function like general-purpose symbol-manipulating machines amounts to:

A Biological Assumption - That at some level people operate in a digital manner
A Psychological Assumption - As Hobbes put it, all thought is reckoning
An Epistemological Assumption - That all knowledge can be formalized
An Ontological Assumption - That our world is comprised of context-free facts

The Biological Assumption

One must bear in mind that every generation can only think in terms of the phenomena and artefacts to which they are exposed, hence we tend to forgive Aristotle for thinking of the brain as a cooling device; we should also forgive the tendency of recent generations to think of the brain in terms of telephone relays, or digital computers.

But even if the brain were some sort of computer, there is no evidence to suggest that it would be wired as a heuristic device, rather it may well be like the early cybernetic neural networks. Moreover, there is no evidence that the brain functions digitally, rather the opposite: information passes along the axons as volleys of signals - with frequencies and phases which are not based on any synchronizing device - so the information may just as well be in the analogue qualities of the signal. Moreover, there is evidence that the breadth of the axons themselves functions as a frequency filter for the signals within, so the same axon may have different information running through different parts.

Thus the view that the brain is some sort of Turingian general-purpose symbol-manipulating machine is an empirical assumption that has had its day. In fact, the difference between the strongly interactive nature of the brain and the compartmentalized discreet nature of a digital computer seems to point in the exact opposite direction.

The Psychological Assumption

We have put paid to the idea that the brain is basically a digital computer. But can the same be said of the mind? Is there an information-processing level of the mind, where comparison, list search, classification etc. happen? There is a current fashion of thinking of the mind as an information-processing device, which is an improvement on behaviourism, but is it true? That we can metaphorically talk about the mind processing information is indeed evident from current parlance, but is that all that is going on, or is it just a capability of the mind, to do, for example, arithmetic? What evidence other than analogy with our latest toys (computers) supports the idea of a program or some sort of flow-chart in the mind?

The term "information processing" is ambiguous. If what is meant is that the mind can notice and be influenced by relevant facts, then that is clear and true. But in this world (of computers) information is viewed as in cybernetics, as bits of 0-or-1 which mechanically and non-semantically are literally processed by wires and switches. As Warren Weaver puts it, in cybernetics information should not be confused with meaning, the semantic part of communications is irrelevant to the engineering aspects. Note that this is the exact opposite of gestalt psychology. In a sense, this conflict is the presice definition of what a programmer's job is: to take meaningful statements about, say, an accounting methodology, and translate them into the meaningless 0s and 1s of a computer, that would make the computer blindly and meaninglessly perform what to humans would be a useful function. It is the ambition of AI for the computer to do this translation itself.

Since the term information processing has taken on a technical sense, we will henceforth call what humans do "information processing" in quotation marks.

Much confusion has arisen from confusing the common algorithmic way to achieve an end, and the way that humans do accomplish the same task. For example, a grainy picture would look grainy to a person, and the simplest way to detect graininess for a computer would be to calculate the first derivative, and sum it's absolute value across a digitized image. The fact that this is how a computer would be programmed to do it does not mean that humans do anything of the sort. This error is so common that reputable scientists such as Jerry Fodor falls into this trap, and continue to assume that "every operation in the nervous system is identical to some sequence of elementary operations". This is as absurd as to assert that the moon is sequentially an explicitly calculating its trajectory around the earth and the sun.

For a psychological explanation to be of interest, the proposed model needs not only to transform roughly the same inputs into roughly the same outputs as a human would, but also the means my which this is done need be similar in some way, e.g. remember that both Humans and machines do count out in chess - the difference is only in selecting what is to be counted out. Recently, Miller et al. stated that a Plan for an organism is essentially the same as a program for a computer, and that they are very impressed with Newell, Shaw and Simon's work showing hierarchical structures as a basis for cognitive simulation, and therefore conclude that humans must be thinking hierarchically. However, as we have seen, this work is far from impressive: we must now look at how the work of Newell, Shaw & Simon is evaluated.

I. Empirical Evidence for the Psychological Assumption: Critique of the Scientific Methodology of Cognitive Simulation

First we turn to any Empirical justification of the Psychological assumption, later to any a-priori justifications.

Newell & Simon compared protocols of how humans solved problems or played games with the traces of how algorithms attempted the same. They conclude that their work "provide[s] a general framework for understanding problem-solving behaviour ... and finally reveals with great clarity that free behaviour of a reasonably intelligent human can be understood as a product of a complex but finite and determinate set of laws".

This conclusion is strange, because Newell and Simon themselves point out several discrepancies in their data. Moreover, a scientific rule must apply across a reasonably large set of cases, while the cases before us are few and carefully selected. If the discrepancies were treated as such, and subjected to further research, one may be tempted to allow their findings as preliminarily true, but neither they nor others seem to take the discrepancies seriously, as if admitting to a discrepancy somehow absolves it, and makes it go away. Besides, such tolerance of "odd cases" can be justified only if the theory in question is a powerful one that works in many other cases, but as we have seen above there is not one example of evidence that humans do think as "a product of a complex but finite and determinate set of laws", unless the human is consciously trying out an algorithm.

In their earlier writings, Newell & Simon refer to the psychological assumption as a working hypothesis: "we postulate that the subject's behaviour is governed by a program organized from a set of elementary information processes". This is justified by appeal to parsimony. It seems that they applied this principle is an unusual way: as "whatever makes our life easy" rather than "whatever makes the theory simple". Adding speculative and unsubstantiated unconscious mental processes, and then asserting that these behave as computer programs is a very strange application of Occam's razor indeed. However, Newell and Simon are unstoppable in their assumption, and assert:

There is growing body of evidence that elementary information processes used by the human brain in thinking are highly similar to to a subset of the elementary information processes that are incorporated in the instruction codes of present day computers.

However, there is no such evidence. The only "Growing body" is the body of speculation and assumption, and of works taking these assumptions to be true in the face of all evidence. The hypothesis here is instrumental in producing the "evidence" for itself! This serves as the basis of the assumption that all setbacks both in Cognitive Simulation and in Artificial Intelligence are purely accidental. The confidence is such, that it would seem that this assumption is being held a-priori.

II. A Priori Arguments for the Psychological Assumption

Look at how Miller et al. introduce their work:

Any complete description of behaviour should be adequate to serve as a set of instructions, that is, it should have the characteristics of a plan that could guide the action described.

In what sense can a set on instructions be complete? As Lewis Carol observed, there is always need for more detail in formal instructions. And what sort of instructions do you give for distinguishing blue from red?

However, this notion that understanding actions is equivalent to a detailing of instructions has a long history in western thought. Plato thought that all sensible (non-arbitrary) action must fall under some rule, and any person acting non-arbitrarily is following that rule either consciously or unconsciously. For Plato, these rules are pre-programmed, and can be elicited by interview. The attraction of this theory is that moral theory would make much more sense if it were true.

One one level, this assumption makes sense: if we were to take man as a physical object, and remember that all behaviour is reducible to chemical and physical events in the body, and that there is no evidence of a "soul" or such non-physical intervention, then man in a natural machine, subject to natural laws, and therefore subject to simulation in a computer. However, true as all thing may be, the resultant explanation would not be psychological at all, and therefore all discussion of humans as agents, minds, etc. would not exist in this context; and there would be, for example, no heuristics. This is the empirical school of psychology.

There is, however, another level, called the Phenomenological, where talk of humans as agents, and of tables and chairs, intentions and objects, language and other people, all make sense. But at this level there are no mechanistic psychological theories, so this is no solution for the cognitive simulators either.

If there is to be a psychology, then there must be an object under study - but as we have seen, not a physucal object. One attempt at formulating a psychology was started by David Hume, and continues as stimulus-response psychology. Another option is to see the object as an information-processing device, and the laws as Kantian reason. This school is called idealist, mentalist, intellectualist, or now, "cognitive psychology".

Until the advent of the computer, the empiricist school was in the ascendant, not least because the mentalist school seemed to always needed a "transcendental ego", a homunculus, to apply the rules. The computer provided a model of what non-human rule-application would look like. This was jumped upon with such delight, that mentalists never stopped to think whether this third level between the physical and the phenomenological is coherent at all. Look at the following (typical) quote from Miller:

When an organism executes a Plan he proceeds step by step, completing one part and then moving to the next

Here all three levels exist in unstable and ungrammatical suspension. All this would not be so bad if it were not evidence of deep confusion - and this confusion is of the utmost importance since what is being attempted here is no less than establishing a third scientific level to discuss humans other than the physical or phenomenological. Note, for example, that when one describes the functions of the eye in physical terms (light etc.) sight is not what is being described, because sight is what the human is experiencing, not the organism, which is a passive physical object.

All the terminology of "percepts", "sense data", "processing", "transformation" etc. is ambiguous: Are we talking photons/electrons/radiation? probably not, so what exactly are these? Phenomenologically we perceive objects, directly - No light rays, no sense-input. If Neisser (et al.) want to introduce percepts, which are neither light rays nor a perspectivial view of a physical object - then let them clarify what this "percept" may be, and then proceed to argue that it exists.

The only way out is to assume what we came to prove, that the mind is a computer, and then it all makes sense, but we are hunting for the basis of this assumption, not for further motivation for why it might be nice if it were true. It is fine to assume that the mind is like a computer, but then that is a hypothesis, and should not be treated as an axiom.

Conclusion

So, as we have seen, all evidence for the existence of a "psychological machine" is at best circular, and at worst plain confusion. As to the empirical question of whether such a machine is possible, the balance of evidence so far is not encouraging. This is not to say that man is not a physical object subject to the laws of physics - but man is definitely not a digital computer defined by discrete states.

The Epistemological Assumption

It is now clear that Cognitive simulation is hopelessly doomed. However, we can still find grounds for optimism: maybe human behaviour tough not comprised of rules, is nonetheless describable by rules, in the same way that the moon and the planets do not calculate differential equations, but are still describable using them? This would be a description of the competence, not a causal explanation of the performance. This is a subtle but important distinction: the psychological assumption, and the Cognitive Simulators, want to formalize behaviour by the same rules as the mind (or brain) produce it. The Epistemological assumption, with the Artificial intelligence community, want to describe a certain behaviour, and simulate that - without reference to any psychological reality.

The epistemological assumption is comprised of two claims:

That all non-arbitrary behaviour can be formalized,
That the formalism can be used to reproduce the behaviour.

We will attack these by showing that 1. is an unwarranted extension of the success of physics, and 2. by showing that a theory of competence is too abstract to produce behaviour.

I. A Mistaken Argument from the Success of Physics

Minsky's confidence in the epistemological assumption is based on Turing's, But Turing does not assert that a computer can do anything a human can, but can that a computer can do any well defined task a human can do.

There is another possibility: and that is to use a digital computer to simulate the human, or at least the nervous system, in toto, bringing into account all synapses, chemicals interactions, etc. But note that what would be simulated here is not the "information processing" of the human, or the mind, but the overall chemical infrastructure. In a sense, it would be simulation of a machine that, in turn, can indeed engage in the information processing. This machine, in this case, is the human body.

But in any case, this is not what the AI and CS researchers are claiming when they same that man is a Turing machine, for if that were the case they may as well say that a cat of an aeroplane is a Turing machine. What they mean, as Minsky puts it, is "Mental processes resembles ... the kind of processes found in computer programs: arbitrary symbol associations, treelike storage schemes, conditional transfers, and he like". All AI is dedicated to logically manipulating representations of the world, not to solving physical equations.

Also, bear in mind that the computation power required for a computer to simulate a human body is arguable so immense that there are not enough atoms in earth to built one. But nonetheless, when pressed, Turing Minsky et al seek refuge in this confusion of "information processing" and physical laws, as if it justified their assumptions.

II. A Mistaken Argument from the Success of Modern Linguistics

Noam Chomsky's recent success in establishing a formal theory of language was taken as a huge accomplishment by the AI community, as it seems to confirm the first half of the epistemological assumption. However, many parts of language, which would be required for any performance of language, have been left out of Chomsky's formalism, leaving just a description of the syntax. Can there be a formal description of what Chmosky at el. call pragmatics? No, on two counts: the in-principle one will be dealt with in the next chapter, but the descriptive one is as follows:

Consider the sentence "The idea is in the pen". How is a computer to interpret that? Rule-driven behaviour can only do one of two things: apply a rule unequivocally, or make a blind stab. Humans, on the other hand, can relate to a non-standard utterance without putting in under either of the headings "incomprehensible", or "clearly understood".

The idea of meta-rules to deal with slight (or big) violations of grammar does not work: Firstly, for any finite set of rules like this some poet will be able to say something that a human would at least partially understand and that would fall outside the rules, Secondly, this idea falls foul of the same issues as with other postulations of unconscious processes, namely that we have no evidence for these processes' existence, other than that it would be convenient for the theorists in question. Also, if there is a rule for everything, why do some utterances definitely feel odd? How are they different?

The problem with natural language is that algorithms are timeless and objective, while human communications are situated in time, place, etc. - in a context, including a culture and a field of interests, which is never explicit.

Conclusion

But the optimistic formalizer may say that the fact that we don't have rules of pragmatics is only a temporary failing, not a matter of impossibility. We must show that the psychological assumption is untenable on its own terms.

Wittgenstein noted that we don't use language in according to strict rules, nor have we been taught language according to rules. The insistence on rule following behaviour requires one of two things: either in infinite regress of rules-about-how-to-follow-rules (a la Lewis Carol), or an eventual rule that can be applied with no reference to further rules. Where the regress stops and a rule is applied self-evidently is a point of difference between Wittgenstein and the AI community:

For Wittgenstein, the regress of rules stops at an indeterminate situation-dependent point, and action takes over from deliberation. Since the computer is in no sense in a situation, this is a step that AI cannot follow: The machine's only non-rule behaviour is in the input device's sensors that generate the data. Humans do not have determinate actions directly dependent on input, like the frog's reaction to a moving black spot. The idea that the underlying rules-that-need-no-interpretation for humans are frog-like in unappealing, both from experience and from research.

The Ontological Assumption

We now turn to the last of the four assumptions, that all facts are enumerable, and can therefore be made available to a computer. This idea that all facts can be made explicit is drawn from a 2,000 year old tradition in philosophy, and is supported my a misunderstanding of the recent success of physics.

As Newell described GPS, he stated that the "...task environment [must be] defined in terms of discreet objects". This is the case for all computer programs and representations. But in order to understand anything, or even just de-cypher (for example) handwritten text one needs context - knowledge about "the ways of the world". Once we add in the "information processing" model, what one gets as an answer to the demand for context is a great mass of discreet facts. Minsky calculated in Semantic Information Processing how many facts of this nature a "humanoid intelligence" may need to behave sensibly in reasonable situations, and came up with numbers ranging from 100,000 to ten million. This leads to the "large database problem", which also includes the problem of the inflexibility of pre-judged categories of data. However, humans do not suffer from any "large database problems", it is the Ontological assumption itself which is generating the problem, not the nature of intelligence.

The Human understanding of a chair, for example, is not a discreet-and-finite set of facts, but requires an understanding of the institution of furniture, the structure of the human body, and the inevitability of fatigue. None of these issues, in turn, is in any way not discreet than the original chair.

There are two traditions in western philosophy, one, the mentalist (starting from Plato) believes in detailed facts, while the other, the Soulist believes in some inexplicable spirit which is the mind, or soul. It is the goal of philosophy since Plato to eliminate uncertainty in all fields, and therefore a tradition evolved through Leibnitz Hume and Russell, that all complex facts are somehow made of elementary ones, and that these logical elements have a one-to-one relationship with atomic things in the world. This tradition culminates in early Wittgenstein's assertion in the Tractatus that the world is comprised of facts, and nothing else.

The moment this agenda had been made so very explicit, several (mainly continental) philosophers (Merleau-Ponty, Heidegger and the later Wittgenstein) started publishing its limitations. We will explore more of this is section III, but we have already seen that the mentalist view does not square with our experience. Why, then, does it have such a hold on our minds?

The initial success of physics, such as Galileo's ability to describe the motions of objects, gave much encouragement to this mentalist tradition, allowing Hobbes to state that "all thought is reckoning". But this is to confuse the physical state of the universe, which can arguably be described atom by atom, with a Human's situation in the world.

As McCarthy sums up the assumption:

One of the basic entities in our theory is the situation. Intuitively, a situation is the complete state of affairs at some instant in time. The laws of motion of a system determine all future situations from a given situation. Thus, a situation corresponds to the notion of a point in phase space.

But from a (human or other) agent's point of view, the same situation can occur in different physical settings, and the same situation in the exact same physical setup, depending on the interest of the involved parties. Foe example, McCarthy discusses at length "being at home", and defines is at being inside one's house. But this is very partial. Is one not at home while in the garden? Is one at home when visiting a property one owns but lets out? Is one "at home" before the furniture has been moved in? Being at home is a human situation, being in a house is a physical state.

It is easy to see why the AI community wants to reduce the situational to the physical: Physics provides an excellent understanding by means of equations, while the human situation is exceedingly difficult to understand. Consider the sentence "He follows Marx". Is "he" a communist, or a fan of Groucho? It is now clear that one's knowledge of language and one's knowledge of the world are inseparable, and that a complete description of such knowledge would not only be infinite, but also impossible in principle.

correct interpretation of anything seems to require context, and correct identification of the context seems to require ever larger contexts, and there is no guarantee that these contexts are hierarchical or in any other way systematically organized. Therefore, there has been a tendency to look to the ultimate, or topmost context as a starting point. Thing ultimate context of "our shared culture" or "being a life form" is also non-discreet, and therefore non programmable. But nothing short of grasping this ultimate context could give us artificial intelligence, as nothing short of being a life form gives us natural intelligence. But how are we to proceed? without some particular case of concern, we are just faced again with an "infinite database" of contexts.

This is not, however, how Humans work. Humans take the preceding context as a guideline for the present one we carry over from the immediate past a set of expectations and interpretations that help us understand the present. But from a programming point of view, this leads to an infinite regress in time instead of the above infinite regress in context. A way out could be to try to program a newborn baby's intelligence and point of view, complete with the various simple reflexes and reactions, and then try to move forward from there. So far, no work has been done in this direction - AI seems to want to spring into being full-grown, like Athena from Zeus's forehead. Even if we were to start with a baby-intelligence, it is not clear how the nature of the intelligence would move from fixed and reflex-like to the flexible adult intelligence.

A possible way our would be to deny the division of fact and situation, and since we have seen that we cannot do anything outside of context, we should subsume the fact in the situation. The fact is not "in" the situation, it is part of the situation. In order for there to exist facts, a context is required.

Introduction

I

II

Part I: Ten Years of research in Artificial Intelligence

Phase I (1957-1962) Cognitive Simulation

I. Analysis of Work in Language Translation, Problem Solving, and Pattern Recognition

Language translation

Problem Solving

Pattern Recognition

Conclusion

II. The Underlying significance of Failure to Achieve Predicted Results

Fringe consciousness vs. Heuristically guided search

Ambiguity tolerance vs. context-free precision

Essential/inessential discrimination vs. trial-and-error search

Perspicuous Grouping vs. Character Lists

Conclusion

Phase II (1962-1967) Semantic Information Processing

I. Analysis of Semantic Information Processing Programs

Analysis of a program that "understands English" - Bobrow's student

Evan's Analogy Program

Quinlan's Semantic memory program

II. Significance of Current Difficulties

Part II: Assumptions Underlying Persistent Optimism

Introduction

The Biological Assumption

The Psychological Assumption

I. Empirical Evidence for the Psychological Assumption: Critique of the Scientific Methodology of Cognitive Simulation

II. A Priori Arguments for the Psychological Assumption

Conclusion

The Epistemological Assumption

I. A Mistaken Argument from the Success of Physics

II. A Mistaken Argument from the Success of Modern Linguistics

Conclusion

The Ontological Assumption

Part III: Alternatives to Traditional Assumptions

The Role of the Body in Intelligent Behaviour

The Situation: Orderly Behaviour Without Recourse to Rules

The Situation as a Function of Human Needs

Conclusion: The Scope and Limits of Artificial Reason

The 1979 Introduction

See Also