Summary of 'Programs With Common Sense' (1959) by John McCarthy

24 Feb 2017 - London, England

The Advice Taker

In 1959, John McCarthy noted that while interesting work was being done to solve problems requiring a high level of human intelligence, many simpler verbal reasoning processes had not yet been implemented using machines.

Taking inspiration from the field of formal logic which dates back to Aristotle (384–322 BC), McCarthy sought to design a machine with “common sense”. He proposed a program, named the Advice Taker, that could draw conclusions and improve from a set of premises (“advice”) defined in a formal language. Unlike previous research on the subject [1], McCarthy wished to describe the program’s procedures and heuristics in rich detail. The motivation behind this approach was to create a machine with the ability to learn from experience as effectively as humans do and enable discovery of abstract concepts through relatively simple representations.

McCarthy briefly mentioned that one known way to make machines capable of intelligent behaviour is for them to simulate all possible actions and test the results. Behaviours can be represented using nerve nets [2], Turing machines [3] or calculator programs [4]. His criticism of these targeted the low frequency of encountering interesting behaviour and the fact that small changes in behaviour expressed at a high level of abstraction do not have simple representations.

This led him to define a set of features he deemed essential for the evolution of human level intelligence:

All behaviours must be representable in the system.
Interesting changes in behaviour must be expressible in a simple way.
All aspects of behaviour (except the most routine) must be improvable including the improvement mechanism itself.
The machine must have or evolve concepts of partial success because on difficult problems, decisive successes or failures come too infrequently.
The system must be able to create subroutines which can be included in procedures as units.

McCarthy’s paper focused mainly on the second point. To begin, he stated that in order for a program to be capable of learning something it must be capable of being told it. He then made the distinction between the way an engineer would instruct a computer program to complete a task, through imperative commands, compared with the declarative way we would instruct a human. Declarative sentences have the advantages of being able to use previous knowledge, they have logical consequences, order is less important which allows afterthoughts and they are less dependent on the previous state of the system, meaning that the instructor requires less knowledge of the previous state.

Construction of the Advice Taker

The Advice Taker program possesses the following key features:

A way to represent expressions in the computer.
The expressions should be declarative in nature and allow logical inference.
An immediate deduction routine which deduces a set of immediate (one-step) conclusions given a set of premises. The intelligent behaviour would exist in the selection of which premises the deduction routine should be applied to.
Some formulas input to the system may define object properties. e.g. The number 1776 have a property that associates it with the event of the American Revolution.
Formulas in the system (other than declarative sentences) could include individuals, functions and programs.
The program is intended to operate cyclically, applying the immediate deduction to a list of premises and a list of individuals
Some conclusions may have the form of imperative sentences to be obeyed.

A priori knowledge

As an example, McCarthy described a scenario where you are at your desk and wish to go to the airport. Before any deduction could take place and a solution to the problem obtained, the following a priori premises would be input to the Advice Taker:

First we’d need to define the “at” predicate (the relation between a place and a sub-place) in the form:
- at(x, y) and it’s transitivity at(x, y), at(y, z) → at(x, z)
This could be used to provide the Advice Taker with several facts:
- at(I, desk)
- at(desk, home
- at(car, home)
- at(home, county)
- at(airport, county)
We can then define rules related to “going” and whether a place is “walkable” or “drivable”:
- did(go(x, y, z)) → at(I, y)
- walkable(x), at(y, x), at(z, x), at(I, y) → can(go(y, z, walking))
- drivable(x), at(y, x), at(z, x), at(car, y), at(I, car) → can(go(y, z, driving))
And use those rules to define two further facts:
- walkable(home)
- drivable(county)
We define a premise “canachult” that states if x, we can perform y and in doing so obtain z:
- (x → can(y)),(did(y) → z) → canachult(x, y, z)
This is also semi-transitive, where for two sets of actions with a linked variable:
- canachult(x, y, z), canachult(z, u, v) → canachult(x, prog(y, u), v)
- prog(y, u) represents the execution of those actions to obtain v.
The final premise will be the one which causes the action to be taken:
- x, canachult(x, prog(y, z), w), want(w) → do(y)
The problem would be represented with the following statement, causing the deductive process to begin:
- want(at(I, airport))

Deductive arguments

Given the above rules, facts and a goal, the Advice Taker should deduce the argument below. The final proposition would initiate action to achieve the goal:

at(I, desk) → can(go(desk, car, walking))
at(I, car) → can(go(home, airport, driving))
did(go(desk, car, walking)) → at(I, car)
did(go(home, airport, driving)) → at(I, airport)
canachult(at(I, desk), go(desk, car, walking), at(I, car))
canachult(at(I, car), go(home, airport, driving), at(I, airport))
canachult(at(I, desk), prog(go(desk, car, walking), go(home, airport, driving)) → at(I, airport))
do(go(desk, car, walking))

But how would the initial premises be collected and the deduction routine operate? McCarthy conceded he could not yet provide a full explanation of this but explored some high level ideas in the remainder of the paper:

It should be possible to store all initial premises in memory. The premises should not give the Advice Taker any particular advantage in this problem when compared with a human possessing the same ability to navigate.
Although the premises are stored in memory, we still need to describe the process the Advice Taker will follow to determine which are relevant and should be applied to the goal.
The overall deduction routine could start with an observation routine which:
- Looks at a main list M which is initialised with the goal want(at(I, airport))
- Observational statements about the contents of M are then added to an observation list O.
- The observation routine may have many outputs but in the first instance it would formally observe something to the effect of the only statement on M has the form want(u(x)).
A deduce and obey routine would then apply to list O along with a smaller list F, consisting of the Advice Taker’s fixed properties. The purpose of this routine is to extract certain statements from the properties of items in the observation list O.
- If want had arisen before, the observation routine earlier may have added it from M to O as an object with properties of statements that are relevant to building an argument, thereby generalising from past experience. In this case we assume that it has not been encountered before and does not take the status of an object.
- The first deductive step should ascertain that the goal premise want(at(I, airport)) is related to getting somewhere and search for appropriate stored rules and facts that could assist in the deductive argument.
- Low level abstractions deemed relevant would be found including walkable(x), at(y, x), at(z, x), at(I, y) → can(go(y, z, walking)) and walkable(home).
- Higher levels of abstractions would be found such as canachult premises referenced above that relate to doing something to obtain a new state.
- The a priori formula want(at(I, x)) → do(observe(whereamI)) should be found, causing the Advice Taker to invoke a general whereami routine to obtain the first premise at(I, desk).
- We might expect the deduce and obey routine would use this information to create a property list for want(at(I, x). One property may be a rule that begins with the premises at(I, y), want(I, x) and the conclusion to search for the property list of go(y, x, z).
- We would expect this to fail and heuristics of the Advice Taker to then initiate a search for a y such that at(I, y) and at(airport, y).
- This search would look at property lists of the origin (home) and the destination (airport), ultimately causing the drivable definition to be found with one of it’s premises at(I, car).
- A repetition of the above would find the walkable rule, which completes the set of premises since the other at premises would have been found as by-products of previous searches, enabling the argument above to be derived.

Conclusion

McCarthy hoped that the heuristic rules mentioned on the property lists were plausible to the reader. He concluded with the observation that many of the statements encountered were of stimulus-response format and obeying these rules could be likened to unconscious human thought. Conscious thought on the other hand, could be viewed as the process of identifying and deducing logical conclusions from a set of premises.

The final section presented criticism of the paper from Prof. Y. Bar-Hillel who claimed that the work belonged to the Journal of “Half-Baked Ideas” and was careless in it’s specification. McCarthy remarked that he was not proposing a practical real world problem for the program to solve but rather an example intended to allow us to think about the kinds of reasoning involved and how a machine may be made to perform them.

References & Sidenotes

You can read the original paper here. I’d also recommend reading John McCarthy’s legacy which provides further analysis of the paper and considers the impact of John McCarthy’s work on formal knowledge systems and Artificial intelligence as a whole.

[1] Newell, A., Shaw, J. C. and Simon, H.A.(1957). Empirical Explorations of the Logic Theory Machine. A case Study in Heuristic. Proceedings of the Western Joint Computer Conference, published by the Institute of Radio Engineers, New York, 1957, pp. 218–230.

[2] Minsky, M.L. (1956). Heuristic Aspects of the Artificial Intellegence Problem. Lincoln Laboratory Report,pp.34–55.

[3] McCarthy, John (1956). The Inversion of Functions Defined by Turing Machines, in Automata Studies, Annals of Mathematical Study No. 34, Princeton, pp. 177–181.

[4] Friedberg, R. (1958). A Learning Machine, Part I IBM Journal of Research and Development 2, No. 1.