Experiments on the Mechanization of Game-learning. 2--Rule-Based Learning and the Human Window
Author(s) -
Donald Michie
Publication year - 1982
Publication title -
the computer journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.319
H-Index - 64
eISSN - 1460-2067
pISSN - 0010-4620
DOI - 10.1093/comjnl/25.1.105
Subject(s) - deliverable , computer science , mechanization , window (computing) , intelligibility (philosophy) , artificial intelligence , machine learning , operating system , engineering , systems engineering , ecology , philosophy , epistemology , biology , agriculture
The first successful learning programs were developed in the 1950s and belonged to a general category which was at that time commonly known as 'hill-climbing'. Global mathematical models of system performance were typically constructed in forms permitting multi-dimensional representation in systems of orthogonal co-ordinate axes. Numerical parameters were then automatically tuned at run time in response to sensed deviations from computed criteria of optimality. This scheme embraces classical adaptive control, along with many studies of machine learning in games and game-like situations. How the problem appeared to AI workers of that epoch can be gleaned from the proceedings of a celebrated symposium held in 1957 under the title: Mechanisation of Thought Processes (HMSO, London). The present paper is the second of a series. The first appeared nearly 20 years ago and proposed a new principle for attaining the above purposes, now known as 'rule-based' learning. The idea was to partition the problem-domain into a mosaic of smaller sub-domains and to associate a separate rule of action with each. In the pre-learning state a rule may be stochastic or even vacuous. In the learning mode, entry into a given subdomain invokes a global procedure for collecting data on the sensed consequences of executing the associated rule and for up-dating the rule's content in the light of these. A worked toy example was supplied in the form of a computer simulation of a machine for learning to play Noughts and Crosses (Tic-Tac-Toe). Distinct equivalence classes into which the positions can be grouped were taken as the separate sub-domains. These were represented as separate 'boxes', as in a card-filing system. Successful tests of the 'boxes' principle were subsequently made on a hard dynamical problem, namely automatically controlling the support point of an inverted pendulum within a bounded space. The adaptive polebalancer was required to deliver 20 left-right decisions per second to a motor-controlled cart on which was balanced a pole free to move in the vertical plane defined by a straight bounded track (Fig. I). The task of the BOXES program was to acquire by trial and error, or from being shown by a human tutor, or from a combination of both, the ability to control the cart within the bounds set by the ends of the track without permitting the pole's angular deviation from the vertical to exceed a pre-set tolerance. For experimental runs the precise specification was: the system fails if any of four monitored variables (position on track, velocity, pole angle, angular velocity of pole) pass outside fixed bounds. Initially decisions were taken randomly, and fail-free periods were measured in seconds. After learning, the fail-free periods lasted half an hour or more. BOXES was the first system to be driven by a set of independently modifiable production rules, and thus foreshadowed today's 'expert systems'. The mode of learning was primitive, being confined to revising the action-recommendations associated with stored situation-patterns, the latter being fixed. The next series of experiments at Edinburgh, involving computer-coordination of hand-eye robots, focused on the situationperception component of machine learning. In the FREDDY robot work, the learning module built new patterns in memory as the basis of adaptive perception of situation-categories. The stored structures were descriptions in semantic net form of the visual appearances of various objects such as cup, spectacles, axle, wheel, and so forth. From these the system was required at run time to recognize instances from images sampled from the television camera acting as the robot's eye. In robot vision 'programming by example' becomes a necessity. The infeasibility of programming in the ordinary sense is apparent if one compares the police task of identifying a culprit from photographs with identification purely on the basis of verbal description. The Edinburgh, versatile assembly program was thus operating in the foothills of a domain now commonly
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom