Knowledge-Based Cascade-Correlation
Author(s) -
Thomas R. Shultz,
François Rivest
Publication year - 2000
Language(s) - English
DOI - 10.1109/ijcnn.2000.10008
Neural network modeling typically ignores the role of knowledge in learning by starting from random weights. A new algorithm extends cascade-correlation by recruiting previously learned networks as well as single hidden units. Knowledge-based cascade-correlation (KBCC) finds, adapts, and uses its relevant knowledge to speed learning. In this paper, we describe KBCC and illustrate its performance on a small, but clear problem. 1 Exist ing knowledge and new learning Most research on learning in neural networks has assumed that learning is done "from scratch", without the influence of previous knowledge. However, it is clear that when people learn, they make extensive use of their existing knowledge [1-3]. Use of prior knowledge in learning is responsible for the ease and speed with which people are able to learn new material, and for interference effects. A major limitation of neural network models of human cognition and learning is that these networks begin learning from only a random set of connection weights. This implements a tabula rasa view of each learning task that few contemporary researchers would accept. In this paper, we propose a fundamental extension of cascade-correlation (CC), a generative learning algorithm that has been useful in the simulation of cognitive development [4-9]. CC builds its own network topology by recruiting new hidden units into a feed-forward network as needed in order to reduce network error [10]. Our extension, called knowledge-based cascade-correlation (KBCC) recruits previously learned networks in addition to the untrained hidden units recruited by CC. We refer to existing networks as potential source knowledge and to a current learning task as a target. Previously learned source networks compete with each other and with single hidden units to be recruited into the target network. KBCC is similar to recent neural network research on transfer [11], sequential learning [12], lifelong learning [13], multi-tasking [14], knowledge insertion [15], modularity [16], and input recoding [17], but it tries to accomplish these functions by storing and searching for knowledge within a unified generative network approach. 2 Descript ion of KBCC KBCC is similar to CC, except that KBCC treats previously learned networks like single candidate hidden units, in that they are all candidates for recruitment into a target network. A candidate unit and a candidate network both describe a differentiable function. The connection scheme for a sample KBCC network is shown in Figure 1. This scheme is similar to that in CC except that a recruited network can have multiple weighted sums as inputs and multiple outputs, whereas a single recruited unit only has one weighted sum as input and has a single output. Among the notational conventions we use in formulating KBCC are: o ou w , : Weight between output ou of unit u and output unit o. c u i o w , : Weight between output ou of unit u and input ic of candidate c. p o f , ′ : Derivative of the activation function of output unit o with respect to its input at pattern p. p o i c c f , ∇ : Partial derivative of candidate c output oc with respect to its input ic at pattern p. p o V , : Activation of output unit o at pattern p. 0-7695-0619-4/00/$10.00 (C) 2000 IEEE p oc V , : Activation of output oc of candidate c at pattern p. p ou V , : Activation of output ou of unit u at pattern p. p o T , : Target value of output o at pattern p.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom