Learning in a Labyrinth: Learning from
Model-Based Feedback
Jerker Denrell, Christina Fang, Daniel A. Levinthal
iibjd@hhs.se, fang@management.wharton.upenn.edu, levinthal@wharton.upenn.edu
Current models of experiential learning suffer from an
important limitation in that the choice of a
specific action is assumed to be immediately followed by an observable outcome.
However in
many situations outcomes can only be observed after a series of actions have
been performed. As
a result, our models of learning miss a fundamental challenge of learning
tasks --- action and
payoffs are often separated across time.
We create a formal computational model of learning in
these situations in which the actor
develops a mental model of the value of intermediate states. We model
a particular method of
credit assignment, termed temporal differencing that has recently been introduced
in the literature
on adaptive systems in computer science. In this method an actor's own mental
model of the
environment is used to provide interim feedback regarding the value of actions
in lieu of feedback
from the environment. We explore the evolution of an actor's mental model
over time as a
complex problem-solving task is repeated. While the problem structure
is assumed to remain
fixed, the initial conditions for this task are varied with each iteration.
As experience is gained
with the problem, stage-setting or antecedent actions begin to be recognized
as valuable and
distinct problem solving routines develop. These routines become more elaborated
with
experience, resulting in increased efficiency at problem-solving. The process,
however, requires
repeated trials. The positional values of intermediary states are only gradually
imputed. Thus,
while a valid cognitive map quickly emerges for states close to the solution,
recognizing the
positional value of more distant states requires numerous trials. Although
more extensive credit
assignment can produce faster learning, we show that it may lead to less
intelligent associations
between the ultimate outcome and prior actions. As a result, managing
the appropriate degree of
credit assignment is a form of the familiar exploration/exploitation tradeoff.