INSTRUMENTAL CONDITIONING

1.  THORNDIKE’S PUZZLE BOX = hungry cats had to learn to escape to get a bowl of food -- Thorndike measured the LATENCY to escape.  Results = over trials, LATENCY declined

 

 

 

the gradual nature of the curve convinced Thorndike that the animals had not formed a rational understanding of the situation but rather that THE FOOD REWARD GRADUALLY STAMPED IN AN ASSOCIATION BETWEEN CUES IN THE PUZZLE BOX AND ESCAPE.  He formalized his belief in his 'LAW OF EFFECT"

 

"WHEN A RESPONSE IS REPEATEDLY FOLLOWED BY A SATISFYING STATE OF AFFAIRS, THAT RESPONSE WILL INCREASE IN FREQUENCY"

 

 

2.  DISCRETE TRIALS PROCEDURES = during training, 1) each trial ends when you remove the animal from the apparatus and 2) the instrumental response is performed only once during each trial.  Usually, discrete trial procedures use some type of maze.

 

The use of mazes was pioneered by W.S. SMALL at Clark University so that he could study learning in rats.

 

His inspiration was provided by an article in the Scientific American describing how rats lived in underground burrows and have to make their way through "maze-like" passages all the time -- so he borrowed from nature -- that is he brought nature into the laboratory and set up what he thought was the equivalent of these underground mazes.

 

Measure 1) RUNNING SPEED = how fast an animal can get from the start box to the goal box -- usually increases over trials -- animals get faster; or 2) LATENCY = the time it takes the animals to leave the start box and begin moving down the alley -- usually gets shorter over trials -- animals get faster.

 

3.  FREE-OPERANT PROCEDURES = procedures involving responses made by the animal at a pace which they set (i.e., they are “free” operate on their environment by responding whenever they would like and however often they would like).  Skinner eliminated the maze altogether and designed a chamber that had the start box and the goal box in the same place so the animal didn't have to run anywhere and built the “Skinner box” – or operant chamber.

steps in training:

1ST STEP = MAGAZINE TRAINING = sound of food being delivered (classical conditioning sign tracking!)

2ND STEP = SHAPING = rewarding successive approximations to the desired behavior

 

Measure the animal's behavior using a CUMULATIVE RECORDER.

This is a device that has a rotating drum that pulls paper out at a constant rate -- a pen sits on the paper.  If no response occurs, the pen stays still and draws a flat, horizontal line as the paper comes out of the machine.  If an animal performs a lever press, the pen moves up one step on the paper and stays up -- the next response moves the pen up one more step and so on and so on -- it's called cumulative because you can measure the total number of response by just looking at the vertical distance between where the pen started and where the pen stopped after the animal stopped responding.

VERTICAL DISTANCE = total number of responses

HORIZONTAL DISTANCE = how much time has elapsed

SLOPE OF THE LINE = rate of responding (how fast the animal was pressing the bar)

 

 

 

4.  REINFORCEMENT SCHEDULES = Rules that determine when a response will be reinforced.

 

CONTINUOUS REINFORCEMENT (CRF) = every response is reinforced.

PARTIAL or INTERMITTENT REINFORCEMENT -- 2 types: Ratio & Interval

      1.  RATIO SCHEDULE = reinforcement depends on the number of responses emitted, example "piecemeal" work where people are paid by how many items they make (NOT by how long it takes to make them)

a.  FIXED RATIO = fixed number, in FR30 every 30th response is rewarded.

b.  VARIABLE RATIO = variable number, in VR30 an average of 30 responses are rewarded.

      2.  INTERVAL SCHEDULE = reinforcement depends on how much time has passed since the last reinforcement, example getting mail where you can visit the mailbox a zillion times a day but you're still not going to get any until 24 hrs after today's batch.

a.  FIXED INTERVAL = fixed amount of time, in FI30 the first response made after 30 seconds has elapsed is rewarded.

b.  VARIABLE INTERVAL = variable amount of time, in VI30 the first response made after an average of 30 seconds has elapsed is rewarded.

 

 

Each schedule has a different effect on behavior:

 

 

FIXED INTERVAL = you get very little responding after a reinforcement, but the rate steadily accelerates and reaches a peak just before the next reinforcement is due.  This is called an FI SCALLOP.  WHAT DOES THIS TELL US ABOUT ANIMALS -- WHAT ARE THEY DOING?  THEY ARE JUDGING TIME -- Very, very important implications -- if you reward every hour then the behavior will only occur every hour.

FIXED RATIO = you get pause and run, reward then pause followed by many presses then reward.

 

With VARIABLE SCHEDULES (either VI or VR) you get much more regular because reinforcement can occur at any time -- it cannot be predicted.  Which schedule(s) do you think Las Vegas knows all about?

 

One last schedule -- FIXED-TIME SCHEDULE is the automatic delivery of a reinforcer at a given time (like every 2 minutes).  It can be contrasted with a FIXED INTERVAL SCHEDULE in that a FIXED TIME SCHEDULE, the reinforcement is NOT contingent upon any response -- that is, the animal is reinforced no matter what he is doing at the time.  Skinner called this "accidental" reinforcement SUPERSTITIOUS BEHAVIOR because the animal acts as though his behavior produces reinforcement when in reality, nothing (or anything) he does will result in getting the reward.

 

 

5.  RESPONSE-OUTCOME CONTINGENCIES

 

Some definitions:

APPETITIVE STIMULUS = A pleasant event.

AVERSIVE STIMULUS = An unpleasant event.

POSITIVE CONTINGENCY = a response "turns on" a stimulus = a rat can press the bar which will activate the food magazine and he will get some food.

NEGATIVE CONTINGENCY = a response "turns off" a stimulus = a rat can be sitting in the Skinner box and the experimenter can deliver a loud noise – if the rat presses the bar the noise will be turned off.

 

Four common procedures bring together our 2 types of events (APPETITIVE and AVERSIVE) and our 2 types of response-outcome contingencies (POSITIVE and NEGATIVE):

 

 

 

a. POSITIVE REINFORCEMENT = procedures in which the response turns on an APPETITIVE STIMULUS.  If the response occurs, then the appetitive stimulus is presented.  If the response doesn't occur, then the appetitive stimulus is not presented.  This is a POSITIVE CONTINGENCY and the rate of responding increases.

POSITIVE REINFORCEMENT

APPETITIVE STIMULUS

POSITIVE CONTINGENCY

RESPONDING INCREASES

 

 

b. PUNISHMENT = procedures in which the response turns on an AVERSIVE STIMULUS.  If the response occurs, it receives the aversive stimulus.  If the response doesn't occur, then the aversive stimulus is not presented.  This is a POSITIVE CONTINGENCY and the rate of responding decreases.

PUNISHMENT

AVERSIVE STIMULUS

POSITIVE CONTINGENCY

RESPONDING DECREASES

 

 

c. NEGATIVE REINFORCEMENT = procedures in which the response turns off or prevents an AVERSIVE STIMULUS.  If the response occurs, the AVERSIVE STIUMULUS either doesn't come on at all or gets turned off.  If the response doesn't occur, then the aversive stimulus gets turned on or stays on.  This is a NEGATIVE CONTINGENCY and the rate of responding increases.

NEGATIVE REINFORCEMENT

AVERSIVE STIMULUS

NEGATIVE CONTINGENCY

RESPONDING INCREASES

 

2 types of NEGATIVE REINFORCEMENT -- one is called ESCAPE  in which the response turns off the AVERSIVE STIMULUS.  Example, an experimenter can turn on a loud noise and the rat presses the bar once and turns it off.  By performing a response the animal has ESCAPED the aversive situation.

And the other is called AVOIDANCE in which the response prevents an AVERSIVE STIMULUS.  Example, a rat can be classically conditioned by pairing a light with footshock.  Present the light, and the rat runs over and presses the bar once and prevents the shock from ever happening.  By performing the response, the animal has AVOIDED the aversive stimulus.

 

d. OMISSION TRAINING = procedures in which the response turns off or prevents an APPETITIVE STIMULUS.  If the response occurs, then the APPETITIVE STIMULUS gets "omitted".  If the response doesn't occur, then the appetitive stimulus occurs.  This is a NEGATIVE CONTINGENCY and the rate of responding decreases.

OMISSION TRAINING

APPETITIVE STIMULUS

NEGATIVE CONTINGENCY

RESPONDING DECREASES

 

 

6.  STIMULUS CONTROL = A phenomena in which the likelihood of a response varies according to the stimuli present at the time.  A response is under stimulus control if it’s probability of occurrence differs in the presence of different stimuli.

 

7.  GENERALIZATION = Responding to one stimulus due to training involving some other similar stimulus.  See CLASSICAL CONDITIONING above.

 

8.  DISCRIMINATION = Differential responding to 2 stimuli.  During DISCRIMINATION TRAINING, 2 stimuli are presented: the reinforcer is presented in the presence of one stimulus (S+), but not in the presence of the other (S-).  See CLASSICAL CONDITIONING above.

 

9.  PEAK SHIFT = a shift in the generalization gradient away from S-.  If subjects are given a generalization test following training with a single stimulus, the peak of the generalization gradient will be located at the training stimulus.  However, if subjects are given discrimination training involving 2 stimuli, the greatest responding during the generalization test occurs not to S+ but to a stimulus further away from S-.

 

 

10.  EXTINCTION = reinforcement is discontinued.  When EXTINCTION is first introduced after a period of reinforcement, there is this BURST of responding – then the rate of responding gradually decreases.  Wait until the next day and put the animal back into the Skinner box -- you get SPONTANEOUS RECOVERY similar to classical conditioning.

 

11.  SPONTANEOUS RECOVERY = the return of an extinguished response after a period of time following the last extinction trial.

 

12.  PARTIAL REINFORCEMENT EFFECT (PRE) = The higher the proportion of responses that are not reinforced during training, the more persistent responding is during extinction.

 

13.  FRUSTRATION & EXTINCTION-INDUCED AGGRESSION = An increase in the vigor of the behavior that immediately follows nonreinforcement of a previously reinforced response; it’s the emotional response induced by the withdrawal of an expected reinforcer.  Under certain circumstances, frustration may be sufficiently severe to include aggressive reactions.

 

Azrin, Hutchinson & Hake (1966) used 2 pigeons placed in a Skinner box -- one was restrained in the corner so he couldn't respond, and the other one was trained to peck a key for reinforcement.  The key pecker basically ignored the other one as long as he got his reinforcement.  When EXTINCTION was introduced he attacked the restrained one -- FRUSTRATION.

14.  NEGATIVE REINFORCEMENT & AVOIDANCE 

NEGATIVE REINFORCEMENT

AVERSIVE STIMULUS

NEGATIVE CONTINGENCY

RESPONDING INCREASES

 

Studies on AVOIDANCE rely on both CLASSICAL CONDITIONING and INSTRUMENTAL CONDITIONING procedures.

 

DISCRIMINATED, or SIGNALLED AVOIDANCE involves discrete trials.

A trial begins with the presentation of a CS -- like a tone.

If the animal makes the desired response, like running from one side of a cage to another, then he has successfully AVOIDED shock = this is called an AVOIDANCE trial.

If he does not make the desired response, he gets a shock.  The shock stays on until he makes the desired response.  When he does, the shock is turned off = this is called an ESCAPE trial.

During the early part of training, most of the trials are ESCAPE trials.  Once the animal learns that the CS predicts the US, then most of the trials become AVOIDANCE trials.

 

The most popular apparatus used in DISCRIMINATED AVOIDANCE is called a SHUTTLE BOX which is a cage separated into 2 halves by an arched door.  Each half has a separate wire grid floor through which we can pass an electrical current.  The animal is put in one side of the box and the CS is presented.  If the animal crosses over into the other side of the box, he avoids the shock.  After some sort of intertrial interval (say 1 minute), the CS will be turned on again and the rat will have to cross over into the opposite compartment again in order to avoid the shock.

So throughout the experiment, the rat will “shuttle” back and forth between the 2 sides of the box.

 

 

15.  SIDMAN AVOIDANCE = An avoidance procedure devised by Murray Sidman that does involve a warning stimulus.  An aversive event such as a shock is scheduled to occur at fixed time intervals (the shock-shock interval); if the subject makes the required response at any time during this interval, the next programmed shock is postponed for a fixed period (the response-shock interval).

 

 

16. CHOICE BEHAVIOR EXPERIMENTS = very rarely does reinforcement operate on a single response in isolation.  Instead of simply choosing whether or not to make a response, we are often confronted with a choice between two or more responses each with a set of their own reinforcers.  Choice behavior experiments are those in which more than one response can be made.

 

Measure responding using a RELATIVE RATE OF RESPONDING measure for each choice.  Example, a pigeon is trained to peck on either Key A or Key B.  The RELATIVE RATE OF RESPONDING for Key A which would equal the Responses on A divided by the total of (responses on A plus responses on B)

 

RELATIVE RATE OF RESPONDING for key A = RA/(RA+RB)

 

RELATIVE RATE OF RESPONDING for key B = RB/(RA+RB)

 

What happens if the pigeon pecks the same number of times on A and B...., say 10 times on each.  What's the RELATIVE RATE OF RESPONDING for key A?    0.5

What's the RELATIVE RATE OF RESPONDING for key B?    0.5

 

What happens if the pigeon pecks the same number of times on A and B...., say 8 times on A but only 2 times on B?

What's the RELATIVE RATE OF RESPONDING for key A?    0.8

What's the RELATIVE RATE OF RESPONDING for key B?    0.2

 

 

17.  HERRNSTEIN’S MATCHING LAW (Herrnstein, 1961) = when you have a choice among several activities, the percentage of time that you devote to one of these activities will match the percentage of the available reinforcers that you have gained from this activity.

 

Example 1 = CHOICE BEHAVIOR EXPERIMENT : both keys (A & B) are on the exact same VI60 schedule.  Results : they will peck equally often on each of the keys and they will get just as many reinforcements on key A as on key B -- so the RATE OF REINFORCEMENT will be equal.

Example 2 = CHOICE BEHAVIOR EXPERIMENT : Key A has a VI6 min schedule in one hour, what's the absolute maximum number of reinforcements a pigeon can get?  10 (cause there's 10 6-minute intervals in one hour).  Key B has a VI2 min schedule -- in one hour, what's the absolute maximum number of reinforcements a pigeon can get?  30 (there's 30 2-minute intervals in one hour).  So that's 3 times the amount that's possible on key A.

 

Results: the pigeon will MATCH the number of responses to the likelihood of getting reinforced -- he will respond 3 times as often on Key B compared to Key A.

 

 

18.  CONTRAST EFFECTS = A change in a reinforcer’s effectiveness due to prior experience with other reinforcers (usually a reinforcer is “shifted” with another one having a different level of positive or negative valence).  The effects of a shift in reward were originally demonstrated by Crespi -- in fact, behavioral shifts following reward shifts were collectively called "the CRESPI EFFECT”.  A more recent study was done by Mellgren (1972).

 

Mellgren took groups of rats in a runway

During Phase 1 -- Group 1 and 2 got 2 pellets of food each time they ran down the runway.  Group 3 & 4 got 22 pellets of food.

During Phase 2 -- half of the rats were "shifted".  So Group 1 = remained the same and got 2 pellets (Small-Small); Group 2 = was shifted up and got 22 pellets (Small-Large); Group 3 = remained the same and got 22 pellets (Large-Large); and Group 4 = was shifted down and got 2 pellets (Large-Small)

 

Results:

The Small-Small group didn't change much

The Large-large group didn't change much.

But, rats shifted from a Small to a Large reward ran faster for the large reward than the ones that had received the large reward all along.  This is called a POSITIVE BEHAVIORAL CONTRAST -- so a POSITIVE BEHAVIORAL CONTRAST is defined as increased responding for a favorable reward because of prior experience with a less attractive outcome.

And rats shifted from a Large to a Small reward ran slower for the small reward than the ones that had received the small reward all along.  And this is called a NEGATIVE BEHAVIORAL CONTRAST -- so a NEGATIVE BEHAVIORAL CONTRAST is defined as depressed responding for an unfavorable reward because of prior experience with a better outcome.

 

 

19.  PREMACK PRINCIPLE = A basic law that states that the opportunity to perform a response can be used to reinforce any other response whose probability of occurrence is lower.

 

 

20.  YERKES-DODSON LAW = There is an inverse relationship between task difficulty and optimum motivation: the more difficult the problem, the lower the optimum motivation.