Friday, August 6, 2010

The Prisoners' Dilemma: An Evolutionary Approach

Group Strategies and the Prisoners’ Dilemma
Or
Making it Real: The PD when Death is on the Line



Abstract



This is a rewrite of my masters dissertation from the University of Connecticut, 2001.

The Prisoners’ Dilemma has been modeled on many computer simulations in an attempt to determine under what conditions cooperation might arise. The simulation employed here adds a dimension to the Prisoners’ Dilemma by allowing whole populations to compete against each other under more realistic conditions. The rewards and penalties are not merely points accumulated, but are translated into real benefits whereby members of a population can have offspring or die prematurely from starvation depending on how they perform against competing strategies. Each player in a simulation starts out with a base amount of ‘life points.’(1) From there the simulation is played a number of times (iterations) and the players allowed to interact with each other. With each iteration, two players are selected at random from the total population and allowed to engage in one Prisoners’ Dilemma game. The results are added to or subtracted from each player’s life points. After accumulating a certain number of life points, a player is allowed to have an offspring which inherits his parent’s strategy. Lose too many points and the player dies.

Further, the possibility of group strategies emerges as populations can behave differently toward outsiders than they do toward members of their own group. Successful behavior not only benefits the individual, but insures greater representation of one's group in the next generation. One strategy in particular, which cooperates with members of its own group but defects against all outsiders, has proven to be remarkably successful. I have named this strategy ‘Cunning Neighbor’ (CN.) Two other group strategies, Wise Neighbor and Good Neighbor, are also successful, though not as much as CN.


Introduction


Strategies like Tit for Tat and PAVLOV do not take into account membership in a group. In an environment governed by natural selection, group strategies could be a powerful force incurring more fitness for an individual group member than for an individual acting solely in his own interest, thereby sustaining the requirement that natural selection be about the individual inheriting the genes while allowing for the evolution of group strategies. By using the simulation developed to incorporate more realism into the Prisoners’ Dilemma game (Loux, 2001) the possibility of some unique group strategies has emerged. There can be strategies that treat outsiders differently than members of ones own group or which learn about the outside world and then communicate that learning to members of the group. Learning and cooperating strategies should do well in a simulation where knowledge learned can be shared and not just used for one’s own future reference. Three group strategies that this simulation employs are Cunning Neighbor (CN) (2), Wise Neighbor (WN) and Good Neighbor (GN).

In Cunning Neighbor, a member will always defect against members of an outside group but always cooperate with members of his own group. This is a hybrid of All-C and All-D. Within group, the code of conduct is to always cooperate. When interacting with outsiders, however, the code is to always defect. CN proved to be surprisingly effective at invading and dominating a population(3). In one simulation three populations compete. These employ the strategies Tit for Tat (TFT), All-D and All-C. As is expected, after one hundred iterations All-D dominates the population (See figure 1.)

At this point a group of Cunning Neighbors migrates into the game. Five CN enter a population of eight TFT, 21 All-D and seven All-C. After 100 more iterations, All-D still dominates with 34 members (See figure 2.) TFT is five and All-C is five. The original five CN have grown to ten. At one thousand iterations All-C and TFT have gone extinct (at iteration 471 and 565, respectively. See figure 3.) CN stands at 77 members and All-D at 82. By iteration 1,009 CN dominates the population by 81 to 80 All-D. By iteration 2,000 CN has grown to about two thirds of the population. By iteration 10,000 CN has overwhelmed the environment with 2,138 individuals to 233 All-D going from an initial population density of twelve percent to ninety percent (See figure 4.)

Figure 1. After 100 iterations between Always Cooperate, Always Defect and Tit For Tat. All-D is clearly dominating the population.


Figure 2. Cunning Neighbor enters the population at iteration 101. After 200 iterations Cunning Neighbor has taken the number two place over TFT and All-C.


Figure 3. After 1,000 iterations CN is beginning to overtake All-D.



Figure 4. After 10,000 iterations CN has established an overwhelming dominance of the population.

In a straight contest between TFT and CN, CN proved to be very effective (See figure 5.) The balance quickly tipped in CN’s favor and TFT was driven to a minority by iteration 200. CN continued to overwhelm the population but was unable to completely eliminate TFT(4). After 1,000 iterations, CN represented 338 members to TFT’s three.


Figure 5. This table only shows the first 100 iterations. Beyond that TFT is no longer visible on the chart.

Good Neighbor (GN) is a strategy that always cooperates with other neighbors, just like CN. When it comes to outsiders, a GN will try to be nice at first. But unlike TFT that must learn on a case by case basis who is friendly and who is nasty, a GN can make a judgment about the outside population as a whole. He does this by keeping track of how many times he receives the prizes, R, S, T and P. Lots of R's and T's indicate a friendly environment. Lot's of S's and P's indicate a hostile environment. (See Appendix A. Good Neighbor for a discussion on how GN functions.)

In a friendly environment, a Good Neighbor will cooperate. In a hostile environment, he will defect. This is true even against players that a GN has not encountered before. This makes GN less naïve than TFT, which must learn this lesson anew with each and every new player encountered and then remember the outcome for future reference. This is obviously memory intensive. Creating a ‘prejudice’ based on past experiences with a group as a whole requires less information processing then having to keep track of each and every interaction with every individual.

Good neighbor operates similar to Cunning Neighbor except that it tries to be cooperative with outsiders while maintaining a demeanor of ‘niceness’. A Good Neighbor will cooperate first, but if he senses a hostile world, he will switch to defection. If while defecting he senses a switch to cooperation on the part of an outside group, he will switch back to cooperating. This makes GN a better learner than Pavlov which goes solely by points earned. In a hostile environment Pavlov jumps back and forth from the frying pan into the fire by switching from C to D. GN will detect that the outside world is hostile and cut his losses by sticking with defection. He wants to be a nice guy, but won’t be taken to the cleaners, either.

As could be expected, Good Neighbor will do less well than Cunning Neighbor in a nasty environment. In a contest between GN, All-D and TFT, TFT does the worst. All-D does better than GN, but GN is able to function (See figure 6.) Notice the stepping stone characteristic of the graph. This is due to the fact that each Good Neighbor must learn for himself what the outside environment is like. Since the outside world consists of defectors and Tit for Tatters, it is an ambiguous environment and requires a number of interactions to form an opinion. Add to this the fact that newborn members must start from scratch and create their own impression of the outside world.

Figure 6. Good Neighbor, Always Defect and Tit For Tat compete.

In a contest between GN and All-D, Good Neighbor will not do as well as All-D, but will be able to survive (See Figure 7.) Each Good Neighbor begins by being friendly, but immediately switches to always defecting.

Figure 7. Good Neighbor and Always Defect.

Wise Neighbor (WN) is like GN, except that he has the ability to pass on experiences to the rest of its group (See Appendix B. Wise Neighbor for a discussion on how the simulation calculates Wise Neighbor’s responses.) Experiences with the outside world are recorded for the whole group and not just for each individual. In a population of five Wise Neighbors, the first four can learn about the outside world. When WN number five has his first encounter with an outsider he will cooperate or defect based on how well the other four have fared. This overcomes the problem of isolation and youthful naiveté of GN. With GN, each neighbor must learn for himself what the outside environment is like. New entities born into the WN community can learn from the past experiences of the group. Knowledge can be encoded in the wisdom of the group, in other words.

Figure 8. Wise Neighbor and Always Defect

Wise Neighbor does very well against Always Defect (See figure 8.) It very quickly detects the hostile environment and defends against it. It is also able to transmit this information on to other members of the group, including newborn members, thereby preparing members of the next generation for their first encounters with the outside world.

Not surprisingly, Cunning Neighbor does very well against individual strategies like Tit for Tat. Even when a game is played for one hundred iterations and has grown in size from the original players, CN is able to infiltrate it and quickly dominate. By 10,000 iterations it represents 90 percent of the population, up from 12 percent when it first entered the population.

Good Neighbor and Wise Neighbor do poorer. They are able to detect a hostile environment and defend themselves against it. They are also able to live peacefully in a nice environment. The problem with these strategies is in an ambiguous environment. In a three way game with Cunning Neighbor and All-C, it takes a while for GN or WN to decide which strategy to pursue: Cooperate or Defect. Since these strategies shun the Temptation, they bounce back and forth between Cooperating and Defecting until the outside environment becomes mostly hostile at which point they fall into a pattern of All-D, even against the remaining nice outsiders.

Wise Neighbor is better off than Good Neighbor. Once one WN determines that the outside environment is hostile, every other WN will defect. With GN each neighbor will cooperate first and then decide what the outside environment is like.



Figure 9. Cunning Neighbor, Always Cooperate and Wise Neighbor

Figure 10. Cunning Neighbor, Always Cooperate and Good Neighbor.

Some Implications

Interestingly, Cunning Neighbor proved to be a highly effective strategy as well as being very information efficient. All it requires for a CN to decide on a strategy is one piece of information: Friend or Foe? Tit For Tat, on the other hand, requires a lot of information processing. Since the simulation operates in a population that can easily grow into the hundreds of thousands, each player must have a photographic memory of every other player and how that player responded in the past.

The simulation program creates a record for each game played. It records the ID's of the two players, the iteration of the game where this encounter takes place and the decisions of the two players. This is a very small record stored in a database. The next time a player like Tit For Tat or Tit for Two Tats plays, the simulation must search that database looking for records of an encounter between these two players in the past. If a simulation has run for ten or twenty thousand times, it must search ten or twenty thousand records. As simulations get bigger, the amount of time to execute grows longer with the increased risk of mistaken identify causing an inappropriate response. With an empty database, the best time for 10,000 iterations was about one minute. With a larger database containing a much bigger history, the maximum run time was over an hour and ten minutes(5).

From the standpoint of information processing this is significant. It also suggests that a strategy that requires personal recognition is costly in information processing terms. Something like Cunning Neighbor is much easier to encode. All that is needed is a recognition system based on the most general of criteria. A secret handshake, tell tale sign or Shibboleth is much easier to process than having to remember every past encounter with every potential partner.

For the other group strategies, a slightly more sophisticated technique is needed, but not nearly as data processing intensive as Tit For Tat. What is needed is the same recognition mechanism, plus a general memory of the overall responses of the opponents as a whole. Instead of individual opponents, opponents are lumped together into a collective 'them' and an overall impression formed.

Cunning Neighbor is deterministic: Us against Them. The other strategies develop prejudices and can become biased. They react to new outsiders based entirely on how they have viewed them in the past. If members of a certain outside group tend to be friendly, future encounters will be treated favorably. If not, a ‘once burned, twice learned’ approach will be used instead.

Has this proven selection of group strategies? Possibly, but only in that loyalty to ones group can ultimately increase the fitness of the individual. Loyalty to ones group makes it possible for the individual to exploit that group. Since ‘fitness’ is a measure of ‘exploitation,’ this is a well understood and accepted principal of evolution(6). By identifying with ones group and working toward the good of that group one can exploit the resources available only in association with that group. The memory of past interactions can play a powerful role in shaping future decisions. Not just regarding interactions with ‘outsiders’ but those within ones own group as well. You do not cheat your neighbor just because he is your neighbor, but because he will remember and that will cloud his interactions with you in the future. None the less in different circumstances you may try to get more than you deserve in the hopes that you can get away with it. This balancing act between cooperation and self indulgence, between being nice and trying to get away with something, forms the basis of civilization’s teeter tooter existence.

Furthermore, a group loyal strategy is still only one strategy in the toolkit of individual survival. Sometimes an individual might be loyal to their group and sometimes he may act with total selfishness. It all depends on what’s appropriate and what that individual thinks will work at that instant. Each is a possible tool of interaction which can be employed equally as is necessary. Civilizations might be noble or they might be savage. Or they might just be the collection of both noble and savage behaviors that have come together and seem to work at this moment in history.

The balancing act between altruistic and selfish impulses swings back and forth. Too selfish and a social group falls apart due to a lack of trust. Too trusting and it becomes vulnerable to outside invasion. This balance of selfish and altruistic must self regulate between set extremes to persist.
After all, every culture has its myths of man’s higher and lower natures struggling with each other, good vs. evil, angels and demons, gods pushing stones up impossible mountains but never giving up, neither side relenting or overcoming the other, nor ever ceasing the struggle. Humanity is forever in the grips of his internal contradictions. For in both directions lies destruction.

In this paper I have tried to layout some conditions under which members of a group, a tribe or a village may form group identity and a loosely formed, though delicate, cooperation as well as a framework of interaction with outside groups. This may help illuminate, if not explain, the checkered past of human interactions within our species.

The tools of genetic behavior provided by evolution are varied and contradictory. What makes sense in one interaction might not in another. What may be considered savage in one context is perfectly acceptable in another, even if justifying that savagery requires belief systems beyond belief.

Yet not all encounters between divergent cultures end in war, though war and conquest is one strategy. Sometimes trade, cooperation and alliances based on mutual enlightened self interest and the promise to cooperate can be as powerful, though not as memorable, as conflict. It all depends on what works. There is a reason why mediators between nations, groups, families and individuals labor to help us ‘get along.’ Contention and cooperation coexist in the same heart.

Appendices


Appendix A. Good Neighbor

Good neighbor can recognize members of its neighborhood and automatically cooperates with others of its own kind. Members are entered into the simulation in groups which are flagged as members of the same group. When a Good Neighbor interacts with outsiders, he cooperates first and then subsequently relies upon his memory of past interactions. GN’s memory consists of four counters that record the frequency that it has received each of the four possible payoffs. These counters are indicated as f(R), f(S), f(T) and f(P). The GN will cooperate in this case.

For example, if on its first interaction with a member of another population the opponent defects (GN always cooperates on its first move) GN will receive the sucker payoff. At the completion of this interaction, the outcome is tallied as usual with the points added to the entities vitality, etc. In addition to this, an incremental count is made of what prize was received. So if the GN cooperated and the opponent defected, a 1 is added to the counter for sucker prizes received. At the end of this iteration, the counters for f(R), f(S), f(T) and f(P) are 0, 1, 0 and 0, respectively. This will give this entity an overall negative view of the outside world. The actual algorithm for when to cooperate is: if f(R)+f(T) >= f(S)+f(P), then cooperate. When all counters are zero, f(R)+f(T) will be equal to f(S)+f(P), so a GN will cooperate on the first round automatically.

The next time this entity plays a game with someone from outside of its group, it will defect. If on this occasion the outsider cooperates, then the GN will receive the temptation. A 1 is added to the counter for temptation prizes received. At the end of this iteration, the counters for f(R), f(S), f(T) and f(P) will be 0, 1, 1 and 0, respectively. Since the algorithm that GN uses in evaluating the outside world is, if f(R)+f(T) is greater than or equal to f(S)+f(P), then cooperate, this entity will now have a positive impression of the outside world. The next time it encounters an outsider, it will cooperate.
Good Neighbor attempts to be nice in that it shuns the temptation. It considers cooperation to be preferable to defection, but will stick with a strategy of defection if it encounters a hostile world.

Appendix B. Wise Neighbor

In the case of wise neighbor, counters are kept for how well the group fares against outsiders. Information is shared, so it is not recorded at the individual level, but at the group level. There are four counters for all of the members of one group of WN’s. At the beginning of the simulation the counters are all set to zero. When the first member of the group encounters an outsider, it cooperates if f(R)+f(T) is greater than or equal to f(S)+f(P). Then the outcome is counted. Suppose that the outsider also cooperated. A 1 is added to the counter for f(R), the reward for cooperating. The next time a member of this group encounters an outsider, it can look at the experience of the group as a whole. So if a different member of the group encounters an outsider, it does not have to rely solely on its own experience. The group has already had one positive experience with outsiders, so this entity will cooperate in the new situation: f(R)+f(T) = 1, f(S)+f(P) = 0.

Suppose this time the outsider defects. This entity receives the sucker prize. A 1 is added to the counter for f(S). Since this is a group tally, this information is available to the next member of this group who encounters an outsider. This time, f(R)+f(T)= 1 and f(S)+f(P)=1. Since f(R)+f(T) is greater than or equal to f(S)+f(P), the next encounter will result in a member of this group cooperating. The overall impression of the outside world is positive or ambiguous at best. WN will try to cooperate whenever possible, just like GN. WN’s extra benefit is that it can share information among members of the group.


References


Axelrod, Robert and Douglas Dion
1988 The Further Evolution of Cooperation. Science, Vol 242, p. 9 December
1988, 1385-1389.

Axlerod, Robert
1984 The Evolution of Cooperation. Basic Books, A subsidiary of Perseus
Books, L.L.C.

Barkow, Jerome H, Leda Cosmides & John Tooby
1992 The Adapted Mind. Oxford University Press, New York.

Bendor, Jonathan, Roderick M. Kramer and Suzanne Stout
1991 When in Doubt… Cooperation in a Noisy Prisoner’s Dilemma. Journal of
Conflict Resolution, Vol 35 No. 4, December 1991, p. 691-719.

Bendor, Jonathan
1993 Uncertainty and the Evolution of Cooperation. Journal of Conflict
Resolution, Vol 37, no. 4, December 1993, p 709-734.

Cressman, R.
1996 Evolutionary Stability in the Finitely Repeated Prisoner’s Dilemma Game.
Journal of Economic Theory, Vol. 68, p. 234-248.

Danielson, Peter.
1995 Prisoner's Dilemma Popularized: Game Theory and Ethical Progress.
Dialogue (Waterloo, Ont.), v. 34 (Spring '95) p. 295-304.

Davis, Wayne
1998 Prisoner’s Dilemma.
http://netrunners.mur.csu.edu.au/~osprey/prisoner.html

Dixit, Avinash K. and Barry J. Nalebuff
1991 Thinking Strategically. The Competitive Edge in Business, Politics,
and Everyday Life. W.W.Norton and Co., New York.

Frank, Robert
1988 Passions Within Reason: The Strategic Role of the Emotions.
Norton, New York.

Houston, Alasdair I.
1993 Mobility Limits Cooperation. TREE vol. 8, no. 6, June 1993.

Linster, Bruce G.
1992 Evolutionary Stability in the Infinitely Repeated Prisoners' Dilemma
played by two-state Moore machines. Southern Economic Journal,
v. 58 (Apr.'92) p. 880-903.

Lloyd, Alun L.
1995 Computing Bouts of the Prisoner’s Dilemma. Scientific American, June
1995, p.110-115.

Lomborg, Bjorn
1996 Nucleus and Shield: The Evolution of Social Structure in the Iterated
Prisoner’s Dilemma. American Sociological Review, 1996, Vol. 61
(April:278-307).

Loux, Jonathan
2001 The Prisoner’s Dilemma Where Death is on the Line

Marinoff, Louis
1990 The Inapplicability of Evolutionarily Stable Strategy to the Prisoner’s
Dilemma. The British Journal for the Philosophy of Science, v. 41
(Dec. '90) p. 461-72.

Macy, Michael W. and John Skvoretz
1998 The Evolution of Trust and Cooperation Between Strangers: A
Computational Model. American Sociological Review, 1998, Vol. 63
(October:638-660).

Moore, F. C. T. (Francis Charles Timothy)
1994 Taking the Sting out of the Prisoner's Dilemma. The Philosophical
Quarterly, v. 44 (Apr. '94) p. 223-33.

Nowak, Martin and Karl Sigmund
1993 Chaos and the Evolution of Cooperation. Evolution, Vol 90. pp.
5091-5094, June 1993.

To, Theodore
1988 More Realism in the Prisoner’s Dilemma. Journal of Conflict Resolution,
Vol 32 No. 2, June 1988, p. 402-408.

Wu, Jianzhong and Robert Axelrod
1995 How to Cope with Noise in the Iterated Prisoner’s Dilemma. Journal
of Conflict Resolution, Vol. 39, No. 1, March 1995, p. 183-189.


End Notes


1. For the purposes of this simulation, the reward structure has been changed from the traditional 3,0,5,1 to 2,-1,3,0. This allows the results to have a positive, neutral or negative effect on the players.

2. This is similar to Macy and Skvoretz (1998) neighbors except that there is no embeddedness. Each encounter is totally random.

3. Game simulations referenced herein use the following characteristics: Entities are entered in groups of five. The mortality and noise rates are set to two percent. Game executions are anywhere from one hundred to ten thousand iterations. Final results are displayed as number of live entities remaining at the end of the simulation.

4. This is an artifact of the simulation. Players are picked randomly from the population of living entities. As this pool of available players increases and one group’s representation in the pool shrinks, members from that group get selected less and less often. One attempt was made to remedy this by having the first player chosen in a round robin manner from the pool of living entities and then its partner chosen randomly from the rest. At the end of each run the entity number of the last player played was stored on the database so that the next time the game was run it would start up with the next player and continue through the pool. However, this created a different dilemma. Those groups of players which were entered into the game first were all chosen one after the other to play the game, giving that group an advantage over the rest of the population. The simulation gave a slight bias according to entry. The selecting algorithm was switched back to purely random choices for both players.

5. This is not intended to be a precise comparison. Time of day and system overload is not taken into account. This simulation runs on a multi user IBM mainframe along with all other major systems at the University of Connecticut. The processors involved are VM-ESA and OS/390 systems running DB2 for a database engine.

6. Fitness refers to how well an organism ‘fits’ its environment, not how ‘physically fit’ it is. This is one of the most misunderstood concepts in evolution. Fitness really measures the ability of an organism to exploit an environment by whatever means; good or bad, through strength or guile, moral or immoral. These, after all, are all human values, invented by human beings to compel others to act in a certain, exploitable, way. The fact that we humans talk in terms of these arbitrary values is, itself, an adaptation which allows us to fit our environment. Our environment being large groups of humans acing together. To evolution, anything can be an environment and any activity which allows something to extract resources from that environment can be considered 'fit.' The only criteria is that it works, not what we may think about it.