实验复现:重复的囚徒困境博弈 - Experiment Replication:Repeated Prisoner's Dilemma Game
背景 - Background
The prisoner's dilemma is a representative example in non-zero-sum game theory, where the best choice for the individual is not necessarily the best choice for the group. Here is a classic presentation of the Prisoner's Dilemma:
The police arrested two suspects, A and B, but there was not enough evidence to charge them. Therefore, the police detained the suspects separately, met with them individually, and offered the same options to both:
If one confesses and testifies against the other (defects), while the other remains silent, the one who confessed will be released immediately, and the silent one will be sentenced to 3 years in prison.
If both remain silent (cooperate), they will both be sentenced to 1 year in prison.
If both testify against each other (defect), they will both be sentenced to 2 years in prison.
As the prisoner is uncertain if the other prisoner will choose to cooperate or defect, the gains from defecting tend to be higher no matter what the other decides. Hence, it's rational for the prisoner to choose defecting. As a result, both prisoners defected, resulting in a two-year sentence.
However, this decision obviously isn't the optimal solution for maximizing collective interests (Pareto optimal). If both parties chose to cooperate, they would only have to serve a one-year sentence.
重复的囚徒困境博弈 - Repeated Prisoner's Dilemma Game 如果允许两个囚犯重复博弈,那么结果就完全不一样了,因为一个囚犯被背叛后,可以在下一轮中惩罚他的对手,这会在某种程度上鼓励对方与自己合作。
The outcome is completely different if the two prisoners are allowed to play the game repeatedly. In such a case, a prisoner would have the chance to retaliate in the next round if he is defected against. This could, to a certain extent, encourage his opponent to cooperate with him.
Axelrod organized a Repeated Prisoner's Dilemma game where participants submitted codes for head-to-head battles, each with their unique strategies. The winners were determined based on total scores. Among all, the most impressive strategy was a simple one: "Tit for Tat." This strategy cooperates in the first round, then copies the opponent's decision in the next round thereafter. This article intends to replicate his game results.
模拟实验 - Simulating
Firstly, a range of strategies are defined. I've listed all the strategies I could think of, including always_cooperation, always_defect, random, tit_for_tat and its variants, retaliatory_strike, tester, and so on. Readers can add their own strategies.
1 | def always_cooperation(we=None, them=None): |
The term "Generous Tit-for-Tat" refers to the strategy where even if the opponent chooses to defect, there's a 10% chance of cooperation in the next round. "Retaliatory_strike" strategy starts with cooperation but switches to constant defection once betrayed. The "Tester" begins with defection, chooses cooperation if the opponent also defects, but returns to defection after five continuous rounds of opponent's cooperation.
Create ten players for each strategy and let them engage in 200 transactions. However, the players (obviously) are not aware of the number of transactions. Let's define the reward and punishment mechanism for the players as follows:
1 | if st1 and st2: |
Each player starts with a score of 100,000 points. After each game (200 transactions), points are deducted from their score (480 points in this case). When their score falls below zero, they are forced to adopt an alternative strategy currently in use in the field. This methodology increases selection pressure, ensuring only the most robust strategies survive.
Simulate 1500 games and render them into a video. The full code is as follows:
1 | import random |
The videos are as follows:
It becomes clear that the "always cooperate" strategy soon faded, as did strategies that involved mindless cooperation. Although "always defect" and "retaliatory strike" dominated the scene in the medium term and caused a rapid drop in all strategies' scores, three types of "tit-for-tat" strategies eventually prevailed. This suggests that Real good guys and real devils do not exist in society, and those who govern are good people with the ability to defend themselves.
Additionally, the "generous tit-for-tat" has proven to be more successful than the regular "tit-for-tat," implying the importance of forgiving others' mistakes.
修改初始分布 - Adjusting the Initial Distribution
It seems that if there is only one "tit-for-tat," and everyone else "always betrays", this single one is unlikely to last long. Does this signify that the final outcome is related to the initial distribution of strategies. In the above program, adding strategy names to the strategy_increase list increases its initial number from 10 to 50. For instance:
1 | strategy_increase = ['tit_for_tat', 'generous_tit_for_tat'] |
All the simulations' results only have two possibilities - either the "Tit for Tat" camp wins or the "retaliatory strike" prevails. The "Tit for Tat" camp is relatively fragile. If the proportion of "always betrays" and "retaliatory strike" exceeds a certain level, it makes it challenging for "Tit for Tat" to gain superiority. Notably, even though the game could start with a dominance of "Total Cooperation", it doesn’t last long and ironically, it becomes the breeding ground for non-friendly strategies like "always betrays".
On another note, if the game starts dominated by the "Tit for Tat" strategy, then all the non-friendly strategies will soon disappear, providing space for many friendly strategies.
This highlights that love and peace cannot save the world, but law and justice can.