blimix: Joe leaning way out at a waterfall (waterfall)
[personal profile] blimix
I posted puzzles about Rock-Paper-Scissors on Thursday. One person got both answers, in the comments on DreamWidth, and you can check that out if you want the short version. I'll throw in extra thoughts here, including an idea of how even the "right" answer might become problematic.

Problem 1: Good old Rock. The payout from the loser to the winner is as follows: $9 if the winner throws Rock; $3 if the winner throws Scissors; $1 if the winner throws Paper.

My solution method was totally cheating: Mathematical intuition suggested that the answer involved throwing Rock, Paper and Scissors in proportions that were the same as the proportions of the payouts (9:3:1). Which proportion should be assigned to which option isn't entirely obvious, and while my first thought was to play Rock 9/13 of the time, Scissors 3/13 of the time, and Paper 1/13 of the time, which isn't right, I (unaccountably) wrote it down as 9/13 Paper, 3/13 Rock and 1/13 Scissors, which did work.

I'll get to a mathematically valid way of arriving at the answer shortly. For now, let's see how to test an answer.

The opponent knows my strategy, so they know the expected value (average result) of each play they could make. If they play Rock, they have a 1/13 chance of winning $9 against Scissors, and a 9/13 chance of losing $1 to Paper. That gives a total expected value (EV) of $0 for Rock. If they play Paper, they have a 3/13 chance of winning $1 against Rock, and a 1/13 chance of losing $3 to Scissors. So playing Paper has an EV of $0. If they play Scissors, they have a 9/13 chance of winning $3 against Paper, and a 3/13 chance of losing $9 to Rock. Playing Scissors has an EV of $0.

There are two ways to know that that's an ideal solution:

1. As per the hint I gave with the puzzle, all of your opponent's moves have the same expected value. They have no reason to choose one over the other. If you changed the odds in your strategy at all, it would make at least one of their moves better, and at least one of their moves would be worse. For example, if you got greedy and made Rock a little more likely, then Paper would become your opponent's best move, and they could start throwing Paper every time, doing better than they could have if you had left the solution alone. So the existing solution is ideal.

2. Your opponent has the advantage in this puzzle: You're never going to beat them, because at worst they could just duplicate your strategy and tie. Since this solution causes a tie, and you can't do better than a tie, this is an ideal solution.

All right, now let's derive the solution in a way that wouldn't get a failing mark if you turned it in as homework.

We have an option now: We could be optimistic, and assume that there exists some solution that causes a tie. (If that's incorrect, it won't give us a wrong answer: It'll give us no answer, or maybe something stupid like an imaginary answer.) So we have to set the EV for their plays of Rock, Paper, and Scissors to 0. Let's call the probability of our throwing Rock, Paper, and Scissors R, P, and S respectively. Now let's mathematize some statements.

A. If they play Rock, they win to Scissors as much as they lose to Paper: 9S = 1P
B. If they play Paper, they win to Rock as much as they lose to Scissors: 1R = 3S
C. If they play Scissors, they win to Paper as much as they lose to Rock: 3P = 9R
D. You may play Rock, Paper, or Scissors, but nothing else: R + P + S = 1

Note that statements A, B and C are slightly redundant: Any two of them will imply the third one. So let's ignore statement C, and start solving. Rephrase statement D, but substitute "3S" for "R" (statement B) and "9S" for "P" (statement A).
3S + 9S + S = 1
13S = 1
S = 1/13
Now, by statement A, P = 9/13, and by statement B, R = 3/13. So we have our answer, and as we saw above, it checks out.

But what if you're not so optimistic? What if you aren't confident that the solution yields a tie? It turns out that there's still barely enough information to derive it, knowing only that you will allow your opponent no single "best play".

A. Their EVs from playing Rock and Paper are the same: 9S - 1P = 1R - 3S
B. Their EVs from playing Rock and Scissors are the same: 9S - 1P = 3P - 9R
C. Their EVs from playing Paper and Scissors are the same: Never mind; it's implied by statements A and B.
D. You may play Rock, Paper, or Scissors, but nothing else: R + P + S = 1

Now you have three separate statements concerning three variables, which is solvable with simultaneous equations, matrices, or whatever mathematical voodoo you feel like using. I did it just for kicks, but there's not much point in writing it all out here.

Problem 2: On the House. The payouts are the same as in problem 1, but instead of coming from the loser, they come from the bank. Does your strategy change, and if so, how?

My initial thought, which was shared by others, but not quite correct, ran thusly: Playing a single round of this, you could probably do okay by throwing Paper with high probability, and Rock with low (but not very low) probability; they'll pick Scissors to beat your likely Paper, but if your random number generator gives you Rock, then you beat their Scissors. On average, you might do okay. But what if the game is played over and over? There was, some time ago, a competition held between various programs for handling an iterated Prisoner's Dilemma. (Short version: Each turn, you can cooperate with or betray the other player. Cooperating hurts you, but betraying hurts them way more.) The winning program didn't always cooperate or betray; it used a "tit-for-tat" algorithm that started by cooperating, and then just did the same thing the opposing program had done on the previous turn. So it would cooperate with another cooperative program, but not with a program that didn't play nicely. The best "cooperation" one could hope for in this game is to take turns playing Rock and Scissors, so that the two players can split the $9/turn winnings. So you'd want to design a strategy that tries to cooperate that way, until it gets screwed by the opponent, at which point it reverts to the optimal "single round" strategy (which would presumably be worse for both players). The opponent, knowing this strategy, would continue to cooperate out of self-interest. Clearly, this would be the optimal strategy, since you're getting half of the theoretical maximum payout, and you're never going to do better than your opponent (given their advantage of knowing your strategy).

Convincing, huh? But then, start tweaking that best "single round" strategy, and things get strange. If you start off with a big chance of Paper and a small chance of Rock, your opponent loves to play Scissors! The large chance of winning $3 is way better than the small chance to win $1 by playing paper. (What do they care whether you sometimes win $9? It's not coming out of their pocket now.) That means you have quite a bit of leeway to increase your Rock probability and still keep them playing Scissors. In fact, since a Scissors win is three times as good as a Paper win for them, you could make your Rock play just about three times as likely as your Paper play. That's huge for you!

So play Rock 3/4 of the time. (Actually, just slightly under 3/4, so your opponent doesn't decide that Paper is just as good as Scissors. You could call it 0.74999999, or 3/4 - epsilon, or whatever. We'll assume you've done it, and just call it 3/4 for simplicity.)

Now what do you win? This is easy, since your opponent always plays Scissors: You get nothing when you play paper, 1/4 of the time, and you get $9 when you play rock, 3/4 of the time. So your average winnings are $6.75. Whoa!

Meanwhile, your opponent is winning $3, 1/4 of the time. Their average winnings are $0.75. Sucks to be them! And yet, they still have no better response to your strategy. I find that a bit mind-blowing.

In plain English terms, why did this solution turn out so far from the intuition that you can't beat the opponent? I'll put some of my thoughts here, but would be happy to hear yours in the comments.

Once the money started coming from the bank, your opponent stopped caring how much you won. With the asymmetric nature of the payouts, your challenge wasn't to beat your opponent. Rather, it was to set up a situation in which their best move would be for them to do exactly what you wanted.

But what about their advantage? They know your plan, and get to respond to it as best they can. Why didn't that help them? In this case, it was your advantage! Imagine two parties negotiating a deal that's beneficial to both of them, but each is trying to maximize their own benefit. They can offer, counter-offer, threaten to walk away from an unfair deal, etc. This case is a little like that, except not at all, because you get to make the first and only offer! In this case, "They know your strategy" means "They heard your offer." And your offer is to let them keep $0.75 if they play by your rules. If they do anything else, they will get less. They can't do anything to communicate with you, which means that they cannot counter-offer. And I originally stipulated that they would play only to maximize their own profit, with no spite. Therefore, there is no threat of them walking away if they don't like the deal. They just have to suck it up.

When you remove that stipulation, you get a different sort of game. There's a very simple version, whose name I forget. It goes like this: Player 1 is offered a certain amount of free money (e.g., from "the bank"). But Player 2 has veto power over this gift. The entire game consists of Player 1 choosing a percentage of the money to give to Player 2, and then Player 2 deciding whether or not the players get the money at all.

If Player 2 thinks the percentage is insultingly small, they'll veto the gift. On the surface, this looks like pure spite (because even getting a little bit is better than getting nothing), but it's sound strategy to make Player 1 expect this behavior: Player 1 is then forced to give a larger percentage. As I recall, Americans did particularly poorly at this game: They tended to expect too much and offer too little, resulting in many more vetoes. (BTW, does anyone else remember this game? Were there iterated rounds, or was any communication allowed?)

Now to screw with all our heads a little more: How clever is your "opponent" in my puzzle? You know their strategy by virtue of knowing that they will use whatever strategy will maximize their winnings. (In this case, "Always play Scissors.") But what if their strategy isn't, "Maximize winnings, given the strategy that Player 1 uses," but just, "Maximize winnings at any cost"? What if their best strategy is to screw you over (playing suboptimally) if you choose a strategy that allows them less than $3 per turn? If they know that you are trying to maximize your winnings, and they expect you to believe that this strategy of theirs is optimal, then, in a case of self-fulfilling prophecy, it becomes optimal! You're going to allow them more of the winnings (e.g., play Paper more) because you'll both get hosed if you don't.

In this line of thought, even though there is no communication, negotiation for a cut of the profits is possible simply because both players are essentially mind readers, by virtue of each one knowing that the other will play the best game possible. If the "best game possible" is, in the long view, a stubborn refusal to settle for anything less than half, we might be pushed right back to the $4.50/turn Rock/Scissors cooperative game!

But what makes "half" the optimal answer for an asymmetric game? Even the "gift percentage" game isn't about splitting the gift evenly; it's about giving enough to appease the other player. The ball is sort of in your court in this game, and you could force your opponent to accept $0.75. On the other hand, they could punish you (strategically, not vindictively) for that strategy by always playing Paper. They would make almost the same profit, and cost you all of yours. So, your opponent is arguably in a superior bargaining position. From there, maybe their "optimal" strategy is really to settle for no less than 2/3 of the profits!

But how would anyone figure that out? (I imagine that something like this must be covered in game theory.)

And is anyone else reminded of the battle in Jet Li's "Hero" that took place within the opponents' minds, as they sat there and figured out who would win?
Page generated Jan. 9th, 2026 01:38 am
Powered by Dreamwidth Studios