Derangement
Back around Janury of 2011, a friend asked me what the odds are of two people having each other as their Secret Santas in a Secret Santa group of 12. I believe this was based on a real world scenario. The question sucked me in. Below is my rambling approach to trying to solve it. I actually got to a computed answer pretty quickly, but rambled on for a while trying to find a formula for the answer instead of a brute-force approach.
So I spent a number of hours last night playing with your question. It is not so simple. In the end, I had to look up some of the formulas from probability that I had forgotten.
The first question is how many ways are there for 12 people to have a Secret Santa arrangement. A valid arrangement is everybody has been assigned somebody that is not themselves. If we represent the configuration by a string of n digits, where the digit at place i designates whom person i has to buy a gift for, then any valid configuration is one where the digit at place i is not equal to i.
For one person, there is no valid configuration. For two people, there is only one valid configuration: 21. For three people, there are two valid configurations: 231 and 312. For four people, there are 9 valid configurations: 2143 2413 2341 3142 3421 3412 4123 4321 4312.
These configurations where digit i not equal to i are called derangements, which I think is a wonderful word. Hence the title of this page. Here is the wikipedia page. They are somewhat complicated. Which is why I had to look them up for sets bigger than 5. My attempts to deduce the math failed.
Anyway, in the derangement for 12, there are 176,214,841 valid configurations. This means if you're doing a Secret Santa with 12 people, there are more than 176 million ways for people to be assigned to each other.
So after thinking about it for a while, I decided the probability for at least one pair is simply how many ways can you pick a pair from 12 multiplied by how many ways you can configure the remaining ten. Since we don't care about the order of the pair, the number of ways you can pick a pair from 12 is 12*11/2, which is 66. The number of ways you can configure the other 10 is simply the number of derangements for 10, which is 1334961. So I figured the probability would be 66 * 1334961 / 176214841, which is just a touch over 50%. My gut feeling was the answer would be somewhere between a quarter and a half, so this seemed to confirm my instinct.
I could not, however, find anyone online who had tried to solve the same problem. So I figured I would try out my formula on smaller sets where I can check the results. Unfortunately, when I tried it on a set of 4, I got (4*3)/2 * !2 / 9 which is 6/9. The correct answer is 3/9. The problem, it turns out, is that I am double counting pairs. Because if one set of numbers is a pair, then the other will be too. So I can't count the pair 12 and separately the pair 34, because they will only occur together.
My formula is accurate for a set of 5, though. Because a set of five can only have at most one pair. Hence no doublecounting. For six, my formula is slightly off again, I get 135/265 and I think the correct answer is 105/265. But I'm not sure. So the problem is my formula double counts pairs in some cases, but I do not know how to account for that.
So my next approach was to consider how many ways a set of n people can be configured. A set of two people can only be a pair, a set of three people can only be a triple. A set of 4 can be a loop of 4 or two pairs. A set of 5 can be a loop of 5 or a pair and a triple. And so on.
For twelve people, I worked out the following valid configurations:
12
10+2
9+3
8+4
8+2+2
7+5
7+3+2
6+6
6+4+2
6+3+3
6+2+2+2
5+5+2
5+4+3
5+3+2+2
4+4+2+2
4+3+3+2
4+2+2+2+2
3+3+3+3
3+3+2+2+2
2+2+2+2+2+2
What bothers me, though, is that I just came up with 20 combinations, and last night at home I came up with 21. So what I did then was calculate how many possible configurations each of the above options had. Then by dividing by 176214841 I got the probability of each option. Unfortunately, the probabilities I calculated for each of them added up to 166%. So again it seems like I'm counting stuff double.
Anyway, I'll take another look when I get home and see if I can find my mistake. And also why I got 20 options now, but 21 last night.
OK. Got home. Checked my notes from last night. My list above is missing 4+4+4. Sigh.
Anyway, while eating dinner, I thought about this, and I decided that if I could work out all the combinations up to a set of 6, then I could maybe get a better idea of what I'm counting double and how to account for that. The number of derangements for a set of six is only 265, so I figured I can handle working those out. And then I realized: "Hey! I'm a computer scientist! I should be making a computer do this for me."
So at first my plan was to write a program that only printed out the valid derangements for a set of six. But then I thought why not do it proper and have the computer just figure this out for me? So I did. I wrote this program in C (my favorite programming language):
#include <stdio.h> #include <stdlib.h> int* theArray; int len; int countP = 0; int countD = 0; int* countPairs; int countAtLeastOnePair = 0; void initArray() { int i; theArray = (int*) malloc(len * sizeof(int)); for (i=0;i<len;++i) theArray[i] = i; countPairs = (int*) malloc((len/2+2) * sizeof(int)); for (i = 0; i<len/2+1;++i) countPairs[i] = 0; } void printArray() { int i; printf( "%d", theArray[0]); for (i = 1; i < len; ++i) printf(" %d", theArray[i]); printf("\n"); } int isDeranged() { int d = 1; int i; for (i = 0;i< len;i++) { if (theArray[i] == i) { d = 0; break; } } return d; } int numPairs() { int c = 0; int i, j; for (i = 0; i < len-1; i++) { for (j = i+1; j < len; j++) { if (theArray[i] == j && theArray[j] == i) ++c; } } return c; } void swap( int a, int b) { int temp; temp = theArray[a]; theArray[a] = theArray[b]; theArray[b] = temp; } void permute(int i, int n) { int j, c; if (i==n) { if (isDeranged()) { printArray(); countD++; c = numPairs(); countPairs[c]++; if (c > 0) countAtLeastOnePair++; } countP++; } else { for (j=i;j<n;j++) { swap(i,j); permute(i+1,n); swap(i,j); } } } int main(int argc, char* argv[]) { int i; if (argc == 1) len = 4; else len = atoi(argv[1]); if (len <= 0) len = 4; initArray(); permute(0,len); printf("\n\nStats:\n Permutations: %d\n", countP); printf(" Derangements: %d\n", countD); printf("\nPairs:\n"); for (i = 0; i <= len/2;++i) printf(" %2d: %d\n", i, countPairs[i]); printf("\nDerangements with at least one pair: %d\n", countAtLeastOnePair); printf("Odds of a pair: %2.2f%%\n", (100.0*countAtLeastOnePair)/countD); return 0; }
So in C, a program starts in the main function. So that's where things start. The main function checks to see if there is a command line argument. If so, it sets the size of the set to be that argument, otherwise it defaults to 4. It then recursively works through every permutation of the string representing who is gifting who. For each permutation, it checks if this permutation is a valid derangement. If it is, it bumps the count of derangements. It also counts how many pairs the derangement has. If more than 0, it bumps the count of derangements with at least one pair.
In the end, it prints out all the derangements and some statistics. Let's have a look:
derange 1 Stats: Permutations: 1 Derangements: 0 Pairs: 0: 0 Derangements with at least one pair: 0 Odds of a pair: -nan%
Ooops. For one person, there are 0 derangements. To calculate the odds of a pair, I divide the derangements with at least one pair by the total number of derangements. Division by 0. Hence -nan (Not A Number) as the result. Let's move on to size 2.
derange 2 1 0 Stats: Permutations: 2 Derangements: 1 Pairs: 0: 0 1: 1 Derangements with at least one pair: 1 Odds of a pair: 100.00%
The only valid derangement for size 2 is if person 0 gifts person 1 and person 1 gifts person 0 (I start with 0 since I'm a computer scientist and we like starting with 0). This configuration is a pair, so the odds of a pair with two people is 100%. Go figure. But the program is getting the right answers so far. Let's try 3.
derange 3 1 2 0 2 0 1 Stats: Permutations: 6 Derangements: 2 Pairs: 0: 2 1: 0 Derangements with at least one pair: 0 Odds of a pair: 0.00%
There are two valid configurations for three people, none of which involve a pair. Odds of a pair 0%. Program is looking good. On to 4.
derange 4 1 0 3 2 1 2 3 0 1 3 0 2 2 0 3 1 2 3 0 1 2 3 1 0 3 2 1 0 3 2 0 1 3 0 1 2 Stats: Permutations: 24 Derangements: 9 Pairs: 0: 6 1: 0 2: 3 Derangements with at least one pair: 3 Odds of a pair: 33.33%
This is where it really starts to get interesting. The result matches the result I had manually calculated last night. The derangements with pairs are 1032, 2301, and 3210. The others are all loops of four. Next up, five!
derange 5 Stats: Permutations: 120 Derangements: 44 Pairs: 0: 24 1: 20 2: 0 Derangements with at least one pair: 20 Odds of a pair: 45.45%
At this point I turned off printing the individual derangements, since it was getting to be too much. But for five, my formula works: n * (n-1)/2 * !(n-2) = 5*4/2*!3. !n is the notation for the number of derangements for n. !3 we already know is 2, so my formula says 20, the program says 20, life is good. What is relevant here is to look at the printout of how many derangements had how many pairs. Here we see there were only derangements with 0 or 1 pair. Hence my formula can't double count anything. Hence why it got 5 right. Let's move on.
derange 6 Stats: Permutations: 720 Derangements: 265 Pairs: 0: 160 1: 90 2: 0 3: 15 Derangements with at least one pair: 105 Odds of a pair: 39.62%
OK, things look good for my program. The 105 derangements with at least one pair matches the 105 I calculated manually last night. So my program is right, but again, my formula is now definitely proven wrong. It says 6*5/2*!4. !4 is 9, so we get 15*9 = 135. My formula is off by 30. But this is interesting. In the listing of how many derangements have how many pairs, we see that 90 have one pair and 15 have three pairs. The way my formula works, for those configurations with three pairs, my formula actually counts them three times, which is twice more than correct. Twice 15 is 30. There is a pattern! Perhaps I can find a mathematical solution! Let's see what 7 says.
derange 7 Stats: Permutations: 5040 Derangements: 1854 Pairs: 0: 1140 1: 504 2: 210 3: 0 Derangements with at least one pair: 714 Odds of a pair: 38.51%
OK. What does my formula say? If my theory holds, my formula should give me a result that is 210 bigger than the correct result, since there are 210 configurations with two pairs. Let's see: 7*6/2*!5 = 21 * 44 = 924. 714 + 210 = 924. Ha! OK. Let's look at 8.
derange 8 Stats: Permutations: 40320 Derangements: 14833 Pairs: 0: 8988 1: 4480 2: 1260 3: 0 4: 105 Derangements with at least one pair: 5845 Odds of a pair: 39.41%
OK, so my formula should be over the correct result by 1260+3*105= 1575. Formula says 8*7/2*!6 = 28*265 = 7420. 1575+5845=7420. OK. So I now have a better feel for what my formula is double counting. But I still need a formula for how many configurations with 2, 3, 4, ... pairs a given set will have. I am not sure about that math yet.
Anyway, I'm going to take a break for a little while. But I've already let the program calculate the answer to your question. So here it is:
derange 12 Stats: Permutations: 479001600 Derangements: 176214841 Pairs: 0: 106877320 1: 53450496 2: 13347180 3: 2217600 4: 311850 5: 0 6: 10395 Derangements with at least one pair: 69337521 Odds of a pair: 39.35%
And hey, at least my instinct on the number was correct. It is between a quarter and a half.
OK. It is Wednesday morning, the 19th. I got to work a little while ago and am now procrastinating a bit. I just wanted to check one thing, which is that my "pattern" or "theory" for how much my formula is off by still holds for n=12. Looking at the pairs above, I should be off by 13347180 + 2 * 2217600 + 3 * 311850 + 5 * 10395 = 18769905. Anyway, my formula says 12*11/2 * !10 = 66 * 1334961 = 88107426. 69337521 + 18769905 = 88107426. So the pattern holds. Now I just need to figure out how to mathematically evaluate the pattern.
So for every possible number of pairs, I need to calculate how many possibilities there are.
But for now, I need to get back to work.
At this point my friend added another wrinkle to the quesion, asking for the distinction between the odds of exactly one pair vs at least one pair.
So the change is simple. In the function main, I changed
printf("Odds of a pair: %2.2f%%\n", (100.0*countAtLeastOnePair)/countD);
to
printf("Odds of exactly one pair: %2.2f%%\n", (100.0*countPairs[1])/countD); printf("Odds of at least one pair: %2.2f%%\n", (100.0*countAtLeastOnePair)/countD);
So without any further bla bla bla, here are the results for a group of 12 people:
derange 12 Stats: Permutations: 479001600 Derangements: 176214841 Pairs: 0: 106877320 1: 53450496 2: 13347180 3: 2217600 4: 311850 5: 0 6: 10395 Derangements with at least one pair: 69337521 Odds of exactly one pair: 30.33% Odds of at least one pair: 39.35%
Exactly one pair is still pretty likely. I don't have all my notes here at work, and I'm not sure if "exactly one pair" is easier than "at least one pair" or not. I do not think it is harder than "at least one pair."
I still find it odd that I can't find anyone online talking about this specific problem. Perhaps I will ask my math professor friend Rebecca what she thinks. Perhaps I'm just bad at finding the right google search terms. But most people talking about secret santa probability talk about stuff like the odds that nobody will draw themselves on the first round and such. There are some interesting probability themes to secret santa, though.
It is technically the 24th of January, 40 minutes past midnight. Yesterday I had started having doubts that the number of ways for a loop of 12 was really 11! 11! is 39916800, and 39916800 / 176214841 is 22.65%. As mentioned above, when I added up my odds for all the combinations I came up with, I got to 166%. So it seemed unlikely that more than a fifth of the time there would be one big loop.
So I modified my program some more. I added a function to check if a configuration is one big loop. The function looks like this:
int isBigLoop() { int l = 1; int i = 0; int c = 0; while (i < len-1) { c = theArray[c]; if ( c == 0 ) { l = 0; break; } i++; } return l; }
I then have my program also count how many big loops there are and add that to the output. So now the output for 12 is:
Stats: Permutations: 479001600 Derangements: 176214841 Full length loops: 39916800 Pairs: 0: 106877320 1: 53450496 2: 13347180 3: 2217600 4: 311850 5: 0 6: 10395 Derangements with at least one pair: 69337521 Odds of a pair: 39.35%
So for 12 the number of full length loops is 39916800. Which is 11!. So 11! must be right. So I need to figure out where I messed up that is causing my probabilities for the 21 ways of making groups in 12 people to add up to 166%.
Anyway, working on my program. I want to write a function to tell me how long the longest loop in a given configuration is. I was gonna just slightly modify my isBigLoop function, but then I realized that the longest loop in a given configuration will not always start with the first element, so it's not that easy.
Anyway, the reason for computing how many configurations have a longest loop of n is to check if the number of 10-2 configurations is really 66*9!. Since we already know the number of 12 configurations is 11!. So here's the function I ended up writing to determine the longest loop size:
int longestLoop() { int m = 0; int i = 0; int c = 0; for (i = 0; i < len; i++) { int j = 1; c = theArray[i]; while ( c != i ) { c = theArray[c]; j++; } if (j > m) m = j; } return m; }
Here's the output for 12:
Stats: Permutations: 479001600 Derangements: 176214841 Full length loops: 39916800 Pairs: 0: 106877320 1: 53450496 2: 13347180 3: 2217600 4: 311850 5: 0 6: 10395 Longest loops: 2: 10395 3: 800800 4: 6756750 5: 16765056 6: 22730400 7: 25090560 8: 22453200 9: 17740800 10: 23950080 11: 0 12: 39916800 Derangements with at least one pair: 69337521 Odds of exactly one pair: 30.33% Odds of at least one pair: 39.35%
So this says the number of 10-2 configurations is 23950080. Which equals 66*9!. So this looks good. Of course, this doesn't really help me on my path to a nice formula, but it at least makes me feel good to know I'm on the right path.
Anyway, the first version of my program I wrote on the linux server that hosts my web page. But now that I have my fancy new computer, I wanted to run it locally, so in addition to the couple changes listed above, I also made a couple changes to make in run on Windows. So just so you're fully updated, here is the complete program as it currently looks:
#include "stdafx.h"
#include
It is now Monday afternoon, January 24. I am tired because I stayed up until almost 5 am. Silly me. Anyway, I decided to work through the math for all 21 combinations for 12 people, so that I can verify for sure that that is a valid approach. So I made an excel spreadsheet to make the combinations easy for me. So, I'm going to write Cn(n,k) to mean n choose k. So Cn(12,2) means 12 choose 2, a.k.a. 66. Here's what I started with:
12 11! 10+2 Cn(12,2) * 9! 9+3 Cn(12,3) * 2! * 8! 8+4 Cn(12,8) * 7! * 3! 8+2+2 Cn(12,8) * 7! * Cn(4,2) 7+5 Cn(12,7) * 6! * 4! 7+3+2 Cn(12,7) * 6! * Cn(5,3) * 2! 6+6 Cn(12,6) * 5! * 5! 6+4+2 Cn(12,6) * 5! * Cn(6,4) * 3! 6+3+3 Cn(12,6) * 5! * Cn(6,3) * 2! * 2! 6+2+2+2 Cn(12,6) * 5! * Cn(6,2) * Cn(4,2) 5+5+2 Cn(12,5) * 4! * Cn(7,5) * 4! 5+4+3 Cn(12,5) * 4! * Cn(7,4) * 3! * 2! 5+3+2+2 Cn(12,5) * 4! * Cn(7,3) * 2! * Cn(4,2) 4+4+4 Cn(12,4) * 3! * Cn(8,4) * 3! * 3! 4+4+2+2 Cn(12,4) * 3! * Cn(8,4) * 3! * Cn(4,2) 4+3+3+2 Cn(12,4) * 3! * Cn(8,3) * 2! * Cn(5,3) * 2! 4+2+2+2+2 Cn(12,4) * 3! * Cn(8,2) * Cn(6,2) * Cn(4,2) 3+3+3+3 Cn(12,3) * 2! * Cn(9,3) * 2! * Cn(6,3) * 2! * 2! 3+3+2+2+2 Cn(12,3) * 2! * Cn(9,3) * 2! * Cn(6,2) * Cn(4,2) 2+2+2+2+2+2 Cn(12,2) * Cn(10,2) * Cn(8,2) * Cn(6,2) * Cn(4,2)
So I summed up those 21 products and got 253473792. Which is not equal to the 176214841 I expected. But since my program is awesome, I already know that there should be 10395 combinations for 2+2+2+2+2+2, but my calculation says 7484400. So quickly dividing 7484400 by 10395, I get 720. Which I know is 6! Which leads me to an Aha! moment.
The calculation I am doing treats the first pair being AB distinctly from the second pair being AB. For our purposes, pair 1 being AB and pair 2 being CD is the same as pair 1 being CD and pair 2 being AB. So we need to play with the math a bit more. For every combination that has a certain loop length more than once, we need to divide by that loop length factorial.
So I worked out what I need to divide each grouping by:
12 1 10+2 1 9+3 1 8+4 1 8+2+2 2 7+5 1 7+3+2 1 6+6 2 6+4+2 1 6+3+3 2 6+2+2+2 6 5+5+2 2 5+4+3 1 5+3+2+2 2 4+4+4 6 4+4+2+2 2*2 4+3+3+2 2 4+2+2+2+2 24 3+3+3+3 24 3+3+2+2+2 2*6 2+2+2+2+2+2 720
So now when I divide by these values and then sum up, I get the expected 176214841. Yay!
Anyway, here's a picture of my Excel spreadsheet:
All this, and sadly I still don't have my formula.
Anyway, I decided to put what I had so far into a nice formula format, just to see how far away I am. It looks like this:
So I need to figure out if there's a generic way to represent "how many configurations have i pairs." And then I need to figure if I can simplify the summation, since a summation in a formula means you can't just quickly plug values in and get an answer. So I'm going to have to think about this a bit more.
Also, that formula is of course for "at least one pair" and not for "exactly one pair".
This is as far as I ever got