Birthday probability problem | Probability and Statistics | Khan Academy


One of you all sent a fairly
interesting problem, so I thought I would work it out. The problem is I have a
group of 30 people, so 30 people in a room. They’re randomly
selected 30 people. And the question is what is the
probability that at least 2 people have the same birthday? This is kind of a fun question
because that’s the size of a lot of classrooms. What’s the probability that at
least someone in the classroom shares a birthday with someone
else in the classroom? That’s a good way
to phrase as well. This is the same thing as
saying, what is the probability that someone shares with
at least someone else. They could share it with 2
other people or 4 other people in the birthday. And at first this problem seems
really hard because there’s a lot of circumstances
that makes this true. I could have exactly 2 people
have the same birthday. I could have exactly 3 people
have the same birthday. I could have exactly 29 people
have the same birthday and all of these make this true, so do
I add the probability of each of those circumstances? And then add them up and then
that becomes really hard. And then I would have to
say, OK, whose birthdays and I comparing? And I would have to
do combinations. It becomes a really difficult
problem unless you make kind of one very simplifying
take on the problem. This is the opposite of–
well let me draw the probability space. Let’s say that this is
all of the outcomes. Let me draw it with
a thicker line. So let’s say that’s all
of the outcomes of my probability space. So that’s 100% of the outcomes. We want to know– let me draw
it in a color that won’t be offensive to you. That doesn’t look that
great, but anyway. Let’s say that this is the
probability, this area right here– and I don’t know
how big it really is, we’ll figure it out. Let’s say that this is the
probability that someone shares a birthday with
at least someone else. What’s this area over here? What’s this green area? Well, that means if these are
all the cases where someone shares a birthday with someone
else, these are all the area where no one shares a
birthday with anyone. Or you could say, all 30 people
have different birthdays. This is what we’re
trying to figure out. I’ll just call it the
probability that someone shares. I’ll call it the probability
of sharing, probability of s. If this whole area is area 1 or
area 100%, this green area right here, this is going
to be 1 minus p of s. This is going to be
1 minus p of s. Or if we said that this is the
probability– or another way we could say it, actually this is
the best way to think about it. If this is different, so
this is the probability of different birthdays. This is the probability
that all 30 people have 30 different birthdays. No one shares with anyone. The probability that someone
shares with someone else plus the probability that no one
shares with anyone– they all have distinct birthdays–
that’s got to be equal to 1. Because we’re either going to
be in this situation or we’re going to be in that situation. Or you can say they’re
equal to 100%. Either way, 100% and 1
are the same number. It’s equal to 100%. So if we figure out the
probability that everyone has the same birthday we could
subtract it from 100. So let’s see. We could we just rewrite this. The probability that someone
shares a birthday with someone else, that’s equal to 100%
minus the probability that everyone has distinct,
separate birthdays. And the reason why I’m doing
that is because as I started off in the video, this is
kind of hard to figure out. You know, I can figure out the
probability that 2 people have the same birthday, 5 people,
and it becomes very confusing. But here, if I wanted to just
figure out the probability that everyone has a distinct
birthday, it’s actually a much easier probability
to solve for. So what’s the probability
that everyone has a distinct birthday? So let’s think about it. Person one. Just for simplicity, let’s
imagine the case that we only have 2 people in the room. What’s the probably that they
have different birthdays? Let’s see, person one, their
birthday could be 365 days out of 365 days of the year. You know, whenever
their birthday is. And then person two, if we
wanted to ensure that they don’t have the same birthday,
how many days could person two be born on? Well, it could be born
on any day that person one was not born on. So there are 364
possibilities out 365. So if you had 2 people, the
probability that no one is born on the same
birthday– this is just 1. It’s just going to be
equal to 364/365. Now what happens if
we had 3 people? So first of all the
first person could be born on any day. Then the second person could
be born on 364 possible days out of 365. And then the third person,
what’s the probability that the third person isn’t
born on either of these people birthdays? So 2 days are taken up, so
the probability is 363/365. You multiply them out. You get 365 times 36– actually
I should rewrite this one. Instead of saying this is 1,
let me write this as– the numerator is 365 times
364 over 365 squared. Because I want you
to see the pattern. Here the probability is 365
times 364 times 363 over 365 to the third power. And so, in general, if you just
kept doing this to 30, if I just kept this process for 30
people– the probability that no one shares the same birthday
would be equal to 365 times 364 times 363– I’ll have
30 terms up here. All the way down to what? All the way down to 336. That’ll actually be 30
terms divided by 365 to the 30th power. And you can just type this into
your calculator right now. It’ll take you a little time to
type in 30 numbers, and you’ll get the probability that no one
shares the same birthday with anyone else. But before we do that let
me just show you something that might make it a
little bit easier. Is there any way that I can
mathematically express this with factorials? Or that I could mathematically
express this with factorials? Let’s think about it. 365 factorial is what? 365 factorial is equal to 365
times 364 times 363 times– all the way down to 1. You just keep multiplying. It’s a huge number. Now, if I just want the 365
times the 364 in this case, I have to get rid of all of
these numbers back here. One thing I could do is I
could divide this thing by all of these numbers. So 363 times 362– all
the way down to 1. So that’s the same thing as
dividing by 363 factorial. 365 factorial divided by 363
factorial is essentially this because all of these
terms cancel out. So this is equal to 365
factorial over 363 factorial over 365 squared. And of course, for this case,
it’s almost silly to worry about the factorials, but it
becomes useful once we have something larger than
two terms up here. So by the same logic, this
right here is going to be equal to 365 factorial over 362
factorial over 365 squared. And actually, just another
interesting point. How did we get this 365? Sorry, how did we get
this 363 factorial? Well, 365 minus 2
is 363, right? And that makes sense because we
only wanted two terms up here. We only wanted two
terms right here. So we wanted to divide by a
factorial that’s two less. And so we’d only get the
highest two terms left. This is also equal to– you
could write this as 365 factorial divided by 365 minus
2 factorial 365 minus 2 is 363 factorial and then you just end
up with those two terms and that’s that there. And then likewise, this right
here, this numerator you could rewrite as 365 factorial
divided by 365 minus 3– and we had 3 people– factorial. And that should hopefully
make sense, right? This is the same thing as 365
factorial– well 365 divided by 3 is 362 factorial. And so that’s equal to
365 times 364 times 363 all the way down. Divided by 362 times
all the way down. And that’ll cancel out with
everything else and you’d be just left with that. And that’s that right there. So by that same logic, this top
part here can be written as 365 factorial over what? 365 minus 30 factorial. And I did all of that just so
I could show you kind of the pattern and because this is
frankly easier to type into a calculator if you know where
the factorial button is. So let’s figure out what
this entire probability is. So turning on the calculator,
we want– so let’s do the numerator. 365 factorial divided by–
well, what’s 365 minus 30? That’s 335. Divided by 335 factorial and
that’s the whole numerator. And now we want to divide
the numerator by 365 to the 30th power. Let the calculator think
and we get 0.2936. Equals 0.2936. Actually 37 if you rounded,
which is equal to 29.37%. Now, just so you remember what
we were doing all along, this was the probability that no one
shares a birthday with anyone. This was the probability of
everyone having distinct, different birthdays
from everyone else. And we said, well, the
probability that someone shares a birthday with someone else,
or maybe more than one person, is equal to all of the
possibilities– kind of the 100%, the probability space,
minus the probability that no one shares a birthday
with anybody. So that’s equal to
100% minus 29.37%. Or another way you could write
it as that’s 1 minus 0.2937, which is equal to– so if I
want to subtract that from 1. 1 minus– that just
means the answer. That means 1 minus 0.29. You get 0.7063. So the probability that someone
shares a birthday with someone else is 0.7063–
it keeps going. Which is approximately
equal to 70.6%. Which is kind of a neat result
because if you have 30 people in a room you might say,
oh wow, what are the odds that someone has the same
birthday as someone else? It’s actually pretty high. 70% of the time, if you have a
group of 30 people, at least 1 person shares a birthday
with at least one other person in the room. So that’s kind of
a neat problem. And kind of a neat result
at the same time. Anyway, see you in
the next video.

About the author

Comments

  1. @renduke That's not the same question and definitely not the same probability. To use your analogy you should write on 30 notes random numbers between 1 to 365, now after your finished writing the numbers go check if the same number appears on at least 2 notes. 70% of the times you will find such number. Regrading the hat – Well you can use it to take if off for Sal which is probably the best teacher in the world 🙂

  2. @abennett4 it occours in 97 of 400 years actually. Our calander operates on a 400 year cycle (including days of the week (ie it will be a Thursday on January 19th 2412, because today, January 12th 2012 its a Thursday)

    However it is also far more complicated than you have tried to simplify it there.

  3. The assumptions/simplifications not stated in this video, and I don't like people that don't state they are making assumptions/simplifications like this, but otherwise its an awesome video, are
    A. We are ignoring leap years, they do complicate things
    B. We are assuming a even distribution of birthdays throughout the year, (which doesn't happen)

  4. Surprising conclusion! 70,63%, wow!! I guess it's even higher if we take into consideration that more people are born on specific months..

  5. Hi,

    Thanks for your video. What tool are you using to make your video? I'm an educator and this is a good tool for explaining stuff to students.

    Yours

    Sam

  6. @asif26ten Yea but that's the correct answer. As he showed, about 30% of the time no one in a class of 30 shares a birthday, so your classes help account for that 30%.

  7. I know how leap year works. I wrote code for the phone company to determine day-of-week. But day-of-week has nothing to do with the probability problem. When I said " *about* 24 out of 100 years " it was an approximation (hense the word "about"). Last time I checked 24 out of 100 is approximately 97 out of 400. The point being, there being leap years does not change the birthday probability by any significant degree.

  8. It would not change the answer by any significant amount considering the probibility of being born on leap year is approximately 1 out of 1,461. So he didn't actually forget a day. He forgot approximately (leap year does not occur every 4 years – there are exceptions about 1 every 100 years) 1/4 of a day. To factor this all in would have made the problem too complicated and not have changed the answer significantly.

  9. If 24/100 is accurate enough, then so is 1/4

    I didn't think day of the week had anything to do with it either, it was just an additional factoid on the same topic.

    Congrats, I could write code for something like that pretty easily, so don't give me "I wrote code for the phone company" crap.

    Yes, I know it doesn't affect the problem posed.

  10. Thanku 4 ur help u really helped me I really apperciate your help. like if he help u with this and u now no what to do.

  11. Sorry to get your dander, you're the one working on a 400 year cycle, not me.
    Anyway, what simple algorithm would you use to determine day of week?

  12. Let Monday be day 0 (Tuesday 2, Wednesday 3 etc.)

    X = (Year – 1900)

    DayCounter = X + X/4 – X/100 + X/400
    Day = DayCounter modulo 7.

    That day is the start of the year. There is a few other things to do, but it's not hard. just a couple of lines of code.

  13. A lot more lines of code, actually. The simpler, more eloquent method is to simply set a benchmark (say 11/22/1963, a Friday) and just count forwards (Friday, Saturday, Sunday…) or backwards (Friday, Thursday, Wednesday…) until you reach the date in question. Much better!

  14. The significant change would be that instead of this video last 13 minutes, it would last 13 hours. 13 hours of no one making babies thus complicating the birthday probaility problem even more.

  15. Where did 53,917 come from? My comment was meant to be humorous in that nobody watching a 13 hour video on a Birthday Problem would be having sex at the time. Lighten up!

  16. to my knowledge, factorials are not defined for negative values so the equation P(s)=1-365!/((365-n)!*365^n) is only valid in the domain 1 to 365. so i would make the equation P(s)=1-(365!/((365-n)!*365^n))(sgn(365.5-n)/2+0.5) so the probability is always 1 when n is lager than 365.

  17. In most classrooms, the kids are (at least supposed to be) born in the same year. So that year is either a leap year, or its not. If not, the odds are 70.63%. If a leap year, use 366 instead of 365 for the starting number and get 70.53%.

  18. i first saw this in 1978, by a QA manager turned professor in Statistics from Bendix Corp, then a maker of auto manufacturing…..as a young undergrad, I thought it was supercool…the teacher won a coke from nearly everyone in the classroom! ….statistics can make BELIEVERS of us all- witness the global warming data! thanks for the pleasant journey back in time….but still 'real' today.

  19. Well. If there are 400 people in the room what is the probability that all of them have different birthdays?
    Hint: 400 people cannot have different birthdays, when the year only contains 365 days. So the answer to your question is that if there were 400 people in the room, you should know that the chance of minimum 2 people having the same birthday is 100%.

  20. that's what I want to know also. I wsa sure that the number of ways 30 people can have 365 distinct birthdays was 365C30

  21. He did the problem wrong for sure. This is a combination problem and he is treating it as permutation. Order doesn't matter in this case AND 70% just doesn't make sense.

  22. Great video; great problem!

    But might point out that P (probability) should not be expressed in %,
    as rigorously P is a value inside an interval [0, 1].

  23. A lot of 50 items has 40 good items and 10 bad items. Suppose we test five items from the lot. Let X be the number of defective items in the sample. Find the probability,
    P[X=k], if we sample with replacement and if we sample without replacement. I am stuck here, anyone who can assist me?

  24. excellent video. As a former math teacher (a gazillion years ago), this is something we used to do in class, too. Now I have to go back and relearn factorials.

  25. for everyone that's trying and failing in excel / calculator, etc. Try wolfram alpha and type in the formula in the search field: (365!/335!)/365^30.
    Oh and by the way: For a probability of 50% you need 23 people in one room.
    ((365!/(365-n)!)/365^n=0.5)

  26. You have explained this fantastically. I came in expecting to find some flaw but wow. Math is beautiful. Thank you for demonstrating this.

  27. he says its for "at least to people". how can you find it for only 2 people share the same bday?

  28. Thanks man, it was so useful. I have a Cryptology exam shortly, It helped to understand probability of collisions in Hash function… Thank you so much 

  29. Am happy to say that you are wrong my friend! 70.63% is only the probability that soneone shares the same day and motlnth with someone else!
    If you consider a birthdate you'll need to multiply by 74. An average for a difference between the youngest and the oldest persone among the population of 30 people.
    The result should be around 0.00% if you round the result to two decimals. Which makes sense because in certain companies, they use birthdate as a first quick ID

  30. you could also do nPr. (365P30)/(365^30)=.2937 
                                          1-.2937= .706

  31. The only problem I have with the birthday paradox, is it's based on a magical world with even distribution of birthdays throughout the year, but that doesn't actually happen.  In the US, for example, the odds of having a birthday on Dec 25 is statistically much lower than Sept 16.  Then consider the complexity leap day babies throw into the mix by technically only having a birthday once every four years.  It seems like knowing there are actually a lot more babies in September than the rest of the year, your odds of collision would actually be significantly higher. 

  32. Thank you for the careful explanations, we did this in my Math History class but I didn't get the full concept as the professor was in a bit of a rush. I don't like to plug and chug numbers into equations without knowing why the equation works.

  33. why did you consider the favourable outcomes to be the number of permutations of 30 birthdays of 365 and not the number of combinations

  34. By this math, if there are more than 365 people in a room (let's say 400 or 450), then would there be a 100% probability of at least 2 people having same birthday??

  35. Totally wrong answer. This answer assumes an equal probability of a person being born on everyday. That is not the case( How inconvenient! ). If you're going to knowingly oversimplify for the sake of illustrating a principle, at the very least state that to be the case!

    A cool thing to note is that the unequal distribution of birthdays increases the problem of two people having the same birthday, further contradicting the common intuitive conclusion =)

  36. There are some assumptions:
    1. Feb 29-th is excluded, therefore 365 days have been taken. (It doesn't make much difference even we take 366 days)
    2. All days in a year are equally likely where the b'days of each person can fall independently.
    3. The b'day events are independent of each other.
    4. There are no twins in the group (hence everybody b'days is assumed to be different).

  37. Great vid! One thing i can't quite get my head around is when we approach 100% … clearly that would happen at around 40-50 people (I haven't done the math), but surely it's technically possible for 365 people to have different birthdays? That would mean the 100% would be false. If someone could answer this I'd be appreciative!

  38. I came here from sharkee, who did not explain why the birthday paradox was indeed true. You did however, and I thank you for that.

  39. Me: How do I do a problem where I find out the probability of two people sharing a birthday?
    Every damned source: Thats a good question.. But a BETTER question is what is the chance of these people NOT sharing a birthday?

  40. Very nicely done video! Of course, some would say you "wimped out" by not taking into account February 29th. It does not change the numerical result significantly, but coming up with an exact formula in that case is not quite so easy!

    Let's see what happens if we take Feb 29th into account. Let G0 be the set of cases in which none of the 30 people were born on Feb 29th. Let G1 be the set of cases in which exactly one of the 30 people was born on Feb 29th. Let P0 be the probability that our 30 people are one of the cases in G0 and let P1 be the probability that our 30 people are one of the cases in G1. Then

    P0 = (1-L)^30 and P1 = 30L(1-L)^29, where L=[probability of a person chosen at random being born on Feb 29th]=1/[1+(365)(4)]

    There are two important facts to note about G0 and G1. First, the set of all cases in which no two people have the same birthday is contained in the union of G0 and G1. Second, all cases in G0 are equally likely and all cases in G1 are equally likely; this means that we have reduced the problem to one of counting.

    Let F0 = [fraction of cases of G0 in which no two people have the same birthday]
    =[probability that drawing a case from G0 at random results in no birthdays overlapping]

    and let F1 = [fraction of cases of G1 in which no two people have the same birthday]
    =[probability that drawing a case from G1 at random results in no birthdays overlapping]

    It is easy to see that F0 = (365!/335!)/(365^30), and F1 = (365!/336!)/(365^29). Now we can write down the probability that we seek,
    P=[probability of no two people having the same birthday]=(F0)(P0) + F(1)P(1)

    If we make all the substitutions, we can write this as

    P=[(365!/335!)/(365^30)][(365/365.25)^30][229/224]
    = [(365!/335!)/(365^30)][1.0015362190] = (0.29368375728)(1.0015362190) = 0.2941349

    The probability that at least 2 people in the room have the same birthday is (1 – P) = 0.7058651, that is, a little over 70%.

    Now, the result if we ignore February 29th, as is done in the video, that is if all years had 365 days, is
    [probability that no two people have the same birthday] = F0 = (365!/335!)/(365^30) = 0.29368375728, and
    [probability that at least two people have the same birthday] = 1 – 0.29368375728 = 0.7063162

    Thus we see that, taking leapyears into account increases the calculated probability that no two people have the same birthday by a factor of [(365/365.25)^30][229/224] = 1.0015362190, and decreases the calculated probability that at least two people have the same birthday by a factor of 0.9993612 (= 1/1.00063916). That is not a very significant difference, but it certainly was interesting to derive!

    Of course, if we wanted to be more accurate yet, we could take into account that years that are multiples of 100, but not of 400, are not leapyears; for example, 1700, 1800, 1900 and 2100 are not leapyears, but 1600 and 2000 are leapyears. As long as no one in the room was born before 1901 or after 2099, we can ignore that fact.

  41. For people of 152 and up I am getting a 100% as answer. Can this be due to lack of precision error?

  42. First birthday can come in 365 ways – 2nd in 364 – 3rd in 363 and so on…..but these 30 folks can again change their positions among themselves – why is that not considered in the solution? I think the numerator is simply selecting 30 out of 365 which is nCr and not nPr. What do you think?

  43. Did anyone notice that he did it wrong?! You’re supposed to subtract 29 instead of 30 so it would be 336! Instead of 335! In the denominator

  44. but what if you take 365 or more people ? still there is a chance that no one shares a birthday day according to your solution. but if we take 366 peoples there must be a chance that they share a birthday!

  45. Hello. The paradox of birthday is very well known but  refers to at least two people.
    Then, what is the probability if at least 3 people or n people have the same birthday? Thank you.

  46. Hello! Can you give me direction how to solve this task: The probability that at least 2 people in a room of 20 share the same MONTH of birthday?

Leave a Reply

Your email address will not be published. Required fields are marked *