# Lecture 3: Birthday Problem, Properties of Probability | Statistics 110

So I wanted to do one of the most famous
problems in probability to start with, and then we’ll go back into
the non-naive definition. At the very end of last time,
we started the non-naive definition, so we’ll get back to that in a little bit. But first, this is a problem that
everyone who studies probability should be familiar with,
that’s called the Birthday Problem. It’s a very natural problem,
some of you have seen it, I’m sure. But it’s very, very surprising to most people the first
time that they see this result. Even if you’ve seen it before, I wanted to talk more about how
to think about the problem. So the Birthday Problem
is very simple to state. The problem is just, you have a group
of people at a party, or something. And you wanna know how likely is it
that two people have the same birthday. Maybe three people will
have the same birthday, maybe there are several pairs
that have the same birthday. But how likely is it that you can
find at least one pair of people who share the same birthday? Because, of course, if it’s a large party,
then no one would be surprised. But if it’s a small number of people, most people would think that
that’s a pretty big coincidence. All right, okay, so we wanna know,
for example, we might want to ask. How many people do you need in order to
have at least a 50/50 chance that two people will have the same birthday? We want to answer questions like that,
okay? So that’s called the Birthday Problem. So just to state it in general,
let’s say we have K people, and we wanna find the probability
that two have the same birthday. Not a specific two, of course, but in that group that you can find
two with the same birthday. Well, we need to make
a few assumptions about birthdays before we can
actually solve this. Was anyone here in this room
born on a February 29th? Okay, you were? Yeah, okay, well, will you be
offended if I exclude February 29th?>>[LAUGH]
>>Okay, so we have permission to exclude
February 29th, it’s a hassle. We can do this problem dealing with
leap years if we really want, but it’s a hassle. February 29th is less likely
than any other day, so I don’t want to treat it
as 366 days in a year. But, we want to use 365, it’s not
gonna make that much of a difference, it’s pretty rare,
we have a large enough class. At this point, you know how to compute the
probability that at least one person in this room has February 29th. Now we know the probability is one,
but before that, we didn’t know. You can compute that probability if
you know how many people are here, it’s a large class. Okay, well, anyway, but
I got permission to exclude February 29th, so we’re gonna assume
there’s 365 days in a year. Just for simplicity, and
we’re gonna assume that the other- 365 days- Are equally likely. So, I mean,
that seems like a plausible assumption, it sounds plausible to assume
the other 365 days are equally likely. That’s an empirical question,
on a mathematical question, is that true or not? If you look at data on this,
it’s actually not exactly true. There are seasonal effects,
which kind of strangely differ, and different countries have
different seasonal effects. There are more babies born nine
months after a holiday, and you can try to figure out why. Not a huge difference, but
there are some small differences. But anyway, we’re gonna assume
they’re equally likely, okay? And we’re also gonna assume independence. Later we’re gonna talk more about
the formal definition of independence, I’m gonna say independence of births. Later we’ll talk more about the formal
definition of independence, but right now I just mean that
in the intuitive sense. Like, for example,
if we knew that in this group of K people, if we knew that we had twins,
then that’s going to change things. But we assume that everyone, one person’s birthday has no
effect on anyone else’s birthday. So using the word independent
kind of in an intuitive way here. So those are our assumptions, and
now we want to find the probability. Well, a lot of the challenge of this
course is recognizing pattern and structure, right? And this problem sounds like
something we haven’t done, but it should start reminding you of
some of the homework stuff, right? The courses, the robberies, this is
pretty similar to the robberies problem. Just a different application, different
setting, but it has a similar structure. So, therefore, you’re all kind of
experts at this kind of problem now, and I can do it pretty quickly. Okay, so, first of all, if K is greater than 365, then the probability is 1. To visualize that, we can think of this as
like a balls and boxes kind of thing, but, remember, I was talking
last time about labeling? We should think of these
people as labeled from 1 to K, we’re not going to treat people as
indistinguishable particles, right? They’re people,
they have individuality, okay? So if I want to draw a picture, I’m gonna
draw 365 boxes, but I’ll put a dot, dot, dot, so
I don’t actually have to draw 365 boxes. This is the January 1st box, or
bucket, or whatever you wanna call it, this is the January 2nd one,
this is the December 31st. So imagine I have these 365 boxes,
and then each person, I’m assuming that the people are labeled. But I’m just gonna put dots for now. So this would be the case where three
people are born on January 1st. One on January 2nd, two on December 31st,
and however many are in between there. So we’d imagine a situation like this,
okay. Now if you have more than 365 people but
you only have 365 boxes, then clearly there must be a box
with more than one dot in it, right? Does that seem like intuitively obvious? There’s more people than
boxes in that case, that’s called the Pigeon Hole Principle
in math, it’s a very simple fact, right? If you have more dots than boxes then you
have to have a box with multiple dots, but it’s a useful principle,
simple but useful. And okay, so that’s the answer
when K is bigger than 365. And I also wanted to mention, by the way, that this kind of problem is
extremely important in computer science. Because often you have problems
where you have to have clever ways of storing information in
the different data structures. And problems happen when two things
try to store them in the same way, that’s called a collision. And you wanna know what’s the probability
they are gonna have collisions like that. So it’s a fun problem to talk about, but
it also has a lot of application in CS and engineering and elsewhere. Okay, well, that was an easy
problem of Ks bigger than 365, but now let’s do K less than or equal to 365. Now I’ve asked a lot of people,
over the years, I’ve asked a lot of people who have
never seen this problem to just guess. Just to guess intuitively, how many
people would you need in order to have a 50/50 chance that there
is a birthday match? A typical guess is like, 150 or 180, maybe take 365 divided by two,
definitely well over 100. Everyone I’ve ever asked this question
to If they hadn’t seen it before, it says something over 100. Okay, so
that’s the intuition of almost everyone on this before they’ve studied it. The answer is 23. So that says if you have 23 people,
here we have hundreds in this room. Just take a couple rows here. I’m not gonna spend 20 minutes
surveying all your birthdays. But if we just took a few rows here, it’s extremely likely that there’s
gonna be a birthday match. So we want to just see mathematically, first of all we want to see
mathematically, how do you compute that? And then secondly, what’s the intuition? How can it be 23 when there
are 365 days in a year? But if you have 23 people,
which is a small fraction of 365, why is it already 50 50? It’s slightly over 50 50, I think it’s
like 50.7% chance if you have 23 people. Okay, so
let’s find the probability of a match. Now, it’s a little bit easier to
work with the compliment first, so let’s find the probability of no match. Then we can always do 1 minus that to get
the probability that there’s at least one match. So that’s often a useful strategy. You want to think about, is it easier
to find the probability of the event or the complement of the event? In this case,
it’s easier to do the complement. Now this is really similar to homework,
so I’ll just do this quickly. We’re just gonna use the naive
definition of probability that’s justified because I
assumed equally likely, okay? So the denominator, we immediately
know the denominator is 365 to the k. It’s just immediate from
the multiplication rule. Numerator is just, imagine the people coming in, assume that
we’ve given each person an ID number, and imagine that they’re coming to the party
one by one, in order of their ID number. That’s the easiest way to think about it. The first person could have
any birthday that is in 365. Second person could have any birthday
that’s not the same as the first person, so that’s 364. Multiplication rule again. So this is very, very straightforward
from what we’ve done already. The next person could be any
birthday except the first two. So 363 and so on, so you just multiply. And then the only thing you have to be
careful about is to make sure we have the correct number of terms here. If k equals 1,
the probability of no match should be one, because there’s only one person
that there’s no one to match with. So this is going to be 365- k + 1. It would be really messed
up if we forget the + 1. That’s another one of the most common
mistakes in programming is to be off by one and not being careful
about the number of terms. So I wanna have k terms here,
because there are k people. So that’s gonna be 365- k + 1. Okay, so the probability
of a match is 1 minus that. So we’re already done with the problem. So this is easy. I did it quickly because you’ve
done similar stuff at this point. Well, there are ways to compute this
to approximate this in terms of an exponential and we’ll talk about some of
the approximations in the later lecture. But for now, this is something
obviously you wouldn’t do by hand, but something you could do very easily
on a computer or a calculator. And if you do the computation, what you
get for the probability of a match. In a few cases, well, as I mentioned, it’s a little above 50%. So this is 50.7% if k=23. So that’s the first case
where it exceeds 50%. Now, let’s just look at
a couple other cases. If k=say, 50, so
little more than double of 23, so obviously it’s gonna mean more likely. The question now is how much more likely? Well, if you do the computation,
you’ll find that with 50 people, there’s a 97% chance. So it goes way up. Now 50 people, again, most people’s intuition was that with 150
people it’s still not even gonna be 50 50. This says that with only 50,
not 150, it’s 97% likely. And suppose that we had 100 people. Well, 100 is a fairly large number, but
it’s still less than a third of 365. It’s not an enormous number. With 100 people, it’s 99.999 something, I’ll just say, greater than 99.999%. I think it’s 99.9994, but maybe there’s
another 0 in there or something. It’s at least 99.999% likely,
with 100 people. Okay, so
that’s the result of the birthday problem. But, mathematically, this is basically
something that’s hard to argue with. If anyone wants to argue, feel free,
but your argument will be futile. This is-
>>[LAUGH]>>This is unobjectionable, but it’s not that intuitive yet, how could. This is just correct, and
you’ve done similar problems now, but how could it get 100
people 99.999% likely? Well, how could that be? So I want to talk a little
more about the intuition. The basic intuition is that
looking at K is intuitive. It’s just like how many people. That’s actually not the most
relevant quantity here. The more relevant quantity is not K,
but K choose 2. There are K people, but
there are K choose 2 pairs of people. So that’s the more relevant
thing to be looking at. K choose 2 is K(K- 1)/2. And, for example, if we do 23, choose 2. So we said that it’s just about
50% if you have 23 people. 23 choose 2 is 23 times 22 divided by 2,
23 times 11. Does everyone know how to
easily multiply by 11? Add the 2 and the 3 and
then put it in the middle 253. Okay, so with 23 people,
there are 253 pairs of people and now it’s sounding a little more plausible,
I think. 23 is a fairly small number but 253,
that’s a lot of possible pairs. Any one of those pairs,
they may have the same birthday, right? So now this is getting
closer to the order of 365, where it seems more
plausible at that point, so. We’ll come back to, later in the course,
some more complicated versions of this. So this one is straightforward
at this point. But suppose we asked another question,
because we’re talking about coincidences and if two people have
birthdays that are one day apart, they also might find that
a bit of a coincidence. It’s not as surprising as
having the same birthday but it’s like,
my birthday is the day after yours. That’s a little bit surprising. So, if we asked the same question,
where instead of saying they have to have the same birthday,
it’s the same or off by one. Then the 23 reduces to 14. And that again surprises most people,
I think. With 14 people,
there’s a better than 50 50 chance that two of them have the same birthday or
just one day apart. Now if you wanna prove that,
that result about the 14, it’s much more complicated than this. You can try to do this, and say, well,
this should be 363 instead of 362, or something, cuz I’m avoiding that day and
the day before and the day after, so I guess you’d try to subtract 3
from that but then, it doesn’t work. Because you’re like filling in the year
and you kind of have these gaps. You can’t just multiply all this
That’s a much harder problem, and later in the course, we’ll come back to ways to approximate
more complicated problems like that. So the basic idea here is coincidences,
there’s a saying that the biggest coincidence of all would
be if there were no coincidences, right. If you think of all the possible
coincidences that there could ever be in the world. There’s a mind-boggling number
of possible coincidences, so some of them are gonna happen. And what this is saying is,
it’s only 23 people, that’s 253 coincidences that
could have occurred, right? So it’s not at all surprising, all right,
so that’s the birthday problem. And I also wanted to mention,
there’s a very nice applet. There’s a link on the course website
with probability by surprise applets developed at Stanford, that you may
want to play around with later. It kind of helps to build intuition, I
think, by running a little simulation, or you can easily find birthday
problem applets online. Try it for yourself,
run some little simulations, and just see how the matches are coming. Cuz it still seems, only 23 people,
but you can start to build some intuition that way, but
that’s one way to think of it, all right? So that’s the birthday problem, so now let’s come back to the non-naive
definition of probability. And I’ll just remind you, cuz we only did
it quickly, and it’s easy to state again. We only need two axioms, one, so we’re defining this, remember,
we have the sample space. And we have a function P, P stands for
probabilities, pretty good notation, okay? And there are only two axioms, one is that
probability of the empty set is 0, and the probability of the full
sample space equals 1. And secondly,
this says that the empty set is an event. If you’re not yet
clear on how to think of events as sets, then you really wanna master
that as soon as possible. The empty set is an event, but it’s an event that can never occur,
we give it probability 0. The full sample space is an event
that always occurs, it’s guaranteed. So, just by convention, we say we want for
impossible events to have probability 0. And events that are certain, we want to
have probability 1, by convention, okay? And then there’s only one other axiom, which is the fact that
the probability of a union, This could be a finite union where I go
from n=1 up to m, or up to infinity. Either way, I’m just going to write it
this way because it’s more general. But you should know that this also
applies in the finite union case. Equals the sum of P of An,
n=1 to infinity. And then there’s a very, very important condition that this
is if the A1 through A2 and so on are disjoint events,
that is, they don’t overlap. So that’s the key condition and
this should be pretty intuitive. If you think about a Venn diagram,
this is our sample space s. And I’m not going to try
to draw infinitely many, but let’s just draw three events. So here we have A1, A2, A3,
just three blobs or three ovals. As a first intuition of probability,
we can think of probability as area. Later, we’ll see more sophisticated ways
to think about it, but that’s fine for now to just think of it as area. So suppose that this rectangle, it’s
a little too curved to be a rectangle, but it doesn’t matter. Suppose that the area of this region is 1,
think of probability as area. Then all it says is that
the union is this, this and this. The area is this area plus this area,
plus this area, that’s it. So that’s what the whole thing is saying,
and it says we also wanna extend that, too, if we have infinitely many of them,
but it’s the same idea, okay? So that’s the axiom, these are the axiom,
I’ll write the word axioms here, but. It’s just, these are the two
conditions that P has to satisfy, and this was a big breakthrough. I mentioned before that a big breakthrough
in probability was to think of it in terms of sets and events and unions and
intersections, and things like that. And the other big breakthrough
mathematically speaking was just writing down these two axioms. And exploring that Kolmogorov was
a famous mathematician is one of the people most responsible for this. Because before we just had these axioms,
it was harder to say what’s a correct argument in probably,
what’s an incorrect argument. There were a lot of philosophical debates
about, what is the meaning of probability? And those debates continue to this day, those debates in philosophy have
continued for hundreds of years. And I think, in statistics,
it’s important to think about these foundational issues, right,
what does probability really mean? Okay, in the real world it’s important,
but that’s not the main subject for
this course. And it didn’t lead to as much
progress as just having these rules. Because basically,
from a mathematical point of view, as long as you have a sample space, and
you a function P satisfying this and this. And you know P is a function
that takes events as input and returns a number between 0 and
1, satisfies this and this. As long as that’s true,
then we consider P a probability, and every theorem about probability
that we do then is applicable, and we don’t have to worry
what does it really mean. Okay, there are different interpretations
of probability that people can debate and people can use. But as long as the interpretation
satisfies this and this, then we’ll be okay. So, just from these two simple rules, we can derive every theorem that’s
ever been derived about probability. Okay, so
should also be very intuitive, so I’ll just call this properties. Some properties of probability that we
can develop quickly using these axioms, okay, so
our first one is very very simple. The probability of A complement
equals 1 minus the probability of A. Okay, so we’ve already used this
fact before because this fact is something you can see immediately in
the case of the naive definition. Because we are looking at favorable and
unfavorable. So the total number of outcomes
equals number of favorable, plus number of unfavorable. This is always true,
whether using the naive definition or not. In the picture, it should be intuitively
clear because, let A1 equal A. A complement is everything that’s
outside of A1, or outside of A, okay? If the total area is 1, then if
this is 0.3, outside has to be 0.7. So that’s intuitively obvious, once you look at the Venn diagram,
but how do we prove it? Well, the proof is very short as well, we know that 1 equals
the probability of the whole space. That’s axiom one, okay, we also know that we can write
s as A union A complement. s is the same thing as A cuz
that’s everything that’s inside A, that’s everything that’s outside A. Put those together,
that’s everything, okay, now. P of A union A compliment is P
of A plus P of A compliment, because A and A compliment are disjoint. In set notation, we’d write that as just
saying that there intersection is empty. There’s nothing that could be in A and
its compliment by definition. So, they’re disjoined,
so I’m using axiom two. That equals that plus that and that’s
the same thing we were trying to do. So the proof is just the media for use
axiom one, use axiom two and we’re done. All right, so that’s property one,
I’m calling it, no one else calls it property one,
but property one for today. Property two, another useful fact if a is contained in b so a and b are advanced but
this event is inside this event. In words we would say that what
this relationship means is that If A occurs, then B occurs, okay? So this is a larger event. Then, P of A is less than or
equal to P of B. Again, this is obvious, intuitively in
terms of the Venn diagram because B is like a bigger oval
containing the smaller one. It has a bigger area. That’s obvious from the picture. But that’s not a formal proof and
we want to prove this fact. Okay so let’s just quickly
prove this using the axioms. So the proof would go, I’m still
going to draw a little picture for some intuition here’s a and here’s b. B is B is this bigger thing
that contains A, okay? Now, I’m going to think of B
as being split into two parts. There’s A, and then there’s ring of
stuff that’s in B, that’s not in A. Okay?
So, I’m going to decompose B into two parts. So I’m gonna think of B as A union,
union with this ring. This ring is the stuff that’s
in B that’s not in A, so I’d write that as B
intersect A complement. The stuff that’s in B that’s not in A,
okay? And this is disjoint. That is A is disjoint from this, that’s just saying that A is
disjoint from this ring. So these are disjoint, so
therefore we know that P(B)=P(A). Just immediate from axiom two, P of B is
P of A plus P intersect A complement. And probability by definition probability
has to be none negative, so you’re adding something here, adding something none
negative so this is at least P of A. So that’s the end of the proof. So it’s just almost immediate from,
from the axiom two, question?>>Where did you find
the fact that [INAUDIBLE]?>>Probability’s not
supposed to be negative. Well, I didn’t write that,
I wrote that yesterday. P is between,
you could call that an axiom if you want. In that case, call this axiom zero. I stated that last time. I didn’t write it again this time. The reason I didn’t write it as one of the
axioms is if you’re writing out in full you would say, p is a function
from defining known events and taking values between zero and
one, such that this and this. But if you wanna consider that an axiom,
you can. It’s assumed that probability is always
between zero and one, by assumption. Okay, so those are two simple properties. Let’s do one that’s a little
more complicated, and also, very very very useful. And that’s, the question is,
how do we get the probability of a union? Sorry the boards are pretty squeaky. So this is property three how do
we get the probability of a union? Now we know that if a and b are disjoint,
we can just do p(a) + p(b) okay. But if you think about the Venn
diagram again here’s B. And here’s the intersection if we want the area what we should do is you
add the area of a plus the area of b But then we double-counted the stuff
in the intersection, and we subtract that off again. So, we’re gonna prove that that is in
fact the correct way to do it, whether or not the Venn diagram picture
is applicable or not, that suggests the intuition,
and let’s prove the result. And then I’ll do an example
of how you use this. We’ll be using this kind of thing a lot. Because we often want the probability
of a union and we’re not necessarily so lucky as to have disjoint sets. So we wanna know what happens
if they’re not disjoint. Okay, and the result is that we
can add up the probabilities, but we’ve double counted the intersections so
we subtract. The probability of the intersection, okay? So let’s prove that. Well again, let’s think of it,
kind of the strategy here, you know when you’re confronted
with a problem like this and you don’t know how to prove
it A strategy to use would be we know we have to
somehow apply these axioms. But this,
axiom two is the most important one, but this only applies
when they’re disjoint. So somehow we have to make
things disjoint, right? So that’s the strategy. So I’m gonna write P(A union B), In
a different way, right here I can’t apply axiom two, because they’re not disjoint,
so I’m gonna write it in a disjoint way. I call this disjointification. And I think I coined that word,
but I don’t know. So we’re going to do disjointification. And I’ll write the word, and I’ll probably
regret this because it’s a long word. I’ll try to write it fast. So we’re gonna do disjointification. We’re gonna take a union b and
we’re gonna try to write it disjointly. So how do we do that? Well we take A and then we take everything
in B we did not already have in A. So it’s A good union B union
intersect C complement. All right, so that looks similar to that. The only difference is their way we’re
assuming that A is contained in B, and here this is completely general, A and
B are just any two events, okay? So that just says we take A, and
then everything in B we need that, but we don’t need the stuff
that we already had from A. So we’re excluding it. All right, now this and this are
disjoined, so we can immediately apply 2. And we get P(A)+ P(B intersect A compliment), now that’s still not yet
clear why is that the same as this? So here we use another proof strategy,
that I call wishful thinking. So-
>>[LAUGH]>>I wish. I’m gonna put a question mark here. It is very very bad practice to just
start writing things you wish as equality without justification, But
it’s okay if you put a question mark. Okay?
So I’m gonna put a question mark here. P of A plus P of B minus
P of A intercept B. I wish this is true. Now I know this is true but
I’m pretending yet know this and I want to see whether it’s true. So that’s my wish. Now I’m going to try to compare this and
this Well, let’s see P of A is the same
of P of A that’s okay. This will be true if and only if this
term is the same as this difference. So this is equivalent to,
I’ll just rewrite that as This equals this minus this, but it’s going to be easier
to write it without the minus signs. So I’m going to move the P(A
intersect B) over there. So that’s the equivalent of saying that P(A intersect B) + P(B intersect A complement)=P(B). I just said, for this to be true, what we
want is for this to equal this difference. And then I just moved that to
the other side of the equation. Okay, now we just have to check, is this
true, well, yes it is, this is true. It’s immediate by axiom two because, this is true because A intersect B and
A complement. And I’ll write it this way, A intersect B,
and A complement intersect B, are disjoint,
Because if you had an element in both of these sets, it would be in both
A and A complement, that’s impossible. So they’re disjoint, and their union is B. Because what we did was chopped B up into
two parts, we split B into two cases. We have the part of B that’s in A, and
we have the part of B that’s not in A. So there’s two disjoint cases,
so that’s the end of the proof. So this rule is a simple case of
what’s called inclusion and exclusion. And let’s do a more general version
of inclusion-exclusion, and then I’ll talk a little bit
about why it’s called that. And then we’ll do an example
of inclusion-exclusion, okay? So here’s the general inclusion-exclusion
formula, it looks ugly, but the idea is pretty simple. So I just want to extend that to
the case where we have more than two events, so
let’s do a case with three of them first. P(A union B union C) equals, and
I’ll draw another little diagram. Now we have three events, A, B, and C, draw a linking rings kind of thing. And I want the probability of the union, okay, so we’re gonna start
out just like we did there. By adding the individual probabilities,
and then we’re gonna adjust for double counting, things like that. So we’re gonna do P(A) + P(B) + P(C). But it’s clear just from thinking about
the picture that we’ve added too much, right, that we’ve over-counted stuff. So what we have to do is
subtract off the intersections. So we’re gonna subtract off A intersect B,
and we’re gonna subtract off A intersect C,
and we’re gonna subtract B intersect C. Okay, well, now we’ve subtracted too much. If you think of this,
in the triple intersection? Any stuff in the triple intersection,
we added it thee times, then we subtracted it three times,
now we haven’t counted it at all. Everything that’s not in the triple
intersection is correctly accounted for. Stuff that’s in the triple intersection,
we have not counted yet. So we fix that just by adding this back,
A intersect B intersect C. Okay, so that’s inclusion-exclusion, and you can prove this in
a similar way to that. It’s more tedious, and
I’m not gonna write out the whole thing. There’s also some more clever ways to do
it, that we might come to another time. It’s completely analogous,
just a more tedious version of that, and in general, you can use induction. And I’m not gonna write
out the whole induction, because the idea is exactly
the same as this idea. The general case,
when we have the union of n events, The general form of inclusion-exclusion,
If you remember this, then it’s really easy to write this down,
it looks ugly, but it’s the same idea. First we add up the individual events,
then we subtract the intersections. So I’ll just write it this way, subtract
i less than j, P(Ai intersect Aj). I wrote i less than j because then
we don’t have A1 intersect A2, and A2 intersect A1,
we only want to have that once. And then you add the triple, so it’s alternating,
just like this is plus minus plus. We then have triple intersections,
let’s say i less than k less than j, Ai intersect Aj intersect Ak. Minus, dot dot dot, and then the last
term is the intersection of everything, and so it’s either gonna be a plus or
a minus. And I think it should be (-1) to the n+1, because if we think about the case when
n is 2, then we stop by subtracting. And in the case n equals 3,
we stop by adding. So it’s (-1) to the n+1, and
then the intersection of everything, Okay, so hopefully you can see kinda what
the intuition is for this, and why is it called inclusion-exclusion. Because, just think of this case with
three of them, we started by including all of these, but then we’ve overcounted,
and so we start excluding stuff. But then have to adjust the other way,
so we start including and excluding. And it keeps alternating, including and
excluding, and it all works out. Okay, here, this one does not have
a summation because there’s all n of them. So this is all of them,
there’s only one of these, the rest of them will have summations. Okay, so let me do one famous
interesting example of inclusion-exclusion, how you
can apply this to a problem. This is a very old problem,
it goes back to, what’s the guy’s name,
de Montmort’s problem. Which was from 17 something, I think 1713, I wrote it down somewhere, 1713, yeah. And this goes by different
names in the literature, sometimes this is called
the matching problem. Usually I’ll just call it
the matching problem, or I’ll call it de Montmort’s problem. I don’t actually care if you
know the name of the problem, I care that you know the concept. It goes by other names because it’s
a pretty natural problem that comes up in different contexts. There are many different ways
we could phrase this problem. Different forms of the same thing
in different disguises, but I’ll tell you the original one. Not surprisingly, this originated
from considering a gambling game. de Montmort was interested in a lot
of different gambling problems. In 1713,
probability was still in its infancy, and he was studying some of
these gambling problems. Which had real practical consequences for
gamblers, there’s lot of money at stake in this,
that was the motivation. So here’s the problem, so
consider the following card game. We have a deck of cards, now, rather
than thinking like ace of spades and 7 of clubs, it’s simpler to just assume
that the cards are labeled 1 through n, so imagine that. You have a deck of n cards, and each card
just has a number from 1 to n on it, okay? One number per card, so
we have n cards labeled 1 through n. Here’s the problem, so
I’ll describe the game. And I think they actually
played this game back in 1700. Shuffle the cards, okay. Now basically I’m gonna flip
over cards one at a time, okay, and
you’re gonna count from one n, okay. So I’ll flip over
the first card you say one, I flip over the second card you say two,
flip over the third card, you say three. We’ll continue that way. Now if it ever happens that
I flipped over a number and you said that same number, you win. Otherwise, you lose. That’s the game. That’s why I call it matching,
because it’s like, you win if for example the seventh card
is card number seven. That is, the card that’s seventh in the
deck has the number seven written on it. Okay, does everyone
understand what the game is? So I want to know what the probability
that that will happen, that there will be at least one
card whose position in the deck is the same as the number
written on the card. So I’m not going to write out that
whole description just now cause hopefully you’ll understand
what the game is now. So lets find the probability. Well, to find the probability,
we need to define some events. Right? By the way you can try, there’s more
than one way to solve this problem. And you can try to do direct methods, but it’s pretty hard to find a way to do
this without inclusion-exclusion. Not impossible. But inclusion-exclusion is the easiest
way to solve this, I think. And it’s also a good example of it. So, we need to define our events. So, let’s say,
let’s call it Aj be the event that, I’ll just say it in words,
the event is that card Jth card matches. In other words, the Jth card in
the deck is numbered as card J. Okay. So what we are interested in is
the probability of the union. That is just a mathematical expression for the probability that at
least one card matches. That’s probably that you wouldn’t win the
game by having at least one match, okay. So it’s pretty hard to do directly but
inclusion, exclusion you know it looks kinda ugly. So I usually try to find,
when I’m trying to solve, I usually try to avoid
inclusion exclusion at first. But if I don’t think of another method
then I’m going to try this because it’s just a very powerful general technique for
finding the probability of a union. Well, we need to do inclusion and
exclusion, right? So let’s just start writing this out. Luckily, in this problem,
we have a lot of symmetry. And we should take advantage
of symmetry whenever we can. So initially,
I would have to write out this sum. But actually, the probability of aj. Well let’s just kind of do some
kind of scratch calculations. Do some scratch work over here. P(Aj), P(Aj) is the probability
that the jth card in the deck is card j Now you can do this using
a naive definition of probability because we’re assuming…there’s
two ways to think of this. One way would be to have N factorial
as the denominator because we’re using a naive definition, we’re assuming all permutations
of the deck are equally likely. So that would be fine, but an easier way to think of it
is just that it’s one over N. It’s one over n because all
positions are equally likely. All positions are equally likely for
the jth card. I’ll say for the card labeled j. Right, so if the card, I’m looking for
the ace of, imagine I have 52 cards and I’m looking for the ace of spades. Ace of spades is equally
likely to be anywhere so there’s a one in 52 chance
that in any specific position. So, that’s immediately one over n, you can also derive this using naive
definition with an n factorial. Either n minus one factorial over n
factorial, you get the same thing. Notice this does not depend on j, okay? That’s what saves us here. If we had like some complicated
thing involving j and n here then we’re gonna
have to do a summation. But there’s no j here, so that means
we’re just gonna multiply this by n. All right,
now let’s do P of A1 intersect A2. So I’m using symmetry. I could have written AI intersect AJ where
I not equal J, but I may as well for concreteness pick one and two. Let’s just think about what this is. Here I will do the naieve definition. They’re N factorial possible
permutations of the deck of cards, now what is this event say,
it says that the card on top of the deck has a one on it,
the card that is second has a two on it. The other n minus two cards can be
in any order whatsoever, so for the rest of them its n
minus two factorial. And we could either leave it this way,
or we can cancel some stuff out and write this as one over n(n- 1). Because everything else cancels, okay? And continuing in this way if
we want P(A1) intersect blah, blah, blah Intersect, let’s say, Ak. Well, what that says is that the first
k cards are exactly one up to k. The remaining (n- k) cards can
be in any order whatsoever. So the denominator is still n factorial,
the numerator is n minus k factorial. All right? So that looks kind of messy, but
the nice thing is that we have symmetry, so this will work whatever
choices you put there, okay? So now let’s just quickly apply inclusion,
exclusion. We start by adding up these ones,
and there’s N of them, so it’s N times one over N. Now, we add the next ones. How many of these are there? Well, there’s N choose two of these terms. Except we’re subtracting now,
because we alternate. And there’s n choose two, but I’m gonna
write n choose two as n(n- 1)/ 2, times that thing, one / n(n- 1),
so it’s one-half. And well, let’s just do one more term,
and then you’ll get the idea. N choose three,
I can also write it as this way. N choose three is n(n- 1)(n-
2) over three factorial. I can think of this as
two factorial if I want. Times, and then whatever this thing is. In this case, well,
it looks like n(n- 1)(n- 2). So you actually see,
something really nice happens. Which is you have n and n- 1. N and N cancels. n and n- 1 cancels. n, n- 1,
everything is basically cancelling here. The only thing that’s left is these
numbers, this minus dot, dot, dot. So what this then works out to,
it’s gonna look like this alternating sum. But I’ll just say what
it is approximately. Is one minus one over two factorial,
plus one over three factorial, minus one over four factorial plus blah,
blah, blah. This is actually equal at this point. And then the last term
is whatever it is -1 to the something over one over n factorial. Whatever the pattern is for the last one. Let me make sure I got
the last term right. And so yeah so, this is plus negative
one to the n, one over n factorial. I’m just following the same pattern here. Okay.
This looks messy. That’s the exact answer. I know we’re out of time, but this is the end because this we can
immediately say, this looks familiar. This looks ugly, but
looks familiar.” Right? This should remind you of
the Taylor series for e to the x. So, in fact, this is approximately
one minus one over E. We’re going to see this number
one over E a lot in this class. Seems like this problem has
nothing to do with E, but here E comes up just
cause we have this sum. Okay, so have a good weekend. 