So I wanted to do one of the most famous

problems in probability to start with, and then we’ll go back into

the non-naive definition. At the very end of last time,

we started the non-naive definition, so we’ll get back to that in a little bit. But first, this is a problem that

everyone who studies probability should be familiar with,

that’s called the Birthday Problem. It’s a very natural problem,

some of you have seen it, I’m sure. But it’s very, very surprising to most people the first

time that they see this result. Even if you’ve seen it before, I wanted to talk more about how

to think about the problem. So the Birthday Problem

is very simple to state. The problem is just, you have a group

of people at a party, or something. And you wanna know how likely is it

that two people have the same birthday. Maybe three people will

have the same birthday, maybe there are several pairs

that have the same birthday. But how likely is it that you can

find at least one pair of people who share the same birthday? Because, of course, if it’s a large party,

then no one would be surprised. But if it’s a small number of people, most people would think that

that’s a pretty big coincidence. All right, okay, so we wanna know,

for example, we might want to ask. How many people do you need in order to

have at least a 50/50 chance that two people will have the same birthday? We want to answer questions like that,

okay? So that’s called the Birthday Problem. So just to state it in general,

let’s say we have K people, and we wanna find the probability

that two have the same birthday. Not a specific two, of course, but in that group that you can find

two with the same birthday. Well, we need to make

a few assumptions about birthdays before we can

actually solve this. Was anyone here in this room

born on a February 29th? Okay, you were? Yeah, okay, well, will you be

offended if I exclude February 29th?>>[LAUGH]

>>Okay, so we have permission to exclude

February 29th, it’s a hassle. We can do this problem dealing with

leap years if we really want, but it’s a hassle. February 29th is less likely

than any other day, so I don’t want to treat it

as 366 days in a year. But, we want to use 365, it’s not

gonna make that much of a difference, it’s pretty rare,

we have a large enough class. At this point, you know how to compute the

probability that at least one person in this room has February 29th. Now we know the probability is one,

but before that, we didn’t know. You can compute that probability if

you know how many people are here, it’s a large class. Okay, well, anyway, but

I got permission to exclude February 29th, so we’re gonna assume

there’s 365 days in a year. Just for simplicity, and

we’re gonna assume that the other- 365 days- Are equally likely. So, I mean,

that seems like a plausible assumption, it sounds plausible to assume

the other 365 days are equally likely. That’s an empirical question,

on a mathematical question, is that true or not? If you look at data on this,

it’s actually not exactly true. There are seasonal effects,

which kind of strangely differ, and different countries have

different seasonal effects. There are more babies born nine

months after a holiday, and you can try to figure out why. Not a huge difference, but

there are some small differences. But anyway, we’re gonna assume

they’re equally likely, okay? And we’re also gonna assume independence. Later we’re gonna talk more about

the formal definition of independence, I’m gonna say independence of births. Later we’ll talk more about the formal

definition of independence, but right now I just mean that

in the intuitive sense. Like, for example,

if we knew that in this group of K people, if we knew that we had twins,

then that’s going to change things. But we assume that everyone, one person’s birthday has no

effect on anyone else’s birthday. So using the word independent

kind of in an intuitive way here. So those are our assumptions, and

now we want to find the probability. Well, a lot of the challenge of this

course is recognizing pattern and structure, right? And this problem sounds like

something we haven’t done, but it should start reminding you of

some of the homework stuff, right? The courses, the robberies, this is

pretty similar to the robberies problem. Just a different application, different

setting, but it has a similar structure. So, therefore, you’re all kind of

experts at this kind of problem now, and I can do it pretty quickly. Okay, so, first of all, if K is greater than 365, then the probability is 1. To visualize that, we can think of this as

like a balls and boxes kind of thing, but, remember, I was talking

last time about labeling? We should think of these

people as labeled from 1 to K, we’re not going to treat people as

indistinguishable particles, right? They’re people,

they have individuality, okay? So if I want to draw a picture, I’m gonna

draw 365 boxes, but I’ll put a dot, dot, dot, so

I don’t actually have to draw 365 boxes. This is the January 1st box, or

bucket, or whatever you wanna call it, this is the January 2nd one,

this is the December 31st. So imagine I have these 365 boxes,

and then each person, I’m assuming that the people are labeled. But I’m just gonna put dots for now. So this would be the case where three

people are born on January 1st. One on January 2nd, two on December 31st,

and however many are in between there. So we’d imagine a situation like this,

okay. Now if you have more than 365 people but

you only have 365 boxes, then clearly there must be a box

with more than one dot in it, right? Does that seem like intuitively obvious? There’s more people than

boxes in that case, that’s called the Pigeon Hole Principle

in math, it’s a very simple fact, right? If you have more dots than boxes then you

have to have a box with multiple dots, but it’s a useful principle,

simple but useful. And okay, so that’s the answer

when K is bigger than 365. And I also wanted to mention, by the way, that this kind of problem is

extremely important in computer science. Because often you have problems

where you have to have clever ways of storing information in

the different data structures. And problems happen when two things

try to store them in the same way, that’s called a collision. And you wanna know what’s the probability

they are gonna have collisions like that. So it’s a fun problem to talk about, but

it also has a lot of application in CS and engineering and elsewhere. Okay, well, that was an easy

problem of Ks bigger than 365, but now let’s do K less than or equal to 365. Now I’ve asked a lot of people,

over the years, I’ve asked a lot of people who have

never seen this problem to just guess. Just to guess intuitively, how many

people would you need in order to have a 50/50 chance that there

is a birthday match? A typical guess is like, 150 or 180, maybe take 365 divided by two,

definitely well over 100. Everyone I’ve ever asked this question

to If they hadn’t seen it before, it says something over 100. Okay, so

that’s the intuition of almost everyone on this before they’ve studied it. The answer is 23. So that says if you have 23 people,

here we have hundreds in this room. Just take a couple rows here. I’m not gonna spend 20 minutes

surveying all your birthdays. But if we just took a few rows here, it’s extremely likely that there’s

gonna be a birthday match. So we want to just see mathematically, first of all we want to see

mathematically, how do you compute that? And then secondly, what’s the intuition? How can it be 23 when there

are 365 days in a year? But if you have 23 people,

which is a small fraction of 365, why is it already 50 50? It’s slightly over 50 50, I think it’s

like 50.7% chance if you have 23 people. Okay, so

let’s find the probability of a match. Now, it’s a little bit easier to

work with the compliment first, so let’s find the probability of no match. Then we can always do 1 minus that to get

the probability that there’s at least one match. So that’s often a useful strategy. You want to think about, is it easier

to find the probability of the event or the complement of the event? In this case,

it’s easier to do the complement. Now this is really similar to homework,

so I’ll just do this quickly. We’re just gonna use the naive

definition of probability that’s justified because I

assumed equally likely, okay? So the denominator, we immediately

know the denominator is 365 to the k. It’s just immediate from

the multiplication rule. Numerator is just, imagine the people coming in, assume that

we’ve given each person an ID number, and imagine that they’re coming to the party

one by one, in order of their ID number. That’s the easiest way to think about it. The first person could have

any birthday that is in 365. Second person could have any birthday

that’s not the same as the first person, so that’s 364. Multiplication rule again. So this is very, very straightforward

from what we’ve done already. The next person could be any

birthday except the first two. So 363 and so on, so you just multiply. And then the only thing you have to be

careful about is to make sure we have the correct number of terms here. If k equals 1,

the probability of no match should be one, because there’s only one person

that there’s no one to match with. So this is going to be 365- k + 1. It would be really messed

up if we forget the + 1. That’s another one of the most common

mistakes in programming is to be off by one and not being careful

about the number of terms. So I wanna have k terms here,

because there are k people. So that’s gonna be 365- k + 1. Okay, so the probability

of a match is 1 minus that. So we’re already done with the problem. So this is easy. I did it quickly because you’ve

done similar stuff at this point. Well, there are ways to compute this

to approximate this in terms of an exponential and we’ll talk about some of

the approximations in the later lecture. But for now, this is something

obviously you wouldn’t do by hand, but something you could do very easily

on a computer or a calculator. And if you do the computation, what you

get for the probability of a match. In a few cases, well, as I mentioned, it’s a little above 50%. So this is 50.7% if k=23. So that’s the first case

where it exceeds 50%. Now, let’s just look at

a couple other cases. If k=say, 50, so

little more than double of 23, so obviously it’s gonna mean more likely. The question now is how much more likely? Well, if you do the computation,

you’ll find that with 50 people, there’s a 97% chance. So it goes way up. Now 50 people, again, most people’s intuition was that with 150

people it’s still not even gonna be 50 50. This says that with only 50,

not 150, it’s 97% likely. And suppose that we had 100 people. Well, 100 is a fairly large number, but

it’s still less than a third of 365. It’s not an enormous number. With 100 people, it’s 99.999 something, I’ll just say, greater than 99.999%. I think it’s 99.9994, but maybe there’s

another 0 in there or something. It’s at least 99.999% likely,

with 100 people. Okay, so

that’s the result of the birthday problem. But, mathematically, this is basically

something that’s hard to argue with. If anyone wants to argue, feel free,

but your argument will be futile. This is-

>>[LAUGH]>>This is unobjectionable, but it’s not that intuitive yet, how could. This is just correct, and

you’ve done similar problems now, but how could it get 100

people 99.999% likely? Well, how could that be? So I want to talk a little

more about the intuition. The basic intuition is that

looking at K is intuitive. It’s just like how many people. That’s actually not the most

relevant quantity here. The more relevant quantity is not K,

but K choose 2. There are K people, but

there are K choose 2 pairs of people. So that’s the more relevant

thing to be looking at. K choose 2 is K(K- 1)/2. And, for example, if we do 23, choose 2. So we said that it’s just about

50% if you have 23 people. 23 choose 2 is 23 times 22 divided by 2,

23 times 11. Does everyone know how to

easily multiply by 11? Add the 2 and the 3 and

then put it in the middle 253. Okay, so with 23 people,

there are 253 pairs of people and now it’s sounding a little more plausible,

I think. 23 is a fairly small number but 253,

that’s a lot of possible pairs. Any one of those pairs,

they may have the same birthday, right? So now this is getting

closer to the order of 365, where it seems more

plausible at that point, so. We’ll come back to, later in the course,

some more complicated versions of this. So this one is straightforward

at this point. But suppose we asked another question,

because we’re talking about coincidences and if two people have

birthdays that are one day apart, they also might find that

a bit of a coincidence. It’s not as surprising as

having the same birthday but it’s like,

my birthday is the day after yours. That’s a little bit surprising. So, if we asked the same question,

where instead of saying they have to have the same birthday,

it’s the same or off by one. Then the 23 reduces to 14. And that again surprises most people,

I think. With 14 people,

there’s a better than 50 50 chance that two of them have the same birthday or

just one day apart. Now if you wanna prove that,

that result about the 14, it’s much more complicated than this. You can try to do this, and say, well,

this should be 363 instead of 362, or something, cuz I’m avoiding that day and

the day before and the day after, so I guess you’d try to subtract 3

from that but then, it doesn’t work. Because you’re like filling in the year

and you kind of have these gaps. You can’t just multiply all this

That’s a much harder problem, and later in the course, we’ll come back to ways to approximate

more complicated problems like that. So the basic idea here is coincidences,

there’s a saying that the biggest coincidence of all would

be if there were no coincidences, right. If you think of all the possible

coincidences that there could ever be in the world. There’s a mind-boggling number

of possible coincidences, so some of them are gonna happen. And what this is saying is,

it’s only 23 people, that’s 253 coincidences that

could have occurred, right? So it’s not at all surprising, all right,

so that’s the birthday problem. And I also wanted to mention,

there’s a very nice applet. There’s a link on the course website

with probability by surprise applets developed at Stanford, that you may

want to play around with later. It kind of helps to build intuition, I

think, by running a little simulation, or you can easily find birthday

problem applets online. Try it for yourself,

run some little simulations, and just see how the matches are coming. Cuz it still seems, only 23 people,

but you can start to build some intuition that way, but

that’s one way to think of it, all right? So that’s the birthday problem, so now let’s come back to the non-naive

definition of probability. And I’ll just remind you, cuz we only did

it quickly, and it’s easy to state again. We only need two axioms, one, so we’re defining this, remember,

we have the sample space. And we have a function P, P stands for

probabilities, pretty good notation, okay? And there are only two axioms, one is that

probability of the empty set is 0, and the probability of the full

sample space equals 1. And secondly,

this says that the empty set is an event. If you’re not yet

clear on how to think of events as sets, then you really wanna master

that as soon as possible. The empty set is an event, but it’s an event that can never occur,

we give it probability 0. The full sample space is an event

that always occurs, it’s guaranteed. So, just by convention, we say we want for

impossible events to have probability 0. And events that are certain, we want to

have probability 1, by convention, okay? And then there’s only one other axiom, which is the fact that

the probability of a union, This could be a finite union where I go

from n=1 up to m, or up to infinity. Either way, I’m just going to write it

this way because it’s more general. But you should know that this also

applies in the finite union case. Equals the sum of P of An,

n=1 to infinity. And then there’s a very, very important condition that this

is if the A1 through A2 and so on are disjoint events,

that is, they don’t overlap. So that’s the key condition and

this should be pretty intuitive. If you think about a Venn diagram,

this is our sample space s. And I’m not going to try

to draw infinitely many, but let’s just draw three events. So here we have A1, A2, A3,

just three blobs or three ovals. As a first intuition of probability,

we can think of probability as area. Later, we’ll see more sophisticated ways

to think about it, but that’s fine for now to just think of it as area. So suppose that this rectangle, it’s

a little too curved to be a rectangle, but it doesn’t matter. Suppose that the area of this region is 1,

think of probability as area. Then all it says is that

the union is this, this and this. The area is this area plus this area,

plus this area, that’s it. So that’s what the whole thing is saying,

and it says we also wanna extend that, too, if we have infinitely many of them,

but it’s the same idea, okay? So that’s the axiom, these are the axiom,

I’ll write the word axioms here, but. It’s just, these are the two

conditions that P has to satisfy, and this was a big breakthrough. I mentioned before that a big breakthrough

in probability was to think of it in terms of sets and events and unions and

intersections, and things like that. And the other big breakthrough

mathematically speaking was just writing down these two axioms. And exploring that Kolmogorov was

a famous mathematician is one of the people most responsible for this. Because before we just had these axioms,

it was harder to say what’s a correct argument in probably,

what’s an incorrect argument. There were a lot of philosophical debates

about, what is the meaning of probability? And those debates continue to this day, those debates in philosophy have

continued for hundreds of years. And I think, in statistics,

it’s important to think about these foundational issues, right,

what does probability really mean? Okay, in the real world it’s important,

but that’s not the main subject for

this course. And it didn’t lead to as much

progress as just having these rules. Because basically,

from a mathematical point of view, as long as you have a sample space, and

you a function P satisfying this and this. And you know P is a function

that takes events as input and returns a number between 0 and

1, satisfies this and this. As long as that’s true,

then we consider P a probability, and every theorem about probability

that we do then is applicable, and we don’t have to worry

what does it really mean. Okay, there are different interpretations

of probability that people can debate and people can use. But as long as the interpretation

satisfies this and this, then we’ll be okay. So, just from these two simple rules, we can derive every theorem that’s

ever been derived about probability. Okay, so

let’s start with something simple, so let;s just develop some simple

consequences of these axioms. So let’s start with something that

should also be very intuitive, so I’ll just call this properties. Some properties of probability that we

can develop quickly using these axioms, okay, so

our first one is very very simple. The probability of A complement

equals 1 minus the probability of A. Okay, so we’ve already used this

fact before because this fact is something you can see immediately in

the case of the naive definition. Because we are looking at favorable and

unfavorable. So the total number of outcomes

equals number of favorable, plus number of unfavorable. This is always true,

whether using the naive definition or not. In the picture, it should be intuitively

clear because, let A1 equal A. A complement is everything that’s

outside of A1, or outside of A, okay? If the total area is 1, then if

this is 0.3, outside has to be 0.7. So that’s intuitively obvious, once you look at the Venn diagram,

but how do we prove it? Well, the proof is very short as well, we know that 1 equals

the probability of the whole space. That’s axiom one, okay, we also know that we can write

s as A union A complement. s is the same thing as A cuz

that’s everything that’s inside A, that’s everything that’s outside A. Put those together,

that’s everything, okay, now. P of A union A compliment is P

of A plus P of A compliment, because A and A compliment are disjoint. In set notation, we’d write that as just

saying that there intersection is empty. There’s nothing that could be in A and

its compliment by definition. So, they’re disjoined,

so I’m using axiom two. That equals that plus that and that’s

the same thing we were trying to do. So the proof is just the media for use

axiom one, use axiom two and we’re done. All right, so that’s property one,

I’m calling it, no one else calls it property one,

but property one for today. Property two, another useful fact if a is contained in b so a and b are advanced but

this event is inside this event. In words we would say that what

this relationship means is that If A occurs, then B occurs, okay? So this is a larger event. Then, P of A is less than or

equal to P of B. Again, this is obvious, intuitively in

terms of the Venn diagram because B is like a bigger oval

containing the smaller one. It has a bigger area. That’s obvious from the picture. But that’s not a formal proof and

we want to prove this fact. Okay so let’s just quickly

prove this using the axioms. So the proof would go, I’m still

going to draw a little picture for some intuition here’s a and here’s b. B is B is this bigger thing

that contains A, okay? Now, I’m going to think of B

as being split into two parts. There’s A, and then there’s ring of

stuff that’s in B, that’s not in A. Okay?

So, I’m going to decompose B into two parts. So I’m gonna think of B as A union,

union with this ring. This ring is the stuff that’s

in B that’s not in A, so I’d write that as B

intersect A complement. The stuff that’s in B that’s not in A,

okay? And this is disjoint. That is A is disjoint from this, that’s just saying that A is

disjoint from this ring. So these are disjoint, so

therefore we know that P(B)=P(A). Just immediate from axiom two, P of B is

P of A plus P intersect A complement. And probability by definition probability

has to be none negative, so you’re adding something here, adding something none

negative so this is at least P of A. So that’s the end of the proof. So it’s just almost immediate from,

from the axiom two, question?>>Where did you find

the fact that [INAUDIBLE]?>>Probability’s not

supposed to be negative. Well, I didn’t write that,

I wrote that yesterday. P is between,

you could call that an axiom if you want. In that case, call this axiom zero. I stated that last time. I didn’t write it again this time. The reason I didn’t write it as one of the

axioms is if you’re writing out in full you would say, p is a function

from defining known events and taking values between zero and

one, such that this and this. But if you wanna consider that an axiom,

you can. It’s assumed that probability is always

between zero and one, by assumption. Okay, so those are two simple properties. Let’s do one that’s a little

more complicated, and also, very very very useful. And that’s, the question is,

how do we get the probability of a union? Sorry the boards are pretty squeaky. So this is property three how do

we get the probability of a union? Now we know that if a and b are disjoint,

we can just do p(a) + p(b) okay. But if you think about the Venn

diagram again here’s B. And here’s the intersection if we want the area what we should do is you

add the area of a plus the area of b But then we double-counted the stuff

in the intersection, and we subtract that off again. So, we’re gonna prove that that is in

fact the correct way to do it, whether or not the Venn diagram picture

is applicable or not, that suggests the intuition,

and let’s prove the result. And then I’ll do an example

of how you use this. We’ll be using this kind of thing a lot. Because we often want the probability

of a union and we’re not necessarily so lucky as to have disjoint sets. So we wanna know what happens

if they’re not disjoint. Okay, and the result is that we

can add up the probabilities, but we’ve double counted the intersections so

we subtract. The probability of the intersection, okay? So let’s prove that. Well again, let’s think of it,

kind of the strategy here, you know when you’re confronted

with a problem like this and you don’t know how to prove

it A strategy to use would be we know we have to

somehow apply these axioms. But this,

axiom two is the most important one, but this only applies

when they’re disjoint. So somehow we have to make

things disjoint, right? So that’s the strategy. So I’m gonna write P(A union B), In

a different way, right here I can’t apply axiom two, because they’re not disjoint,

so I’m gonna write it in a disjoint way. I call this disjointification. And I think I coined that word,

but I don’t know. So we’re going to do disjointification. And I’ll write the word, and I’ll probably

regret this because it’s a long word. I’ll try to write it fast. So we’re gonna do disjointification. We’re gonna take a union b and

we’re gonna try to write it disjointly. So how do we do that? Well we take A and then we take everything

in B we did not already have in A. So it’s A good union B union

intersect C complement. All right, so that looks similar to that. The only difference is their way we’re

assuming that A is contained in B, and here this is completely general, A and

B are just any two events, okay? So that just says we take A, and

then everything in B we need that, but we don’t need the stuff

that we already had from A. So we’re excluding it. All right, now this and this are

disjoined, so we can immediately apply 2. And we get P(A)+ P(B intersect A compliment), now that’s still not yet

clear why is that the same as this? So here we use another proof strategy,

that I call wishful thinking. So-

>>[LAUGH]>>I wish. I’m gonna put a question mark here. It is very very bad practice to just

start writing things you wish as equality without justification, But

it’s okay if you put a question mark. Okay?

So I’m gonna put a question mark here. P of A plus P of B minus

P of A intercept B. I wish this is true. Now I know this is true but

I’m pretending yet know this and I want to see whether it’s true. So that’s my wish. Now I’m going to try to compare this and

this Well, let’s see P of A is the same

of P of A that’s okay. This will be true if and only if this

term is the same as this difference. So this is equivalent to,

I’ll just rewrite that as This equals this minus this, but it’s going to be easier

to write it without the minus signs. So I’m going to move the P(A

intersect B) over there. So that’s the equivalent of saying that P(A intersect B) + P(B intersect A complement)=P(B). I just said, for this to be true, what we

want is for this to equal this difference. And then I just moved that to

the other side of the equation. Okay, now we just have to check, is this

true, well, yes it is, this is true. It’s immediate by axiom two because, this is true because A intersect B and

A complement. And I’ll write it this way, A intersect B,

and A complement intersect B, are disjoint,

Because if you had an element in both of these sets, it would be in both

A and A complement, that’s impossible. So they’re disjoint, and their union is B. Because what we did was chopped B up into

two parts, we split B into two cases. We have the part of B that’s in A, and

we have the part of B that’s not in A. So there’s two disjoint cases,

so that’s the end of the proof. So this rule is a simple case of

what’s called inclusion and exclusion. And let’s do a more general version

of inclusion-exclusion, and then I’ll talk a little bit

about why it’s called that. And then we’ll do an example

of inclusion-exclusion, okay? So here’s the general inclusion-exclusion

formula, it looks ugly, but the idea is pretty simple. So I just want to extend that to

the case where we have more than two events, so

let’s do a case with three of them first. P(A union B union C) equals, and

I’ll draw another little diagram. Now we have three events, A, B, and C, draw a linking rings kind of thing. And I want the probability of the union, okay, so we’re gonna start

out just like we did there. By adding the individual probabilities,

and then we’re gonna adjust for double counting, things like that. So we’re gonna do P(A) + P(B) + P(C). But it’s clear just from thinking about

the picture that we’ve added too much, right, that we’ve over-counted stuff. So what we have to do is

subtract off the intersections. So we’re gonna subtract off A intersect B,

and we’re gonna subtract off A intersect C,

and we’re gonna subtract B intersect C. Okay, well, now we’ve subtracted too much. If you think of this,

think carefully about this picture, what happened to the stuff

in the triple intersection? Any stuff in the triple intersection,

we added it thee times, then we subtracted it three times,

now we haven’t counted it at all. Everything that’s not in the triple

intersection is correctly accounted for. Stuff that’s in the triple intersection,

we have not counted yet. So we fix that just by adding this back,

A intersect B intersect C. Okay, so that’s inclusion-exclusion, and you can prove this in

a similar way to that. It’s more tedious, and

I’m not gonna write out the whole thing. There’s also some more clever ways to do

it, that we might come to another time. It’s completely analogous,

just a more tedious version of that, and in general, you can use induction. And I’m not gonna write

out the whole induction, because the idea is exactly

the same as this idea. The general case,

when we have the union of n events, The general form of inclusion-exclusion,

If you remember this, then it’s really easy to write this down,

it looks ugly, but it’s the same idea. First we add up the individual events,

then we subtract the intersections. So I’ll just write it this way, subtract

i less than j, P(Ai intersect Aj). I wrote i less than j because then

we don’t have A1 intersect A2, and A2 intersect A1,

we only want to have that once. And then you add the triple, so it’s alternating,

just like this is plus minus plus. We then have triple intersections,

let’s say i less than k less than j, Ai intersect Aj intersect Ak. Minus, dot dot dot, and then the last

term is the intersection of everything, and so it’s either gonna be a plus or

a minus. And I think it should be (-1) to the n+1, because if we think about the case when

n is 2, then we stop by subtracting. And in the case n equals 3,

we stop by adding. So it’s (-1) to the n+1, and

then the intersection of everything, Okay, so hopefully you can see kinda what

the intuition is for this, and why is it called inclusion-exclusion. Because, just think of this case with

three of them, we started by including all of these, but then we’ve overcounted,

and so we start excluding stuff. But then have to adjust the other way,

so we start including and excluding. And it keeps alternating, including and

excluding, and it all works out. Okay, here, this one does not have

a summation because there’s all n of them. So this is all of them,

there’s only one of these, the rest of them will have summations. Okay, so let me do one famous

interesting example of inclusion-exclusion, how you

can apply this to a problem. This is a very old problem,

it goes back to, what’s the guy’s name,

de Montmort’s problem. Which was from 17 something, I think 1713, I wrote it down somewhere, 1713, yeah. And this goes by different

names in the literature, sometimes this is called

the matching problem. Usually I’ll just call it

the matching problem, or I’ll call it de Montmort’s problem. I don’t actually care if you

know the name of the problem, I care that you know the concept. It goes by other names because it’s

a pretty natural problem that comes up in different contexts. There are many different ways

we could phrase this problem. Different forms of the same thing

in different disguises, but I’ll tell you the original one. Not surprisingly, this originated

from considering a gambling game. de Montmort was interested in a lot

of different gambling problems. In 1713,

probability was still in its infancy, and he was studying some of

these gambling problems. Which had real practical consequences for

gamblers, there’s lot of money at stake in this,

that was the motivation. So here’s the problem, so

consider the following card game. We have a deck of cards, now, rather

than thinking like ace of spades and 7 of clubs, it’s simpler to just assume

that the cards are labeled 1 through n, so imagine that. You have a deck of n cards, and each card

just has a number from 1 to n on it, okay? One number per card, so

we have n cards labeled 1 through n. Here’s the problem, so

I’ll describe the game. And I think they actually

played this game back in 1700. Shuffle the cards, okay. Now basically I’m gonna flip

over cards one at a time, okay, and

you’re gonna count from one n, okay. So I’ll flip over

the first card you say one, I flip over the second card you say two,

flip over the third card, you say three. We’ll continue that way. Now if it ever happens that

I flipped over a number and you said that same number, you win. Otherwise, you lose. That’s the game. That’s why I call it matching,

because it’s like, you win if for example the seventh card

is card number seven. That is, the card that’s seventh in the

deck has the number seven written on it. Okay, does everyone

understand what the game is? So I want to know what the probability

that that will happen, that there will be at least one

card whose position in the deck is the same as the number

written on the card. So I’m not going to write out that

whole description just now cause hopefully you’ll understand

what the game is now. So lets find the probability. Well, to find the probability,

we need to define some events. Right? By the way you can try, there’s more

than one way to solve this problem. And you can try to do direct methods, but it’s pretty hard to find a way to do

this without inclusion-exclusion. Not impossible. But inclusion-exclusion is the easiest

way to solve this, I think. And it’s also a good example of it. So, we need to define our events. So, let’s say,

let’s call it Aj be the event that, I’ll just say it in words,

the event is that card Jth card matches. In other words, the Jth card in

the deck is numbered as card J. Okay. So what we are interested in is

the probability of the union. That is just a mathematical expression for the probability that at

least one card matches. That’s probably that you wouldn’t win the

game by having at least one match, okay. So it’s pretty hard to do directly but

inclusion, exclusion you know it looks kinda ugly. So I usually try to find,

when I’m trying to solve, I usually try to avoid

inclusion exclusion at first. But if I don’t think of another method

then I’m going to try this because it’s just a very powerful general technique for

finding the probability of a union. Well, we need to do inclusion and

exclusion, right? So let’s just start writing this out. Luckily, in this problem,

we have a lot of symmetry. And we should take advantage

of symmetry whenever we can. So initially,

I would have to write out this sum. But actually, the probability of aj. Well let’s just kind of do some

kind of scratch calculations. Do some scratch work over here. P(Aj), P(Aj) is the probability

that the jth card in the deck is card j Now you can do this using

a naive definition of probability because we’re assuming…there’s

two ways to think of this. One way would be to have N factorial

as the denominator because we’re using a naive definition, we’re assuming all permutations

of the deck are equally likely. So that would be fine, but an easier way to think of it

is just that it’s one over N. It’s one over n because all

positions are equally likely. All positions are equally likely for

the jth card. I’ll say for the card labeled j. Right, so if the card, I’m looking for

the ace of, imagine I have 52 cards and I’m looking for the ace of spades. Ace of spades is equally

likely to be anywhere so there’s a one in 52 chance

that in any specific position. So, that’s immediately one over n, you can also derive this using naive

definition with an n factorial. Either n minus one factorial over n

factorial, you get the same thing. Notice this does not depend on j, okay? That’s what saves us here. If we had like some complicated

thing involving j and n here then we’re gonna

have to do a summation. But there’s no j here, so that means

we’re just gonna multiply this by n. All right,

now let’s do P of A1 intersect A2. So I’m using symmetry. I could have written AI intersect AJ where

I not equal J, but I may as well for concreteness pick one and two. Let’s just think about what this is. Here I will do the naieve definition. They’re N factorial possible

permutations of the deck of cards, now what is this event say,

it says that the card on top of the deck has a one on it,

the card that is second has a two on it. The other n minus two cards can be

in any order whatsoever, so for the rest of them its n

minus two factorial. And we could either leave it this way,

or we can cancel some stuff out and write this as one over n(n- 1). Because everything else cancels, okay? And continuing in this way if

we want P(A1) intersect blah, blah, blah Intersect, let’s say, Ak. Well, what that says is that the first

k cards are exactly one up to k. The remaining (n- k) cards can

be in any order whatsoever. So the denominator is still n factorial,

the numerator is n minus k factorial. All right? So that looks kind of messy, but

the nice thing is that we have symmetry, so this will work whatever

choices you put there, okay? So now let’s just quickly apply inclusion,

exclusion. We start by adding up these ones,

and there’s N of them, so it’s N times one over N. Now, we add the next ones. How many of these are there? Well, there’s N choose two of these terms. Except we’re subtracting now,

because we alternate. And there’s n choose two, but I’m gonna

write n choose two as n(n- 1)/ 2, times that thing, one / n(n- 1),

so it’s one-half. And well, let’s just do one more term,

and then you’ll get the idea. N choose three,

I can also write it as this way. N choose three is n(n- 1)(n-

2) over three factorial. I can think of this as

two factorial if I want. Times, and then whatever this thing is. In this case, well,

it looks like n(n- 1)(n- 2). So you actually see,

something really nice happens. Which is you have n and n- 1. N and N cancels. n and n- 1 cancels. n, n- 1,

everything is basically cancelling here. The only thing that’s left is these

numbers, this minus dot, dot, dot. So what this then works out to,

it’s gonna look like this alternating sum. But I’ll just say what

it is approximately. Is one minus one over two factorial,

plus one over three factorial, minus one over four factorial plus blah,

blah, blah. This is actually equal at this point. And then the last term

is whatever it is -1 to the something over one over n factorial. Whatever the pattern is for the last one. Let me make sure I got

the last term right. And so yeah so, this is plus negative

one to the n, one over n factorial. I’m just following the same pattern here. Okay.

This looks messy. That’s the exact answer. I know we’re out of time, but this is the end because this we can

immediately say, this looks familiar. This looks ugly, but

looks familiar.” Right? This should remind you of

the Taylor series for e to the x. So, in fact, this is approximately

one minus one over E. We’re going to see this number

one over E a lot in this class. Seems like this problem has

nothing to do with E, but here E comes up just

cause we have this sum. Okay, so have a good weekend.