We collate the data in Excel, which students access on their laptops. I ask them what questions they have about the data. Most want to compare the mean and standard deviation of the first and second attempts, to see if there is an improvement. A couple of students want to find out who 'won' (drew the most accurate triangle), and who made the most improvement.
The mean and standard deviation for the first triangle attempt are 9.1 and 3.2 respectively, and 9.7 and 0.5 for the second attempt. The students notice the slight improvement in the mean, and the large reduction in standard deviation.
We upload the data into SPSS. I invite them to create histograms of the data, which look like this:
The main aim of this session is a re-introduction to continuous random variables, focusing eventually on the normal distribution. I have chosen to start with the uniform distribution, to demonstrate how the area under the probability density function relates to probability.
After a brief introduction to the uniform distribution, I suggest that we might model the side lengths of the first attempt (but not the second) as a uniform random variable. We might have expected the lengths to be clustered around 10cm, but they are not, they are spread across the range 4.9 to 15, so X ~ U(5,15) could be appropriate (5 not 4.9, for simplicity).
What is the probability density function for this variable? It is clear to them that f(x) = 1/10 for 5<x<15, and 0 otherwise. I invite the learners to sketch the pdf, which they do. I ask: what questions do you have? They would like to know the mean and variance.
One of the students thinks the mean is 1/10. I am surprised. I say "1/10 is the value of the function... the mean is not 1/10". It is unusual for me to directly state that a learner is not correct, but for clarity I feel it is appropriate. I write "mean = ?" on the board.
Another student, David, then says: "To get the mean are we going to have to multiply by that 1/10, which I think represents the probability of drawing a side of length 5, 6, 7, 8, ... times those by 1/10, like when we did the mean when it was discrete... but obviously in this case it is not discrete, it's continuous..."
There is some nodding. I am aware that these students almost certainly share some mis-conceptions around probability density functions. It is moments like these that are the gateway to discussion.
"Hold on a second!" I say, as I start to write this out, like a discrete random variable (which we did last week) on the board, and start writing the integer values each multiplied by 1/10. David then says: "But now you can have, like, 6.4, and all values in between...". "Yeah, nice, nice" I say. I recognise the urge to explain, but instead I pause and say: "So, carry on talking." He continues: "So, how would we go about, then, finding the mean?" I reply, to the whole room: "Yeah, what's the equivalent of this? [pointing to the 'discrete' calculation] We can't do 9.3 times a tenth, and 9.4, and 9.5... "
I'm becoming animated. I recognise that this is the perfect opportunity for reaching a deeper understanding of probability density functions. "This is really important, this is the whole essence of the topic!" I exclaim. I could not have planned for this to happen, these moments of grace in teaching.
David continues: "Our distribution is telling us that 9.3, 9.4, 9.5, there's a probability of one over ten of somebody picking that..." I interrupt: "Pause!" A few moments of silence. "Is that right?" he asks. "That's a statement," I say. More silence. I walk over to the board. "This is a statement we need to think about carefully.... say that statement again."
David repeats: "That one over ten, that represents the probability of somebody drawing a side of length say 5.1, and also the probability of 5.2... " I say again: "Pause!" A few more moments of silence. I turn to everyone and say: "Do you agree with that statement?" No one agrees or disagrees.
I write David's statement on the board, and then invite the students to discuss it in groups. I sit amongst them while they talk about it. Their discussions touch on ideas such as area, I hear the word integration get mentioned. After a while I bring the group back together.
David starts the discussion by talking about the area underneath the whole pdf being 1, and is standing by his statement. I reply: "So if I said to you what's the probability of drawing a side length of 7.8, you'd tell me the answer to that was...?" I write "P(X=7.8) = " on the board.
Edward, who is on the same table as David, unexpectedly interjects with "Zero!"
"You would say zero?" I say to Edward, and write "P(X=7.8)=0" on the board. David interrupts: "It'd be one over ten..." I write "P(X=7.8)=1/10" underneath. He continues: "If you mark 7.8 on the x-axis [I am drawing on the board as he is speaking] and drew a line up to where the [pdf] is, that is one over ten. But if we wanted a probability of, say, between 7.8 and say 9, then we could do that with a region, that area." I draw this region on the diagram.
"There two things going on," I say. "There's disagreement here... [pointing back and forth between the two contradictory probability statements]... David and Edward can't both be right... can they? And, everyone seems to be agreeing with using the area to find the probability of a range like that [pointing at the region we have just drawn], there's lots of nodding of heads..."
It's gone past the time for the mid-session break. My urge is to continue, to thrash it out, but an alternative comes available: I can leave things here, to provide the opportunity for the students to do some (possibly subconscious) work over the interval.
Most of the students go off to get cake. I start chatting to two of the students, Fiona and Jenny. First Fiona, and then Jenny, come to the board to demonstrate what they have been thinking about. Their argument seems to me to be partly red herring (if there is indeed such a thing) and partly significant. The other students start filtering back in, drinking and eating cake in the room. A couple of them start to get involved in the discussion.
It is nearly time to re-start the lesson. Most people are eating. Jenny is still at the board. She is now having a discussion with Edward. After many attempts to articulate her thinking, she says: "We're saying that we have multitude of possibilities, all of them with a tenth of the chance of happening... if they all have a tenth of a chance of happening, then you'll have a probability higher than one." I start clicking my fingers. I interrupt, and ask Jenny to repeat her statement. I don't want it to get lost amongst the cake-eating. She re-articulates: "If you have a multitude of possibilities, they can't have a probability of a tenth because the probability will be higher than one." Everyone else is still finishing their cake, only Edward has heard it. "Hold that thought!" I say.
Break finishes, and we continue. "Let's clarify then, who's right, David or Edward? Is the probability of drawing a side of length 7.8 zero, or a tenth?" Almost simultaneously, Jenny says "A tenth," and Edward says, "Zero." I am surprised by Jenny's answer, and exclaim: "Do you still think it's a tenth?" Edward says "But you just proved that it can't be a tenth!" Jenny pauses: "But... then how could it be zero?"
Edward says: "Because it has no area." I confirm that she's just proved that it can't be a tenth. "Yes... ," says Jenny, "... but it's not zero. I don't think it's zero because there must be a probability of it happening given that it's in the range 5 to 15. But a tenth doesn't make sense, because if you have a multitude of possibilities all equal to a tenth, you'd have a probability higher than one."
After some discussion, everyone agrees that the probability of drawing a side length of exactly 7.8 is zero. Fiona says, "So wait, what you're saying is that you can't pinpoint an exact probability for something, but you can work out the probability of a range of values by the area." Me: "Boom...! That feels significant... you can use area to find ranges of probabilities... Which is what Edward was saying, so he was right." All of the students are spontaneously taking notes.
The lesson moves on to a re-introduction to the normal distribution. The above discussion seems to have provided a good basis for what follows, using area under a pdf to find probabilities.
After some 'abstract' work on the normal distribution, we return to the triangle data again, this time supposing each triangle drawing attempt can be modelled by a normal distribution, so we have X ~ N(9.1, 3.2^2) and Y ~ N(9.7, 0.5^2). Again, I ask what questions the students have. One of them asks what the probability is of drawing a side within 1cm of the desired length, i.e. P(9<X<11) and P(9<Y<11). This is a nice example, giving a meaning to finding probabilities. It turns out that P(9<X<11)=0.23 and P(9<Y<11)=0.91, a good indication of an improvement from the first to second attempt.
We will return to this data when looking at paired t-tests in coming weeks. Thank you to Peter Gates for reminding me that statistics lessons make most sense when based on data, rather than a series of meaningless calculations.