P-Values and probability of being wrong
I must admit, most of the time I read formal definitions of Mathematics and I wonder how anyone can actually learn from them. For example, when learning about p-values, this definition is used to explain what they are:
P-value is the probability of obtaining test results at least as extreme as the result actually observed.
Generally, the goal of Mathematics is to be general, abstract and all-encompassing, but this can lead to definitions that are verbose or unintuitive exactly because they try to describe the application of ideas to every possibility (generalization) into a sentence or words that represent that generalization or abstraction. This is not useful I feel for teaching or learning.
Apart from the age-old problem of not wanting to learn Mathematical concepts because learning it appears not immediately useful, which often is the case, eg. if you've got no interest in right-angled triangles and you are told you need to learn about Trigonometry, then you're less likely to learn it well.
Coupled with this is that when the Math concepts are taught, they are unintuitively explained, precisely because they start with the abstract/general concepts or the end-product of mathematics (like the definition above) instead of aiming to reach that abstraction point from a place of ignorance, which I'd argue most people who want to learn would agree they should start from. This harms Mathematics I feel.
It's almost like telling someone how to be a world-cup capable rugby player (or expecting them to be one) by giving them the trophy that won the rugby world cup. That's just silly.
So what about the p-value? Well, that is what I spent most of today trying to understand.
Ultimately my findings are that it is the likelihood that a claim (or conjecture or hypothesis) you make about something is actually wrong.
Weird you might think, why would you want to be so formal about that? The reason is that you want to provide a degree of confidence in your claim (the likelihood of being wrong being low) and you want to show that this likelihood is based on evidence of real underlying something, not just a hearsay claim about the something.
This likelihood of being wrong is based on testing your claim against the source of observations, instead of just merely making the claim and having no testing of the observations to back your claim up. This idea now seems easier to grasp but can also appear a weird thing/concept to want to understand.
You want the likelihood of your claim being wrong to be low, so a low p-value is good. So how do you calculate this likelihood of being wrong?
You need to construct a test that checks that each actual observation matches your claim, and run all the observations through this test, and if all the test results show that your claim is correct for each observation, then the likelihood of your claim being false overall is 0%, i.e the p-value is 0.
This might seem strange but its basically a way to substantiate your claim by showing that you based it on testing your claim on the actual data.
Here is an example:
If I have collected 5 apples and from those apples, I've somehow determined and claimed that they are [all ok to eat] then that's fine, but if you can say that they are all ok to eat because I tested every single one and that's how I came up with that claim, then the claim is now based on testing the observations as a means to show evidence that my claim is unlikely to be wrong, then this is better than just making the claim based on nothing.
I can derive a p-value that indicates this likelihood that my claim is wrong, based on actual testing:
My actual testing on the observed apples:
- Apple 1 has no discolouration and smells ok, so its [OK to eat], and it actually is [OK to eat]
- Apple 2 has no discolouration and smells ok, so its [OK to eat], and it actually is [OK to eat]
- Apple 3 has no discolouration and smells ok, so its [OK to eat], and it actually is [OK to eat]
- Apple 4 has no discolouration and smells ok, so its [OK to eat], and it actually is [OK to eat]
- Apple 5 has no discolouration and smells ok, so its [OK to eat], and it actually is [OK to eat]
That's 5/5 of the tests that show all the apples are OK to eat, meaning that it's 0% likely that my claim that all the apples are [OK to eat] is wrong. My p-value is 0. Furthermore, because my test which is [if the apple has no discolouration and smells ok then it means it is OK to eat] was actually correct about the apple being OK to eat, it's also a good test because it matches the actual reality of being OK to eat.
However, if my testing results were different:
- Apple 1 has no discolouration and smells ok, so its [OK to eat], and it actually is [OK to eat]
- Apple 2 has no discolouration and smells ok, so its [OK to eat], and it actually is [OK to eat]
- Apple 3 has no discolouration and smells ok, so its [OK to eat], and it NOT is [OK to eat]
- Apple 4 has no discolouration and smells ok, so its [OK to eat], and it actually is [OK to eat]
- Apple 5 has no discolouration and smells ok, so its [OK to eat], and it NOT is [OK to eat]
That's 3/5 of the tests that show my claim to be correct while 2/5 show my claim to be wrong, meaning it's a 40% chance that my overall claim is wrong, based on testing. My p-value is 0.4 (40%) and that test of mine doesn't seem to be that good. Either way, my claim is now being based on testing the data, not just a claim based on no evidence.
So we now have a level of confidence that our claim we made that all apples are [OK to eat] is correct. That confidence level is p = 0.4 in the last example, meaning we aren't 100% confident based on the testing of that data that claim is universal across all the apples - because it wasn't in our testing.
So ultimately we have a claim (all apples are OK to eat), and we have a level of confidence in that claim (p-value), derived from testing the apples/data. This is really it.
That p-value can also be seen as the likelihood of our tests producing false positives when testing our claim to be true. Above there were 2 false positives (Apple 3 and Apple 5). The p-value can also be seen as the rate of failure e.g 2/5 that the claim was correct against all the data tested.