Since Thoughts on Bayesian Networks, I've been thinking about how they actually work and why they work. I'm going to walk through the process behind the theory I presented previously.
From a learning perspective, i.e., how they learn, my research suggests that they rely on statistics about the increasing number of observations over time. As they increase, this affects the average occurrence of any particular situation as they occur (or do not reoccur).
For example, if you're designing a spam filter which is to learn and then ultimately decide which mail is spam and which is legitimate, you might start by collecting samples over time and then counting which aspects or conditions about the mail lead to known spam mail. As more email comes in, the larger your data grows over time and the larger occurrences of those conditions that lead to spam being detected as spam.
Another example I gave earlier is in predicting the weather. I'll try and detail the process that is used using Bayesian Networks:
If we are going to make predictions about if it's going to rain or not, we need to collect information about what happens when it does actually rain or what happens when it does not rain, i.e we need to collect daily weather observations. For example:
| Observation # | Cloudy | Humid | Rain | |
| A | 1 | Y (1) | H (1) | Y (1) |
| A | 2 | Y (1) | H (1) | Y (1) |
| B | 3 | Y | L (0) | N |
| C | 4 | N | H | Y |
| A | 5 | Y | H | N |
| D | 6 | N | L | N |
| A | 7 | Y | H | Y |
| C | 8 | N | H | N |
| A | 9 | Y | H | Y |
| D | 10 | N | L | N |
| B | 11 | Y | L | N |
| C | 12 | N | H | N |
This shows 12 observations where some aspects of those observations include rain, cloudy and humid conditions. There are 4 unique combinations of Cloud and Humid conditions and Rain. The patterns are labelled as A, B, C, D.
These are boolean conditions, so if the condition occurred, e.g, it was humid, we use the value of 1, otherwise 0, etc.
We will then work out which combination of conditions or patterns within that data seem to correlate with rain occurring on those days. From these patterns, we will see how many times those conditions occurred over time to determine the average that those particular conditions correlated with rain.
For example, we might say that a particular pattern in the observations almost always also correlates with rain, so we might say then that of all observations that included rain, that particular pattern was present 90% of the time.
From this historical data (observations), we can now start answering probability questions about rain. That is, we can make an inference:
What is the historical probability that it rains, given that it is cloudy and humid (Pattern A)?
We can tell this by seeing that from the historical data (the 12 past observations), there are 4 situations where it is cloudy and humid and rains. But there is another occurrence of cloud and humid conditions where it did not rain. So we have 4 out of 5 occurrences of humid and cloudy, where it also rained, i.e there is a 4/5 chance or 80% probability of rain when it's cloudy and humid:
P(Rain=1|Cloudy=1, Humid=1) = 4/5 or 0.8 or 80% (Pattern A)
But this is only part of the story. Other situations/patterns/conditions also coincide with rain:
| Pattern | Cloudy | Humid | Rained = Yes (1) | Rained = No (0) | Occurrence/frequency of condition patterns | P(Rained=1|C,H) | P(Rained=0, C, H) |
| A | 1 | 1 | 4 | 1 | 5 | 0.8 (4/5) | 0.2 (1/5) |
| B | 1 | 0 | 0 | 2 | 2 | 0.00 (0/2) | 1.0 (2/2) |
| C | 0 | 1 | 1 | 2 | 3 | 0.33 (2/3) | 0.67 (2/3) |
| D | 0 | 0 | 0 | 2 | 2 | 0.00 (0/2) | 1.0 (2/2) |
For example, there is a 33% probability of rain when it is humid, but not cloudy. The other combinations (situations) show there is a 0% probability of rain in those situations (again only based on historical data).
To clarify what you are seeing in the above table, the last two columns show the dividend of the number of times it rained (Rain=1) with a particular situation/pattern (of cloud and humid), and how many times this pattern occurred in all the historical data (weather it rained or not). So for this, we counted cases where conditions coincided with rain, and the very last column, the times when it did not, ie the negative case of when they did. So we can now say we have a full idea of which situations coincide with rain. This is called a Conditional Probability Table (CPT), and the last two columns are conditional probabilities. Conditional probabilities are the probability of rain, given the pattern of conditions that coincide with rain.
The idea is that we now know what the historical probability of rain is given various situations of humid and if its cloudy, and other combinations. This is all based on historical data, however. This doesn't say anything about what will happen today. But if we determine that we know that it is cloudy or humid today with some degree of probability, we can factor that degree of probability into what the historical data knows is the case when those probabilities are 100% (1) or 0% (0) because that's how we created our CPT from the historical data.
For example, if we know the probability of cloud and humid conditions today:
| .P(Cloudy=1) | . P(Cloudy=0) | .P(Humid=1) | .P(Humid=0) |
| 0.6 | 0.4 | 0.5 | 0.5 |
What this is telling us is that we know that there is a 60% chance that it IS cloudy today, 40% chance that it isn't, and also that there is a 50% chance it is humid. So with today's values, we can infer the probability that it rains today by marginalising over the conditions (H,C):
P(Rain=1) = E P(Rain=1|C,H) x P(C) x (H) into a table like this:
| C | H | P(Rain=1|C,H) | P(C) | P(H) | Contribution | Note |
| 1 | 1 | 0.8 | 0.6 | 0.5 | 0.8 x 0.6 x 0.5 = 0.24 | 24 % chance of rain if cloud and humid today |
| 1 | 0 | 0 | 0.6 | 0.5 | 0 x 0.6 x 0.5 = 0 | 0% chance of rain if it's cloud but not humid today |
| 0 | 1 | 0.33 | 0.4 | 0.5 | 0.33 x 0.4 x 0.5 = 00.66 | 6% chance of rain if it's humid but not cloud today |
| 0 | 0 | 0.4 | 0.5 | 0.5 | 0 x 0.4 x 0.5 = 0 | 0% chance of rain if it's not humid and not cloudy today |
| Sum total: | 0.306 | |||||
The sum of 0.306 says that of all the possible outcomes (scenario combinations), there's a 30.6% chance of rain today, given the possible probabilities for cloud and humid today!
P(Rain=1) = 30.6%
We can also make the above inference more accurate, if we know for certain something about the condition being true instead of using a probability such as 0.6 for cloudy:
If you knew that it was Cloudy today, i.e P(C=1) = 1, i.e you have known evidence (not a probability) then you can be more sure about the probability of rain:
P(R=1|C=1) = E P(R=1|C=1,H) x P(H):
- 0.8 [P(R=1|C=1, H=1)] x 0.5 [P(H=1)]
- 0 [P(R=1|C=1,H=0)] x 0.5 [P(H=0)]
Into a table:
| 0.8 [P(R=1|C=1, H=1)] x 0.5 [P(H=1)] | 0 [P(R=1|C=1,H=0)] x 0.5 [P(H=0)] |
| Total (0.4 + 0) = 0.4 | |
| 0.4 | 0 |
Meaning that if we know that it is cloudy, we can say that there is a larger, or 40% chance of rain today, given the historical data.
So in summary, we can consolidate this process into fixed steps:
- Work out the conditional probability of rain using past observations of rain and its conditions (eg, humidity and cloudy)
- To infer the probability of rain today, use the conditional probability of rain and use today's conditions to scale the conditional probability for rain for today's values, and sum up all possible conditional probabilities of rain to obtain today's probability of rain. This is also known as marginalising over today's conditions.
If you know what a condition is today, e.g, it's cloudy today, use its value in looking up the conditional probability in the conditional probability table (CPT) and marginalise over the remaining conditions.