Uncertainty

uncertainty: a situation involving imperfect or unknown information

probability: a numerical description of how likely an event is going to happen or that a proposition is true

possible world: possible events given a situation, e.g., getting a ‘1’ when rolling a dice; notated with the letter:

ω

set of all possible worlds: all possible worlds combined, which when added up equal one; e,g., getting a ‘1, 2, 3, 4, 5 or 6’ when rolling a dice; notated with the letter:

Ω
P(ω)

range of possibilities: ‘0’ means an event is certain not to happen, whereas ‘1’ means an event is absolutely certain to happen, notated as:

0 ≤ P(ω) ≤ 1

unconditional probability: the degree of belief in a proposition in the absence of any other evidence

conditional probability: the degree of belief in a proposition given some evidence that has already been revealed; the probability of ‘rain today’ given ‘rain yesterday’:

P(a|b) (probability of a given b), 
P(rain today|rain yesterday)
P(a|b) = [P(a ∧ b)] / P(b)
P(a ∧ b) = P(b) P(a|b)
P(a ∧ b) = P(a) P(b|a)

random variable: a variable in probability theory with a domain of possible values it can take on, for example:

Weather
{sun, cloud, rain, wind, snow}

probability distribution: a mathematical function that provides the probabilities of occurrence of different possible outcomes, for example:

P(Flight = on time) = 0.6 
P(Flight = delayed) = 0.3
P(Flight = cancelled) = 0.1
or:P(Flight) = ⟨0.6, 0.3, 0.1⟩

independence: the knowledge that one event occurs does not affect the probability of the other event

P(a ∧ b) = P(a)P(b|a) or
P(a ∧ b) = P(a)P(b)

Bayes’ rule: (or Bayes’ theorem) of one probability theory’s most important rules, describing the probability of an event, based on prior knowledge of conditions that might be related:

P(b|a) = [P(b) P(a|b)] / P(a)

Thus, knowing…

P(cloudy morning | rainy afternoon)

… we can calculate:

P(rainy afternoon | cloudy morning)
P(rain|clouds) = [ P(clouds|rain)P(rain) ] / P(clouds)

joint probability: the likelihood that two events will happen at the same time

P(a,b) = P(a) * P(9)

probability rules: a number of algebraic manipulations useful to calculate different probabilities, including negation, inclusion-exclusion, marginalization, or conditioning

negation: a handy probability rule to figure out the probability of an event not happening, for example:

P(¬cloud) = 1 − P(cloud)

inclusion-exclusion: another probability rule, which excludes double-counts to calculate the probability of event a or b:

P(a ∨ b) = P(a) + P(b) − P(a ∧ b)

marginalization: a very useful probability rule (much more details here by Jonny Brooks-Bartlett)

P(a) = P(a, b) + P(a, ¬b)

conditioning: our final probability rule, implying that if we have two events (a and b), instead of having access to their joint probabilities, we have access to their conditional probabilities:

P(a) = P(a|b)P(b) + P(a|¬b)P(¬b)

bayesian networks: a data structure that represents the dependencies among random variables

inference: the process of using data analysis to deduce properties of an underlying distribution of probability

query: variable for which to compute the distribution

evidence variable: observed variables for event e

hidden variable: non-evidence, non-query variable

inference by enumeration: a process for solving inference queries given a joint distribution and conditional probabilities

approximate inference: a systematic iterative method to estimate solutions, such as a Monte-Carlo simulation

sampling: a technique in which samples from a larger population are chosen using various probability methods

rejection sampling: (or acceptance-rejection method) a basic technique used to generate observations from a given distribution

likelihood weighting: a form of importance sampling where various variables are sampled in a predefined order and where evidence is used to update the weights

Markov assumption: the assumption that the current state depends on only a finite fixed number of previous states

Markov chain: a sequence of random variables where the distribution of each variable follows the Markov assumption

hidden Markov models: a Markov model for a system with hidden states that generate some observed event

sensor Markov assumption: the assumption that the evidence variable depends only the corresponding state

filtering: a practical application of probability information: given observations from start until now, calculate a distribution for the current state

prediction: a practical application of probability information: given observations from start until now, calculate a distribution for a future state

smoothing: a practical application of probability information: given observations from start until now, calculate a distribution for past state

most likely explanation: a practical application of probability information: given observations from start until now, calculate the most likely sequence of states