Probability Axioms
Get Probability Axioms essential facts below. View Videos or join the Probability Axioms discussion. Add Probability Axioms to your PopFlock.com topic list for future reference or share this resource on social media.
Probability Axioms

The Kolmogorov axioms are the foundations of probability theory introduced by Andrey Kolmogorov in 1933.[1] These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.[2] An alternative approach to formalising probability, favoured by some Bayesians, is given by Cox's theorem.[3]

Axioms

The assumptions as to setting up the axioms can be summarised as follows: Let (?, FP) be a measure space with ${\displaystyle P(E)}$ being the probability of some event E, and ${\displaystyle P(\Omega )}$ = 1. Then (?, FP) is a probability space, with sample space ?, event space F and probability measure P.[1]

First axiom

The probability of an event is a non-negative real number:

${\displaystyle P(E)\in \mathbb {R} ,P(E)\geq 0\qquad \forall E\in F}$

where ${\displaystyle F}$ is the event space. It follows that ${\displaystyle P(E)}$ is always finite, in contrast with more general measure theory. Theories which assign negative probability relax the first axiom.

Second axiom

This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1

${\displaystyle P(\Omega )=1.}$

Third axiom

This is the assumption of σ-additivity:

Any countable sequence of disjoint sets (synonymous with mutually exclusive events) ${\displaystyle E_{1},E_{2},\ldots }$ satisfies
${\displaystyle P\left(\bigcup _{i=1}^{\infty }E_{i}\right)=\sum _{i=1}^{\infty }P(E_{i}).}$

Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra.[4]Quasiprobability distributions in general relax the third axiom.

Consequences

From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs[5][6][7] of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the remaining two axioms. Four of the immediate corollaries and their proofs are shown below:

Monotonicity

${\displaystyle \quad {\text{if}}\quad A\subseteq B\quad {\text{then}}\quad P(A)\leq P(B).}$

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

Proof of monotonicity[5]

In order to verify the monotonicity property, we set ${\displaystyle E_{1}=A}$ and ${\displaystyle E_{2}=B\setminus A}$, where ${\displaystyle A\subseteq B}$ and ${\displaystyle E_{i}=\varnothing }$ for ${\displaystyle i\geq 3}$. It is easy to see that the sets ${\displaystyle E_{i}}$ are pairwise disjoint and ${\displaystyle E_{1}\cup E_{2}\cup \cdots =B}$. Hence, we obtain from the third axiom that

${\displaystyle P(A)+P(B\setminus A)+\sum _{i=3}^{\infty }P(E_{i})=P(B).}$

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to ${\displaystyle P(B)}$ which is finite, we obtain both ${\displaystyle P(A)\leq P(B)}$ and ${\displaystyle P(\varnothing )=0}$.

The probability of the empty set

${\displaystyle P(\varnothing )=0.}$

In some cases, ${\displaystyle \varnothing }$ is not the only event with probability 0.

Proof of probability of the empty set

As shown in the previous proof, ${\displaystyle P(\varnothing )=0}$. However, this statement is seen by contradiction: if ${\displaystyle P(\varnothing )=a}$ then the left hand side ${\displaystyle [P(A)+P(B\setminus A)+\sum _{i=3}^{\infty }P(E_{i})]}$ is not less than infinity; ${\displaystyle \sum _{i=3}^{\infty }P(E_{i})=\sum _{i=3}^{\infty }P(\varnothing )=\sum _{i=3}^{\infty }a={\begin{cases}0&{\text{if }}a=0,\\\infty &{\text{if }}a>0.\end{cases}}}$

If ${\displaystyle a>0}$ then we obtain a contradiction, because the sum does not exceed ${\displaystyle P(B)}$ which is finite. Thus, ${\displaystyle a=0}$. We have shown as a byproduct of the proof of monotonicity that ${\displaystyle P(\varnothing )=0}$.

The complement rule

${\displaystyle P\left(A^{c}\right)=P(\Omega \setminus A)=1-P(A)}$

Proof of the complement rule

Given ${\displaystyle A}$ and ${\displaystyle A^{c}}$are mutually exclusive and that ${\displaystyle A\cup A^{c}=\Omega }$:

${\displaystyle P(A\cup A^{c})=P(A)+P(A^{c})}$ ... (by axiom 3)

and, ${\displaystyle P(A\cup A^{c})=P(\Omega )=1}$ ... (by axiom 2)

${\displaystyle \Rightarrow P(A)+P(A^{c})=1}$

${\displaystyle \therefore P(A^{c})=1-P(A)}$

The numeric bound

It immediately follows from the monotonicity property that

${\displaystyle 0\leq P(E)\leq 1\qquad \forall E\in F.}$

Proof of the numeric bound

Given the complement rule ${\displaystyle P(E^{c})=1-P(E)}$ and axiom 1 ${\displaystyle P(E^{c})\geq 0}$:

${\displaystyle 1-P(E)\geq 0}$

${\displaystyle \Rightarrow 1\geq P(E)}$

${\displaystyle \therefore 0\leq P(E)\leq 1}$

Further consequences

Another important property is:

${\displaystyle P(A\cup B)=P(A)+P(B)-P(A\cap B).}$

This is called the addition law of probability, or the sum rule. That is, the probability that A or B will happen is the sum of the probabilities that A will happen and that B will happen, minus the probability that both A and B will happen. The proof of this is as follows:

Firstly,

${\displaystyle P(A\cup B)=P(A)+P(B\setminus A)}$ ... (by Axiom 3)

So,

${\displaystyle P(A\cup B)=P(A)+P(B\setminus (A\cap B))}$ (by ${\displaystyle B\setminus A=B\setminus (A\cap B)}$).

Also,

${\displaystyle P(B)=P(B\setminus (A\cap B))+P(A\cap B)}$

and eliminating ${\displaystyle P(B\setminus (A\cap B))}$ from both equations gives us the desired result.

An extension of the addition law to any number of sets is the inclusion-exclusion principle.

Setting B to the complement Ac of A in the addition law gives

${\displaystyle P\left(A^{c}\right)=P(\Omega \setminus A)=1-P(A)}$

That is, the probability that any event will not happen (or the event's complement) is 1 minus the probability that it will.

Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair.

We may define:

${\displaystyle \Omega =\{H,T\}}$
${\displaystyle F=\{\varnothing ,\{H\},\{T\},\{H,T\}\}}$

Kolmogorov's axioms imply that:

${\displaystyle P(\varnothing )=0}$

The probability of neither heads nor tails, is 0.

${\displaystyle P(\{H,T\}^{c})=0}$

The probability of either heads or tails, is 1.

${\displaystyle P(\{H\})+P(\{T\})=1}$

The sum of the probability of heads and the probability of tails, is 1.

References

1. ^ a b Kolmogorov, Andrey (1950) [1933]. Foundations of the theory of probability. New York, USA: Chelsea Publishing Company.
2. ^ Aldous, David. "What is the significance of the Kolmogorov axioms?". David Aldous. Retrieved 2019.
3. ^ Terenin Alexander; David Draper (2015). "Cox's Theorem and the Jaynesian Interpretation of Probability". arXiv:1507.06597. Bibcode:2015arXiv150706597T. Cite journal requires |journal= (help)
4. ^ Hájek, Alan (August 28, 2019). "Interpretations of Probability". Stanford Encyclopedia of Philosophy. Retrieved 2019.
5. ^ a b Ross, Sheldon M. (2014). A first course in probability (Ninth ed.). Upper Saddle River, New Jersey. pp. 27, 28. ISBN 978-0-321-79477-2. OCLC 827003384.
6. ^ Gerard, David (December 9, 2017). "Proofs from axioms" (PDF). Retrieved 2019.
7. ^ Jackson, Bill (2010). "Probability (Lecture Notes - Week 3)" (PDF). School of Mathematics, Queen Mary University of London. Retrieved 2019.