This blog post will serve as a quick tutorial to basic probability and random variables, and encoding them in Racket. It assumes basic knowledge with sets and programming.

# Basic probability

*Definition 1*: A **sample space** is defined as the set of all possible events of an experiment.

For example, in rolling 6-sided dice, the set would be .

*Definition 2*: An **event** is a subset of the sample space.

For example, in rolling 6-sided dice where the number is even, the set would be .

*Definition 3*: The **probability** of a given event in a sample space is a number between 0 and 1 which represents the chance of that event happening, and is denoted . The number is calculated as the quotient between the length of the event and the length of the sample space.

For example, . This is the chance of rolling an even number in a 6-sided dice.

*Definition 4*: The **homogeneous probability distribution** is the probability distribution that assigns to each region of the space a probability proportional to the volume of the region.

We will rely on Definition 4 for simplicity. This means that the probability of rolling any number in is simply .

We can now encode these definitions in Scheme as follows:

(define (calculate-probability omega) (/ 1 (length omega))) (define (calculate-events omega pred) (filter pred omega)) (define (calculate-probability-events omega events) (/ (length events) (length omega)))

We have our `omega`

and `events`

sets defined as:

; Rolling a dice (define omega '(1 2 3 4 5 6)) (define events (calculate-events omega even?))

Now we can calculate these probabilities as follows:

; Probability of rolling a single number in a 6-sided dice (calculate-probability omega) ; Probability of rolling an even number in a 6-sided dice (calculate-probability-events omega events)

As expected, we get and respectively.

# Random variables

Let us now consider rolling two dices. This is simply :

; Rolling two dice (define omega-2 (cartesian-product omega omega))

That set is a little bigger, it contains 36 elements: . The probability of rolling any pair is . For example, in rolling two dice, the chance to roll a 1 and then roll a 2 is .

Now, what if we want to calculate the chance of some specific event upon these elements? E.g. that the sum of the numbers is even. One way to do that is by using the previous way with events: . But, this is a bit inconvenient.

*Definition 5*: A **random variable** is a function from events to a set . Usually, , i.e. we’re dealing with real numbers.

Why are random variables useful? They allow us to map events to some number, potentially different from probability. The code is simply:

(define (calculate-random-variable omega transform) (map transform omega))

Now, we define our `omega-random-var`

as follows:

(define omega-random-var (calculate-random-variable omega-2 (lambda (x) (foldr + 0 x))))

This set will simply contain the sums for the pairs . We will make it a bit convenient so that the connection between a pair and a specific sum is apparent:

(define omega-2-transformed (map cons omega-2 omega-random-var)) (sort omega-2-transformed >= #:key cdr)

This will list the following pairs:

'(((6 6) . 12) ((6 5) . 11) ((5 6) . 11) ((6 4) . 10) ((5 5) . 10) ((4 6) . 10) ((6 3) . 9) ((5 4) . 9) ((4 5) . 9) ((3 6) . 9) ((6 2) . 8) ((5 3) . 8) ((4 4) . 8) ((3 5) . 8) ((2 6) . 8) ((6 1) . 7) ((5 2) . 7) ((4 3) . 7) ((3 4) . 7) ((2 5) . 7) ((1 6) . 7) ((5 1) . 6) ((4 2) . 6) ((3 3) . 6) ((2 4) . 6) ((1 5) . 6) ((4 1) . 5) ((3 2) . 5) ((2 3) . 5) ((1 4) . 5) ((3 1) . 4) ((2 2) . 4) ((1 3) . 4) ((2 1) . 3) ((1 2) . 3) ((1 1) . 2))

The probability of rolling two dices that produce the sum 12 is , for sum 11 it is , etc.

Now, the **distribution** will show how “scattered” the probabilities are. As we can see, in the previous list, the biggest chance is to roll a sum of 7 since that happens 6 times, i.e. .

# Conclusion

It was interesting to represent calculation of probability and random variables through the use of `map`

and `filter`

.

I am sure there are other (and probably better and more flexible) encodings of probability, but it was a good learning experience nevertheless.