Abstractions with Set Theory

Like in programming, building abstractions in mathematics is of equal importance.

We will start with the most basic object (the unordered collection) and work our way up to defining functions.


A set is an unordered collection of objects. This might sound too abstract but that’s what it is. The objects can be anything. It is usually denoted by comma separating the list of objects and enclosing them using curly braces.

For example, one set of fruits is \{ apple, banana \}.

Since it is an unordered collection we have that \{ apple, banana \} = \{ banana, apple \}.

For membership, we denote apple \in \{ apple, banana \} to say that apple is in that collection.


An n-tuple is an ordered collection of n objects. As with sets, the objects can be anything. It is usually denoted by comma separating the list of objects and enclosing them using parenthesis.

We can represent it using unordered collections as follows: Consider the tuple a = (a_1, a_2, ..., a_n). We can use the set A = \{ \{1, \{a_1\}\}, \{2, \{a_2\}\}, ..., \{n, \{a_n\}\} \}. Note that here I’m using natural numbers and I assume them to be unique atoms. Otherwise, if we represented naturals in terms of set theory, there would be issues with this definition. For that, we’d have to use Kuratowski’s definition, but we can skip that for simplicity’s sake.

Now to extract the k-th element of the tuple, we can pick x s.t. \{k, \{x\}\} \in A.

So now we have that (a, b) = (c, d) \leftrightarrow a = b \land c = d, that is two tuples are equal if their first and second elements respectively are equal. This is what makes tuples ordered.

One example of a tuple is (1 pm, 2 pm, 3 pm) which represents 3 hours of a day sequentially.


An n-ary relation is just a set of n-tuples with different values. We are mostly interested in binary relations.

One example of such a relation is the “is bigger than”. So we have the following set: \{ (cat, mouse), (mouse, cheese), (cat, cheese) \}.


Now we can define functions in terms of relations.

To do that we first have to discuss subset. A is a subset of B if all elements of A are found in B (but not necessarily vice-versa). We denote it as such: A \subseteq B. So for example: \{ 1, 2 \} \subseteq \{ 1, 2, 3 \} and \{ 1, 2, 3 \} \subseteq \{ 1, 2, 3 \}. But this doesn’t hold: \{ 1, 2, 3 \} \subseteq \{ 1, 2 \}.

So with this, a function is a mapping from a set A to a set B, or in other words it is a subset of all combinations of ordered pairs whose first element is an element of A and second element is an element of B.

For example, if A = \{ a, b \} \land B = \{ 1, 2, 3 \} then the combinations are: F = \{ (a, 1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3) \}. A function f from A to B is denoted f : A \rightarrow B and is a subset of F: f \subseteq F.

This combination of pairs is called a Cartessian product, and is defined as the set: \{ (a, b) \mid a \in A \land b \in B \}.

We have one more constraint to add to a function, namely that it cannot produce 2 or more different values for a single input. So using either of f(a) = 1 or f(a) = 2 or f(a) = 3 is okay, but not all three of them. That means that we only have to use one of \{ (a, 1), (a, 2), (a, 3) \} from our example set above. The same reasoning goes for b. So f = \{ (a, 1), (b, 2) \} with f(a) = 1 \land f(b) = 2 is one valid example.


Matrices (tuples of tuples), lists (sets or tuples), graphs (set of sets), digraphs (set of tuples), trees (graphs) can also be derived similarly.

This is what makes the set a powerful and interesting idea.

Soft skills

This month my professional career as a Software Engineer sums up to 10 years (although I’ve been programming longer than that). Here is a summary of what I think is really important for a good career.

Technical skills are important.
As you might have noticed, my blog is mostly technical.

But in this article I’d like to praise soft skills.

Being good with everybody, communicative, adaptable, helpful, truthful, modest, and being a good listener, observer, and learner is just some of the most important skills.

I would also like to quote our creed here at Automattic:

I will never stop learning. I won’t just work on things that are assigned to me. I know there’s no such thing as a status quo. I will build our business sustainably through passionate and loyal customers. I will never pass up an opportunity to help out a colleague, and I’ll remember the days before I knew everything. I am more motivated by impact than money, and I know that Open Source is one of the most powerful ideas of our generation. I will communicate as much as possible, because it’s the oxygen of a distributed company. I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.

If you try to look 10 years back, you will mostly see memories of events and people and the time spent with them, not how you optimized that loop or that DB query (not that the latter is not important, thus the reason I said “mostly”).

Find inspiration and motivation in your successes, yourself, your family, nature, and other people. Maintain inner peace, and most of the soft skills will come naturally.

Predicting values with linear regression

I had some fun time reading http://onlinestatbook.com/2/regression/intro.html today. It includes formulas for calculating linear regression of a data set.

Linear regression is used for predicting a value of a variable from a list of known values.

For example if a and b are related variables, then linear regression can predict the value of the one given the value for the other.

Here’s an implementation in Racket:

#lang racket
(require plot)

(define (sum l) (apply + l))

(define (average l) (/ (sum l) (length l)))

(define (square x) (* x x))

(define (variance l)
  (let ((avg (average l)))
     (sum (map (lambda (x) (square (- x avg))) l))
     (- (length l) 1))))
(define (standard-deviation l) (sqrt (variance l)))

(define (correlation l)
      ((X (map car l))
       (Y (map cadr l))
       (avgX (average X))
       (avgY (average Y))
       (x (map (lambda (x) (- x avgX)) X))
       (y (map (lambda (y) (- y avgY)) Y))
       (xy (map (lambda (x) (apply * x)) (map list x y)))
       (x-squared (map square x))
       (y-squared (map square y)))
    (/ (sum xy) (sqrt (* (sum x-squared) (sum y-squared))))))

(define (linear-regression l)
      ((X (map car l))
       (Y (map cadr l))
       (avgX (average X))
       (avgY (average Y))
       (sX (standard-deviation X))
       (sY (standard-deviation Y))
       (r (correlation l))
       (b (* r (/ sY sX)))
       (A (- avgY (* b avgX))))
    (lambda (x) (+ (* x b) A))))

(define (plot-points-and-linear-regression the-points)
  (plot (list
         (points the-points #:color 'red)
         (function (linear-regression the-points) 0 10 #:label "y = linear-regression(x)"))))

So, for example if we call it with this data set:

(define the-points '(
                 ( 1.00 1.00 )
                 ( 2.00 2.00 )
                 ( 3.00 1.30 )
                 ( 4.00 3.75 )
                 ( 5.00 2.25 )))

(plot-points-and-linear-regression the-points)

This is the graph that we get:

Cool, right?