Researching automated theorem provers

Lately I’ve been messing around with automated theorem provers. Specifically Prover9.

The moment you open it, if you use the GUI mode, you will see 2 big text boxes – assumptions and goals. Naturally, when playing one starts with simple stuff.

We leave the assumptions blank and set P \implies (Q \implies P). as our goal. If we click Start, we see that it proves it immediately.

% -------- Comments from original proof --------
% Proof 1 at 0.00 (+ 0.00) seconds.
% Length of proof is 2.
% Level of proof is 1.
% Maximum clause weight is 0.
% Given clauses 0.

1 P -> (Q -> P) # label(non_clause) # label(goal). [goal].
2 $F. [deny(1)].

============================== end of proof ==========================

That was fun, but what stood out for me is that in mathematics we often take axioms for “granted”, i.e. it’s rarely that we ever think about the logic behind it. But with Prover9, we need to express all of it using logic.

For Prover9 there is a general rule, that variables start with u through z lowercase. Everything else is a term (atom).

For start, let’s define set membership. We say \in denotes set membership, and that’s it. Because that’s all there’s to it, it’s like an atom. So let’s say that member(x, y) denotes that.

Now let’s define our sets A = {1}, B = {1, 2}, and C = {1}.

all x ((x = 1) <-> member(x, A)).
all x ((x = 1 | x = 2) <-> member(x, B)).
all x ((x = 1) <-> member(x, C)).

Now if we try to prove member(1, A), it will succeed. But it will not for member(2, A).

What do we know about the equivalence of 2 sets? They are equal if there doesn’t exist an element that is in the first and not in the second, or that is in the second and not in the first.

In other words:

set_equal(x, y) <-> - (exists z ((member(z, x) & - member(z, y)) | (- member(z, x) & member(z, y)))).

So, Prover9 can prove set_equal(A, C), but not set_equal(A, B).

Another example is that we can define the set of the naturals with just:

member(Zero, Nat).
member(x, Nat) -> member(Suc(x), Nat).

So, it can easily prove:
0: member(zero, Nat)
1: member(Suc(Zero), Nat)
2: member(Suc(Suc(Zero)), Nat), etc.


Relation between strict and non-strict identities

First, we’ll start with some definitions:

== is value check. For a and b to have the same value, we will say V(a, b). Thus a == b <=> V(a, b).

=== is value+type check. For a and b to have the same type, we will say T(a, b). Thus a === b <=> V(a, b) and T(a, b).

Now, to prove a === b => a == b, suppose that a === b. By the definitions, we have as givens V(a, b) and T(a, b). So we can conclude that V(a, b), i.e. a == b.

The contrapositive form is a != b => a !== b, which also holds.

However, note that the converse form a == b => a === b doesn’t necessarily hold. To see why, suppose a == b, that is V(a, b). Now we need to prove that V(a, b) and T(a, b). We have V(a, b) as a given, but that’s not the case for T(a, b), i.e. the types may not match.

So, whenever you see a === b you can safely assume that a == b is also true. The same holds for when you see a != b, you can safely assume that a !== b 🙂

Predicting values with linear regression

I had some fun time reading today. It includes formulas for calculating linear regression of a data set.

Linear regression is used for predicting a value of a variable from a list of known values.

For example if a and b are related variables, then linear regression can predict the value of the one given the value for the other.

Here’s an implementation in Racket:

#lang racket
(require plot)

(define (sum l) (apply + l))

(define (average l) (/ (sum l) (length l)))

(define (square x) (* x x))

(define (variance l)
  (let ((avg (average l)))
     (sum (map (lambda (x) (square (- x avg))) l))
     (- (length l) 1))))
(define (standard-deviation l) (sqrt (variance l)))

(define (correlation l)
      ((X (map car l))
       (Y (map cadr l))
       (avgX (average X))
       (avgY (average Y))
       (x (map (lambda (x) (- x avgX)) X))
       (y (map (lambda (y) (- y avgY)) Y))
       (xy (map (lambda (x) (apply * x)) (map list x y)))
       (x-squared (map square x))
       (y-squared (map square y)))
    (/ (sum xy) (sqrt (* (sum x-squared) (sum y-squared))))))

(define (linear-regression l)
      ((X (map car l))
       (Y (map cadr l))
       (avgX (average X))
       (avgY (average Y))
       (sX (standard-deviation X))
       (sY (standard-deviation Y))
       (r (correlation l))
       (b (* r (/ sY sX)))
       (A (- avgY (* b avgX))))
    (lambda (x) (+ (* x b) A))))

(define (plot-points-and-linear-regression the-points)
  (plot (list
         (points the-points #:color 'red)
         (function (linear-regression the-points) 0 10 #:label "y = linear-regression(x)"))))

So, for example if we call it with this data set:

(define the-points '(
                 ( 1.00 1.00 )
                 ( 2.00 2.00 )
                 ( 3.00 1.30 )
                 ( 4.00 3.75 )
                 ( 5.00 2.25 )))

(plot-points-and-linear-regression the-points)

This is the graph that we get:

Cool, right?