Predicting values with linear regression

I had some fun time reading today. It includes formulas for calculating linear regression of a data set.

Linear regression is used for predicting a value of a variable from a list of known values.

For example if a and b are related variables, then linear regression can predict the value of the one given the value for the other.

Here’s an implementation in Racket:

#lang racket
(require plot)

(define (sum l) (apply + l))

(define (average l) (/ (sum l) (length l)))

(define (square x) (* x x))

(define (variance l)
  (let ((avg (average l)))
     (sum (map (lambda (x) (square (- x avg))) l))
     (- (length l) 1))))
(define (standard-deviation l) (sqrt (variance l)))

(define (correlation l)
      ((X (map car l))
       (Y (map cadr l))
       (avgX (average X))
       (avgY (average Y))
       (x (map (lambda (x) (- x avgX)) X))
       (y (map (lambda (y) (- y avgY)) Y))
       (xy (map (lambda (x) (apply * x)) (map list x y)))
       (x-squared (map square x))
       (y-squared (map square y)))
    (/ (sum xy) (sqrt (* (sum x-squared) (sum y-squared))))))

(define (linear-regression l)
      ((X (map car l))
       (Y (map cadr l))
       (avgX (average X))
       (avgY (average Y))
       (sX (standard-deviation X))
       (sY (standard-deviation Y))
       (r (correlation l))
       (b (* r (/ sY sX)))
       (A (- avgY (* b avgX))))
    (lambda (x) (+ (* x b) A))))

(define (plot-points-and-linear-regression the-points)
  (plot (list
         (points the-points #:color 'red)
         (function (linear-regression the-points) 0 10 #:label "y = linear-regression(x)"))))

So, for example if we call it with this data set:

(define the-points '(
                 ( 1.00 1.00 )
                 ( 2.00 2.00 )
                 ( 3.00 1.30 )
                 ( 4.00 3.75 )
                 ( 5.00 2.25 )))

(plot-points-and-linear-regression the-points)

This is the graph that we get:

Cool, right?

Correctness on iterative and recursive processes

Iterative processes are proven using loop invariants, and recursive processes are proven using induction. In some cases it might be trickier to find a good loop invariant, where proving recursive processes is just to follow the very own definitions of the process.

Consider the following recursive definition:

maxList [x] = x
maxList (x:xs) = max(x, maxList xs)

We can prove its correctness using induction:

– Base case: Max element of a list of size 1 is the element itself.

– Inductive step: Assume that maxList of xs is maximum element.

Then for maxList (x:xs) we have 2 cases:
1. maxList of xs is >= x, in which case we select maxList xs
2. x is >= maxList xs, in which case we select x
In either case, we pick the larger element which will be the maximum.

Now consider the following iterative definition:

var max = x[0], i;

for (i = 0; i < x.length; i++) {
     if (x[i] >= max) max = x[i];

In this case we need to find a loop invariant to use that will hold pre-, during, and post- processing of that code block.

We can use the following loop invariant: max is the biggest element in the subarray x(0, i).

– Before loop: for array of size 1 we have the same element to be maximum. So the loop invariant holds.

– Within the loop, we have two cases:
1. x[i] >= max, in which we set max to be x[i]
2. x[i] < max, in which we don't change max
In either case, the loop invariant holds.

– After loop: max is the biggest element in the subarray x(0, x.length – 1) which is just x.

Capturing abstractions in PHP

Related to:

I came across Yay today, which allows us to use macros in PHP. Cool right?

Start with

composer require yay/yay:dev-master

Then you can use


to pre-process your code with macros.

Now we can implement lazy evaluation (kinda) in PHP!

We start with:

macro {
    __delay( ···expression )
} >> {
    function() { return ···expression; }

function force( $x ) {
    return $x();

function not_a_delay( $x ) {
    return function() use ( $x ) {
        return is_callable( $x ) ? $x() : "Not a callable function";

echo "__delay start\n";
$x = __delay( printf( "The time function returns: %d\n", time() ) );
echo "__delay end\n";

echo "\n----------\n\n";

echo "force start\n";
echo 'Retval: ' . force( $x ) . "\n"; // print here!
echo "force end\n";

echo "\n----------\n\n";

echo "not_a_delay start\n";
$x = not_a_delay( printf( "The time function returns: %d\n", time() ) ); // print here!
echo "not_a_delay end\n";

echo "force start\n";
echo 'Retval: ' . force( $x ) . "\n";
echo "force end\n";

So if we look at the output:

boro@bor0:~/Desktop/test$ vendor/bin/yay test.php | php
__delay start
__delay end


force start
The time function returns: 1494937031
Retval: 38
force end


not_a_delay start
The time function returns: 1494937031
not_a_delay end
force start
Retval: Not a callable function
force end

The two big differences are line 19 and 31 of the code. As we can see from the output, line 19 got “delayed”, while line 31 got evaluated eagerly.