Deriving derivative

To reach to here, we first had to work out the point slope formula and then figure out limits. Derivatives are very powerful. This post was inspired by doing gradient descent on artificial neural networks, but I won’t cover that here. Instead we will focus on the very own definition of a derivative.

So let’s get started. A secant is a line that goes through 2 points. In the graph below, the points are A = (x, f(x)) and A' = (x + dx, f(x + dx)).

To derive a formula for this, we can use the point-slope form of a equation of a line: y - y_0 = \frac {y_1 - y_0} {x_1 - x_0} (x - x_0).

Plugging in the values, we get: f(x) - f(x + dx) = - \frac {f(x + dx) - f(x)} {dx} (dx).

What is interesting about this formula using the secant is that, as we will see, it provides us with a neat approximation at f(x).
Let’s define f_{new}(x, dx) = \frac {f(x + dx) - f(x)} {dx}. So now we have: f(x + dx) = f(x) + f_{new}(x, dx) (dx).

The limit as dx approaches 0 for f_{new} will give us the actual slope (according to the definition of an equation of a line) at x.

So, let’s define \lim_{dx \to 0} f_{new}(x, dx) = f'(x). This slope is actually our definition of a derivative. This definition lies at the heart of calculus.

The image below (taken from Wikipedia) demonstrates this for h = dx.

Back to the secant approximation, we now have: f(x + dx) \approx f(x) + f'(x) (dx). This is an approximation rather than an equivalence because we already calculated the limit for one term but not the rest. As dx -> 0, the approximation -> equivalence.

For example, to calculate the square of 1.5, we let x = 1 and dx = 0.5. Additionally, if f(x) = x^2 then f'(x) = x*2. So f(1 + 0.5) = f(x + dx) \approx f(1) + f'(1) 0.5 = 1 + 2 * 1 * 0.5 = 2. That’s an error of just 0.25 for dx = 0.5. Algebra shows for this particular case the error to be dx^2. For dx = 0.1, the error is just 0.01.

Pretty cool, right?

Here are some of the many applications to understand why derivatives are useful:

  • We can use the value of the slope to find min/max using gradient descent
  • We can determine the rate of change given the slope
  • We can find ranges of monotonicity
  • We can do neat approximations, as shown

Definition of the mathematical limit

This post assumes knowledge in mathematical logic and algebra.
We will stick to sequences for simplicity, but the same reasoning can be extended to functions. For sequences we have one direction: infinity.

Informally, to take the limit of something as it approaches infinity is to determine its eventual value at infinity (even though it may not ever reach it).

As an example, consider the sequence a_n = \frac {1} {n}. The first few elements are: 1, 0.5, 0.(3), 0.25, and so on.

Note that \frac {1} {\infty} is undefined as infinity is not a real number. So here come limits to the rescue.

If we look at its graph, it might look something like this:

We can clearly see a trend that as x -> infinity, 1/x tries to “touch” the horizontal axis, i.e. is equal to 0. We write this as such: \lim_{n \to \infty} \frac {1} {n} = 0.

Formally, to say that \lim_{n \to \infty} a_n = a, we denote: (\forall \epsilon > 0) (\exists N \in \mathbb {N}) (\forall n \geq N) (|a_n - a| < \epsilon). Woot.

It looks scary but it’s actually quite simple. Epsilon is a range, N is usually a member that starts to belong in that range, and the absolute value part says that all values after that N belong in this range.

So for all ranges we can find a number such that all elements after that number belong in this range.

Why does this definition work? It’s because when the range is too small, all elements after N belong in it, i.e. the values of the sequence converge to it endlessly.

As an example, let’s prove that \lim_{n \to \infty} \frac {1} {n} = 0. All we need to do is find a way to determine N w.r.t. Epsilon and we are done.

Suppose Epsilon is arbitrary. Let’s try to pick N s.t. N = \frac {1} {\epsilon}.

Let’s see how it looks for \epsilon = \frac {1} {2}: for n > \frac {1} {\epsilon} = 2, we have: a_n < \epsilon. This is obviously true since 1/4 < 1/3 < 1/2. So there’s our proof.

This bit combined with the slope point formula form the derivative of a function, which will be covered in the next post.

Deriving point-slope from slope-intercept form

In this post we’ll derive the form of a linear equation between two points by simply knowing one thing:

Given y = cx + d, this line passes through the point A = (x, y).

The inspiration of this post is deriving the derivative.

So, let’s get started. We want to find an equation of a line that passes through A = (a, f(a)), B = (b, f(b)).

So we plug the points into the equation of a line to obtain the system:

\begin{cases} f(a) = ma + d \\ f(b) = mb + d \end{cases}

Solving for m, d:

\begin{cases} f(a) - ma = d \\ (f(b) - f(a))/(b - a) = m \end{cases}

To eventually conclude y - f(a) = m(x - a).

In some of the next posts, we will derive the formula of a derivative of a function.