Unit Symbols and the Chain Rule (2)


In the previous post, we explored symbolic representation and the recursive nature of the Chain Rule. We also looked briefly at the identity unit\mathbf{unit}.

In this post I discuss in more details these concepts by analyzing the Chain Rule for functions of single input.

Abstraction

Let’s look at a concrete example in the following function:

F(x)=sin2(x)=hwhereg=sin(x)h=g2\begin{align*} \mathbf{F}(x) &= sin^2(x)\\ &= h &\textbf{where}\quad g &= sin(x)\\ & & h &= g^2\\ \end{align*}

Using the Chain Rule, we can derive F\mathbf{F'} with respect to xx:

F(x)=hg=(2g)cos(x)=(2sin(x))cos(x)\begin{align*} \mathbf{F'}(x) &=h' &\cdot\quad& g'\\ &=(2\cdot g) &\cdot\quad& cos(x)\\ &=(2\cdot sin(x)) &\cdot\quad& cos(x) \end{align*}

Now consider this: symbolically, xx is just a placeholder for some constant value that we will plugin later. What if we also apply this concept to the symbol hh?

If we think about it, any engineering process starts with a tabulated set of data, which is the result of a series of high-ordered actions of measurement. These actions are eventually abstracted into numerical values of some predefined units. For example, distances can be measured in units of the foot or its multipliers such as the yard and mile (or the inverse multiplier inch). However, what actually happens is that the unit foot has to be defined and standardized, and then someone has to physically walk the distance, mark and count the units. All of these are high-ordered actions that are abstracted under a symbolic unit\mathbf{unit}. Other units of measurement (SI or Imperial) are defined in a similar manner.

A measurement is a series of high-ordered actions collapsed into a value.

Given this premise, let’s assume that hh is also a placeholder for some constant value that we will plugin later. Then:

F(h)=hF(h)=unithF(g)=(2g)unitgF(x)=(2sin(x))cos(x)unitxwhere:F(x)=hgx\begin{align*} \mathbf{F}(h) &=&h\\\\\\ \mathbf{F}'(h) &=&\mathbf{\color{lightgray}unit_h}\\\\ \mathbf{F}'(g) &=&(2\cdot g) &\quad\cdot& \mathbf{\color{lightgray}unit_g}\\\\ \mathbf{F}'(x) &=&(2\cdot sin(x)) &\quad\cdot& cos(x) &\quad\cdot& \mathbf{\color{lightgray}unit_x}\\\\ \text{where}:\quad \mathbf{F}'(x) &=&h' &\quad\cdot& g' &\quad\cdot& x' \end{align*}

We notice that each unit can be expanded only as needed, one symbol at a time. In other words, a unit can be used as is, or decomposed into its constituents.

Symbolic Unit

Thus, the derivative of some functional F\mathbf{F} can be defined recursively as follows:

DzF=unitzDyF=FyunityDxF=FyyxunitxDaF=Fyyxbaunita\begin{align*} D_z\mathbf{F} &=\mathbf{\color{lightgray}unit_z}\\\\ D_y\mathbf{F} &= \frac{\partial\mathbf{F}}{\partial y} \cdot\mathbf{\color{lightgray}unit_{y}}\\\\ D_x\mathbf{F} &= \frac{\partial\mathbf{F}}{\partial y} \cdot\frac{\partial y}{\partial x} \cdot\mathbf{\color{lightgray}unit_{x}}\\\\ \vdots\\\\ D_a\mathbf{F} &= \frac{\partial\mathbf{F}}{\partial y} \cdot\frac{\partial y}{\partial x} \cdot\quad\dots\quad \cdot\frac{\partial b}{\partial a} \cdot\mathbf{\color{lightgray}unit_{a}}\\\\ \end{align*}

Again, remember that each partial derivative is a quantity that can be evaluated locally according to its respective unit. For example, given a first-ordered function yy of some input xx, the derivative of yy is simply: the unitx\mathbf{unit_x} scaled by the quantity y/x\partial{y}/\partial{x}. It is entirely possible that xx can be decomposed into smaller constituents. However, we don’t care about that since our function ff is defined in terms of xx at the base. Therefore, we solidify the chain at unitx\mathbf{unit_x} and call it the total derivative with respect to xx.

y(x)=yxunitx=dydxy'(x) = \frac{\partial y}{\partial x}\cdot\mathbf{\color{lightgray}unit_{x}} = \frac{dy}{dx}

Example

We will evaluate at x=2x = 2 the following function and its derivative:

f(x)=ln(x2)wherex=2f(x) = ln(x^2)\quad\quad\textbf{where}\quad x = 2

1. Chain Rule: standard form

let:f(y)=ln(y)y=x2f(z)=zz=ln(y)\begin{align*} \text{let}:\quad f(y) &= ln(y) &\leftarrow& \quad y=x^2\\\\ f(z) &= z &\leftarrow& \quad z=ln(y)\\\\ \end{align*} then:f(z)=unitzf(y)=(zy)unityf(x)=(zy)(yx)unitx=(1/y)(2x)unitx=(1/x2)(2x)unitx=(1/22)(22)unitx\begin{align*} \text{then}:\quad f'(z) &=\mathbf{\color{lightgray}unit_z} \\\\ f'(y) &= \left(\frac{\partial z}{\partial y}\right) &\cdot&\mathbf{\color{lightgray}unit_y}\\\\ f'(x) &= \left(\frac{\partial z}{\partial y}\right) &\cdot& \left(\frac{\partial y}{\partial x}\right) &\cdot&\mathbf{\color{lightgray}unit_x}\\\\ &= \left({1}/{y}\right) &\cdot&(2\cdot x) &\cdot&\mathbf{\color{lightgray}unit_x}\\\\ &= \left({1}/{x^2}\right) &\cdot&(2\cdot x) &\cdot&\mathbf{\color{lightgray}unit_x}\\\\ &= \left({1}/{2^2}\right) &\cdot&(2\cdot 2) &\cdot&\mathbf{\color{lightgray}unit_x}\\\\ \end{align*}

Every partial derivative has to be expanded and evaluated individually. Also note the two sets of parentheses aligned vertically. They start branching out at f(y)f'(y).

2. Chain Rule: modified form with eager evaluation

We compute the function’s symbols and its respective partial derivatives at x=2x = 2:

we define the symbol:y=x24y=2x4f(y)=ln(y)ln(4)f(y)=y10.25\begin{align*} \text{we define the symbol}:\quad {\color{red}y} &= x^2 &\rightarrow &\quad 4\\ y' &= 2\cdot x &\rightarrow &\quad {\color{blue}4}\\ f(y) &= ln(y) &\rightarrow &\quad ln(4)\\ f'(y) &= y^{-1} &\rightarrow &\quad {\color{blue}0.25}\\\\ \end{align*} we define the symbol:z=ln(y)ln(4)z=unitz1f(z)=zln(4)f(z)=unitz1\begin{align*} \text{we define the symbol}:\quad {\color{red}z} &= ln(y) &\rightarrow &\quad ln(4)\\ z' &= \mathbf{\color{lightgray}unit_z} &\rightarrow &\quad 1\\ f(z) &= z &\rightarrow &\quad ln(4)\\ f'(z) &= \mathbf{\color{lightgray}unit_z} &\rightarrow &\quad 1\\ \end{align*} then:f(z)=unitzf(y)=(z/y)unity0.25f(x)=0.25(y/x)unitx4=1unitx\begin{align*} \\\text{then}:\quad f'(z) &=\mathbf{\color{lightgray}unit_z}\\\\ f'(y) &= \underbrace{ ({\partial{z}}/{\partial{y}}) \cdot\mathbf{\color{lightgray}unit_y} }_{\color{blue}0.25}\\\\ f'(x) &={\color{blue}0.25}\cdot \underbrace{ ({\partial{y}}/{\partial{x}}) \cdot\mathbf{\color{lightgray}unit_x} }_{\color{blue}4} = 1\cdot{\color{lightgray}\mathbf{unit_x}} \end{align*}

3. Observations and Notes

There are a couple things to note here in the modified form:

  1. we evaluate all partial derivatives as soon as possible.
  2. notice how it takes fewer steps (starting from then) to evaluate f(x)f'(x).

Because we eagerly evaluated all partial derivatives, we ended up having to do less work computing the total derivative at the end.

Essentially, in the standard chain rule, the evaluation of the total derivative is deferred to the very end. Note that every line (starting from then) represents an expansion step. We recursively expand each symbol, plugging the input xx in only at the end to compute the total derivative. Because of this deferment, we ended up having to perform two branches of computations. This can easily be traced with a computation graph (left as an exercise for the reader — you can check out the previous post for example graphs).

On the other hand, in the modified version, rather than deferring, we use each input right away. Whenever we evalute a function ff, we also evaluate its partial derivative ff' at the respective symbolic unit. Therefore, by the time we get to evaluating the total derivative, we will have already passed through all partial derivative quantities. The total derivative is collected at the final computation node.

Summary

In this post, I have shown that symbolic representation helps us frame the Chain Rule recursively. By folding or unfolding symbols, we can simplify and localize the complexity of a problem. In the process, we also noticed a pattern of computation and optimized it.

I find the concept of recursive unit\mathbf{unit} to be very useful. It can be applied more generally to other aspects of programming.

For example, we know that a running program is treated as a process by the OS and has resources allocated accordingly. The unit\mathbf{unit} here is process, and the load on the system is the amount of processes running concurrently. That is one way of framing the measurement. On the other hand, we can also go deeper into the program’s source code and map out the function call graph, each function as sub-unit and its resource allocation. In this case, we may frame our measurements in terms of the more detailed resource allocation.

In the next post, we will continue from the single-node graph. We will work through a multi-node graph — it is larger but still a very simple example. We will use it as a scaffold to bind the abstract concepts together.

Until next time.