Matrix Multiplication

A couple years ago I bought a nice introduction to ML, Programming Machine Learning, a book that caters to developers like me who want to try things first hand. I went back to it and this time I decided to get to the bottom of it.

Matrix operations, especially the multiplication, were something that took some time for me to digest, especially when used with NumPy. I studied those things too many years ago (at school and university) and I realized I had lost the intuitive understanding of it. Let’s try to build that again.

When it comes to ML one of the most common operations is a weighted sum like this

\[y = x_0 w_0 + x_1 w_1 + x_2 w_2 + ... + x_n w_n\]

In neural networks we make a sum of the inputs multiplied by their weigh all the time. This operation happens to correspond to the dot product of the \(x\) and \(w\) vectors.

\[\begin{bmatrix} x_0 & x_1 \end{bmatrix} \begin{bmatrix} w_0 \\ w_1 \end{bmatrix} = x_0 w_0 + x_1 w_1\]

We can use NumPy’s dot function to get this:

>>> import numpy as np
>>> x = np.array([1,2])
>>> y = np.array([5,6])
>>> np.dot(x,y)
np.int64(17)

The matrix multiplication basically extends the dot product to matrices.

>>> np.matmul(x,y)
np.int64(17)

You might notice that NumPy’s matmul works with both matrices and vectors.

The main condition

To multiply the matrices \(a\) and \(b\) the number of \(a\) columns must match the the number of \(b\) rows and this comes naturally if you look at the operation as a weighted sum: every input needs its own weight.

A matrix pair with these \((rows, columns)\) shapes \((1,2) (3,1)\) would not work:

\[\begin{bmatrix} x_0 & x_1 \end{bmatrix} \begin{bmatrix} w_0 \\ w_1 \\ w_2 \end{bmatrix}\]
>>> x = np.array([[1,2]])
>>> y = np.array([[1],[2],[3]])
>>> np.matmul(x,y)
Traceback (most recent call last):
  File "<python-input-9>", line 1, in <module>
    np.matmul(x,y)
    ~~~~~~~~~^^^^^
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0

While this pair \((1,2) (2,3)\) would:

\[\begin{bmatrix} x_0 & x_1 \end{bmatrix} \begin{bmatrix} w_0 & w_1 & w_2 \\ w_3 & w_4 & w_5 \end{bmatrix}\]
>>> x = np.array([[1,2]])
>>> y = np.array([[1,2,3],[4,5,6]])
>>> np.matmul(x,y)
array([[ 9, 12, 15]])

Visualizing multiplication

Let’s go for the simplest example, when we have a bunch of inputs and we want to multiply each row for a set of weights:

\[\begin{bmatrix} A_0 & A_1 & A_2 & A_3 \\ B_0 & B_1 & B_2 & B_3 \\ C_0 & C_1 & C_2 & C_3 \\ \end{bmatrix} \begin{bmatrix} W_0 \\ W_1 \\ W_2 \end{bmatrix} = \begin{bmatrix} AW \\ BW \\ CW \end{bmatrix}\]

Let’s go with a matrix with another column now

\[\begin{bmatrix} A_0 & A_1 & A_2 & A_3 \\ B_0 & B_1 & B_2 & B_3 \\ C_0 & C_1 & C_2 & C_3 \\ \end{bmatrix} \begin{bmatrix} W_0 & X_0 \\ W_1 & X_1 \\ W_2 & X_2 \end{bmatrix} = \begin{bmatrix} AW & AX \\ BW & BX \\ CW & CX \end{bmatrix}\]

It’s important to visualize this because with NumPy’s matmul and broadcasting it is easy to lose sight of what’s happening and suddenly you lose track of the matrices you’re working with.