Wednesday, May 22, 2013

Fun with Hopfield and Numpy

Hopfield networks are fun to play with and are very easily implemented in Python using the Numpy library. Simple as they are, they are the basis of modern machine learning techniques such as Deep Learning and programming models for quantum computers such as Adiabatic quantum computation.

We are going to use a Hopfield network for optical character recognition. This is obviously a toy example and there are much more better machine learning techniques for this kind of problem (e.g. deep learning) but it is a nice demonstration.

Among other things Hopfield networks work as autoassociative memories, allowing to store and to recall patterns, e.g. images of letters. We want to store two letters and then try to recognize noisy versions of those letters. In the following we first create images of the letters "A" and "Z" from a string representation:

A = """

Z = """

Hopfield networks operate on binary vectors and we therefore define a function to_pattern() that takes a letter image and transforms it into a bipolar vector by gluing all image rows together, and converting "X" to +1 and "." to -1.

def to_pattern(letter):
    from numpy import array
    return array([+1 if c=='X' else -1 for c in letter.replace('\n','')])

For instance, calling to_pattern(A) returns then

[-1  1  1  1 -1  1 -1 -1 -1  1  1  1  1  1  1  1 -1 -1 -1  1  1 -1 -1 -1  1]

To visualize a pattern as an image we write a function display(pattern) that reshapes the pattern vector to a matrix and displays it as an image using the pylab function imshow().

def display(pattern):
    from pylab import imshow, cm, show
    imshow(pattern.reshape((5,5)),cmap=cm.binary, interpolation='nearest')

Calling display(to_pattern(A)); display(to_pattern(Z)) produces the following pictures:

Having all the basic elements in place let's get serious. First thing to do is to create the Hopfield network from the given patterns. Training a Hopfield network is extremely simple and requires only computation of the outer products of all pattern vectors with each others and summation of the resulting matrices.

$$W = ∑↙{k}↖n \bo p_{k} \bo p^T_{k} $$

The resulting matrix W specifies the weights of the edges between the nodes of the network and the nodes represent the features of the patterns (= pixels in an image). There are variants but typically the weight matrix is normalized (division by the number of patterns) and diagonal elements are set to zero (subtraction of the identity matrix I), which leads to the following expression:

$$W = 1/n ∑↙{k}↖n \bo p_{k} \bo p^T_{k} - I$$

Let us organize all our patterns in a matrix (rows are patterns)

patterns = array([to_pattern(A), to_pattern(Z)])
and the implementation of the training formula is straight forward:

def train(patterns):
    from numpy import zeros, outer, diag_indices 
    r,c = patterns.shape
    W = zeros((c,c))
    for p in patterns:
        W = W + outer(p,p)
    W[diag_indices(c)] = 0
    return W/r

To recall a stored pattern from a Hopfield network the pattern itself (or a partial or nosy version of it) the dot product of the pattern vector and the weight matrix is computed. This produces a new pattern vector that is binarized and then fed back into the system:

$$\bo p^{(t+1)} = sgn( \bo W \bo p^{(t)})$$

The operation is repeated until the pattern vector becomes stable or after a specified number of iteration has been performed. We keep the implementation simple and stop after 5 steps, which is typically sufficient.

def recall(W, patterns, steps=5):
    from numpy import vectorize, dot
    sgn = vectorize(lambda x: -1 if x<0 else +1)
    for _ in xrange(steps):        
        patterns = sgn(dot(patterns,W))
    return patterns

Note that in contrast to the recall formula the code performs the recall on all pattern within the pattern matrix at once, while the formula describes the recall for an individual pattern vector. Also the pattern matrix contains the patterns a rows and therefore the dot product W*p in the formula becomes dot(patterns,W) in the code (Since W is symmetric, transposing W is not required).

To test the network I created noisy version of the training patterns by randomly flipping 10 bits. The image sequence below shows the resulting input patterns and the output patterns after the second and third iteration. As you see the original patterns have been recalled perfectly after three iterations (from left to right).

Patterns are stored as low energy states (attractors) within the system and the recall procedure describes the trajectory of an input pattern to its attractor. Each step reduces the energy and the energy for a pattern is defined as follows:

$$E(\bo p) = \bo p \bo W \bo p^T$$

Here is the corresponding code to compute the energy for the pattern in the pattern matrix.

def hopfield_energy(W, patterns):
    from numpy import array, dot
    return array([-0.5*dot(dot(p.T,W),p) for p in patterns])

A few final comments. The Hopfield network does not only store the original patterns but also their inverse versions and possibly linear combinations of patterns. Furthermore the capacity of the network is limited. For random patterns the network can store roughly 0.14*N pattern without errors (N = Number of network nodes). There are many variations of the Hopfield network with different training and recall procedures that give better performance but the beauty of the original Hopfield network is in its simple implementation but complex behavior.

Last but not least cudos to the developers of jqmath, the JavaScript module used here to display the mathematical expressions.

more reading

Hopfield network
Hopfield network
Weave a neural net with Python
Associative recall of memory without errors
Minimum Probability Flow Learning
Finite memory loading in hairy neurons
Error tolerant associative memory