Inspecting gradients in Chainer

Chainer is my choice of framework when it comes to implementing Neural Networks. It makes working with and trouble shooting deep learning easy.

Printing out the gradients during back propagation to inspect their values is sometimes useful in deep learning, to see if your gradients are as expected and aren’t either exploding (numbers too large) or vanishing (numbers too small). Fortunately, this is easy to do in Chainer.

Chainer provides access to the parameters in your model, and for each parameter, you can check the gradient during the back propagation step, stored in the optimizer (such as SGD or Adam). To access these, you can extend to additionally output the gradients, by defining your own StandardUpdater like so:

class CustomStandardUpdater(
    def __init__(self, train_iter, optimizer, device):
        super(CustomStandardUpdater, self).__init__(
            train_iter, optimizer, device=device)

    def update_core(self):
        super(CustomStandardUpdater, self).update_core()
        optimizer = self.get_optimizer('main')
        for name, param in
            print(name, param.grad)

In lines 9-10 you can see the parameters (weights) of your neural network being accessed through the optimizer, and for each parameter, the name and gradient is being output. This StandardUpdater can be attached to your training module as follows:

model = MyChainerModel()
optimizer = chainer.optimizers.Adam()
train_iter = chainer.iterators.SerialIterator(train_dataset, batch_size=32, shuffle=True)
updater = CustomStandardUpdater(train_iter, optimizer, gpu)
trainer = training.Trainer(updater, stop_trigger=(100, 'epoch'))

Changing floating point size in Chainer

The default floating point size in Chainer is 32 bit. That means for deep learning, Chainer will expect numpy.float32 for CPU or cupy.float32 for GPU under the hood, and will exit with error if the data is set at a different size.

However, there may be times you want more than 32 bits, such as when you’re getting NaN’s or inf’s in your training routine and want to troubleshoot.

Changing Chainer to use float64 is simple:

import chainer
import numpy as np
chainer.global_config.dtype = np.float64

Call this at the beginning of your program. And of course, you’ll want to make sure that the ndarray dtype’s for your data are set to float64 (as in np.array(…).astype(np.float64)) before being passed to Chainer.