CSCI 315 Assignment #4

Due 11:59PM Friday 11 March, via Sakai

Goal

The goal of this assignment is to become familiar with Theano, the state-of-the-art package for Deep Learning. We will also step-up our game, moving from Mozer's simplified 14x14-pixel MNIST digit set to the full-sized 28x28 set.

The good news is, the code is mostly already written for you and accessible online.

Getting Started

Download the pickled / gzipped MNIST digits dataset from Bengio's lab at U. Montreal. You should end up with a file mnist.pkl.gz.

Part 1: Logistic Regression in Theano

As of today's date (25 Feb 2016), Chapter 3 of the Buduma textbook has a very clear exposition of logistic regression in Theano, with accompanying code that is unfortunately incomplete.* Fortunately, there is very similar, working code online that you can access to run Buduma's example.

Create a file logistic_sgd.py. You can copy the entire contents of this file from the online tutorial at deeplearning.net. I suggest reading through the whole example, then copy/pasting the code from the link Assuming you've saved this code in the same directory as the MNIST dataset, hitting F5 in IDLE3 should run the training and show you some impressive results for this one-layer network!

You will also see a new file, best_model.pkl, containing the pickled representation of the model you just trained. Modify the code so that this file gets saved as best_logistic_model.pkl. Once you've reached this point, comment-out the call to sgd_optimization at the very bottom of the script, and replace it with a call to predict. You should see the predictions for the first 10 digits in the test set.

Theano is pretty incredible, but it's hard to learn much from just copying and pasting someone else's code. So to complete our exercise in logistic regression, we'll write a new script, based on the predict function we just ran. Copy your logistic_sgd.py into a new file logistic_confusion.py, and modify the latter to show a confusion matrix for the entire test set. Your logistic_confusion.py should have just enough code to do this: you don't need any of the training code or the definition of the LogisticRegression class, which you can import from logistic_sgd, so that your pickled network can be loaded correctly.

Hint: Looking at the load_data function, you'll see that it returns the training, testing, and validation sets as Theano shared variables; hence the need for the test_set_x = test_set_x.get_value() trick in the prediction code. So you can simply use the “raw” test data (no shared variables) to test the model and build your confusion matrix. I.e,. get rid of most of the load_data code, and simply use the data loaded from the pickled MNIST digits.

If you do this right, your whole logistic_confusion.py should be around 50 lines of code, including header comments and imports.

Part 2: Back-propagation in Theano

Create another file mlp.py. MLP stands for Multi-Layer Perceptron, also known as a feed-forward network, typically using back-prop. You already implemented one of these in Assignment #3; now we'll do it in Theano.

As with logistic regression in Part 1, deeplearning.net has a nice tutorial with working code at the bottom. (This code will use your logistic_sgd module from Part 1.) Following the same procedure as in Part 1, you should be able to run your mlp.py and see how amazingly well this network does on the MNIST data. If you get tired of waiting for it to complete all 1000 training epochs (the default), feel free to change the number of epochs to something that'll finish in a reasonable amount of time.

Once you've got your MLP network running, have it conclude its training by pickling the model to a file best_mlp_model.pkl, as the logistic-regression code does. Then copy your logistic_confusion.py into a new script mlp_confusion.py, modifying it to produce the confusion matrix for the MLP model.

Part 3: Logistic Sigmoid vs Hyperbolic Tangent

If you look carefully at the HiddenLayer class in the MLP code, you'll see a nice discussion of the difference between using a logistic sigmoid function (as we did in Assignment #3) and the tanh function that is used in the actual code. To see why the latter has become more common in deep learning, let's copy/past/modify the mlp.py to a new script mlp_sigmoid.py, replacing the original tanh with sigmoid (a tiny change in the code), and keeping the other parameters (training iterations, etc.) the same. Now repeat your experiment from Part 2, this time pickling a network best_mlp_sigmoid.pkl, and writing a new script mlp_tanh_vs_sigmoid.py to load and test the two pickled MLP networks and report their success rates (not the confusion matrix) on the whole test set.

What to submit to sakai

Extra Credit Suggestions


* Buduma shared with me a new manuscript of the textbook that remedies this problem, but the new version uses Google's TensorFlow instead of Theano. Can you guess what other company might be offering their own Deep Learning package?