CSCI 315 Assignment #4
Due 11:59PM Friday 11 March, via Sakai
goal of this assignment is to become familiar with Theano
the state-of-the-art package for Deep Learning.
We will also step-up our game, moving from Mozer's simplified 14x14-pixel MNIST digit set to the full-sized 28x28 set.
The good news is, the code is mostly already written for you and accessible online.
Download the pickled / gzipped MNIST digits dataset
from Bengio's lab at U. Montreal. You should end up with a file mnist.pkl.gz.
Part 1: Logistic Regression in Theano
As of today's date (25 Feb 2016), Chapter 3 of the Buduma textbook has a very clear exposition of logistic regression in Theano,
with accompanying code that is unfortunately incomplete.*
Fortunately, there is very similar, working code online that you can access to run Buduma's example.
Create a file logistic_sgd.py. You can copy the entire contents of this file from the
at deeplearning.net. I suggest reading through the whole example, then copy/pasting the code from the link
Assuming you've saved this code in the same directory as the MNIST dataset,
hitting F5 in IDLE3 should run the training and show you some impressive results for this one-layer network!
You will also see a new file, best_model.pkl, containing the pickled representation of the model you just trained.
Modify the code so that this file gets saved as best_logistic_model.pkl.
Once you've reached this point, comment-out the call to sgd_optimization at the very bottom of the script,
and replace it with a call to predict. You should see the predictions for the first 10 digits in the test set.
Theano is pretty incredible, but it's hard to learn much from just copying and pasting someone else's code. So to complete our
exercise in logistic regression, we'll write a new script, based on the predict function we just ran. Copy
your logistic_sgd.py into a new file logistic_confusion.py, and modify the latter to show a confusion matrix
for the entire test set. Your logistic_confusion.py should have just enough code to do this: you don't need any
of the training code or the definition of the LogisticRegression class, which you can import from logistic_sgd,
so that your pickled network can be loaded correctly.
Hint: Looking at the load_data function, you'll see that it returns the training, testing, and validation sets
as Theano shared variables; hence the need for the test_set_x = test_set_x.get_value() trick in the prediction code.
So you can simply use the “raw” test data (no shared variables) to test the model and build your confusion matrix.
I.e,. get rid of most of the load_data code, and simply use the data loaded from the pickled MNIST digits.
If you do this right, your whole logistic_confusion.py should be around 50 lines of code, including header comments
Part 2: Back-propagation in Theano
Create another file mlp.py. MLP stands for Multi-Layer Perceptron, also
known as a feed-forward network, typically using back-prop. You already implemented one of these in Assignment #3;
now we'll do it in Theano.
As with logistic regression in Part 1, deeplearning.net has a nice tutorial with working code
at the bottom. (This code will use your logistic_sgd module from Part 1.)
Following the same procedure as in Part 1, you should be able to run your mlp.py and see how
amazingly well this network does on the MNIST data. If you get tired of waiting for it to complete all 1000 training
epochs (the default), feel free to change the number of epochs to something that'll finish in a reasonable amount of time.
Once you've got your MLP network running, have it conclude its training by pickling the model to a file best_mlp_model.pkl,
as the logistic-regression code does. Then copy your logistic_confusion.py into a new script mlp_confusion.py, modifying
it to produce the confusion matrix for the MLP model.
Part 3: Logistic Sigmoid vs Hyperbolic Tangent
If you look carefully at the HiddenLayer class in the MLP code, you'll see a nice discussion of the difference between
using a logistic sigmoid function (as we did in Assignment #3) and the
tanh function that is used in the actual code. To see why
the latter has become more common in deep learning, let's copy/past/modify the mlp.py to a new script
mlp_sigmoid.py, replacing the original tanh with sigmoid (a tiny change in the code), and keeping the other
parameters (training iterations, etc.) the same. Now repeat your experiment
from Part 2, this time pickling a network best_mlp_sigmoid.pkl, and writing a new script mlp_tanh_vs_sigmoid.py
to load and test the two pickled MLP networks and report their success rates (not the confusion matrix) on the whole test set.
What to submit to sakai
- logistic_sgd.py from Part 1
- best_logistic_model.pkl from Part 1
- logistic_confusion.py from Part 1
- mlp.py from Part 2
- best_mlp_model.pkl from Part 2
- mlp_confusion.py from Part 2
- mlp_sigmoid.py from Part 3
- best_mlp_sigmoid.pkl from Part 3
- mlp_tanh_vs_sigmoid.py from Part 3
Extra Credit Suggestions
If you're up for a real challenge, you might try out the latest thing in Deep Learning: Google's
TensorFlow, which appears to be making a bid to
replace Theano (see the footnote below). So a great extra-credit task would
be to replicate your Theano results from Part 1 and/or Part 2, producing a confusion matrix using
TensorFlow models instead. You can probably install TensorFlow on your laptop, and I have asked Steve Goryl to install
it on the machines in Parmly 404 as well.
- Lasagne Because of the complexity of Theano, some clever Theano users created
Lasagne (deep layers,
?), a package that supposedly provides
the same functionality with a simpler API. If you can replicate any or all of the Theano work above
using Lasagne instead, that would be worth some extra-credit points! You can probably install Lasagne on your laptop,
but if you prefer to work on one of our Linux boxes, let Steve and me know, and we'll look into installing it.
* Buduma shared with me a new manuscript of the textbook that remedies this problem,
but the new version uses Google's TensorFlow instead of Theano.
Can you guess what other company might be offering their own Deep Learning package?