Computer Science 252
Neural Networks

Assignment 6: Latent Semantic Analysis

Due Friday 18 November

Objectives

  1. Enhance our understanding of LSA by applying it to the real-world problem of document retrieval.

  2. Replicate precisely the work of other scientists, a crucial and under-appreciated aspect of scientific research.

Methods

Because of the popularity of LSA, there are already some great examples on the web. This step-by-step exercise shows the use of LSA for document retrieval, using a very small corpus of documents (three) of a few words each. The output of my lsa.py script closely matches these results, except that I avoid repeating the results from a previous step.

To get started, write a main that looks like this:

if __name__ == '__main__':

    docs = ['shipment of gold damaged in a fire',
            'delivery of silver arrived in a silver truck',
            'shipment of gold arrived in a truck']

    show_lsa(docs)
So all the actual work will go into your show_lsa() function. Use your Pythonic toolkit, like .split(), set() and array slicing, to minimize the amount of code you need to write.

Indeed, thanks to the the power of NumPy, the first step – printing out a nicely-formatted table – was actually the most time-consuming. For that step, I experimented with printing blank spaces and tabs until I got a nice-looking table.

Here are some tips to help you with the mathematical part:

What to turn into sakai

All you need to turn in for this assignment is your lsa.py script, which should produce an output like mine.