# Computer Science 252 Neural Networks Assignment 6: Latent Semantic Analysis

## Objectives

1. Enhance our understanding of LSA by applying it to the real-world problem of document retrieval.

2. Replicate precisely the work of other scientists, a crucial and under-appreciated aspect of scientific research.

## Methods

Because of the popularity of LSA, there are already some great examples on the web. This step-by-step exercise shows the use of LSA for document retrieval, using a very small corpus of documents (three) of a few words each. The output of my lsa.py script closely matches these results, except that I avoid repeating the results from a previous step.

To get started, write a main that looks like this:

```if __name__ == '__main__':

docs = ['shipment of gold damaged in a fire',
'delivery of silver arrived in a silver truck',
'shipment of gold arrived in a truck']

show_lsa(docs)
```
So all the actual work will go into your show_lsa() function. Use your Pythonic toolkit, like .split(), set() and array slicing, to minimize the amount of code you need to write.

Indeed, thanks to the the power of NumPy, the first step – printing out a nicely-formatted table – was actually the most time-consuming. For that step, I experimented with printing blank spaces and tabs until I got a nice-looking table.

• numpy.set_printoptions(precision=4) will allow you to check your results against the ones in the PDF.

• To build the query matrix (really, a vector) in Step 1, I wrote a function build_query that took a list of document words and a list of query words, and returned a 1 where a document word was in the query, and 0 otherwise.

• I also wrote a cosine function for vector cosine, and a magnitude function to support it, exactly as we did in our first exam.

• For a tiny text corpus like this one, the co-occurrence values are small enough that taking the logarithm creates more problems than it solves. So you do not need to do the add-one / take-logarithm part of LSA here. It is still however worth revisiting the simple LSA program from slide #15 of the lecture slides.

• As you will see, numpy.linalg.svd returns the matrix VT, the transpose of V. So when I needed the actual matrix V, I did V = Vt.transpose().

• For the matrix inverse in Step 5, I used numpy.linalg.inv().

## What to turn into sakai

All you need to turn in for this assignment is your lsa.py script, which should produce an output like mine.