Getting Started with Python

Thursday was our first of two classes on Python with Rahul Dave.  Before we got started, Rahul shared with us an IPython notebook containing the examples we would cover in class.  We saved the file to our computers and uploaded it to our own IPython notebooks so that we could play with the code on our own while he talked us through it.  Given the fast pace of the class, having the examples preloaded into a notebook made for a much easier, more interactive experience than if we’d had to choose between copying code or simply watching the screen.

Most of what was covered in class is contained in the notebook and in our collaborative class notes (PDF, 92 KB).  If you have IPython installed on your own computer, you should be able to save and upload the notebook as we did in class.  If not, you can view it in read-only mode in the IPython Notebook Viewer by entering its URL  (https://dl.dropbox.com/u/75194/Explain2.ipynb) into the Viewer’s main page.

After a brief introduction to the IPython interface, we learned about object types, importing modules, and working with files, lists, and strings.  Though a few participants have programming experience, and some have tested the waters in Python through workshops or online tutorials (such as the excellent Learn Python the Hard Way), the majority of participants are new to programming, and Thursday’s session was intended to give us a basic understanding of programming concepts while also getting us started working right away with text files.

Rahul used the text of Hamlet, obtained from Project Gutenberg, as a sample text for demonstrating a variety of methods and functions.  We divided the text roughly into words by splitting it on whitespace (using .split()), and divided it roughly into lines by splitting on carriage returns and line breaks (using .split(“\r\n”)).  To find the number of unique “words” in our rough word list, we converted all items in the list to lowercase (using .lower()), created a set of words in the list (using set()), and called the len() function on the set.  Of course, our method of dividing the text on whitespace didn’t account for punctuation, and the result was a set of unique words that included, for example, “?” and “&”, as well as “matter”,  “matter,”,  “matter.”,  “matter:”,  “matter;”,  and “matter?”.  Figuring out how to refine the splitting of texts into words is part of our homework for this Thursday: to clean up as many glitches as possible with the help of our book’s section on regular expressions and use the resulting list to compute a word frequency graph.

Leave a Reply