Information access made easy with Python

This week’s post is contributed by class participant Alex Holachek, who presented the following tutorial on using the Requests module at the start of Thursday’s class.  Alex works as a Technical Information Specialist for the Astrophysics Data System.

API Queries and Web Scraping with the Requests Module

Even if you are only a beginner programmer, you can use Python to get access to a lot of information on the web that might have in the past been difficult or impossible for you to obtain. Here, we will be using the requests module to allow Python to connect to the internet, and the lxml module to extract text from returned HTML pages. Below are three basic functions that you can examine.  You can also copy paste them into your own code from GitHub.

Before you proceed: Start off by making sure Python is working in utf-8 mode, and has imported the relevant modules, by putting the following at the top of your file.

Post 04 21 img1

1. CHALLENGE ONE: Using Python to interface with a basic API

Post 04 21 img2

2. CHALLENGE TWO: Using Python to get the title and text from a blog post

Post 04 21 img3

3. CHALLENGE THREE: Using Python to make a spider to not only get text from a website, but to get links, follow those links, and get text from new websites

Post 04 21 img4

The code examples here were written in the Sublime Text 2 Editor, which is highly recommended.

Leave a Reply