z-logo
open-access-imgOpen Access
From HTML to List of Words (part 1)
Author(s) -
William J. Turkel,
Adam Crymble
Publication year - 2012
Publication title -
the programming historian
Language(s) - English
Resource type - Journals
ISSN - 2397-2068
DOI - 10.46430/phen0006
Subject(s) - python (programming language) , computer science , upload , world wide web , string (physics) , web page , web content , markup language , information retrieval , programming language , mathematics , xml , mathematical physics
In this two-part lesson, we will build on what you’ve learned about Downloading Web Pages with Python, learning how to remove the HTML markup from the webpage of Benjamin Bowsey’s 1780 criminal trial transcript. We will achieve this by using a variety of string operators, string methods, and close reading skills. We introduce looping and branching so that programs can repeat tasks and test for certain conditions, making it possible to separate the content from the HTML tags. Finally, we convert content from a long string to a list of words that can later be sorted, indexed, and counted.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here