Computing with Text

For this project you will work with text files of a few classical books that were obtained from the website of Project Gutenberg:

Project

Part 1. For each of these texts perform the following tasks:

  • Create a list of several (20 or so) words most frequently appearing in the text.

  • Plot frequencies of all words appearing in the text against their ranks in the frequency order.

Part 2. Analyze differences and similarities between lists of most frequent words and plots of words frequencies for all of these texts. Describe any patterns you see. Can you make inferences about the book based on the list of most frequent words?

If you want to expand your research to additional texts you can use

Project Gutenberg files or other online resources. Use requests to retrieve these additional files so they are accessible when you submit the report.