Tools in Data Science

Wikipedia Data with Python#

You’ll learn how to scrape data from Wikipedia using the wikipedia Python library, covering:

Installing and Importing: Use pip install to get the Wikipedia library and import it with import wikipedia as wk.
Keyword Search: Use the search function to find Wikipedia pages containing a specific keyword, limiting results with the results argument.
Fetching Summaries: Use the summary function to get a concise summary of a Wikipedia page, limiting sentences with the sentences argument.
Retrieving Full Pages: Use the page function to obtain the full content of a Wikipedia page, including sections and references.
Accessing URLs: Retrieve the URL of a Wikipedia page using the url attribute of the page object.
Extracting References: Use the references attribute to get all reference links from a Wikipedia page.
Fetching Images: Access all images on a Wikipedia page via the images attribute, which returns a list of image URLs.
Extracting Tables: Use the pandas.read_html function to extract tables from the HTML content of a Wikipedia page, being mindful of table indices.

Here are links and references:

NOTE: Wikipedia is constantly edited. The page may be different now from when the video was recorded. Handle accordingly.