Wikipedia Data with Python#

Wikipedia data with Wikimedia Python library

You’ll learn how to scrape data from Wikipedia using the wikipedia Python library, covering:

  • Installing and Importing: Use pip install to get the Wikipedia library and import it with import wikipedia as wk.
  • Keyword Search: Use the search function to find Wikipedia pages containing a specific keyword, limiting results with the results argument.
  • Fetching Summaries: Use the summary function to get a concise summary of a Wikipedia page, limiting sentences with the sentences argument.
  • Retrieving Full Pages: Use the page function to obtain the full content of a Wikipedia page, including sections and references.
  • Accessing URLs: Retrieve the URL of a Wikipedia page using the url attribute of the page object.
  • Extracting References: Use the references attribute to get all reference links from a Wikipedia page.
  • Fetching Images: Access all images on a Wikipedia page via the images attribute, which returns a list of image URLs.
  • Extracting Tables: Use the pandas.read_html function to extract tables from the HTML content of a Wikipedia page, being mindful of table indices.

Here are links and references:

NOTE: Wikipedia is constantly edited. The page may be different now from when the video was recorded. Handle accordingly.