Wikipedia Data with Python#
You’ll learn how to scrape data from Wikipedia using the wikipedia Python library, covering:
- Installing and Importing: Use pip install to get the Wikipedia library and import it with import wikipedia as wk.
- Keyword Search: Use the search function to find Wikipedia pages containing a specific keyword, limiting results with the results argument.
- Fetching Summaries: Use the summary function to get a concise summary of a Wikipedia page, limiting sentences with the sentences argument.
- Retrieving Full Pages: Use the page function to obtain the full content of a Wikipedia page, including sections and references.
- Accessing URLs: Retrieve the URL of a Wikipedia page using the url attribute of the page object.
- Extracting References: Use the references attribute to get all reference links from a Wikipedia page.
- Fetching Images: Access all images on a Wikipedia page via the images attribute, which returns a list of image URLs.
- Extracting Tables: Use the pandas.read_html function to extract tables from the HTML content of a Wikipedia page, being mindful of table indices.
Here are links and references:
- Wikipedia Library - Notebook
- Learn about the
wikipediapackage
NOTE: Wikipedia is constantly edited. The page may be different now from when the video was recorded. Handle accordingly.
