Tools in Data Science

Scraping emarketer#

In this live scraping session, we explore a real-life scenario where Straive had to scrape data from emarketer.com for a demo. This is a fairly realistic and representative way of how one might go about scraping a website.

You’ll learn:

Scraping: How to extract data from web pages, including constructing URLs, fetching page content, and parsing HTML using packages like lxml and httpx.
Caching: Implementing a caching strategy to avoid redundant data fetching for efficiency and reliability.
Error Handling and Debugging: Practical tips for troubleshooting, such as using liberal print statements, breakpoints for in-depth debugging, and the concept of “rubber duck debugging” to clarify problems.
LLMs: Benefits of Gemini / ChatGPT for code suggestions and troubleshooting.
Real-World Application: How quick proofs of concept to showcase capabilities to clients, emphasizing practice over theory.