Lab — Scheduled Scraper with GitHub Actions#

Objective#

Set up an automated scraping job that runs daily without manual intervention.

Write a Python script that fetches the top 10 articles from Hacker News API.
Save the data to a Parquet file.
Configure a GitHub Actions workflow (.github/workflows/scrape.yml) with a cron trigger to run daily.
The workflow must commit the updated Parquet file back to the repository.
Provide a separate script using DuckDB to query the Parquet file and print the most common words in titles.