Lab — Context-Based Extraction Tool#
Objective#
Build a unified tool that an agent can call to extract text from any source URL (PDF, Website, or YouTube Video).
Instructions#
Create a Python module exposing a single extract_context(url: str) function.
- If URL is a PDF: Use
PyMuPDFto download and extract the text. - If URL is a Website: Use
Playwrightto render JavaScript and extract the main article text viaBeautifulSoup. - If URL is a YouTube Video: Use
yt-dlpto download the auto-generated VTT subtitles and parse them into raw text.
Testing#
Provide a script that calls your function with three different URLs and prints the first 500 characters of each.