Playwright & Selenium#

Modern web scraping often requires executing JavaScript. Tools like Playwright and Selenium automate real browsers.

Playwright#

Playwright is generally faster and more modern than Selenium. It supports async execution natively in Python.

import asyncio
from playwright.async_api import async_playwright

async def scrape():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto("https://example.com")
        
        # Wait for an element and extract text
        await page.wait_for_selector("h1")
        title = await page.inner_text("h1")
        print(title)
        
        await browser.close()

asyncio.run(scrape())

Selectors#

Mastering CSS selectors and XPath is critical. Prefer data attributes (like data-testid) when available, as class names change frequently in modern React/Vue apps.