Specialized Agents#

General agents are jack-of-all-trades. Specialized agents excel at one specific modality.

Browser Agents (Playwright)#

Browser agents navigate the web, click buttons, and read DOMs. They use visual bounding boxes or simplified HTML representations to understand pages.

Code Execution Agents#

These agents write Python/Bash and execute it in a sandbox. They rely on REPL environments to iterate on code until it works.

Image Analysis Agents#

Agents utilizing models like GPT-4o or Claude 3.5 Sonnet to inspect charts, UI screenshots, or physical documents to extract structured data.