Specialized Agents#
General agents are jack-of-all-trades. Specialized agents excel at one specific modality.
Browser Agents (Playwright)#
Browser agents navigate the web, click buttons, and read DOMs. They use visual bounding boxes or simplified HTML representations to understand pages.
Code Execution Agents#
These agents write Python/Bash and execute it in a sandbox. They rely on REPL environments to iterate on code until it works.
Image Analysis Agents#
Agents utilizing models like GPT-4o or Claude 3.5 Sonnet to inspect charts, UI screenshots, or physical documents to extract structured data.