
Pranjal Aggarwal
I am a second year Ph.D. student in Language Technologies at the School of Computer Science, Carnegie Mellon University, advised by Prof. Sean Welleck. I was a research intern at FAIR (Meta) in 2025 where I worked with Swarnadeep Saha and Jason Weston. Previously, I did my undergrad at IIT Delhi, where I was advised by Prof. Mausam, and also spent time at Princeton working with Prof. Karthik Narasimhan.
My research focuses on computer-use agents, test-time compute for reasoning, and formal verification for code. I also invented Generative Engine Optimization (GEO). I am a recipient of the SoftBank Group-Arm Fellowship at CMU.
Improving computer-use agents by scaling environments and efficient architectures
Gym-Anything: Turn Any Software into an Agent Environment
Pranjal Aggarwal, Graham Neubig, Sean Welleck
We introduce Gym-Anything, a framework that turns any software into a computer-use agent environment with a simple make() call. The system includes 200+ real-world applications across 22 occupational categories, generating 10,000+ tasks with checklist-based verification. We also introduce CUA-World-Long, a benchmark of 200 long-horizon tasks evaluating agents on 500+ step interactions.
Programming with Pixels: Computer-Use Meets Software Engineering
Pranjal Aggarwal, Sean Welleck
Computer-use agents (CUAs) hold the promise of performing a wide variety of general tasks, but current evaluations have primarily focused on simple scenarios. We introduce Programming with Pixels (PwP), the first comprehensive computer-use environment for software engineering, where agents visually control an IDE to perform diverse software engineering tasks. We introduce PwP-Bench, a benchmark of 15 existing and new software-engineering tasks spanning multiple modalities, programming languages, and skillsets. We find that when interacting purely visually, CUAs perform significantly worse than specialized coding agents. However, when given direct access to just two APIs -- file editing and bash operations -- performance jumps, often reaching the levels of specialized agents despite having a task-agnostic design.
Image Editing Software as a Tool: MLLM Agents for Precise Image Editing via API-Driven Control
Maxwell Jones, Pranjal Aggarwal, Lawrence Keunho Jang, Trung Bui, Franck Dernoncourt, Gang Wu, Jun-Yan Zhu, Ruslan Salakhutdinov
We explore the use of multimodal large language model (MLLM) agents for precise image editing through API-driven control of professional image editing software, enabling complex editing operations that go beyond the capabilities of end-to-end generative models.
News
Recent updates and announcements
New preprint: Gym-Anything: Turn Any Software into an Agent Environment is out!
New preprint: Reasoning Over Mathematical Objects (Principia) is now on arXiv!
We are organizing the Frontiers of Flows for Generative AI workshop at CMU!
Two papers accepted to ICLR 2026: OptimalThinkingBench and Programming with Pixels!
Selected as a SoftBank Group-Arm Fellow!