Pranjal Aggarwal

PhD Student@Carnegie Mellon University

Language Technologies Institute

I am a second year Ph.D. student in Language Technologies at the School of Computer Science, Carnegie Mellon University, advised by Prof. Sean Welleck. I was a research intern at FAIR (Meta) in 2025 where I worked with Swarnadeep Saha and Jason Weston. Previously, I did my undergrad at IIT Delhi, where I was advised by Prof. Mausam, and also spent time at Princeton working with Prof. Karthik Narasimhan.

My research focuses on computer-use agents, test-time compute for reasoning, and formal verification for code. I also invented Generative Engine Optimization (GEO). I am a recipient of the SoftBank Group-Arm Fellowship at CMU.

Research

Select a research area to explore related publications

Improving computer-use agents by scaling environments and efficient architectures

Gym-Anything: Turn Any Software into an Agent Environment

Pranjal Aggarwal, Graham Neubig, Sean Welleck

arXiv2026

We introduce Gym-Anything, a framework that turns any software into a computer-use agent environment with a simple make() call. The system includes 200+ real-world applications across 22 occupational categories, generating 10,000+ tasks with checklist-based verification. We also introduce CUA-World-Long, a benchmark of 200 long-horizon tasks evaluating agents on 500+ step interactions.

Programming with Pixels: Computer-Use Meets Software Engineering

Pranjal Aggarwal, Sean Welleck

ICLR2026

Computer-use agents (CUAs) hold the promise of performing a wide variety of general tasks, but current evaluations have primarily focused on simple scenarios. We introduce Programming with Pixels (PwP), the first comprehensive computer-use environment for software engineering, where agents visually control an IDE to perform diverse software engineering tasks. We introduce PwP-Bench, a benchmark of 15 existing and new software-engineering tasks spanning multiple modalities, programming languages, and skillsets. We find that when interacting purely visually, CUAs perform significantly worse than specialized coding agents. However, when given direct access to just two APIs -- file editing and bash operations -- performance jumps, often reaching the levels of specialized agents despite having a task-agnostic design.

Image Editing Software as a Tool: MLLM Agents for Precise Image Editing via API-Driven Control

Maxwell Jones, Pranjal Aggarwal, Lawrence Keunho Jang, Trung Bui, Franck Dernoncourt, Gang Wu, Jun-Yan Zhu, Ruslan Salakhutdinov

Under Review2026

We explore the use of multimodal large language model (MLLM) agents for precise image editing through API-driven control of professional image editing software, enabling complex editing operations that go beyond the capabilities of end-to-end generative models.

News

Recent updates and announcements

New preprint: Gym-Anything: Turn Any Software into an Agent Environment is out!

Apr 8, 2026

New preprint: Reasoning Over Mathematical Objects (Principia) is now on arXiv!

Mar 25, 2026

We are organizing the Frontiers of Flows for Generative AI workshop at CMU!

Mar 1, 2026

Two papers accepted to ICLR 2026: OptimalThinkingBench and Programming with Pixels!

Jan 22, 2026

Selected as a SoftBank Group-Arm Fellow!

Aug 25, 2025

My works have been featured in

The New York Times Wired a16z VentureBeat The Observer Inc.Entrepreneur Digiday Search Engine Land MarkTechPost Interconnects

Pranjal Aggarwal

Research

Computer-Use Agents

Test-Time Compute for Reasoning

Formal Verification for Code

Gym-Anything: Turn Any Software into an Agent Environment

Programming with Pixels: Computer-Use Meets Software Engineering

Image Editing Software as a Tool: MLLM Agents for Precise Image Editing via API-Driven Control

News