7 1

Rohan Arora

rohan-arora

AI & ML interests

None yet

Recent Activity

published an article about 11 hours ago

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

upvoted a paper 15 days ago

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

updated a dataset about 1 month ago

ibm-research/ITBench-Lite

View all activity

Organizations

published an article about 11 hours ago

Article

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

ibm-research

•

about 11 hours ago

• 9

upvoted a paper 15 days ago

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

Paper • 2605.09131 • Published 19 days ago • 55

updated a dataset about 1 month ago

ibm-research/ITBench-Lite

Updated Apr 21 • 947 • 5

New activity in ibm-research/ITBench-Trajectories 2 months ago

Task description defined twice in the input

#2 opened 2 months ago by

kyzor

Scenario indices discrepancy

#1 opened 2 months ago by

kyzor

New activity in ibm-research/ITBench-Lite 3 months ago

Add CISO scenarios

#3 opened 3 months ago by

yana1205dev

test

#2 opened 3 months ago by

yana1205dev

published an article 3 months ago

Article

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

ibm-research

•

Feb 18

• 19

New activity in ibm-research/ITBench-Lite 4 months ago

finops new scenarios

#1 opened 4 months ago by

bturkkan

updated a Space 4 months ago

ITBench-Lite-Space

🚀

Develop and run interactive code notebooks with JupyterLab

updated a dataset 4 months ago

ibm-research/ITBench-Trajectories

Updated Jan 19 • 566 • 3

New activity in ibm-research/ITBench-Lite 4 months ago

activities

#1 opened 4 months ago by

bhavya24

New activity in rohan-arora/ITBench-Lite 4 months ago

test-pr

#1 opened 4 months ago by

rohan-arora

published a dataset 4 months ago

ibm-research/ITBench-Trajectories

Updated Jan 19 • 566 • 3

published a Space 4 months ago

ITBench-Lite-Space

🚀

Develop and run interactive code notebooks with JupyterLab

published a dataset 4 months ago

ibm-research/ITBench-Lite

Updated Apr 21 • 947 • 5

Rohan Arora

AI & ML interests

Recent Activity

Organizations

rohan-arora's activity

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Task description defined twice in the input

Scenario indices discrepancy

Add CISO scenarios

test

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

finops new scenarios

ITBench-Lite-Space

activities

test-pr

ITBench-Lite-Space