New Benchmark Shows AI Agents Perform Poorly When Automating
Real Jobs
A new paper from the Center for AI Safety and Scale AI has introduced the Remote Labor Index (RLI), the first benchmark designed to measure how well AI agents can perform paid, remote jobs.
4