Data & Analytics

APEX-Agents

Name: APEX-Agents
Author: Mercor

A benchmark that evaluates whether AI agents can complete long-horizon, cross-application professional tasks in investment banking, consulting, and corporate law.

APEX-Agents evaluates agentic AI systems on realistic, multi-step professional workflows rather than single-turn prompts. It comprises 33 data-rich simulated work environments with 480 tasks requiring agents to navigate complex file systems, work across documents, spreadsheets, PDFs, chat, email, and calendar. The benchmark measures Pass@1 — probability of success on the first attempt. The entire dataset is open-sourced on Hugging Face, along with the Archipelago Docker-based evaluation harness on GitHub.

Visit product page

Use cases

Evaluating enterprise AI agent readiness

Benchmarking agentic systems on long-horizon workflows

Testing agent reliability under realistic conditions

Guiding RL training strategies

Integrations

Hugging Face

GitHub (Archipelago)

Docker

Expected outcomes

Standardized agent evaluation before deployment

Identification of failure modes

Improved agent training

Reviews

GCC buyer feedback

Reviews are written by GCC buyers and published after moderation.

No reviews yet

Buyer reviews will appear here once published.

Company

Mercor

San Francisco, USA

Founded 2023

Primary Verticals

Financial Services (BFSI)Healthcare & Life SciencesManufacturingRetail & E-commerceTelecom & Media

Email Company

Functions

Data & AnalyticsLegal & ComplianceFinance & Accounting

Industries

Financial Services (BFSI)

Integrations

Use cases

Is this your company? Claim & customize your profile

This profile was created using publicly available information.