The first question should not be “where can we use agents?” It should be “where will an agent earn its keep without creating a new supervision job for the team?” That difference saves a lot of expensive pilots.
Asana’s new Agentic Work Management push, Meta’s business-agent direction, and Microsoft’s steady enterprise-agent drumbeat all point to the same next phase: agents are moving into the systems where work is assigned, tracked, and coordinated. The tooling is getting better. The harder question is still operational: which work deserves autonomy first?
Start where the work is repeatable
A good first workflow is high-volume, rule-bound, and annoying. It has clear inputs, a known destination, and a human escalation path. Nobody needs the agent to invent a strategy from scratch. They need it to turn a pile of messy inputs into the next obvious action.
Strong candidates look like this:
- Document intake. Classify the file, extract the fields, compare it against a checklist, and route exceptions to the right person.
- CRM hygiene. Normalize notes, enrich missing fields, flag stale opportunities, and prepare follow-up drafts for review.
- Operations triage. Read inbound requests, detect urgency, attach context, and create the right task with the right owner.
- Reporting prep. Pull numbers from known systems, explain deltas, and prepare a manager-ready summary without changing the underlying data.
These workflows are not glamorous. That is the point. Agents earn trust by doing useful, bounded work repeatedly before they are allowed near higher-consequence decisions.
Autonomy is not a maturity badge. It is an operating cost. Spend it first where the rules are clear, the volume is real, and failure can be contained.
Do not start with judgment-heavy work
Some workflows look attractive because they are expensive. That does not make them good first agent projects. Pricing strategy, legal interpretation, creative direction, hiring decisions, customer escalations, and anything involving emotional nuance should not be the opening move.
Agents can help around that work. They can summarize the file, gather precedent, prepare options, check a policy, or draft a response for a person to edit. But if the real value of the work is judgment, context, taste, negotiation, or accountability, the first version should make the human better instead of trying to remove the human.
This is where many projects overreach. The agent demos well because the examples are tidy. Then production brings edge cases, missing fields, angry customers, broken integrations, and unclear ownership. The result is not automation. It is a nervous team babysitting a tool they no longer trust.
The four filters we use
Before we build, we score candidate workflows against four filters. If a workflow fails more than one, it is usually not the first one.
- Frequency. Does this happen often enough that automation compounds, or is it a rare headache with a loud sponsor?
- Clarity. Can a skilled employee describe the happy path, the exceptions, and the escalation rules without hand-waving?
- System access. Can the agent read and write through supported APIs with scoped credentials, or would it depend on brittle screen workarounds?
- Recoverability. If the agent is wrong, can the business detect it quickly and undo or correct the action without a customer-facing incident?
A workflow with high frequency, clear rules, real APIs, and recoverable failure is a strong candidate. A workflow with vague rules, missing system access, and irreversible outcomes belongs later, after the organization has the control plane to manage it.
Shape the first 30 days like a production pilot
A useful pilot should not pretend to transform the company in a month. It should prove one workflow can run safely with measurable value.
Week one maps the workflow and defines the scorecard: cycle time saved, error rate, escalation rate, cost per run, and user satisfaction. Week two builds the agent with read-only or draft-only access. Week three runs it beside the human process and compares output. Week four grants narrow write access only where the evidence supports it, with a kill switch and audit log in place.
That cadence keeps the pilot honest. The question at the end is not “did the demo impress everyone?” It is “did this workflow get faster, cheaper, cleaner, or more reliable without hiding new risk?”
What failure usually looks like
The failures are boring and predictable. The agent cannot reach the source system. Nobody owns the exception queue. The process was never actually standardized. The prompt changed but nobody ran evals. A manager wants full autonomy before the team has audit logs. A vendor demo skipped the messy handoff where the real work happens.
We covered the production side of this in from demo to durable. The short version: agents are software. They need authentication, observability, idempotency, rollback, and evals before they deserve real authority.
Foundation AI helps teams choose the first workflow with discipline, then ships the agent that can survive production. If you have ten possible use cases and want to pick the one that will actually pay back, start here.
