03/03

AI as an Operating Layer.
Let it run the workflow.

Stage Three of Three
2–6 months to fluency
i

AI stops being something you ask and becomes something that runs. You become the supervisor of a small fleet of agents. Don't try this without Stage 2 maturity. The intuition is the same; the stakes are higher.

A · Landscape

The 2026 toolkit

Agents you can actually deploy
Pick the surface that matches the work
ToolBest for
Claude CodeTerminal-native agent. Writes, runs, ships code. Best for eng/data/repos.
Claude in ChromeBrowser agent. Navigates web apps, fills forms, takes actions in SaaS.
CoworkDesktop agent for non-devs. Manages files and tasks across apps.
ChatGPT AgentBrowser actions, scheduled runs, deep research over 5–30 min.
Gemini AgentsActs inside Gmail, Docs, Sheets. Pulls Drive context natively.
The Playbook
Build & pilot in 5 steps
01 · PICK NARROW One workflow, <10 steps Runs 5+ times a week. Owner can describe it in 1 page. Low stakes if it goes wrong.
02 · MAP THE STEPS Shadow the human Document every input, output, judgment call. Note what's automatable vs. what's not.
03 · SCOPE TIGHT Permissions matter Read-only first. Write actions need explicit approval. Dedicated credentials, never admin.
04 · PILOT 2 WEEKS One owner, daily standup Measure cycle time, error rate, sentiment. Compare to baseline.
05 · DECIDE Scale, kill, or harden Day 14: written decision. Document either way. Share the post-mortem.
B · Prescriptive

See a real agent. Pilot your first.

14-day commitment
A real workflow, end-to-end
SEE Pre-Meeting Brief Agent

A starter agent that actually works.

Trigger: 60 minutes before any external meeting on the calendar.

What it does:

Workflow 1. Read calendar invite. Identify external attendees. 2. For each, web search for recent news (last 90 days). 3. Check LinkedIn for role/company changes. 4. Pull last 3 email threads from inbox. 5. Synthesize: who, why they matter, talking points, 2 questions to ask. 6. Deliver as a 1-page brief in Slack DM.

What it can't touch: No outbound communications. No CRM writes. No scheduling. Read-only across the board.

What it saves: 30 min of manual prep per external meeting. 6 hours/week back.

DO Pilot one in 14 days

Your 2-week pilot plan.

  1. Day 1: Pick the workflow. Write the 1-page scope (trigger, inputs, steps, outputs, what it can't touch).
  2. Day 2-3: Build it. Use Cowork, Claude in Chrome, or ChatGPT Agent. Start read-only.
  3. Day 4-10: Run it daily. Review every output. Note errors, log fixes.
  4. Day 11-13: Tighten. Add the write actions you trust. Keep the rest read-only.
  5. Day 14: Decide. Scale to the team, kill it, or harden for production. Write the post-mortem either way.

The point isn't the agent. The point is the muscle of piloting one. Once you've done one, the next ten are 10x faster.

C · Recipes

6 workflows worth piloting

Ranked from easiest to hardest. Start at the top. Don't skip ahead until each one is real.

Easy · Read-only

Pre-Meeting Brief

SCOPE Trigger: 60 min before external meetings. Pull attendee context (news, LinkedIn, email history). Deliver 1-page brief via Slack DM. Read-only, no outbound actions.
ROI
6 hours/week back. Better-prepared meetings.
Easy · Read-only

Competitive Intel

SCOPE Weekly: scan named competitors' news, blog posts, hiring, pricing pages, podcast appearances. Summarize material moves. Deliver as Monday-morning digest. Read-only.
ROI
Strategic awareness without a full-time analyst.
Medium

Lead Enrichment

SCOPE New CRM leads: enrich with company size, funding, tech stack, role context. Score against ICP. Write enrichment to CRM. Surface top 10 daily.
ROI
Sales focuses on top 10, not all 200.
Medium

Month-End Close Prep

SCOPE Day -3: pull subledgers, run reconciliation checks, flag variances, draft month-end narrative. Human approves before close. No journal entries.
ROI
Close cycle 2 days faster. Fewer Friday-night closes.
Hard

Customer Health Monitor

SCOPE Daily scan of usage, support tickets, NPS, exec communication. Flag accounts trending down. Draft proactive outreach (human sends). Never sends directly.
ROI
Catches churn risks 2-3 weeks earlier.
Hard

Contract Review

SCOPE Incoming MSAs/NDAs: compare to our playbook. Flag non-standard clauses. Score risk. Draft redlines. Human approves all changes. Legal owns final review.
ROI
Cycle time on standard contracts: days to hours.
D · Discipline

Watch-outs & what stays human

Watch-outs
Where pilots go wrong
Click each to see the mitigation.

The trap: Agent works well. Team wants it to do more. You expand scope without re-scoping permissions.

The fix: Every new capability requires explicit permission expansion, signed off by the owner. Dedicated service account. Never admin. Audit log on.

The trap: Agent ran. Output happened. No one knows what it touched.

The fix: Log every action, every input, every output. Daily review for the first month. Weekly thereafter.

The trap: Agent fails partially. Output looks plausible. No alarm goes off.

The fix: Build explicit success criteria into every run. If criteria not met, agent fails loudly. Human review on first 30 days regardless.

The trap: Worked in month 1. Quality slipped by month 4. No one noticed.

The fix: Quarterly recalibration. Sample outputs. Compare to baseline. The teammate from Stage 2 needed Friday iteration; the agent needs quarterly review.

Non-negotiable
What stays human
Always. No exceptions, no pilots.
  • Pricing: any change to what you charge
  • People: hiring, firing, comp, performance reviews
  • Legal: final contracts, settlements, regulatory filings
  • Customer escalations: anything labeled "urgent"
  • Public statements: press, social, investor relations
  • Financial transactions: anything that moves money
  • Irreversible deletes: data, accounts, records
  • Security responses: any incident, any breach

The principle: Agents handle the boring, repeated, low-stakes work. Humans handle the consequential, novel, or irreversible. Don't blur the line because the agent is "ready."

E · Self-check

Are you ready for Stage 3?

Answer honestly. If you can't say yes to most of these, go back to Stage 2 and build a few more teammates first.

If no: build more teammates first. The intuition for what AI can and can't do reliably comes from doing Stage 2 work. Skipping it is the #1 reason pilots fail.

If no: agents are bad investments for one-off work. The ROI comes from frequency. If the workflow doesn't recur, a Stage 2 teammate is the right answer.

If no: don't start. Pilots without owners drift. Owners without time produce theater, not learning.

If no: write it down before you build. The list of "never" is more important than the list of "can." Put it in the spec. Reference it in the audit.

If no: pick a different workflow. Pilots should be embarrassing if they fail, not catastrophic. Build the muscle on something boring before you try something consequential.

If no: don't start. Pilots that go indefinitely are zombie projects. The decision discipline is half the value of the exercise.

Do this

Try this quarter

  1. Find your 4-hour workflow. One thing the team does 5+ times a week that costs at least 4 hours total. Boring beats glamorous.
  2. Build a 2-week pilot. Read-only first. One owner. Daily standup. Written success criteria.
  3. Decide on day 14. Scale, kill, or harden. Write the post-mortem. Tell your peers what you learned.
Bonus reference

The Shortcuts

Keyboard moves, prompt patterns, and the small techniques that make every stage faster. Worth bookmarking for reference.

Continue