Skip to content

TECHNICAL WHITE PAPER

Marshal Agents

A technical white paper describing the design of Marshal Agents that autonomously execute defined SMB workflows across a customer's own tools, governed by human approvals, gateways, and audit-ready control.

Audience
Founder-led SMBs
Scope
Managed AI Ops
Design center
Existing stack, human judgment
Cover of the Marshal Agents technical white paper

01

Introduction: From chat to work

The first useful question is not whether AI can write about work. It is whether it can execute a bounded business workflow and prove what happened.

The first wave of business AI taught everyone to ask questions. The next wave executes work. For SMBs, that means noticing a new lead, enriching the record, classifying urgency, drafting the right response, routing it to the right human, updating the CRM, and leaving an audit trail.

Marshal Agents are workflow-level systems for founder-led businesses that already run on CRM, email, calendar, Slack, spreadsheets, project management tools, billing systems, and an heroic amount of copy-paste.

Production lines

03

Lead capture, revenue generation, and operational throughput.

Agent types

09

Specialized agents mapped to high-friction SMB workflows.

Deployment target

6-12 days

A narrow first workflow shipped before the system expands.

02

What a Marshal Agent is

A Marshal Agent is a managed AI workflow executor that converts a defined business goal into controlled tool calls, approvals, and recorded outcomes.

1

Goal

A specific job to be done, such as responding to a qualified inbound lead inside five minutes.

2

Context

Relevant data from the customer's tools, permissions, business rules, history, and current state.

3

Workflow

A constrained process map with branching, looping, routing, and stopping conditions.

4

Tools

Read, write, analysis, communication, scheduling, enrichment, and notification interfaces.

5

Control

Policy checks, approval gates, exception queues, audit logs, monitoring, and kill switches.

6

Learning

Evaluation, feedback, corrections, and workflow improvements over repeated runs.

The best agent architectures stay simple, composable, transparent, and measurable before they get clever.

03

The SMB design center

Small businesses do not need an agent zoo. They need recurring operational load removed from real teams without replacing the stack.

Marshal starts with narrow workflows, clear owners, measurable outcomes, and existing tools. The goal is not to replace the business system. The goal is to make the current system behave like someone competent is watching it all day.

Existing stack first

No rip-and-replace. The agent works across the CRM, inbox, calendar, project tools, docs, and finance systems already in use.

Human judgment stays human

Approvals, exceptions, and high-impact decisions route to people before action.

Workflow before autonomy

Agents execute defined process maps and adapt only inside approved boundaries.

Measured in outcomes

The useful metrics are response time, meeting booked, record updated, cycle time reduced, and human review rate.

04

The Agent Factory architecture

Marshal's architecture is organized around the workflows founder-led teams feel first: inbound demand, outbound revenue work, and operational handoff after the sale.

Lead Capture System

Make inbound demand survivable

Speed-to-Lead, Qualification & Routing, and Booking & Follow-Up.

Revenue Generation System

Turn prospecting into a system

Prospect Identification, Account & Personalization, and Outbound Execution Support.

Operational Throughput System

Stop using humans as API glue

Client Intake & Onboarding, Data Sync & Admin Relay, and Reporting & Decision Support.

05

Reference architecture

Marshal Agents sit above the customer tools and below the human decision layer. The agent is not the system of record. It is the controlled execution layer.

  1. 01Human owners define accountability.
  2. 02Approval gates review, approve, reject, or edit sensitive actions.
  3. 03Audit console records runs, traces, exceptions, and metrics.
  4. 04Execution engine routes triggers, builds context, plans workflows, checks policy, executes tools, and evaluates outcomes.
  5. 05Customer tools remain systems of record: CRM, inbox, calendar, comms, docs, and operations systems.

Every side effect is governed by scope, credential permissions, workflow rules, and human approval requirements. This is how an agent gets useful without becoming an unsupervised intern with API keys.

06

Execution engine

A Marshal run is a bounded transaction: detect, plan, check, act, verify, log, and learn.

  1. 01Trigger: event, schedule, manual request, webhook, inbox pattern, or CRM state change.
  2. 02Assess: classify workflow, owner, priority, confidence, and safety.
  3. 03Retrieve: pull approved context while respecting access and business rules.
  4. 04Plan: generate a step plan from the workflow spec, including branches and gates.
  5. 05Execute: call tools with fixed schemas and record inputs, outputs, and decisions.
  6. 06Gate: pause for human review when required by policy, confidence, or impact.
  7. 07Commit: write approved changes with idempotent operations.
  8. 08Evaluate: score outcome, capture feedback, flag drift, and update operating memory.

07

Workflow specifications

A workflow specification is the executable contract between the business process and the model.

A workflow spec names the trigger, required context, allowed tools, branching rules, approval gates, side effects, retry policy, exception routes, and success metrics. This turns a vague instruction like "handle inbound leads" into a process that can be tested, monitored, and improved.

Example workflow spec
workflow_id: lead_capture.speed_to_lead.v2
owner: revenue_ops
trigger: inbound.demo_request.created
read_scopes:
  - crm.leads.read
  - enrichment.company.read
  - inbox.thread.read
write_scopes:
  - crm.leads.update
  - slack.notify
  - calendar.link.create
human_gates:
  - external_message.send when confidence < 0.92
  - route_change when account_value = high
idempotency_key: lead_id + workflow_version
success_metrics:
  - first_response_time
  - meeting_booked
  - approval_rate
  - exception_rate

Prompt chaining, routing, parallel checks, evaluator loops, and orchestrator-worker patterns are useful because they make complex tasks inspectable. They also create places to put gates.

08

Tools and connectors

Models reason. Tools do. The tool layer turns language into controlled interaction with the customer's stack.

Read tools

Retrieve CRM records, inbox threads, calendar state, form submissions, documents, tickets, tasks, call notes, spreadsheets, and billing data.

Write tools

Create or update CRM objects, draft messages, assign tasks, create project records, notify Slack, schedule meetings, and move workflow state.

Analysis tools

Score fit, dedupe records, summarize threads, compare fields, generate KPI tables, calculate deltas, and produce decision briefs.

Gateway tools

Request approval, route exceptions, lock a record, log a run, issue rollback instructions, and escalate to a human owner.

Tool contract

  • Name and purpose
  • Input schema
  • Output schema
  • Allowed scopes
  • Credential owner
  • Side-effect level
  • Timeout and retries
  • Idempotency key
  • Rollback path
  • Audit fields

09

Context, memory, and business graph

SMBs need their CRM, inbox, docs, calendar, and project tools to stop acting like divorced parents.

Marshal builds task-specific context from the customer's existing systems. The agent retrieves only what the workflow needs: account history, recent messages, deal stage, meeting availability, onboarding checklist, project ownership, support state, and business rules attached to the workflow.

Memory is structured feedback from runs: approved drafts, rejected routes, recurring exceptions, owner preferences, tool failures, and performance deltas. Retrieval-augmented generation keeps decisions grounded in current customer data instead of model folklore.

  1. 01Workflow run
  2. 02Person
  3. 03Account
  4. 04Deal
  5. 05Project
  6. 06Message
  7. 07Document
  8. 08Task
  9. 09Metric

10

Governance and human approvals

Marshal governance is built around three questions: what can the agent see, what can it do, and when must a human sign off?

Permissions

Inherited access, least privilege, credential scopes, and owner-controlled revocation.

Policy gateway

Checks workflow scope, tool scope, user access, data class, and action class before execution.

Human approval

Required for external messages, irreversible writes, high-value routing, ambiguous records, or low confidence.

Exception queue

Routes anomalies to the responsible person with context, recommendation, and next action.

Autonomy ladder

  • 0 Observe only
  • 1 Draft for review
  • 2 Execute after approval
  • 3 Execute low-risk actions
  • 4 Execute with sampling review
  • 5 Full autonomy inside scope

11

Gates and gateways

A gate is a deliberate pause. A gateway is an enforceable control. Marshal uses both.

Human control points for agent side effects
ControlWhen it triggersHuman actionSystem record
Draft approvalExternal message, proposal note, client recap, or outbound sequence.Approve, edit, reject, or request rewrite.Prompt, source context, draft, reviewer, and final text.
Write approvalCRM stage movement, task creation, owner change, finance sync, or data overwrite.Approve or reject write.Old value, new value, tool call, and idempotency key.
Risk gatewayLow confidence, missing field, policy conflict, high-value account, or sensitive data.Resolve exception.Reason code, confidence, and recommended next step.
Commit gatewayAll required approvals are satisfied.No manual action unless policy requires it.Final write, timestamp, result, and rollback note.

The pattern is simple: let the machine handle repetition, search, drafting, routing, and synchronization. Keep humans on judgment, exceptions, high-impact changes, and final accountability.

12

The nine agent types

The Agent Factory turns three production lines into nine concrete agent types with known triggers, reads, writes, and gates.

Lead Capture

Speed-to-Lead

Watches forms, chat, demo requests, inbound emails, or partner referrals. It enriches the lead, checks fit, drafts or sends a response, notifies the owner, creates CRM activity, and prepares meeting context.

Lead Capture

Qualification & Routing

Scores fit and intent, applies routing rules, detects ownership conflicts, assigns the next human, and packages the handoff.

Lead Capture

Booking & Follow-Up

Coordinates scheduling, reminders, reschedules, no-show recovery, and post-booking CRM hygiene.

Revenue Generation

Prospect Identification

Builds candidate lists from ICP rules, enriches contacts and companies, suppresses existing records, and creates reviewable prospect pools.

Revenue Generation

Account & Personalization

Turns accounts into briefs: firmographic context, trigger events, likely pain, relationships, and draft personalization hooks.

Revenue Generation

Outbound Execution Support

Drafts email and LinkedIn copy, tracks touches, categorizes replies, logs actions, and escalates interest, objections, and meeting-ready responses.

Operational Throughput

Client Intake & Onboarding

Collects intake data, checks completeness, creates onboarding tasks, prepares internal handoff, and tracks missing assets.

Operational Throughput

Data Sync & Admin Relay

Keeps records aligned across CRM, sheets, project tools, finance tools, and dashboards.

Operational Throughput

Reporting & Decision Support

Assembles KPI rollups, trend reads, variance explanations, client recap drafts, and decision briefs.

13

Agent-to-agent orchestration

The goal is not one giant genius agent. Marshal composes specialized agents with clear ownership.

Sub-agents allow specialization. A lead routing agent should not also invent onboarding checklists and reconcile finance fields. Small agents with explicit contracts are easier to test, monitor, replace, and improve.

  1. 01Event
  2. 02Orchestrator
  3. 03Lead Agent
  4. 04Revenue Agent
  5. 05Ops Agent
  6. 06Reporting Agent
  7. 07Human owner

14

Side effects and safe writes

Reading is reversible. Writing is where the bill comes due.

  1. 01Draft
  2. 02Propose
  3. 03Approve
  4. 04Commit
  5. 05Verify
  6. 06Compensate

15

Model routing and LLM choice

Model choice is an execution decision, not a personality quiz.

Model routing by workflow step
Workflow stepModel routePurpose
ClassificationFast modelCheap, low-latency, repeatable labels.
Retrieval synthesisGeneral modelSummarize grounded context with citations and constraints.
Complex reasoningStrong modelPlan branches, resolve ambiguity, and draft high-stakes text.
EvaluationJudge modelScore correctness, completeness, grounding, and policy adherence.

Marshal designs for model portability. Workflows should not be fused to one provider, one prompt style, or one benchmark moment. Models change. Business processes should not panic every time a leaderboard updates.

16

Evaluation

An agent can sound competent while quietly taking the wrong step. Evaluate the work, not the vibes.

  1. 01Golden tasks
  2. 02Sandbox runs
  3. 03Step graders
  4. 04Outcome graders
  5. 05Regression suite
  6. 06Production monitoring

Metrics to inspect

  • Tool selection accuracy
  • Approval rate
  • Exception rate
  • Latency
  • Cost per run
  • First response time
  • Meeting booked rate
  • Record sync accuracy
  • Human edit distance
  • Rollback rate
  • Source coverage
  • Grounding score

17

Observability

When an agent touches business systems, "the model decided" is not an acceptable postmortem.

Example run trace
run_id: 2026-06-06T12:00:33Z-lead-8432
workflow_id: lead_capture.speed_to_lead.v2
trigger: demo_request.created
context_sources:
  - hubspot.contact: read ok
  - enrichment.company: read ok
  - gmail.thread: no prior thread
plan_steps: [enrich, score, draft, gate, notify, update]
tool_calls:
  - enrich_company: success, 821 ms
  - crm_update: pending approval
human_gate:
  reviewer: sales_owner
  decision: approved_with_edits
outcome:
  first_response_time: 00:03:41
  meeting_status: pending
  eval_score: 0.91

Observability covers traces, metrics, logs, transcripts, tool calls, policy decisions, approvals, errors, and outcome metrics. It lets Marshal find the exact step that failed instead of staring at the final output.

18

Security and risk management

SMBs need practical controls that map to the actual risk of agents reading and writing across business systems.

Least privilege

Each workflow gets only the read and write scopes it needs. Credentials remain owner-controlled and revocable.

Permission checks

Every request is checked against user access, workflow scope, tool scope, and data class before action.

Prompt injection defense

Untrusted content is treated as data, not instruction. Tool calls are gated by policy.

Output handling

Generated text and structured outputs are validated before downstream use.

Human review

Required for sensitive data, financial actions, public messages, high-value accounts, and low-confidence decisions.

Run controls

Audit logs, alerting, exception queues, rollback notes, and kill switches support managed operation.

19

Deployment lifecycle

Marshal ships one workflow first because production systems should earn their territory.

  1. 01Scope
  2. 02Map
  3. 03Build
  4. 04Test
  5. 05Pilot
  6. 06Operate

Fit criteria matter: clear workflow, operational pain, a real growth model, practical tool readiness, measurable outcome, and an internal owner. Without those, the agent becomes theater.

20

SMB use case: Speed-to-Lead

A founder-led services firm gets a demo request from a qualified buyer while the team is in meetings. The agent handles the first response path without pretending it is the sales team.

Inputs

Lead form, CRM, enrichment, routing rules, calendar availability, and response templates.

Human checkpoints

Strategic account, unusual request, uncertain fit, or external message below confidence threshold.

Business metrics

First response time, meeting booked rate, lead acceptance, approval rate, and edit distance.

  1. 01Form submitted
  2. 02Enrich company
  3. 03Check CRM
  4. 04Score fit
  5. 05Draft response
  6. 06Approval gate
  7. 07Send or queue
  8. 08Log outcome

21

SMB use case: Client Intake & Onboarding

A new client signs. The handoff from sales to delivery can either be a system or a Slack archaeology expedition.

Inputs

Closed-won deal, agreement, kickoff notes, project template, and intake questionnaire.

Outputs

Project created, onboarding checklist, client request, internal brief, and missing-data queue.

Metrics

Time to kickoff, checklist completeness, missing asset age, owner response time, and exception rate.

  1. 01Deal closed
  2. 02Collect context
  3. 03Create project
  4. 04Request intake
  5. 05Check gaps
  6. 06Approval gate
  7. 07Notify team
  8. 08Monitor status

22

What SMBs should measure

A good agent program measures whether work moved, not how many tokens got incinerated in the name of progress.

Outcome and control metrics by system
SystemPrimary outcomesControl metricsFailure signals
Lead CaptureFirst response time, meetings booked, lead acceptance, conversion to opportunity.Approval rate, edit distance, route accuracy, duplicate rate.Missed SLA, wrong owner, low-quality response, orphan lead.
Revenue GenerationApproved prospect volume, reply rate, positive reply rate, qualified opportunities.Suppression accuracy, source coverage, personalization quality, bounce rate.Bad fit list, unsupported claim, sequence fatigue, CRM conflict.
Operational ThroughputCycle time, records synchronized, reports delivered, onboarding completion.Exception rate, write accuracy, rollback rate, missing-field age.Conflicting source data, stale dashboard, unapproved write, unowned task.

Outcome-based operations need business results and control health. If the result improves while control metrics degrade, the system is borrowing risk from the future.

23

Best-practice architecture patterns

The strongest systems make workflows explicit, tools narrow, side effects gated, and autonomy earned.

Start with workflows, not personalities

Define the job, trigger, state, allowed actions, and stop conditions before choosing model behavior.

Keep tools narrow and explicit

Give every tool one job, a fixed schema, boundaries, examples, and failure behavior.

Gate before side effects

Use policy checks and human approvals before writes, public communications, destructive actions, or sensitive decisions.

Use routing over one giant prompt

Specialized sub-agents and model routing reduce cost, latency, and mystery.

Make retries safe

Use idempotency keys and durable run state so failures do not create duplicate actions.

Evaluate trajectories

Judge final outcomes, intermediate tool choices, approval behavior, and full run transcripts.

Instrument everything

Logs, traces, metrics, source context, tool calls, exceptions, and approvals should be visible.

Promote autonomy gradually

Move from observe to draft to approved execution to limited autonomy only after the evidence earns it.

24

Conclusion: The SMB operating layer

The technical answer is not a chatbot, a no-code toy, or enterprise AI cosplay. It is a managed agentic operating layer.

Marshal Agents exist because founder-led companies can build revenue machines and then feed them manual coordination until everyone looks tired. The agent does the repeatable work, surfaces the ambiguous work, and makes the business more legible every time it runs.

  • Ground in the stack: use the tools and records the business already runs on.
  • Constrain the workflow: autonomy lives inside defined process boundaries.
  • Preserve human authority: humans approve, override, and own judgment.
  • Evaluate continuously: every run teaches the system, or it should not have run.

25

References and source notes

[1]

Marshal Agent Factory

runmarshal.com/agent-factory and linked agent pages. Defines three production lines, nine agent types, and the SMB Agent Factory framing.

[2]

Marshal platform pages

Marshal Agents, Agent Governance, Agent Orchestration, and AI Agent Development Services pages, accessed June 2026.

[3]

Anthropic agent engineering guidance

Building effective agents, December 2024.

[4]

Anthropic trustworthy agents guidance

Trustworthy agents in practice, April 2026.

[5]

Anthropic evals guidance

Demystifying evals for AI agents, January 2026.

[6]

NIST AI Risk Management Framework

Artificial Intelligence Risk Management Framework and Generative AI Profile, 2024.

[7]

OWASP LLM Top 10

Top 10 for Large Language Model Applications 2025.

[8]

Model Context Protocol

MCP specification, 2025-06-18.

[9]

Google Agent2Agent

Agent2Agent protocol announcement and documentation, April 2025.

[10]

Temporal durable execution

Idempotency and durable execution guidance.

[11]

OpenTelemetry

Documentation for traces, metrics, logs, and vendor-neutral observability.

[12]

OpenAI Agents SDK

Tracing and guardrails documentation for model calls, tools, handoffs, and runtime validation.

[13]

LangSmith

Agent evaluation documentation for final response, single-step, and trajectory evaluation.

[14]

Lewis et al.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, 2020.