01
Introduction: From chat to work
The first useful question is not whether AI can write about work. It is whether it can execute a bounded business workflow and prove what happened.
The first wave of business AI taught everyone to ask questions. The next wave executes work. For SMBs, that means noticing a new lead, enriching the record, classifying urgency, drafting the right response, routing it to the right human, updating the CRM, and leaving an audit trail.
Marshal Agents are workflow-level systems for founder-led businesses that already run on CRM, email, calendar, Slack, spreadsheets, project management tools, billing systems, and an heroic amount of copy-paste.
Production lines
03
Lead capture, revenue generation, and operational throughput.
Agent types
09
Specialized agents mapped to high-friction SMB workflows.
Deployment target
6-12 days
A narrow first workflow shipped before the system expands.
02
What a Marshal Agent is
A Marshal Agent is a managed AI workflow executor that converts a defined business goal into controlled tool calls, approvals, and recorded outcomes.
1
Goal
A specific job to be done, such as responding to a qualified inbound lead inside five minutes.
2
Context
Relevant data from the customer's tools, permissions, business rules, history, and current state.
3
Workflow
A constrained process map with branching, looping, routing, and stopping conditions.
4
Tools
Read, write, analysis, communication, scheduling, enrichment, and notification interfaces.
5
Control
Policy checks, approval gates, exception queues, audit logs, monitoring, and kill switches.
6
Learning
Evaluation, feedback, corrections, and workflow improvements over repeated runs.
The best agent architectures stay simple, composable, transparent, and measurable before they get clever.
03
The SMB design center
Small businesses do not need an agent zoo. They need recurring operational load removed from real teams without replacing the stack.
Marshal starts with narrow workflows, clear owners, measurable outcomes, and existing tools. The goal is not to replace the business system. The goal is to make the current system behave like someone competent is watching it all day.
Existing stack first
No rip-and-replace. The agent works across the CRM, inbox, calendar, project tools, docs, and finance systems already in use.
Human judgment stays human
Approvals, exceptions, and high-impact decisions route to people before action.
Workflow before autonomy
Agents execute defined process maps and adapt only inside approved boundaries.
Measured in outcomes
The useful metrics are response time, meeting booked, record updated, cycle time reduced, and human review rate.
04
The Agent Factory architecture
Marshal's architecture is organized around the workflows founder-led teams feel first: inbound demand, outbound revenue work, and operational handoff after the sale.
Lead Capture System
Make inbound demand survivable
Speed-to-Lead, Qualification & Routing, and Booking & Follow-Up.
Revenue Generation System
Turn prospecting into a system
Prospect Identification, Account & Personalization, and Outbound Execution Support.
Operational Throughput System
Stop using humans as API glue
Client Intake & Onboarding, Data Sync & Admin Relay, and Reporting & Decision Support.
05
Reference architecture
Marshal Agents sit above the customer tools and below the human decision layer. The agent is not the system of record. It is the controlled execution layer.
- 01Human owners define accountability.
- 02Approval gates review, approve, reject, or edit sensitive actions.
- 03Audit console records runs, traces, exceptions, and metrics.
- 04Execution engine routes triggers, builds context, plans workflows, checks policy, executes tools, and evaluates outcomes.
- 05Customer tools remain systems of record: CRM, inbox, calendar, comms, docs, and operations systems.
Every side effect is governed by scope, credential permissions, workflow rules, and human approval requirements. This is how an agent gets useful without becoming an unsupervised intern with API keys.
06
Execution engine
A Marshal run is a bounded transaction: detect, plan, check, act, verify, log, and learn.
- 01Trigger: event, schedule, manual request, webhook, inbox pattern, or CRM state change.
- 02Assess: classify workflow, owner, priority, confidence, and safety.
- 03Retrieve: pull approved context while respecting access and business rules.
- 04Plan: generate a step plan from the workflow spec, including branches and gates.
- 05Execute: call tools with fixed schemas and record inputs, outputs, and decisions.
- 06Gate: pause for human review when required by policy, confidence, or impact.
- 07Commit: write approved changes with idempotent operations.
- 08Evaluate: score outcome, capture feedback, flag drift, and update operating memory.
07
Workflow specifications
A workflow specification is the executable contract between the business process and the model.
A workflow spec names the trigger, required context, allowed tools, branching rules, approval gates, side effects, retry policy, exception routes, and success metrics. This turns a vague instruction like "handle inbound leads" into a process that can be tested, monitored, and improved.
workflow_id: lead_capture.speed_to_lead.v2
owner: revenue_ops
trigger: inbound.demo_request.created
read_scopes:
- crm.leads.read
- enrichment.company.read
- inbox.thread.read
write_scopes:
- crm.leads.update
- slack.notify
- calendar.link.create
human_gates:
- external_message.send when confidence < 0.92
- route_change when account_value = high
idempotency_key: lead_id + workflow_version
success_metrics:
- first_response_time
- meeting_booked
- approval_rate
- exception_ratePrompt chaining, routing, parallel checks, evaluator loops, and orchestrator-worker patterns are useful because they make complex tasks inspectable. They also create places to put gates.
08
Tools and connectors
Models reason. Tools do. The tool layer turns language into controlled interaction with the customer's stack.
Read tools
Retrieve CRM records, inbox threads, calendar state, form submissions, documents, tickets, tasks, call notes, spreadsheets, and billing data.
Write tools
Create or update CRM objects, draft messages, assign tasks, create project records, notify Slack, schedule meetings, and move workflow state.
Analysis tools
Score fit, dedupe records, summarize threads, compare fields, generate KPI tables, calculate deltas, and produce decision briefs.
Gateway tools
Request approval, route exceptions, lock a record, log a run, issue rollback instructions, and escalate to a human owner.
Tool contract
- Name and purpose
- Input schema
- Output schema
- Allowed scopes
- Credential owner
- Side-effect level
- Timeout and retries
- Idempotency key
- Rollback path
- Audit fields
09
Context, memory, and business graph
SMBs need their CRM, inbox, docs, calendar, and project tools to stop acting like divorced parents.
Marshal builds task-specific context from the customer's existing systems. The agent retrieves only what the workflow needs: account history, recent messages, deal stage, meeting availability, onboarding checklist, project ownership, support state, and business rules attached to the workflow.
Memory is structured feedback from runs: approved drafts, rejected routes, recurring exceptions, owner preferences, tool failures, and performance deltas. Retrieval-augmented generation keeps decisions grounded in current customer data instead of model folklore.
- 01Workflow run
- 02Person
- 03Account
- 04Deal
- 05Project
- 06Message
- 07Document
- 08Task
- 09Metric
10
Governance and human approvals
Marshal governance is built around three questions: what can the agent see, what can it do, and when must a human sign off?
Permissions
Inherited access, least privilege, credential scopes, and owner-controlled revocation.
Policy gateway
Checks workflow scope, tool scope, user access, data class, and action class before execution.
Human approval
Required for external messages, irreversible writes, high-value routing, ambiguous records, or low confidence.
Exception queue
Routes anomalies to the responsible person with context, recommendation, and next action.
Autonomy ladder
- 0 Observe only
- 1 Draft for review
- 2 Execute after approval
- 3 Execute low-risk actions
- 4 Execute with sampling review
- 5 Full autonomy inside scope
11
Gates and gateways
A gate is a deliberate pause. A gateway is an enforceable control. Marshal uses both.
| Control | When it triggers | Human action | System record |
|---|---|---|---|
| Draft approval | External message, proposal note, client recap, or outbound sequence. | Approve, edit, reject, or request rewrite. | Prompt, source context, draft, reviewer, and final text. |
| Write approval | CRM stage movement, task creation, owner change, finance sync, or data overwrite. | Approve or reject write. | Old value, new value, tool call, and idempotency key. |
| Risk gateway | Low confidence, missing field, policy conflict, high-value account, or sensitive data. | Resolve exception. | Reason code, confidence, and recommended next step. |
| Commit gateway | All required approvals are satisfied. | No manual action unless policy requires it. | Final write, timestamp, result, and rollback note. |
The pattern is simple: let the machine handle repetition, search, drafting, routing, and synchronization. Keep humans on judgment, exceptions, high-impact changes, and final accountability.
12
The nine agent types
The Agent Factory turns three production lines into nine concrete agent types with known triggers, reads, writes, and gates.
Lead Capture
Speed-to-Lead
Watches forms, chat, demo requests, inbound emails, or partner referrals. It enriches the lead, checks fit, drafts or sends a response, notifies the owner, creates CRM activity, and prepares meeting context.
Lead Capture
Qualification & Routing
Scores fit and intent, applies routing rules, detects ownership conflicts, assigns the next human, and packages the handoff.
Lead Capture
Booking & Follow-Up
Coordinates scheduling, reminders, reschedules, no-show recovery, and post-booking CRM hygiene.
Revenue Generation
Prospect Identification
Builds candidate lists from ICP rules, enriches contacts and companies, suppresses existing records, and creates reviewable prospect pools.
Revenue Generation
Account & Personalization
Turns accounts into briefs: firmographic context, trigger events, likely pain, relationships, and draft personalization hooks.
Revenue Generation
Outbound Execution Support
Drafts email and LinkedIn copy, tracks touches, categorizes replies, logs actions, and escalates interest, objections, and meeting-ready responses.
Operational Throughput
Client Intake & Onboarding
Collects intake data, checks completeness, creates onboarding tasks, prepares internal handoff, and tracks missing assets.
Operational Throughput
Data Sync & Admin Relay
Keeps records aligned across CRM, sheets, project tools, finance tools, and dashboards.
Operational Throughput
Reporting & Decision Support
Assembles KPI rollups, trend reads, variance explanations, client recap drafts, and decision briefs.
13
Agent-to-agent orchestration
The goal is not one giant genius agent. Marshal composes specialized agents with clear ownership.
Sub-agents allow specialization. A lead routing agent should not also invent onboarding checklists and reconcile finance fields. Small agents with explicit contracts are easier to test, monitor, replace, and improve.
- 01Event
- 02Orchestrator
- 03Lead Agent
- 04Revenue Agent
- 05Ops Agent
- 06Reporting Agent
- 07Human owner
14
Side effects and safe writes
Reading is reversible. Writing is where the bill comes due.
- 01Draft
- 02Propose
- 03Approve
- 04Commit
- 05Verify
- 06Compensate
15
Model routing and LLM choice
Model choice is an execution decision, not a personality quiz.
| Workflow step | Model route | Purpose |
|---|---|---|
| Classification | Fast model | Cheap, low-latency, repeatable labels. |
| Retrieval synthesis | General model | Summarize grounded context with citations and constraints. |
| Complex reasoning | Strong model | Plan branches, resolve ambiguity, and draft high-stakes text. |
| Evaluation | Judge model | Score correctness, completeness, grounding, and policy adherence. |
Marshal designs for model portability. Workflows should not be fused to one provider, one prompt style, or one benchmark moment. Models change. Business processes should not panic every time a leaderboard updates.
16
Evaluation
An agent can sound competent while quietly taking the wrong step. Evaluate the work, not the vibes.
- 01Golden tasks
- 02Sandbox runs
- 03Step graders
- 04Outcome graders
- 05Regression suite
- 06Production monitoring
Metrics to inspect
- Tool selection accuracy
- Approval rate
- Exception rate
- Latency
- Cost per run
- First response time
- Meeting booked rate
- Record sync accuracy
- Human edit distance
- Rollback rate
- Source coverage
- Grounding score
17
Observability
When an agent touches business systems, "the model decided" is not an acceptable postmortem.
run_id: 2026-06-06T12:00:33Z-lead-8432
workflow_id: lead_capture.speed_to_lead.v2
trigger: demo_request.created
context_sources:
- hubspot.contact: read ok
- enrichment.company: read ok
- gmail.thread: no prior thread
plan_steps: [enrich, score, draft, gate, notify, update]
tool_calls:
- enrich_company: success, 821 ms
- crm_update: pending approval
human_gate:
reviewer: sales_owner
decision: approved_with_edits
outcome:
first_response_time: 00:03:41
meeting_status: pending
eval_score: 0.91Observability covers traces, metrics, logs, transcripts, tool calls, policy decisions, approvals, errors, and outcome metrics. It lets Marshal find the exact step that failed instead of staring at the final output.
18
Security and risk management
SMBs need practical controls that map to the actual risk of agents reading and writing across business systems.
Least privilege
Each workflow gets only the read and write scopes it needs. Credentials remain owner-controlled and revocable.
Permission checks
Every request is checked against user access, workflow scope, tool scope, and data class before action.
Prompt injection defense
Untrusted content is treated as data, not instruction. Tool calls are gated by policy.
Output handling
Generated text and structured outputs are validated before downstream use.
Human review
Required for sensitive data, financial actions, public messages, high-value accounts, and low-confidence decisions.
Run controls
Audit logs, alerting, exception queues, rollback notes, and kill switches support managed operation.
19
Deployment lifecycle
Marshal ships one workflow first because production systems should earn their territory.
- 01Scope
- 02Map
- 03Build
- 04Test
- 05Pilot
- 06Operate
Fit criteria matter: clear workflow, operational pain, a real growth model, practical tool readiness, measurable outcome, and an internal owner. Without those, the agent becomes theater.
20
SMB use case: Speed-to-Lead
A founder-led services firm gets a demo request from a qualified buyer while the team is in meetings. The agent handles the first response path without pretending it is the sales team.
Inputs
Lead form, CRM, enrichment, routing rules, calendar availability, and response templates.
Human checkpoints
Strategic account, unusual request, uncertain fit, or external message below confidence threshold.
Business metrics
First response time, meeting booked rate, lead acceptance, approval rate, and edit distance.
- 01Form submitted
- 02Enrich company
- 03Check CRM
- 04Score fit
- 05Draft response
- 06Approval gate
- 07Send or queue
- 08Log outcome
21
SMB use case: Client Intake & Onboarding
A new client signs. The handoff from sales to delivery can either be a system or a Slack archaeology expedition.
Inputs
Closed-won deal, agreement, kickoff notes, project template, and intake questionnaire.
Outputs
Project created, onboarding checklist, client request, internal brief, and missing-data queue.
Metrics
Time to kickoff, checklist completeness, missing asset age, owner response time, and exception rate.
- 01Deal closed
- 02Collect context
- 03Create project
- 04Request intake
- 05Check gaps
- 06Approval gate
- 07Notify team
- 08Monitor status
22
What SMBs should measure
A good agent program measures whether work moved, not how many tokens got incinerated in the name of progress.
| System | Primary outcomes | Control metrics | Failure signals |
|---|---|---|---|
| Lead Capture | First response time, meetings booked, lead acceptance, conversion to opportunity. | Approval rate, edit distance, route accuracy, duplicate rate. | Missed SLA, wrong owner, low-quality response, orphan lead. |
| Revenue Generation | Approved prospect volume, reply rate, positive reply rate, qualified opportunities. | Suppression accuracy, source coverage, personalization quality, bounce rate. | Bad fit list, unsupported claim, sequence fatigue, CRM conflict. |
| Operational Throughput | Cycle time, records synchronized, reports delivered, onboarding completion. | Exception rate, write accuracy, rollback rate, missing-field age. | Conflicting source data, stale dashboard, unapproved write, unowned task. |
Outcome-based operations need business results and control health. If the result improves while control metrics degrade, the system is borrowing risk from the future.
23
Best-practice architecture patterns
The strongest systems make workflows explicit, tools narrow, side effects gated, and autonomy earned.
Start with workflows, not personalities
Define the job, trigger, state, allowed actions, and stop conditions before choosing model behavior.
Keep tools narrow and explicit
Give every tool one job, a fixed schema, boundaries, examples, and failure behavior.
Gate before side effects
Use policy checks and human approvals before writes, public communications, destructive actions, or sensitive decisions.
Use routing over one giant prompt
Specialized sub-agents and model routing reduce cost, latency, and mystery.
Make retries safe
Use idempotency keys and durable run state so failures do not create duplicate actions.
Evaluate trajectories
Judge final outcomes, intermediate tool choices, approval behavior, and full run transcripts.
Instrument everything
Logs, traces, metrics, source context, tool calls, exceptions, and approvals should be visible.
Promote autonomy gradually
Move from observe to draft to approved execution to limited autonomy only after the evidence earns it.
24
Conclusion: The SMB operating layer
The technical answer is not a chatbot, a no-code toy, or enterprise AI cosplay. It is a managed agentic operating layer.
Marshal Agents exist because founder-led companies can build revenue machines and then feed them manual coordination until everyone looks tired. The agent does the repeatable work, surfaces the ambiguous work, and makes the business more legible every time it runs.
- Ground in the stack: use the tools and records the business already runs on.
- Constrain the workflow: autonomy lives inside defined process boundaries.
- Preserve human authority: humans approve, override, and own judgment.
- Evaluate continuously: every run teaches the system, or it should not have run.
25
References and source notes
[1]
Marshal Agent Factory
runmarshal.com/agent-factory and linked agent pages. Defines three production lines, nine agent types, and the SMB Agent Factory framing.
[2]
Marshal platform pages
Marshal Agents, Agent Governance, Agent Orchestration, and AI Agent Development Services pages, accessed June 2026.
[3]
Anthropic agent engineering guidance
Building effective agents, December 2024.
[4]
Anthropic trustworthy agents guidance
Trustworthy agents in practice, April 2026.
[5]
Anthropic evals guidance
Demystifying evals for AI agents, January 2026.
[6]
NIST AI Risk Management Framework
Artificial Intelligence Risk Management Framework and Generative AI Profile, 2024.
[7]
OWASP LLM Top 10
Top 10 for Large Language Model Applications 2025.
[8]
Model Context Protocol
MCP specification, 2025-06-18.
[9]
Google Agent2Agent
Agent2Agent protocol announcement and documentation, April 2025.
[10]
Temporal durable execution
Idempotency and durable execution guidance.
[11]
OpenTelemetry
Documentation for traces, metrics, logs, and vendor-neutral observability.
[12]
OpenAI Agents SDK
Tracing and guardrails documentation for model calls, tools, handoffs, and runtime validation.
[13]
LangSmith
Agent evaluation documentation for final response, single-step, and trajectory evaluation.
[14]
Lewis et al.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, 2020.
