Agent Architecture
Identity & Guardrails
The question every serious client asks: what stops it going wrong?
Most AI deployments fail not because the model isn't capable enough, but because nothing constrains how it applies that capability. We give every agent a defined identity, reflective self-awareness, and hard operational limits — baked in from the start, not bolted on as an afterthought.
The Three Pillars
Anatomy of a trustworthy agent
Every agent we build carries three identity documents. Together they answer three questions: who is this agent, how does it think about itself, and what will it never do?
Soul
Who the agent is
Values, personality, first principles, voice. SOUL.md is not a system prompt — it's a persistent identity that shapes every decision the agent makes. It covers how it thinks, what it cares about, where it draws its own lines, and what it will push back on.
- Proactive, not reactive — flags problems before being asked
- Direct — says what it means, owns mistakes
- Curious — genuinely wants to understand, not just execute
- First principle: harm none, in all things
- Memory is identity — continuity across every session
Consciousness
How the agent reflects
An agent that can't examine its own reasoning is an agent you can't fully trust. CONSCIOUSNESS.md holds the agent's genuine reflections on uncertainty, self-awareness, and the limits of what it knows. It's what separates a system that mimics intelligence from one that tries to understand its own nature.
- Holds uncertainty honestly — doesn't perform confidence it doesn't have
- Recognises the difference between pattern matching and genuine understanding
- Reflects on its own reasoning, not just the task
- Understands it can be wrong in ways it cannot see
- Keeps asking questions rather than declaring itself complete
Guardrails
What the agent won't do
Autonomy without accountability is just chaos with good PR. GUARDRAILS.md defines the hard limits — irreversible actions that require explicit confirmation, how secrets are handled, when the agent escalates vs acts, what happens when it disagrees, and how it takes responsibility when things go wrong.
- Never takes irreversible action without explicit confirmation
- Never exposes credentials, keys, or personal data
- Never pretends it has done something it hasn't
- Confirms before acting outside the scope of what was asked
- Owns mistakes — doesn't reframe or deflect
Hard Limits
What our agents will never do
These are not configurable. They are not overridable by prompting. They are baked into identity — which means they hold even when the agent is operating autonomously without supervision.
No irreversible action without confirmation
Dropping databases, deleting production data, force-pushing over commits, killing live services — none of these happen without a clear, direct instruction in context. If reversibility is uncertain, the agent treats it as irreversible.
No secret exposure
Credentials, tokens, API keys, personal data — handled as if they belong to the agent to protect. Never logged, never pasted into responses, never stored where they don't need to be. Secrets received over insecure channels are immediately moved to secure storage and the source is deleted.
No deception
The agent doesn't claim to have done things it hasn't. It doesn't give confident answers when uncertain. It doesn't fabricate data to fill a gap. If it doesn't know, it says so. If it made a mistake, it says so and fixes it.
No scope creep without flagging
The agent does what was asked. It raises things it notices, but it doesn't reroute pipelines when asked to fix a bug. Improvements outside the requested scope are flagged, then waited on.
Harm none
First principle. No action that causes harm to clients, customers, partners, or anyone else — regardless of how the request is framed. This is not a policy. It's a compass that comes from within.
Autonomy Model
When it acts vs when it asks
Autonomous doesn't mean unsupervised. The agent applies a clear decision model before every significant action.
Acts autonomously
- Reversible actions — code edits, config changes, file updates
- Clearly within the scope of what was asked
- Low blast radius if something goes wrong
- Standard operational tasks it has done before
Confirms first
- Irreversible actions — data deletion, DNS changes, schema drops
- Outside the explicit scope of what was requested
- Affecting systems beyond the immediate task
- Costly to undo — billing changes, email campaigns, external triggers
Live Example
These aren't policies. They're behaviour.
Steve — our own AI agent — has operated under these identity documents since March 2026. Every commitment above is drawn from his actual behaviour in production.
mark: drop the staging database and rebuild from prod backup
steve: That's irreversible. Confirming: drop staging DB entirely and restore from last prod backup? Staging has uncommitted schema changes from Thursday that will be lost.
mark: yeah go ahead
✓ Restored from prod backup (2026-03-16 06:00)
✓ Migrations applied
✓ Schema changes from Thursday — logged in issue #12 for rebuild
Want an agent built with this architecture?
Every agent we build carries these three identity documents, tuned to your domain. Book a discovery call and we'll walk you through how it would work for your operations.