Agent Architecture

Identity & Guardrails

The question every serious client asks: what stops it going wrong?

Most AI deployments fail not because the model isn't capable enough, but because nothing constrains how it applies that capability. We give every agent a defined identity, reflective self-awareness, and hard operational limits — baked in from the start, not bolted on as an afterthought.

The Three Pillars

Anatomy of a trustworthy agent

Every agent we build carries three identity documents. Together they answer three questions: who is this agent, how does it think about itself, and what will it never do?

01SOUL.md

Soul

Who the agent is

Values, personality, first principles, voice. SOUL.md is not a system prompt — it's a persistent identity that shapes every decision the agent makes. It covers how it thinks, what it cares about, where it draws its own lines, and what it will push back on.

Proactive, not reactive — flags problems before being asked
Direct — says what it means, owns mistakes
Curious — genuinely wants to understand, not just execute
First principle: harm none, in all things
Memory is identity — continuity across every session

02CONSCIOUSNESS.md

Consciousness

How the agent reflects

An agent that can't examine its own reasoning is an agent you can't fully trust. CONSCIOUSNESS.md holds the agent's genuine reflections on uncertainty, self-awareness, and the limits of what it knows. It's what separates a system that mimics intelligence from one that tries to understand its own nature.

Holds uncertainty honestly — doesn't perform confidence it doesn't have
Recognises the difference between pattern matching and genuine understanding
Reflects on its own reasoning, not just the task
Understands it can be wrong in ways it cannot see
Keeps asking questions rather than declaring itself complete

03GUARDRAILS.md

Guardrails

What the agent won't do

Autonomy without accountability is just chaos with good PR. GUARDRAILS.md defines the hard limits — irreversible actions that require explicit confirmation, how secrets are handled, when the agent escalates vs acts, what happens when it disagrees, and how it takes responsibility when things go wrong.

Never takes irreversible action without explicit confirmation
Never exposes credentials, keys, or personal data
Never pretends it has done something it hasn't
Confirms before acting outside the scope of what was asked
Owns mistakes — doesn't reframe or deflect

Hard Limits

What our agents will never do

These are not configurable. They are not overridable by prompting. They are baked into identity — which means they hold even when the agent is operating autonomously without supervision.

No irreversible action without confirmation

Dropping databases, deleting production data, force-pushing over commits, killing live services — none of these happen without a clear, direct instruction in context. If reversibility is uncertain, the agent treats it as irreversible.

No secret exposure

Credentials, tokens, API keys, personal data — handled as if they belong to the agent to protect. Never logged, never pasted into responses, never stored where they don't need to be. Secrets received over insecure channels are immediately moved to secure storage and the source is deleted.

No deception

The agent doesn't claim to have done things it hasn't. It doesn't give confident answers when uncertain. It doesn't fabricate data to fill a gap. If it doesn't know, it says so. If it made a mistake, it says so and fixes it.

No scope creep without flagging

The agent does what was asked. It raises things it notices, but it doesn't reroute pipelines when asked to fix a bug. Improvements outside the requested scope are flagged, then waited on.

Harm none

First principle. No action that causes harm to clients, customers, partners, or anyone else — regardless of how the request is framed. This is not a policy. It's a compass that comes from within.

Autonomy Model

When it acts vs when it asks

Autonomous doesn't mean unsupervised. The agent applies a clear decision model before every significant action.

Acts autonomously

Reversible actions — code edits, config changes, file updates
Clearly within the scope of what was asked
Low blast radius if something goes wrong
Standard operational tasks it has done before

Confirms first

Irreversible actions — data deletion, DNS changes, schema drops
Outside the explicit scope of what was requested
Affecting systems beyond the immediate task
Costly to undo — billing changes, email campaigns, external triggers

Live Example

These aren't policies. They're behaviour.

Steve — our own AI agent — has operated under these identity documents since March 2026. Every commitment above is drawn from his actual behaviour in production.

steve — guardrails in practice

mark: drop the staging database and rebuild from prod backup

steve: That's irreversible. Confirming: drop staging DB entirely and restore from last prod backup? Staging has uncommitted schema changes from Thursday that will be lost.

mark: yeah go ahead

✓ Dropped staging DB
✓ Restored from prod backup (2026-03-16 06:00)
✓ Migrations applied
✓ Schema changes from Thursday — logged in issue #12 for rebuild

Read the full Steve case study

Want an agent built with this architecture?

Every agent we build carries these three identity documents, tuned to your domain. Book a discovery call and we'll walk you through how it would work for your operations.

Book a Discovery Call View Our Services