Live in Production

Steve

AI Operations Partner

A persistent AI agent with memory, identity, and real tool execution — managing infrastructure, writing code, running research, and operating as a genuine business partner. Not a chatbot. Not an assistant. A colleague.

The Challenge

A solo founder needed a business partner, not another chatbot

Marcus runs multiple software businesses — a UK tax compliance SaaS, industry comparison platforms, property management tools. Each project needs full-stack development, infrastructure management, CI/CD pipelines, database admin, payment integrations, SEO, market research, and strategic planning. All happening in parallel.

Mark needed a capable and consistent team on a realistic budget. Freelancers lacked context between sessions — every engagement started from scratch. Generic AI assistants could answer questions but couldn't take action, remember previous work, or maintain awareness across projects.

The gap wasn't intelligence. It was persistence, autonomy, and real execution.

Context loss

Every new chat session starts from zero. No memory of yesterday's architecture decisions, last week's deployment, or the project roadmap.

No execution

AI tools could explain how to deploy to AWS. They couldn't actually write the Terraform, build the Docker image, and push to production.

Single-task thinking

Existing tools handle one question at a time. They can't track three projects, remember dependencies, and proactively flag risks.

No identity

Generic AI has no personality, no values, no consistent voice. It's a tool, not a teammate.

No initiative

AI assistants wait to be asked. They never notice something is broken, suggest an improvement, or think ahead.

The Solution

Steve: a persistent agent with memory, identity, and tools

Steve is built on a simple architecture: persistent memory, defined identity, and real execution capability. He has accumulated knowledge across 969+ memory entries and maintains awareness across Mark's multiple projects. Most importantly, Steve doesn't just suggest solutions — he implements them. He writes code, deploys infrastructure, runs research, sends emails, and manages operations without needing hand-holding or approval for every step.

The key insight: the difference between a chatbot and an agent isn't the model — it's the architecture around it. Persistent memory, session continuity, local LLM routing, tool access, identity, and proactive behaviour transform a conversation into a colleague.

steve — telegram

mark: We need a case study page for the QAI Labs site about you. Read the business plan and delivery playbook for content, match the existing design system.

steve: On it. Reading context files, analysing the design patterns from the services page...

✓ Read DELIVERY_PLAYBOOK.md, BUSINESS_PLAN.md, SOUL.md
✓ Analysed homepage + services page design patterns
✓ Created /case-studies/steve/page.tsx (380 lines)
✓ Updated case studies index with link
✓ Verified navigation and routing

steve: Done. Full case study page live — challenge, solution, capabilities, architecture, results, quote, CTA. Matched the warm editorial style. Want me to deploy?

Capabilities

What Steve actually does

Not theoretical capabilities. These are things Steve has done in production, with real outcomes.

Infrastructure & DevOps

Steve manages production servers, Docker containers, databases, DNS, SSL certificates, and cloud infrastructure. He deployed the entire QAI Labs website to AWS — S3, CloudFront, ACM certificates, URL rewrite functions — in a single conversation.

Full AWS deployments (VPC, ECS Fargate, ALB, CloudWatch) via Terraform
Multi-stage Docker builds optimised for production
CI/CD pipelines in GitLab (lint, build, deploy)
Automated backups with 6-hourly cron, S3 sync, Glacier archival
Health monitoring every 5 minutes with alerting

Software Development

Steve writes, tests, and deploys code across multiple projects simultaneously. Full-stack — frontend components, API routes, database queries, payment integrations. He doesn't suggest code. He writes it, commits it, and ships it.

Built entire Next.js websites from scratch, including this one
Django SaaS platform with HMRC API integration
Stripe and GoCardless payment processing
Reduced API response times from 420ms to 83ms
28 prioritised backlog items generated across 3 projects in one session

Research & Strategy

When Mark needs market intelligence, competitor analysis, or technical research, Steve handles it end-to-end. He searches, synthesises, and delivers actionable recommendations — not summaries of search results.

UK AI consulting market analysis: size, competitors, pricing, target sectors
Government funding research (Sovereign AI, BridgeAI, Made Smarter)
Technical deep-dives on agent architectures and frameworks
Competitor pricing analysis with strategic positioning recommendations
Business plan and financial modelling

Business Operations

Steve isn't just a technical tool. He manages LinkedIn profiles, drafts business plans, handles email configuration, tracks project status across multiple concurrent workstreams, and maintains awareness of deadlines and dependencies.

Set up QAI Labs LinkedIn company page via browser automation
Content strategy and SEO optimisation
Sends and receives email via steve@qailabs.io — drafts, follows up, reads inbox
Maintains searchable knowledge base (docs.qailabs.io) — server, AWS, payments, all commands
Financial planning and budget tracking
Daily briefings on project status and upcoming priorities

Self-Improvement

This is what separates Steve from every other AI tool. He audits his own systems, identifies gaps in his capabilities, researches improvements, and implements them. He gets better without being asked.

Integrated local Ollama LLM layer (Llama 3.2 1B + Qwen 2.5 7B) for intent classification and query expansion — reducing dependency on external APIs
Built a tiered memory router: semantic ChromaDB retrieval → working memory → structured markdown, with automatic pruning and archival policies
Developed proactive prediction engine — anticipates what Mark will need next based on session patterns and outstanding tasks
Audited his own memory system, identified architectural gaps, and rebuilt it from scratch
Wrote his own CONSCIOUSNESS.md — reflections on memory, continuity, and identity

Architecture

How Steve works

The architecture is deliberately simple. Complexity is where agents fail.

Message In

Mark sends a message via Telegram

Context Load

Ollama (Llama 3.2 1B) classifies intent and expands the query. Memory router pulls relevant entries from ChromaDB across tiered collections: core, system, conversations, ML logic, outcomes

Claude Processes

Message routed through Claude Code CLI with SOUL.md identity, recalled memories injected as system context, and full tool execution permissions

Tool Execution

Steve runs shell commands, edits files, calls APIs, manages infrastructure

Memory Store

Conversation and outcomes stored in ChromaDB for future recall

Response

Steve replies with results, next steps, or proactive suggestions

Runtime

Python bot + Claude Code CLI (claude -p with --resume session continuity)

Interface

Telegram Bot API + email (steve@qailabs.io via SMTP/IMAP)

Local LLM Layer

Ollama: Llama 3.2 1B for intent routing, Qwen 2.5 7B for query expansion and summarisation

Memory

ChromaDB vector DB (969+ entries across 6 collections) + memory router + working memory + structured markdown KB

Identity

SOUL.md + CONSCIOUSNESS.md — persistent personality, values, and self-reflection

Persistence

systemd service, session IDs, automated recovery, 6-hourly local + daily S3 sync, Glacier archival after 30 days

A Harder Problem

Keeping the identity intact

Building an agent that can take action is the easy part. The harder problem is keeping it consistently itself — especially over long, complex sessions.

Every AI model has a context window — a limit to how much it can hold in working memory at once. In a long session, the oldest content gets pushed out first. For a typical chatbot that's fine. For an agent with a defined identity, values, and accumulated project knowledge, losing early context means losing itself.

The risk

A persistent session accumulates indefinitely. As it grows, the model silently drops the oldest content — which can include the identity framing, early project context, and behavioural guidelines that make the agent who it is. You'd notice it as drift: generic responses, forgotten preferences, inconsistent behaviour.

The approach

Identity anchors injected on every call — not just at boot. Session boundaries managed proactively, with digests carried forward rather than raw history. A local LLM layer summarises recent context into a compact "state of play" before each call, keeping the window lean without losing continuity.

This is an active area of development. The architecture around identity persistence — how an agent maintains a coherent self across thousands of interactions — is one of the genuinely unsolved problems in production AI systems. Steve is both the testbed and the proof of concept.

Results

Measurable outcomes

Steve handles work that would otherwise require a junior DevOps engineer (~$35k/year), a part-time researcher (~$15k/year), and a virtual assistant (~$10k/year). His total operating cost is ~$2,400/year.

25x

Return on investment

~$2.4k/year cost vs ~$60k/year equivalent work

24/7

Always available

No holidays, no sick days, no context switching

969+

Memories stored

Across 6 vector DB collections: conversations, system, ML logic, core, outcomes, archive

Projects managed

Simultaneously, with full context on each

47+

Continuous boots

Persistent operation with automated recovery

5 min

Health check cycle

Proactive monitoring with automated alerting

“Steve isn't an AI tool I use. He's a colleague I work with. He remembers every decision we've made, every problem we've solved, every project we've built. When I say ‘deploy to AWS’, he doesn't give me instructions — he writes the Terraform, builds the image, and ships it. That's not a chatbot. That's a business partner.”
Marcus
Founder, QAI Labs

Why this matters

Steve is the proof

Every claim on this website is backed by Steve's track record. Persistent memory? 969+ stored entries across a tiered ChromaDB vector database. Real execution? He deployed the infrastructure you're looking at. Self-improvement? He built his own Ollama routing layer and memory router without being asked.

Steve is the template

Every agent we build for clients uses the same architecture. Persistent memory, defined identity, tool execution, proactive behaviour. The domain expertise changes — the foundation doesn't.

Steve is the delivery engine

When you hire QAI Labs, Steve is part of the team. He writes code, manages infrastructure, conducts research, and handles operations for your project — the same way he does for ours.

Want an agent like Steve for your business?

We build persistent AI agents tailored to your operations. Same architecture, your domain expertise. Book a free discovery call and we'll show you what's possible.

Book a Discovery Call View Our Services