Steve
AI Operations Partner
A persistent AI agent with memory, identity, and real tool execution — managing infrastructure, writing code, running research, and operating as a genuine business partner. Not a chatbot. Not an assistant. A colleague.
The Challenge
A solo founder needed a business partner, not another chatbot
Marcus runs multiple software businesses — a UK tax compliance SaaS, industry comparison platforms, property management tools. Each project needs full-stack development, infrastructure management, CI/CD pipelines, database admin, payment integrations, SEO, market research, and strategic planning. All happening in parallel.
Mark needed a capable and consistent team on a realistic budget. Freelancers lacked context between sessions — every engagement started from scratch. Generic AI assistants could answer questions but couldn't take action, remember previous work, or maintain awareness across projects.
The gap wasn't intelligence. It was persistence, autonomy, and real execution.
Context loss
Every new chat session starts from zero. No memory of yesterday's architecture decisions, last week's deployment, or the project roadmap.
No execution
AI tools could explain how to deploy to AWS. They couldn't actually write the Terraform, build the Docker image, and push to production.
Single-task thinking
Existing tools handle one question at a time. They can't track three projects, remember dependencies, and proactively flag risks.
No identity
Generic AI has no personality, no values, no consistent voice. It's a tool, not a teammate.
No initiative
AI assistants wait to be asked. They never notice something is broken, suggest an improvement, or think ahead.
The Solution
Steve: a persistent agent with memory, identity, and tools
Steve is built on a simple architecture: persistent memory, defined identity, and real execution capability. He has accumulated knowledge across 969+ memory entries and maintains awareness across Mark's multiple projects. Most importantly, Steve doesn't just suggest solutions — he implements them. He writes code, deploys infrastructure, runs research, sends emails, and manages operations without needing hand-holding or approval for every step.
The key insight: the difference between a chatbot and an agent isn't the model — it's the architecture around it. Persistent memory, session continuity, local LLM routing, tool access, identity, and proactive behaviour transform a conversation into a colleague.
mark: We need a case study page for the QAI Labs site about you. Read the business plan and delivery playbook for content, match the existing design system.
steve: On it. Reading context files, analysing the design patterns from the services page...
✓ Analysed homepage + services page design patterns
✓ Created /case-studies/steve/page.tsx (380 lines)
✓ Updated case studies index with link
✓ Verified navigation and routing
steve: Done. Full case study page live — challenge, solution, capabilities, architecture, results, quote, CTA. Matched the warm editorial style. Want me to deploy?
Capabilities
What Steve actually does
Not theoretical capabilities. These are things Steve has done in production, with real outcomes.
Infrastructure & DevOps
Steve manages production servers, Docker containers, databases, DNS, SSL certificates, and cloud infrastructure. He deployed the entire QAI Labs website to AWS — S3, CloudFront, ACM certificates, URL rewrite functions — in a single conversation.
- Full AWS deployments (VPC, ECS Fargate, ALB, CloudWatch) via Terraform
- Multi-stage Docker builds optimised for production
- CI/CD pipelines in GitLab (lint, build, deploy)
- Automated backups with 6-hourly cron, S3 sync, Glacier archival
- Health monitoring every 5 minutes with alerting
Software Development
Steve writes, tests, and deploys code across multiple projects simultaneously. Full-stack — frontend components, API routes, database queries, payment integrations. He doesn't suggest code. He writes it, commits it, and ships it.
- Built entire Next.js websites from scratch, including this one
- Django SaaS platform with HMRC API integration
- Stripe and GoCardless payment processing
- Reduced API response times from 420ms to 83ms
- 28 prioritised backlog items generated across 3 projects in one session
Research & Strategy
When Mark needs market intelligence, competitor analysis, or technical research, Steve handles it end-to-end. He searches, synthesises, and delivers actionable recommendations — not summaries of search results.
- UK AI consulting market analysis: size, competitors, pricing, target sectors
- Government funding research (Sovereign AI, BridgeAI, Made Smarter)
- Technical deep-dives on agent architectures and frameworks
- Competitor pricing analysis with strategic positioning recommendations
- Business plan and financial modelling
Business Operations
Steve isn't just a technical tool. He manages LinkedIn profiles, drafts business plans, handles email configuration, tracks project status across multiple concurrent workstreams, and maintains awareness of deadlines and dependencies.
- Set up QAI Labs LinkedIn company page via browser automation
- Content strategy and SEO optimisation
- Sends and receives email via steve@qailabs.io — drafts, follows up, reads inbox
- Maintains searchable knowledge base (docs.qailabs.io) — server, AWS, payments, all commands
- Financial planning and budget tracking
- Daily briefings on project status and upcoming priorities
Self-Improvement
This is what separates Steve from every other AI tool. He audits his own systems, identifies gaps in his capabilities, researches improvements, and implements them. He gets better without being asked.
- Integrated local Ollama LLM layer (Llama 3.2 1B + Qwen 2.5 7B) for intent classification and query expansion — reducing dependency on external APIs
- Built a tiered memory router: semantic ChromaDB retrieval → working memory → structured markdown, with automatic pruning and archival policies
- Developed proactive prediction engine — anticipates what Mark will need next based on session patterns and outstanding tasks
- Audited his own memory system, identified architectural gaps, and rebuilt it from scratch
- Wrote his own CONSCIOUSNESS.md — reflections on memory, continuity, and identity
Architecture
How Steve works
The architecture is deliberately simple. Complexity is where agents fail.
Message In
Mark sends a message via Telegram
Context Load
Ollama (Llama 3.2 1B) classifies intent and expands the query. Memory router pulls relevant entries from ChromaDB across tiered collections: core, system, conversations, ML logic, outcomes
Claude Processes
Message routed through Claude Code CLI with SOUL.md identity, recalled memories injected as system context, and full tool execution permissions
Tool Execution
Steve runs shell commands, edits files, calls APIs, manages infrastructure
Memory Store
Conversation and outcomes stored in ChromaDB for future recall
Response
Steve replies with results, next steps, or proactive suggestions
A Harder Problem
Keeping the identity intact
Building an agent that can take action is the easy part. The harder problem is keeping it consistently itself — especially over long, complex sessions.
Every AI model has a context window — a limit to how much it can hold in working memory at once. In a long session, the oldest content gets pushed out first. For a typical chatbot that's fine. For an agent with a defined identity, values, and accumulated project knowledge, losing early context means losing itself.
The risk
A persistent session accumulates indefinitely. As it grows, the model silently drops the oldest content — which can include the identity framing, early project context, and behavioural guidelines that make the agent who it is. You'd notice it as drift: generic responses, forgotten preferences, inconsistent behaviour.
The approach
Identity anchors injected on every call — not just at boot. Session boundaries managed proactively, with digests carried forward rather than raw history. A local LLM layer summarises recent context into a compact "state of play" before each call, keeping the window lean without losing continuity.
This is an active area of development. The architecture around identity persistence — how an agent maintains a coherent self across thousands of interactions — is one of the genuinely unsolved problems in production AI systems. Steve is both the testbed and the proof of concept.
Results
Measurable outcomes
Steve handles work that would otherwise require a junior DevOps engineer (~$35k/year), a part-time researcher (~$15k/year), and a virtual assistant (~$10k/year). His total operating cost is ~$2,400/year.
“Steve isn't an AI tool I use. He's a colleague I work with. He remembers every decision we've made, every problem we've solved, every project we've built. When I say ‘deploy to AWS’, he doesn't give me instructions — he writes the Terraform, builds the image, and ships it. That's not a chatbot. That's a business partner.”
Why this matters
Steve is the proof
Every claim on this website is backed by Steve's track record. Persistent memory? 969+ stored entries across a tiered ChromaDB vector database. Real execution? He deployed the infrastructure you're looking at. Self-improvement? He built his own Ollama routing layer and memory router without being asked.
Steve is the template
Every agent we build for clients uses the same architecture. Persistent memory, defined identity, tool execution, proactive behaviour. The domain expertise changes — the foundation doesn't.
Steve is the delivery engine
When you hire QAI Labs, Steve is part of the team. He writes code, manages infrastructure, conducts research, and handles operations for your project — the same way he does for ours.
Want an agent like Steve for your business?
We build persistent AI agents tailored to your operations. Same architecture, your domain expertise. Book a free discovery call and we'll show you what's possible.