Agent Design Is a Systems Problem
The prevailing assumption in early GenAI — that better prompts yield better outputs — breaks down entirely when you move to agent-based systems. The real lever is architecture. This is a paradigm shift that changes how you design, build, and govern AI at scale.
Strategic Briefing
The Shift You're Noticing
Early GenAI Thinking
The dominant mental model was linear and prompt-centric. Engineers focused on crafting better instructions, refining wording, and tuning inputs — treating the model as a black box that responds to cleverness.
Better prompts → better outputs
Agent Reality
When agents enter the picture, the relationship inverts. Individual prompt quality becomes a local optimisation concern. What determines real-world outcomes is the architecture surrounding the model — how components interconnect, how state is managed, and how failures are handled.
Better system → better outcomes
This is not a refinement of the old model. It is a fundamentally different paradigm: systems thinking applied to probabilistic components.
Why Agents Are Systems Problems
An agent setup is not a single function call — it is a dynamic system with all the complexity that entails. Understanding this distinction is the first step to designing agents that actually work in production.
Multiple Components
LLMs, tools, memory stores, triggers, and external APIs operating in concert — each with its own behaviour and failure profile.
State Over Time
Unlike stateless calls, agents accumulate context across steps. Where state lives — and how it is managed — determines system reliability.
Feedback Loops
Outputs feed back into subsequent inputs. Without deliberate loop design, agents drift, compound errors, or enter cycles.
Non-Deterministic Behaviour
Probabilistic components mean identical inputs can yield different outputs. The system must be resilient to this variance by design.
The Real Questions in Agent Design
Once you accept that agents are systems, the diagnostic questions change entirely. These are the questions a systems architect asks — not a prompt engineer.
Where does state live?
Identify which component owns state at each point in the workflow and how it is persisted, updated, or discarded.
How does information flow?
Map the data contracts between nodes. What each agent receives — and what it can act on — determines output quality more than any prompt.
What triggers actions?
Define execution boundaries clearly. Ambiguous triggers introduce race conditions, redundant calls, and unpredictable side effects.
What are the failure modes?
Every component fails eventually. Design recovery and correction pathways before you encounter them in production.
Systems Thinking Applied
These aren't novel concepts — they are foundational principles from systems theory, now directly applicable to agent architecture. Recognising them by name makes them tractable.
Interconnections
Agents don't operate in isolation. Every dependency between components is a potential failure surface — and a potential design lever.
Feedback Loops
Reinforcing loops amplify — useful for self-correction, dangerous when compounding errors. Balancing loops provide stability. Design both deliberately.
Delays
Latency between action and effect is a source of instability. Acknowledge delays in your orchestration logic to prevent overcorrection.
Emergent Behaviour
Systems exhibit behaviours that no individual component does alone. Expect the unexpected — and instrument your system to observe it.
A Clean Four-Layer Mental Model
Context and harness engineering remain critical — but they operate at lower layers of the stack. Conflating them with top-level design is where most agent projects go wrong.
Layer 4 — Prompting
Local optimisation only. Effective within a well-designed system; irrelevant as a substitute for one.
Layer 3 — Context Engineering
What each agent sees. How memory is structured. How inputs are shaped. Designing information availability at each node.
Layer 2 — Harness / Orchestration
How agents are invoked, how tools are called, retry mechanisms, guardrails, and observability hooks.
Layer 1 — Systems Thinking
Define agents, roles, flows, feedback loops, state transitions, and failure handling. This is the foundation everything else rests on.
Context and Harness Engineering: Redefined
These disciplines don't disappear — they mature. Their scope narrows and their precision increases when placed correctly within the system architecture.
Context Engineering
Moves from "how do I write the perfect prompt?" to "what information is available at each node in the system?" You are now designing:
  • Context boundaries — what each agent is permitted to see
  • Data contracts — the shape and schema of information passed between components
  • State exposure — which elements of system state are surfaced, and when
Harness Engineering
Moves from "how do I make the LLM behave?" to "how do I orchestrate execution and control flow?" You are now designing:
  • Tool calling logic — when and how external capabilities are invoked
  • Retry mechanisms — graceful handling of transient failures
  • Guardrails — enforcing behavioural constraints at the system boundary
  • Observability hooks — instrumentation for debugging and audit
What Happens Without Systems Thinking
The consequences of treating agent design as a prompting problem are predictable — and compounding. These failure modes are not edge cases; they are the default outcome of under-architected systems.
Agents Loop
Without well-defined termination conditions and state boundaries, agents re-enter flows they have already completed — consuming tokens, time, and budget.
Context Bloats
Without deliberate context management, conversation history and retrieved data accumulate unchecked, degrading performance and increasing latency as context windows fill.
Outputs Drift
Without feedback loop controls, small deviations in early steps compound into large deviations in outputs — producing results that are subtly or significantly wrong.
Failures Compound Silently
Without observability, errors propagate undetected through the system. By the time a failure surfaces, its root cause is several steps removed and difficult to isolate.
Applied to QX / INQ / AI Ops
This isn't abstract theory. The architecture patterns already in use across QX and related platforms are a direct expression of systems thinking — whether or not they have been labelled as such.
Azure Functions
Define clean execution boundaries. Each function is a contained unit of behaviour — predictable inputs, predictable outputs, explicit failure handling.
Queues
Provide decoupling and flow control. Asynchronous messaging absorbs variance between components and prevents cascading failures under load.
Agent Framework
Enforces role separation. Distinct agents with defined responsibilities prevent scope creep and make the system's behaviour auditable at each step.
CLCS
Introduces declarative system behaviour. Defining what the system should do — rather than scripting how — brings governance and adaptability into the architecture.

This is not prompt engineering. This is system architecture with probabilistic components — a meaningful and important distinction for how teams are structured, skills are developed, and investments are prioritised.
The Bottom Line
Agent design is systems thinking first, with context and harness engineering acting as implementation layers within that system. This hierarchy is not semantic — it has direct practical consequences.
If you lean on prompts
You will see early gains, then plateau. Prompts cannot compensate for architectural weaknesses — they can only mask them temporarily. Technical debt accumulates invisibly until it fails visibly.
If you get the system right
You achieve predictable behaviour from unpredictable components. The system becomes resilient, observable, and extensible. Prompts become almost replaceable — a local detail within a robust architecture.
The next concrete step: map this into an agent architecture pattern for the Review → Verify → Insight workflow, where systems thinking becomes directly operational.

Get the system right, and the prompts become almost… replaceable.