Last August, Cognizant announced it was hiring a 1000 "context engineers" to industrialise agentic AI (Cognizant, Aug 2025). Meanwhile Gartner expects a third of enterprise software to include agentic AI by 2028, up from less than 1% in 2024. But Gartner also warns that more than 40% of agentic projects will be cancelled by the end of 2027, undone by cost, unclear value and inadequate controls (Gartner, 2025).
So what separates the organisations that make agentic AI work from the ones cancelling projects in 2027? Increasingly it is not the technology. It is whether the operating model around it – the skills, processes and knowledge management – has been reshaped to capture the benefits.
So many leaders will have experienced projects where the system was ready, but the team using it and the knowledge to tailor the outcome to their services wasn’t. Each new wave of technical innovation, such as the cloud transformation, changes the job description of teams. Yet many organisations deploy it without changing who does what. AI is no longer something just for technology teams – AI is now central to how every knowledge worker interacts with technology.
I’ve seen three themes emerge in the skills every team now needs to harness the benefits from agentic AI.
The first is framing the right problem. When you can build at machine speed, the expensive mistake is now building the wrong thing quickly. So the skill we all need to deepen is user-centred design: the empathy and judgement to look past what people ask for to what they actually need, how they behave in real conditions, and the root cause underneath a surface-level pain point. Only then can you judge how to really solve a problem, and what a good outcome looks like. This matters more when building is cheap, because nothing exposes a badly understood problem faster than automating it at scale. AI is also pivotal in this process by supporting early prototyping and user-journey simulations.
The hidden bottleneck: evaluation.
Framing the right problem also means deciding how you will know the agent got it right, and that is where most teams come unstuck. A senior data engineer described it perfectly to me recently. With AI, he said, it is like having seven experienced engineers working for him. The pace and quality of code coming out is extraordinary. And yet the team is not yet shipping seven times faster. All that brilliant output now piles up at the door marked to review – how will it work in practice with the firewall policies we have? How do we know it won’t bring down another system? A study of more than 10,000 developers across 1,255 teams found that AI-assisted engineers complete 21% more tasks and merge 98% more pull requests, but review time rises 91% and organisational throughput barely moves at all (Faros AI, 2025). The bottleneck moved downstream, from writing to checking, and got worse.
Moving beyond human-in-the-loop.
This is a critical lesson of agentic AI, and reaches well beyond the code. The default is to keep a human in the loop, checking the agent's work. Yet that doesn’t scale. An agent works at machine speed, so a person signing off every action just rebuilds the bottleneck and the better the model gets, the harder its mistakes are to catch: a weak model makes errors anyone spots in seconds, while a strong one produces work that is convincing, yet wrong, and hands it over with confidence. A tired reviewer approving that is not providing oversight.
What replaces blanket human review is evaluation, and the old disciplines still apply, just at machine-scale. Good QA was always about checking the work against a known standard: start with the cheap, deterministic checks a machine can do well: does the output fit the schema, follow the current rules, stay within policy, cite a source that actually exists. Then, for the things rules cannot capture, use a model to judge another model's work, an "LLM as judge" scoring reasoning and task completion. And because an agent can be right about the answer yet wrong about how it got there, grade the whole trajectory, not just the final output: which tools it called, whether the arguments were sound, whether it recovered from a failed step. A query can run without error and still pull from the wrong column; only trajectory-level checking catches that.
So what is left for people? More than ever, but different work. The machine cannot tell you whether your definition of "good" is the right one. Humans write the principles that judges score against, and calibrate those judges against a human-marked sample until the two agree. An uncalibrated judge simply automates its own bias at speed instead of the ones your organisation cares about. Humans deeply review the genuinely hard cases, the unusual one that does not fit the pattern and the decision someone must defend to a citizen or a committee. And humans turn every failure into a new test, so the evaluation set grows from real mistakes and "good" stays in step with a changing world. The skill is designing the evaluation principles and system, and knowing which answers must still reach a person.
This is also being recognised in the law. Under the reformed Article 22 of the UK GDPR (Data (Use and Access) Act 2025), and the ICO's 2026 draft guidance, a human rubber-stamp does not count as meaningful review: to take a decision out of "solely automated" territory, the human involvement must come after the output, bear on the actual outcome, and carry the authority to change it (ICO, 2026). The ICO's own Recruitment Rewired study found most employers believed they had a human in the loop when, in practice, the tool was deciding (ICO, 2026). The person who designs real assurance looks more like a test engineer than a caseworker, and yet many teams have limited capability in this space.
The second is each team must actively manage knowledge. An agent can read existing documents, but it cannot tell which parts are current, authoritative or what the gaps are that only live in the heads of experienced staff. This results in producing confident, but unreliable answers. “Context engineers”, the hot title of the moment, seem to be following the footsteps of Prompt Engineers. But framing that as a new role misses the point. Prompting was never really a separate profession: it became part of everyone's job, and most of the work related to building context will go the same way.
Cognizant can hire a thousand context engineers (Cognizant to Deploy 1,000 Context Engineers) to build the plumbing but the knowledge that matters, how a benefit is assessed, why an exception is allowed, what "good" actually looks like and maintaining that must remain a part of knowledge workers role. It lives with the domain experts who do the work. Based on a search of Context Engineer titles in Cognizant on LinkedIn, and Glassdoor searches, it remains unclear if a large volume of context engineer roles have been created nearly one year on from the announcement. For many organisations, it still needs the experts to identify or even create the knowledge before it can be worked with. I have seen this with directly in AIOps: an agent resolving incidents is only as good as the knowledge articles behind it, and where those are stale or missing, it recommends the wrong fix with total confidence. Organisations that treat knowledge as an asset will see compounding returns from AI whilst organisations that treat it as an afterthought may see compounding errors.
Knowledge is becoming a governance requirement.
In government, this is no longer a nice-to-have. The new GDS data asset management policy now requires every department and arm's-length body to identify their critical data assets, record metadata describing them, assign a named data owner, and maintain a Data Quality Action Plan reviewed at least annually (Data asset management policy in government - GOV.UK). Strip away the compliance language and it is a mandate to treat your organisational context as the value-driver it is: to know what you hold, who owns it, and whether it is good enough to trust. That is a key part of what agents will reason over. The policy is ahead of most delivery teams in practice.
The third is managing a suite of agents, which becomes part of everyone’s job to differing degrees. We take on a fleet of agents to support our work, deciding what to delegate, spotting when one is drifting, and knowing which decisions to pull back into human hands. Managing them well starts with understanding their limits. Context windows are finite, so you cannot build one all-knowing agent to do everything: without splitting the work across agents each scoped to the context its task needs, machine learning becomes machine forgetting, the agent crammed with so much information that it summarises to a point that critical details are lost. Deciding how to carve up the work, work with agents day-to-day and shape which agent gets which slice of context, is a real skill in its own right and deserves focus in all development plans.
Build your own context layer.
So if you take one thing from this and act on it today, make it this. Build three assets for yourself, and ask an AI to interview you to draw them out – you can realistically get a good start at this in less than an hour. First, your style: how you communicate and what your output should sound like, so AI can produce work that reads as yours. Second, your goals: your role, what you are accountable for, what you are aiming to achieve, and the values behind your decisions, so an agent knows what you are trying to achieve and when to defer to you. Third, your skills: ask your AI what the top skills or reusable workflows would best help you achieve your goals, then ask it to build them for you. These three are the beginning of your own context layer. They tell you what knowledge is worth capturing and keeping current, what "good" looks like when you evaluate an agent's work, and which agents or reusable workflows are worth building in the first place. And the same three questions scale all the way up, from an individual to a team to a department.
The organisations that lead on agentic AI will be the ones whose people learned to frame the right problem, manage their knowledge in a way AI and humans can understand, and manage their agents.