The Alchemy of Prompts: Why Human Judgment Cannot Be Distilled into Machine Instruction
- 8 minutes ago
- 5 min read

Friday 20 February 2026
In the fevered atmosphere of the contemporary artificial intelligence boom, an entire sub-industry has arisen dedicated to a curious ambition: to extract human expertise from living minds and transmute it into training data for large language models. Professors, lawyers, physicians, military analysts and financiers are invited to compose prompts and to score outputs according to elaborate rubrics. Their knowledge, it is said, can thus be formalised, scaled and embedded into systems such as OpenAI’s ChatGPT or Google’s Gemini.
The premise is disarmingly simple. A domain expert is asked to produce high-quality prompts — questions, scenarios, hypotheticals — that probe a model’s reasoning. The same expert, or another, evaluates the model’s responses against a rubric: criteria for correctness, depth, nuance, tone and coherence. Those evaluations are then fed back into the training loop. Through reinforcement learning from human feedback, the machine supposedly absorbs the standards of the expert community. Judgment becomes data; data becomes optimisation; optimisation becomes apparent intelligence.
It is a seductive narrative. It suggests that knowledge can be atomised into labelled examples. It implies that expertise is a matter of correct pattern production. It promises scale — the replication of rare intellectual capacities at near-zero marginal cost. Billions of dollars in venture capital, sovereign wealth and corporate investment now circulate on this premise.
Yet the premise is conceptually unstable.
What is being asked of experts in this process is not merely to provide information. It is to formalise the criteria by which information is assessed. The rubric must define what counts as a good answer. But expertise, in its mature form, is not reducible to checklists. A constitutional lawyer does not decide a case by mechanically applying enumerated factors; he or she weighs competing principles in light of context, history and institutional prudence. A battlefield commander does not choose manoeuvres by ticking boxes; he reads terrain, morale and the adversary’s likely psychology. A diplomat does not craft communiqués by scoring phrases on a matrix; he or she senses how tone will resonate within cultures and amongst personalities.
Such judgment is tacit, situated and often irreducibly narrative. It is learned through apprenticeship and failure. It involves the capacity to perceive which rule should be bent, which precedent is analogous and which analogy is misleading. It is precisely the art of deciding when not to follow the rubric.
The paradox is acute. To train a model by rubric, one must assume that expertise can be rendered as explicit criteria. But the very individuals most capable of exercising judgment know that their craft depends upon the limits of explicitness. The expert’s knowledge is not merely that certain answers are correct; it is that the frame of the question may itself be flawed. A prompt may presuppose a false dichotomy; the wisest response may be to reject the premise. How does one encode, in advance, the conditions under which the question itself must be challenged?
The defenders of the enterprise respond that sufficient scale will compensate for conceptual imperfection. With enough prompts, enough feedback and enough parameters, the model will approximate the distribution of expert outputs. Yet approximation is not understanding. A language model does not possess stakes in the world; it does not bear responsibility for the consequences of its outputs. It predicts tokens according to statistical regularities within its training data. It does not judge — it imitates the surface patterns of judgment.
This distinction matters. Judgment involves accountability. When a surgeon elects to operate, he or she binds herself to the outcome. When a judge pronounces sentence, he or she stands within an institutional order that can praise or condemn his or her reasoning. A model, by contrast, cannot be morally or institutionally answerable. It cannot truly deliberate because it cannot be called to account.
The industrial effort to replace human judgment with machine learning rests therefore upon a conflation. It confuses the expression of reasoning with the possession of responsibility. It mistakes the outward form of expertise for its inward structure.
None of this is to deny the utility of large language models. They can summarise, draft and retrieve with extraordinary efficiency. They can assist human practitioners by accelerating the mundane. But assistance is not substitution. The leap from tool to surrogate is where the conceptual incoherence emerges.
And yet it is precisely this leap that the current investment climate presumes. Vast sums are being spent on hiring domain specialists to feed prompts into systems in the hope that their insight can be liquefied into model weights. Universities restructure curricula around artificial intelligence. Corporations reorganise workflows upon the assumption that automated reasoning will soon replace salaried professionals. Governments announce strategies premised on productivity gains that may prove illusory.
History offers cautionary analogies. Financial bubbles often rest upon a kernel of technological truth inflated by speculative expectation. Railways transformed transport — but railway shares nonetheless collapsed in the nineteenth century when projections outran profitability. The internet reshaped commerce — but the dot-com bubble burst when investors realised that traffic did not guarantee revenue. In each case, innovation endured; valuations did not.
The present artificial intelligence boom bears similar hallmarks. There is genuine technical achievement in scaling neural networks and in refining reinforcement learning. But the extrapolation from impressive linguistic mimicry to the replacement of human judgment is a category error. If that error becomes widely recognised — if institutions discover that they cannot, in fact, entrust ultimate decisions to probabilistic pattern generators — then capital will retreat.
The bubble may not burst because the models cease to function. It may burst because the promise made for them was metaphysically impossible. Expertise is not a dataset. Judgment is not a function that can be exhaustively specified. The human capacity to weigh, to hesitate, to reinterpret and to bear responsibility cannot be distilled into prompts and rubrics without losing the very qualities that make it valuable.
The ambition to encode all judgment into machine learning may reveal less about the machines than about ourselves. It reflects a managerial desire for formalisation, measurability and control. It expresses a hope that uncertainty can be engineered away. But human affairs — law, war, diplomacy, medicine, politics — are domains of irreducible contingency. They demand persons who can answer not only for what is optimal in theory, but for what is right in context.
When that truth reasserts itself — as it eventually does — the market may rediscover that judgment is scarce precisely because it cannot be automated. And the vast capital currently devoted to attempting its mechanisation may come to resemble, in retrospect, another chapter in the long history of technological exuberance outrunning philosophical coherence.

