Machines, Minds, and Judgment: Can Artificial Systems Outperform Human Reason?
- Matthew Parish
- Aug 30
- 6 min read

By Matthew Parish
The astonishing progress of artificial intelligence over the past decade has prompted many to ask whether computational systems might one day rival or even surpass human beings in all aspects of decision-making. In narrow domains—chess, Go, protein folding, or the recognition of tumorous lesions in radiology—artificial intelligence has already demonstrated superiority. Yet there remains a persistent suspicion that something qualitatively different is involved in the uniquely human act of forming judgments in environments not reducible to explicit rules. Judgment implies not merely calculation but discretion, interpretation, and the ability to respond to novel circumstances for which no algorithm has yet been prescribed. Philosophers, mathematicians, and cognitive theorists have grappled with this problem from different angles, and their contributions illuminate both the power and the limitations of computational models.
Gödel’s Incompleteness and the Limits of Formalisation
Kurt Gödel’s incompleteness theorems, first published in 1931, are often invoked in discussions about the boundaries of artificial intelligence. Gödel proved that any sufficiently rich formal system is incapable of proving all truths within its domain: there will always be true statements that cannot be derived by the system’s rules. If human reasoning were entirely formal, then it too should be bound by incompleteness; yet Gödel himself argued that human mathematicians can see the truth of certain propositions that lie outside formal derivability. As Douglas Hofstadter noted in Gödel, Escher, Bach (1979), this paradox hints at a mysterious quality in human thought: we appear able to transcend the mechanical formalism we ourselves devise.
While Gödel’s work does not by itself disprove the possibility of machine judgment, it highlights a crucial point: no set of pre-established rules can ever be complete. The world constantly generates situations for which no finite rulebook suffices. Human judgment flourishes precisely in this indeterminate space. If artificial systems are to approach the human level, they must do more than calculate; they must approximate the capacity to transcend a given formalism when necessary.
Davidson and the Problem of Radical Interpretation
Donald Davidson’s philosophy of language and action offers a further challenge. In essays such as “Radical Interpretation” (1973) and his seminal book Inquiries into Truth and Interpretation (1984), Davidson argued that understanding another agent’s utterances requires not merely applying a dictionary of meanings but interpreting behaviour holistically, against a background of beliefs, desires and intentions. Meaning arises not from formal rules but from an interpretative practice embedded in human life.
This resonates deeply with the problem of machine judgment. Artificial intelligence systems rely on vast training datasets and probabilistic inference. They excel at pattern recognition, but they lack the lived background that makes human interpretation possible. When a judge weighs a case, when a diplomat reads a gesture at the negotiating table, or when a physician infers a diagnosis from ambiguous symptoms, each is drawing upon layers of tacit understanding that cannot be reduced to codified rules. Davidson’s point is that this background is constitutive of meaning itself; without it, one cannot speak of genuine understanding. Machines may simulate aspects of interpretation, but absent embeddedness in human practices, their judgments risk being superficial analogues rather than authentic exercises of discretion.
Hofstadter and the Fluidity of Analogy
Douglas R. Hofstadter, in Gödel, Escher, Bach and later in Fluid Concepts and Creative Analogies (1995), stressed the centrality of analogy in human cognition. He argued that much of our reasoning is not about applying rigid rules but about mapping structures from one domain onto another—finding resonances, correspondences and metaphors. Analogy is what allows us to make sense of unfamiliar situations by comparison with familiar ones. It is also inherently fluid: the “rules” of analogy are themselves variable, shifting with context.
Computational models have made progress in analogy-making—witness large language models generating metaphors or analogical problem-solvers in cognitive AI research—but Hofstadter’s key insight remains: analogy cannot be reduced to algorithm without losing its open-ended flexibility. The very act of deciding which features of a situation are salient for comparison already presupposes a kind of judgment that is not rule-governed. Thus Hofstadter’s work underscores the difficulty of imagining an artificial system fully mastering analogy in the rich, spontaneous way humans do.
The Rule-Free Contexts of Judgment
Judgment differs from rule-application in that it requires navigating indeterminate, norm-laden, or morally fraught contexts. Aristotle in the Nicomachean Ethics called this capacity phronesis, or practical wisdom: the ability to deliberate well about matters that cannot be reduced to universal rules.
Consider three examples:
Legal Judgment: A judge must apply the law, but statutes are often vague, precedents conflicting, and facts unique. The act of “doing justice” lies not in mechanical rule-following but in weighing equities, anticipating consequences, and articulating reasons that balance fidelity to law with responsiveness to circumstance.
Moral Judgment: Ethical dilemmas rarely yield to algorithmic clarity. To decide whether to tell a painful truth or withhold it for kindness requires sensitivity to context, relationships, and unquantifiable values.
Political Judgment: Statesmen must decide under uncertainty, often with incomplete information and competing principles. The art of politics is precisely that it is not reducible to technical expertise but involves prudence, timing, and a sense of the possible.
In all three cases, rule-based systems can inform decisions—through precedential databases, moral calculus, or policy simulations—but the final act of judgment is irreducibly human.
Advances in Machine Learning: A Challenge to the Sceptics
Nevertheless one must not underestimate the remarkable advances of recent years. Machine learning systems increasingly exhibit behaviours that seem uncannily judgment-like. They can synthesise vast corpora, detect subtle correlations, and adjust dynamically to feedback. Large language models approximate interpretative flexibility; reinforcement learning systems acquire strategies that look like prudence. Moreover machines can be trained to mimic discretionary processes by ingesting thousands of past human judgments, whether in law, medicine or business.
This raises the disquieting possibility that the difference between human and machine judgment may be one of degree rather than kind. Perhaps what appears to us as irreducibly human—the weighing of equities, the feel for context—is simply a higher-order statistical mapping that sufficiently advanced systems could replicate.
Prediction and Judgment: A Crucial Distinction
At this juncture, it is helpful to separate two related but distinct activities: prediction and judgment. Artificial intelligence excels at prediction. By analysing vast amounts of data, it can estimate with remarkable accuracy the likely next word in a sentence, the probability of credit default, or the most effective treatment for a medical condition. Prediction is essentially probabilistic extrapolation from patterns in data.
Judgment, however, involves more than forecasting outcomes. It entails evaluating them in light of values, principles, or ends that are not themselves derivable from data. A machine may predict that telling the truth will cause offence and concealment will preserve harmony; but deciding whether truth or harmony is the higher good is an act of judgment, not prediction. Similarly, a weather model may forecast rain, but the judgment to cancel or persist with a major outdoor commemoration involves prudence, symbolism, and moral imagination.
This distinction clarifies why machines, however advanced, risk conflating statistical success with wisdom. As John Searle observed in his famous “Chinese Room” argument (1980), the manipulation of symbols according to rules may replicate outputs without generating understanding. The "Chinese Room" is a thought experiment designed to show that a computer program, no matter how sophisticated, cannot possess genuine understanding or consciousness. In the scenario, a non-Chinese-speaking person in a room follows a rulebook to manipulate Chinese characters, successfully passing as a Chinese speaker to outsiders, yet they do not actually understand Chinese. Searle argues that, like the person in the room, computers only manipulate symbols (syntax) without understanding their meaning (semantics), thus refuting the idea of strong artificial intelligence.
In any event, prediction without judgment is powerful but incomplete. To conflate the two is to mistake accuracy for discernment. The human act of judgment incorporates predictive insight but adds an evaluative and interpretative layer that resists reduction to statistics.
The Epistemic Gap: Simulation versus Understanding
Nevertheless, the crucial question remains: do machines understand the judgments they simulate? Searle’s thought experiment makes the point starkly: symbol manipulation can produce outputs indistinguishable from understanding, yet without genuine grasp of meaning. Similarly, machines may one day outperform judges, doctors, or diplomats on average outcomes. But whether this constitutes “judgment” in the human sense is contested.
The answer may hinge upon whether one defines judgment behaviourally (good outcomes suffice) or phenomenologically (the act of understanding is essential). From the standpoint of social utility, behavioural sufficiency might be enough; from the standpoint of philosophy, the absence of authentic understanding preserves a categorical distinction.
The Persistent Human Margin
Gödel reminds us of the limits of formal systems; Davidson shows us that meaning is embedded in human practices; Hofstadter underscores the fluidity of analogy at the core of thought; Aristotle provides the classical account of practical wisdom; and Searle warns us against confusing simulation with comprehension. Together, they suggest that while computational systems will increasingly rival and even surpass human beings in many decision domains, the full exercise of judgment—as a context-sensitive, interpretative, and meaning-laden act—may always retain a human margin.
Machines can advise, calculate, simulate, and optimise. They may even outperform human averages in certain judgment-like tasks. But as long as judgment involves navigating contexts that are not reducible to rules, that require interpretation against the grain of lived human experience, it seems unlikely that computation will ever completely eclipse the human capacity for judgment. The question is less whether machines will replace us than whether they will force us to reflect more deeply upon what makes human judgment distinct, and why it remains indispensable in a world awash with algorithms.




