Seeing What Matters: How Judgement Went Missing From AI
- owenwhite
- Sep 5
- 14 min read
Updated: Sep 7

Part I — Sarah’s Room
Sarah’s job title says “case worker,” but that hardly captures the skills she needs to exercise each day. She works in child protection, which means most days she walks into other people’s crises: kitchens where the fridge hums louder than the conversation; living rooms with plastic toys underfoot and court letters on the table; front steps where the weather and the mood both look changeable. The manuals say what to record; the database says what to tick; the policy says what to aim for. None of that tells her how to see.
Seeing is the point. Under noise and narrative and understandable self-deception, Sarah tries to perceive what is really going on and what might happen next. Is the warmth between mother and child the everyday warmth of care, or the brittle warmth of performance under inspection? Is the new boyfriend anchoring the situation or quietly destabilising it? Is this household on the cusp of safety or the cusp of harm? She is not “following a process” so much as reading a room—listening for the signal, weighing words against gestures, history against possibility. She is, to use old terms that have fallen out of fashion, making a judgement.
Judgement makes managers nervous, so the system has surrounded Sarah with a hedgerow of safeguards: thresholds, scoring rubrics, dashboards, peer reviews, audit trails. Each has its place. But the odd effect is to turn judgement from the central act into the unsayable one, something you do half-apologetically. The paperwork speaks of “risk categories” and “interventions,” not of courage or character. In official language, Sarah's job is about compliance; in reality, it is about discernment.
That gap—the dashboard world of means and the human world of ends—is where our modern confusion lives. We have become excellent at what can be counted and uneasy about what must be seen. We are sure-footed on procedures and strangely tongue-tied about wisdom. The public face of competence is the politican at a podium, graph behind him, speaking of targets met and percentages improved. The private face of competence is Sarah, trying to notice what matters in time to make a difference. The two faces ought to belong to the same person. Increasingly, they don’t.
It’s tempting to say this is merely a bureaucratic quirk, the price of scale. But it isn’t. The separation of means and ends is baked into our dominant picture of rationality. “Instrumental reason,” as philosophers call it, takes the ends as given and concentrates genius on the means: efficiency, optimisation, risk management, “what works.” But ends—safety, dignity, flourishing, justice—are not blank tokens to be plugged in. They must be seen, argued over, refined, and held to account. When we treat ends as settled, we outsource the most human work to silence. When we refuse to deliberate about ends, we hollow out the world.
Artificial intelligence has brought that hollowing to a head. Because the most celebrated technology of our age is, in its bones, allergic to the very skills Sarah relies on: a cultivated perceptiveness that fuses context, experience, emotion, and moral attention into timely judgement. That kind of seeing—Aristotle called it phronesis or practical wisdom—is not a design flaw in human beings to be compensated for by machines. It is a fundamental aspect of intelligence to be honoured and extended.
But not everyone sees this. Spend a day listening to the AI boosters and you will hear the refrain. Intelligence is pattern recognition at scale. Reasoning is a computation over representations. Values are preferences to be learned and optimised. Moral controversy signals the need for clearer objectives or better data. From this vantage, judgement is either a bias to be scrubbed out or a latent function to be approximated by more parameters and a larger corpus. With enough examples and a well-crafted objective, the system will “learn” what to do. The fear, if you are Sarah, is that the real hinge of her practice—the skill of seeing—will be written out of the story; and with it, the living conversation between means and ends.
If you are tempted to dismiss this as sentimentality, consider how decisions actually happen. First, they are thick with values: fairness, dignity, safety, mercy, trust. Second, those values are not free-floating abstractions; they show up as saliences—features that stand out—because of who is present, what is at stake, how the past echoes, and which futures are live. Third, grasping those saliences is not a stepwise calculation; it is a cultivated perceptual skill. We don’t first see neutral facts and then bolt on values. We see facts under a description already value-soaked. Iris Murdoch, explicitly following Plato, called the work “attention” -- the slow cleaning of the inner lens so that what matters can be seen without distortion. To attend is to deliberate about ends in the very act of perceiving means.
In Sarah’s world, the scoreboard is never the story. The story is whether a child sleeps safely tonight. That is not a problem of means alone. It is a problem of ends seen clearly enough that the right means can be chosen.
Part II — The Guillotine, the Toolkit, and the Rise of Instrumental Reason
How did judgement come to be exiled from our picture of reason, and how did means come to stand without ends? One familiar chapter begins in the Enlightenment. David Hume warned that you cannot derive an “ought” from an “is.” Facts alone will not tell you what to value; values are not delivered by the same post as observations. As a political settlement, the separation had obvious benefits: it protected science from clerical interference and plural societies from moral overreach. But as a cultural habit, it had a subtler consequence. “Truth” migrated to the factual side of the ledger; “value” drifted towards taste, preference, and politics. We did not stop making evaluative judgements; we just stopped feeling we could name them as such.
Enter instrumental reason as the dominant way of thinking about rationality. If ends are subjective or, at best, contested, stay out of that minefield. Take the ends as given—“reduce crime,” “raise test scores,” “boost GDP,” “lower risk”—and unleash ingenuity on the means. Optimise. Manage. Nudge. Measure. Reward improvement. A new civic architecture emerged: indicators, audits, “what works” toolkits, A/B tests, dashboards. Goodhart’s Law—that when a measure becomes a target, it ceases to be a good measure—became the ghost at the feast, invoked and ignored in equal measure. We wanted the clarity of numbers even when the numbers offered clarity about the wrong things.
This style was not born from malice; it was a hedge against moral hubris. But its unintended effect has been to remove ends from public deliberation and relocate them in the realm of preference and PR. You don’t argue about the good; you select key performance indicators. You don’t cultivate public judgement; you publish a scorecard. The means become the moral universe. The graph that slopes nicely becomes the thing itself.
Artificial intelligence is instrumental reason’s favoured child. It is a machine for modelling regularities in whatever we can render as data: clicks, tokens, pixels, past decisions, proxies for preferences. In research labs, brilliant teams push two deep ideas: representation (how to encode the world) and optimisation (how to improve a chosen objective). In industry, the gravitational pull is toward prediction under constraint: Who will churn? Which ad will be clicked? Which claim is likely fraudulent? The intellectual style—call it the calculus of correlation—bets that reality can be captured in the right representation, and guidance delivered by the right objective function.
Where does judgement live in that picture? Notice the Enlightenment’s downstream effect on the design brief. If values are preferences, the system can learn them from revealed behaviour or explicit ratings. If fairness is controversial, pick a fairness definition that fits the regulatory scheme and bake it into the loss function. If people disagree, model a distribution over preferences and choose a policy that maximises expected satisfaction under constraints. The hard moral substance of judgement dissolves into the soft mathematics of aggregation. Ends are not deliberated; they are inferred or stipulated. Means do the heavy lifting.
The AI boosters—our Prometheans—do not deny this. They reframe it. Judgement becomes “meta-optimisation” (learning the loss), “preference learning” (from feedback and comparisons), or “alignment” (steering systems toward human goals). Pick your flavour: RLHF (reinforcement learning from human feedback), constitutional training, RLAIF (from AI feedback), inverse reinforcement learning. Each is ingenious; each makes things better on important margins. But they share a framing that keeps human-style judgement offstage: values are to be elicited and stabilised before the system acts, not exercised as part of what intelligence is. Ends are exogenous. The only problem is the means.
Yet human practice keeps smuggling ends back in. Murdoch’s “attention” is not an optional extra after the data is digested; it is the disciplined work by which the world becomes legible in the first place. Aristotle’s phronesis is not cleverness at tactics; it is the virtue of selecting and shaping means in a dialogue with ends that are partly discovered and partly formed as we go. Plato’s cave is not a parable about accumulating facts; it is a story about the soul turning toward a light that discloses what to love. In all three, the separation of means and ends is a mistake about how perception itself functions. We never really take ends as given; we see them—or fail to—through the qualities of attention we bring to the world.
That is why instrumental reason, left to itself, hollows out life. Treat ends as fixed or unmentionable and the only remaining hero is efficiency. But efficiency at what? The most dangerous bureaucratic sentences are the most confident: “We have optimised the system.” The meaningful question is missing: “For the sake of what?” If the ends are not deliberated, they are smuggled in—by the powerful, by the convenient, by the metric that happens to be measurable. Neutrality is not compassion; it is abdication dressed as fairness.
The remedy is not to abandon means, but to restore the conversation between means and ends as the heart of rational life. That conversation is called judgement.
Part III — What Judgement Knows That Models Don’t: Reuniting Means and Ends
To bring judgement back into view—and to reunite means with ends—we need three clarifying moves: distinguish skill from rule, salience from signal, and formation from fine-tuning.
First, skill and rule. Hubert Dreyfus, drawing on phenomenology, argued that expertise is not primarily the application of stored rules. Mastery looks like ease—an absorbed responsiveness that collapses deliberation into action. The consultant surgeon who senses, from a tissue’s resistance, that something is off; the teacher who spots the sideways glance that spells trouble three minutes before it happens; the negotiator who adjusts her cadence because the silence on the line changed in a way a transcript will never show. These are not feats of explicit rule-following. They are sketches of embodied skill—skill that is already a dialogue between means and ends. The surgeon’s touch aims at healing; the teacher’s glance aims at learning; the negotiator’s cadence aims at dignity and safety. The end is not appended after the move; it infuses the move.
AI researchers accept this in one domain: perception. No one thinks we can hand-code all the rules for recognising cats; we train systems on vast data. But when the target is moral salience rather than visual salience, the fantasy of rules—or its cousin, the fantasy of aggregation—returns. We imagine that if we collect enough labels, the system will inherit the moral capacities of the labellers. That is a category mistake. Labels are the residue of judgement, not its substance. They tell you how the expert answered; they do not transmit how the expert saw, nor the end she sought in seeing.
Second, salience and signal. Machine learning treats a feature as salient if it helps predict the labelled outcome. Moral life reverses the arrow: an outcome is interesting because the features are salient in a way that matters. In a loan dataset, “postcode” may be a predictive signal. In a just society, “postcode” must not be a salient reason for refusing someone access to credit. The ethical work is not merely to down-weight the feature; it is to notice that the feature ought not to count. That requires a point of view on human ends: what people are, what they deserve, what a life is for. Data will never supply that point of view. It must be reasoned about, narrated, and then lived.
Third, formation and fine-tuning. When we speak of cultivating judgement, we speak of forming a person: training perception (Murdoch’s attention), shaping character (the virtues), deepening discernment through experience and mentoring (Aristotle’s phronimos, the person of practical wisdom). The modern mirror is the “fine-tuning” of a model on curated datasets to reduce errors on a metric. The analogy is tempting but misleading. Fine-tuning aligns outputs to preferences; formation aligns a self to the good. One optimises behaviour; the other transforms the seer. One takes ends as inputs; the other learns to hold ends up to scrutiny.
A sceptic might reply: you are mystifying. Surely judgements are patterns that can be learned. Surely values can be approximated by the right training signal. Surely the difference is one of degree (data, compute, cleverness), not of kind. Four facts of practice answer otherwise.
Underspecification: many real problems do not have a unique, label-able “right answer.” They have better or worse ways of seeing and acting. Sarah can write a report that passes review but still fail the child. The review measures words; the world measures lives.
Reverberation: practical effects loop back. When a predictive system flags a neighbourhood as risky, patrols intensify, recorded incidents rise, the prediction looks validated, resources shift, opportunities shrink—until the model has helped make the world it measured. In such systems, judgement is not bias; it is the only tool for refusing self-fulfilling cycles and naming what ought not be.
Thick concepts: dignity, humiliation, respect, betrayal are not just descriptors; they are calls to respond. They draw on shared forms of life and cannot be cashed out in neutral features without losing their grip. You can count interruptions in a meeting; you cannot count “belittling” in a way that does not rely on background judgement about ends—what a meeting is for, what people owe one another.
Asymmetric stakes: in medicine, education, justice, and social care, the heaviest costs fall on the vulnerable. That asymmetry is precisely why Aristotle centred virtue: not because rules don’t matter, but because rules are brittle at the edges, and the edges are where real lives fray. If we build systems that cannot recognise edges, we will automate our indifference.
Where does this leave AI? Not on the bench, but in a different role. In the complex domain, we need tools for sensemaking under uncertainty, for discovering regularities we haven’t seen, for checking our blind spots, for rehearsing options and stress-testing plans. But help is not replacement, and steering is not seeing. If our headline is “AI does the means so that people can worry about the ends,” we have misunderstood both. The dialogue between means and ends is not an afterthought; it is the atmosphere in which intelligent action breathes. Systems that ignore that dialogue will look competent on dashboards and clumsy in the world.
There is a better headline: AI can widen a good practitioner’s horizon of attention if we design for attention and deliberation about ends. That requires a different engineering imagination—one that treats the interface as a place where reasons and values are made legible, contestable, and revisable in use.
Part IV — Beyond the Promethean Pitch: A Design Brief for Judgement-Centred Intelligence
The loudest voices in our AI moment like to tell a simple story: exponential curves, scaling laws, emergent abilities, a straight line to a generality that will “do most economically valuable work.” There is energy in these visions, and there is achievement. But notice the picture of human agency they presuppose. Values are preferences to be surfaced and stabilised; reason is a calculus over representations; progress is more pattern, more power, more speed; danger is misalignment between the system’s objective and ours. In this frame, judgement is either the messy human part to route around or a capability that larger models will converge upon once they imbibe enough text about moral life. Ends are inputs; means are the magic.
The counter-tradition—Aristotle’s phronesis, Murdoch’s attention, Heidegger’s disclosure, Polanyi’s tacit knowledge, Schön’s reflection-in-action—does not reject intelligence at scale. It asks a prior question: intelligence for what? The “for what” cannot be answered by counting clicks, parsing sentiment, or even maximising a carefully weighted social objective. It must be answered by people, in institutions, with habits of practice, traditions of reasoning, and forms of life that make meaning thick enough to live by. That prior question is the conversation between means and ends. If we don’t hold it open, technique will quietly swallow purpose.
Translated into a design brief, that yields five principles.
Build for salience, not just signal. A good tool for judgement does more than highlight what is statistically predictive; it foregrounds what might matter—and says why. Imagine clinical support that does not only estimate risk but also visualises plausible causal pathways and trade-offs between goods (pain relief vs. dependency risk; speed vs. thoroughness), inviting the practitioner’s experience to complete the picture. The aim is not to choose for the clinician but to widen her attentive field in time to save a life. Means are shown in their relation to ends.
Expose the “oughts.” Every model smuggles in values via labels, features, and objectives. Name them. Make them contestable. Show what changes when we trade one fairness notion for another, when we privilege speed over accuracy, when we rank by predicted compliance rather than greatest need. Treat the interface as a civic space where the ends are visible enough to argue about. When a system proposes a course of action, let it show the live disagreements, not merely the central estimate.
Evaluate consequences, not only benchmarks. Benchmarks are useful; they also seduce. In practice, we care about how systems reverberate through institutions across time. That requires slow feedback loops and narrative evidence. Did an education recommender nudge teachers toward short-term test gains while eroding curiosity? Did a policing model reduce reported crime while corroding trust? Pair quantitative evaluation with ethnography. Hold yourself answerable for ends, not just for means performed to spec.
Put formation on the roadmap. If we want judgement in the loop, we must train people to judge. That is not a one-off onboarding but a craft apprenticeship: case conferences that reward attention; reflective practice sessions where decisions are revisited for what they revealed; mentoring that transmits styles of seeing. Vendors love to promise “human in the loop.” Insist on “human formation in the loop,” or the loop will be decorative. The conversation between means and ends requires adults who can hold it.
Design with complexity, not only complication. Dave Snowden’s Cynefin framework drew a line too many organisations blur: complicated problems yield to expertise and can be planned; complex problems require probing, sensing, and adapting. AI’s sweet spot has been the complicated. We need tools that respect the complex: dashboards that show live tensions between goods; simulation sandboxes for exploring second-order effects; alerting systems that flag when the world has shifted out of the model’s distribution rather than marching on as if nothing has changed. In complex worlds, means evolve because ends are clarified in action.
Return, finally, to Plato’s cave. The usual telling says we must leave the shadows for the sun—trade illusion for truth. A neglected moment comes just before the ascent, when the prisoner first sees the fire that casts the shadows. It is dazzling in the wrong way: bright but deceptive, a heat that blinds as much as it illuminates. In Murdoch’s update, the fire is the ego—our fantasies, fears, and vanities. In our institutional update, the fire is the dashboard glow: the metrics that hypnotise us into confusing improvement in the means with clarity about the ends. Turning toward a truer light is not a rejection of tools; it is the refusal to take their illumination as the last word.
Will the boosters protest that judgement, so understood, is ineffable and unprogrammable? Some will. Others will hear the invitation. The most interesting AI work in the next decade may not be the last leap in model size but the first serious attempt to collaborate with human practical wisdom without trying to subsume it. That will feel less like automation and more like apprenticeship; less like handover and more like co-deliberation; less like “move fast” and more like “move fittingly.”
And Sarah? She will still be in rooms with families. There will still be dashboards. But perhaps the language around her will have changed. Perhaps her reports will make explicit what she is trained to see. Perhaps oversight will reward her for refusing a clean metric when it conflicts with a messy good. Perhaps the tools on her laptop will help her notice not only what is likely to work, but what is worth working for.
That future will be slower, more dialogical, and less triumphalist than the AI Prometheans promised. It will also be more human—and, paradoxically, more intelligent. Because the intelligence we most need to scale is not only the power to predict. It is the power to keep means answerable to ends, to perceive the good in time to choose it, and to keep choosing it together.



Comments