Smart, experienced practitioners regularly identify the evaluative challenge AI creates, then neutralize their own insight. They see the problem clearly, describe it precisely, and treat the description as having dealt with it. This pattern, recognition without reflexivity, runs through nearly every piece of practitioner discourse in the evidence base.
When organizations engage with AI, the engagement produces metrics that are easy to see and celebrate (speed, volume, surface quality). These metrics displace the harder-to-measure criteria that actually matter. Over time, the people who would notice the substitution lose the capacity to detect it, because the AI is reshaping the very practices through which they developed their judgment. The traps catalogued here are the ways practitioners talk themselves into believing everything is fine, through sincere belief rather than strategic evasion.
The traps fall into four categories based on how they were discovered, and are organized in tabs by what level they operate at.
The original 17 traps all have empirical support; 4 cross-domain traps (18 through 21) are predicted but unconfirmed. The tabs organize them by what level the reasoning operates at, not by how they were discovered:
Individual Self-Framing (traps 2, 4, 6, 9, 14) captures how practitioners position themselves as stable, untransformed subjects. "I have the skills, I make the choices, I can see the problem."
Temporal Reasoning (traps 1, 5, 7, 8, 13) captures how practitioners borrow authority from time. Past cycles, present evidence, enduring fundamentals, navigable transitions, and reversible experiments.
Structural Reasoning (traps 3, 10, 11, 12, 19, 20, 21) captures how practitioners reason about organizational systems. Market value, metrics, access, feedback loops, regulatory legitimacy, workforce shortage, and accuracy-as-outcome.
Field-Level Dynamics (traps 15, 16, 17, 18) captures how traps gain authority, spread, and resist correction across actors. Legitimation circuits, performative naming, expertise dismissal, and moral urgency.
Each trap card carries an origin tag showing how it was found. The eight inductive traps were observed in practitioner discourse before the framework existed. The six deductive traps were predicted by PSF and later confirmed in independent evidence. The three analytical traps emerged from re-examining accumulated evidence. The four cross-domain traps were predicted by applying PSF to AI diagnostics in India.
Traps do not appear in isolation. A single LinkedIn post or conference talk routinely deploys three, four, or five traps at once, and the combinations are not random. The Co-occurrence tab tracks this with a matrix (which trap pairs appear together) and a cluster log (full multi-trap instances with source context).
Every new piece of evidence (case study, LinkedIn post, academic paper, interview excerpt, field observation) gets run through this protocol:
When an interviewee deploys one of these moves, use the PSF counter-question to probe the assumption without leading. The traps often appear as sincere beliefs, not defensive rhetoric, so the follow-up should be genuinely curious rather than confrontational.
Code practitioner discourse (LinkedIn posts, conference talks, Slack threads, blog posts) against all 21 traps. Track frequency, co-occurrence, and which traps cluster together. Note the origin category of each instance (inductive, deductive, analytical, or cross-domain), since confirmation of predicted traps is itself evidence that PSF works.
These traps are not mistakes by careless thinkers. They are sophisticated practitioners making structurally incomplete arguments because they sincerely believe what they are saying. The goal is not to dismiss the discourse but to identify what each move conceals about how judgment is sustained or eroded.
These traps govern how practitioners position themselves relative to AI. Each one preserves the self as the stable, untransformed subject of the interaction. The practitioner acknowledges that AI creates challenges, then frames those challenges as affecting others, requiring skills they already possess, or operating within choices they control. The common structure: The evaluator is exempt from the process being described.
The practitioner acknowledges AI output can deceive, but locates the vulnerability in others (juniors, non-specialists, clients). Expertise is treated as a stable shield rather than a depreciating capacity that requires ongoing replenishment.
What happens to the pipeline of experienced practitioners over time? If AI handles the work through which juniors develop judgment, that pipeline may thin. The current stock of experienced practitioners can mask a growing deficit in the conditions required to produce more of them. Who develops the judgment that the "less experienced" currently lack, and through what activities?
Third-person vulnerability framing ("they," "someone less experienced," "clients"). Implicit self-exemption from the problem being described. No discussion of how expertise was acquired or how current conditions affect its transmission.
Radiologists and pathologists locate AI risk in less experienced clinicians while trusting their own interpretive capacity. The professional licensing system reinforces the trap by conferring formal credentials that practitioners may treat as evidence of evaluative immunity. Predicted to appear in any domain with expertise hierarchies, potentially amplified in licensed professions.
The practitioner lists competencies required for effective AI use (asking right questions, validating responses, spotting signals, discriminating quality, recognizing patterns). Each competency is presented as a prerequisite the practitioner brings to the engagement, framing the human as the stable element in the human-AI system.
Are these competencies fixed endowments, or capacities sustained through effortful exercise? Each "if" in the conditional stack names a skill that was developed and maintained through the very activities AI engagement may now replace or restructure. The conditions assume what needs to be explained: that practitioners will continue to possess precisely the capacities whose developmental substrate may be eroding.
Lists of preconditions for effective AI use, each treated as a static attribute. No analysis of how the preconditions were developed or whether current conditions sustain their development. The longer the conditional stack, the more the speaker inadvertently catalogues the evaluative capacities at risk.
Diagnostic practitioners list prerequisites for effective AI use (clinical correlation, awareness of false positives, population-specific calibration) as fixed competencies rather than capacities sustained through effortful practice. Each "if" names a skill whose developmental substrate AI engagement may restructure. Predicted in any domain where practitioners can enumerate conditions for effective use.
The practitioner identifies the evaluative challenge with clarity, often eloquently. The act of identification is implicitly treated as sufficient to confer immunity. Naming the trap is confused with escaping it. The post or comment itself becomes evidence of the practitioner's exemption.
Does recognizing the mechanism of proxy seduction protect you from it? PSF operates through sincere belief, not strategic evasion. The practitioner who articulates the problem most clearly may still be subject to organizational pressures, shifting norms, and gradual recalibration of standards. Individual awareness need not insulate against structural erosion. It may even create false confidence that delays collective response. What organizational or institutional conditions would be needed beyond individual recognition?
Accurate description of the evaluative challenge followed by no structural or institutional prescription. Agreement threads where multiple practitioners affirm each other's insight without discussing mechanisms for collective protection. The diagnostic accuracy of the post is high, but the post functions as a performance of competence rather than a call to structural action.
Diagnostics discourse includes sophisticated recognition of AI limitations (dataset bias, automation bias, need for clinical correlation). These recognitions function identically to the software domain: naming the risk is treated as having addressed it. SAHI recommends "human-in-the-loop" without verifying the loop is substantive. Predicted wherever practitioners and regulators can articulate risks.
The practitioner frames the human-AI relationship as additive. AI is a tool the practitioner uses, and the practitioner remains the stable subject of the interaction. The framing positions the human as the locus of agency and judgment while AI handles execution or acceleration. Because the word "augmentation" carries positive connotations (enhancement, amplification), the frame discourages further inquiry into what exactly is being changed.
PSF proposes that AI changes the evaluator, not just the production process. The augmentation frame rules out this possibility by definition. If the tool is reshaping the practitioner's standards, expectations, and sense of what "good" looks like, "augmentation" misdescribes the relationship. After sustained engagement, does the practitioner judge quality the same way they would have without the tool? If not, the relationship is transformative, not supplementary, regardless of how the practitioner describes it.
Language of tools and use ("I use AI to...," "AI helps me...," "it's in my toolkit"). The practitioner as grammatical subject, AI as instrument. Resistance to framing that positions AI as having changed the practitioner's own judgment or standards. May co-occur with Expertise Immunity (trap 2), since both preserve the practitioner as the stable, untransformed agent.
"AI assists, humans decide" pervades diagnostic AI discourse. SAHI mandates "human-in-the-loop." The framing positions the clinician as the stable agent. PSF asks: after sustained AI-pre-read exposure, does the clinician evaluate the same way? If not, the relationship is constitutive, not supplementary. Predicted wherever the augmentation/replacement binary organizes discourse.
The practitioner locates control at the individual level and treats engagement as a series of discrete, bounded decisions. Each interaction with AI is presented as a conscious choice, and the practitioner's autonomy is preserved through the framing. The implication: structural effects cannot accumulate because each engagement is independently chosen and can be independently declined.
Organizational norms, competitive pressures, pace expectations, and shifted baseline standards can progressively erode the space for individual opt-out, even as practitioners continue to believe they are choosing freely. The "choice" frame also conceals a deeper question: whether prior engagement has already transformed the chooser. If sustained AI engagement has shaped the practitioner's standards, expectations, and sense of what "good work" looks like, the choice may be made by a subject the tool has already constituted. Individual agency is real. It operates within a field of constraints that the agency frame renders invisible.
First-person choice language ("I decide," "I choose," "I use it for X but not for Y"). Emphasis on deliberate, conscious selection. No acknowledgment of organizational pressure, peer norms, or competitive dynamics that constrain the choice set. May co-occur with the Augmentation Frame (trap 9), since both position the practitioner as the autonomous, untransformed agent. Could surface across all interview types.
PREDICTED DOMAIN-BOUND. In software development, individual developers choose which AI tools to use. In diagnostic centers, AI deployment is an institutional decision. The radiologist does not choose whether the AI pre-read appears on their workstation. The "I choose when to use AI" framing may not arise in diagnostics, replaced by institutional deployment decisions that practitioners accommodate. If this trap is absent in diagnostics, it confirms the distinction between mechanism-level and domain-level traps.
These traps derive reassurance from temporal frames: past cycles that resolved (Historical Normalization), present evidence that looks good (Present-Tense Projection), fundamentals that persist across eras (Fundamentals Endurance), navigable futures with legible destinations (Transition Naturalization), or reversibility that preserves optionality (The Reversibility Assumption). PSF asks the same question of each: Does the temporal frame account for how AI engagement changes the conditions it describes?
A previous technology democratized production, quality initially suffered, experienced practitioners remained essential, equilibrium was restored. The current moment is framed as a familiar cycle with a known resolution. The analogy is persuasive because the surface pattern genuinely matches: democratized tooling did flood the market with low-quality output, and the field did sort itself out. The trap lies not in the pattern recognition but in the unexamined assumption about how self-correction worked.
The earlier ecosystems self-corrected through visible failure. VB applications crashed, lost data, and presented interfaces users could not navigate. Bad output was self-announcing, and the unambiguous feedback it produced was the mechanism through which practitioners either improved or exited. AI-generated output occupies a structurally different position: it clears a surface quality threshold that earlier tools never did. The outputs are "professionally competent and completely forgettable," not broken. The badness tends to be self-concealing. The historical analogy is not wrong (the democratization pattern is real), but the feedback architecture that drove correction in earlier cases may no longer operate when output passes a legibility threshold while missing on accountable criteria.
Analogies to specific historical tools. Cyclical framing ("we've been here before"). Confidence that the discipline will survive because it survived last time. The critical diagnostic marker: absence of any account of the feedback mechanism that drove self-correction in the earlier case, and whether that mechanism still operates. The speaker treats the historical outcome (quality norms emerged) as transferable without examining the process (visible failure produced unambiguous learning signals) that generated it. The core PSF distinction applies directly: failure is self-announcing, degradation is self-concealing.
Diagnostics practitioners deploy historical analogies with the same structure: "We integrated PACS, we integrated digital X-ray, we will integrate AI." The move treats each technology transition as equivalent, concealing that AI engagement transforms the evaluator, not just the production process. Predicted to appear in any domain where practitioners have prior experience with non-transformative technology adoption.
Current evidence (hiring trends, client demand, revenue) is cited as proof that professional judgment remains valued. The present is projected forward as a stable state rather than read as a point on a trajectory.
Is the current state an equilibrium or an inflection point? If PSF is right, the early phase of proxy seduction looks like success: organizations are productive, outputs look good, metrics improve. Degradation becomes visible only after the judgment needed to detect the substitution has already eroded. Present-tense evidence may be unreliable as a signal of long-term trajectory precisely when proxy seduction is in its early phase.
Use of "now," "currently," "more than ever" as evidence. Snapshot data treated as trend confirmation. Absence of any mechanism account for how the observed state might change, or any consideration that early-phase trajectories can look like stable improvements.
Current deployment metrics (2,600 sites, 600,000 patients screened, 10,000 scans daily) treated as evidence that the technology works. The present-tense success is genuine and measurable. What it projects (these numbers will continue to translate to patient benefit) is assumed. Predicted wherever current performance metrics substitute for longitudinal outcome tracking.
A timeless skill or principle (design thinking, user empathy, strategic judgment) is invoked as an anchor of continuity. Technology is framed as surface change while the deep structure of professional practice remains stable. The fundamental is treated as self-sustaining regardless of the conditions under which it is practiced.
Do the fundamentals persist automatically, or only through specific transmission mechanisms that the current shift may be disrupting? The claim that fundamentals endure treats them as properties of the discipline rather than as capacities sustained through particular forms of practice. If the activities through which fundamentals are learned and reinforced (shipping products, watching them fail, iterating under constraint) are restructured by AI engagement, the fundamentals themselves may attenuate even as practitioners continue to invoke them.
Invocation of timeless principles. Framing of technology as ephemeral and skill as permanent. No account of how the fundamental skill is transmitted across practitioner generations or maintained within current practice. The fundamental is named but its maintenance mechanism is not examined.
UNCERTAIN. Professional licensing and structured medical education may genuinely sustain fundamentals in ways that informal software credentialing does not. A radiologist must pass board exams testing independent interpretive capacity. Whether those exams will continue to test skills that matter in an AI-augmented workflow is an open question. This trap may operate differently in licensed versus unlicensed professions.
Unlike Historical Normalization (which looks backward to a previous cycle), Transition Naturalization looks forward. A structural discontinuity is repackaged as a professional evolution with a legible destination: the designer becomes an Editor, the SDLC becomes Context Engineering, the spec becomes the product. The disruption is acknowledged, sometimes with striking clarity, but the framing provides a navigable endpoint that makes the transition feel managed. New vocabulary (catalogs, middle loops, context engineering) gives the destination conceptual solidity.
Does the destination role depend on capacities formed at the origin? The first generation of "Editors" developed their judgment through years of being "Operators," through direct practice of the work they now oversee. The transition risks consuming accumulated judgment without specifying how to replenish it. What happens to the second generation, trained as Editors from the start, who must evaluate AI-generated output against criteria they never developed through direct practice? The clean narrative (from here to there) conceals a generational question. The endpoint may be viable only for practitioners who carry forward capacities the new path no longer develops.
Evolutionary language ("shift," "transition," "rewrite," "move to"). Clear origin and destination states for professional identity. New terminology that gives the destination conceptual weight (suggesting the endpoint is already understood well enough to name). Absence of any account of how competencies developed in the origin state are reproduced in the destination state. The practitioner may accurately describe what the new role looks like while leaving unexamined whether it can sustain itself without the old role's developmental substrate.
The shift from "interpret image" to "review AI output" is presented as a natural professional evolution, analogous to the shift from film to digital. Vendor language frames it as enhancement. 5C Network describes its workflow as "AI-first, expert-verified." The transition is given a clean narrative arc. Predicted wherever a workflow change can be framed as professional development.
AI engagement is framed as an experiment that can be reversed if results are unsatisfactory. The organization retains optionality: if quality drops, workflows can revert. The reversibility frame reduces the perceived stakes of engagement and makes broad deployment feel low-risk.
The erosion may not be reversible in the way the frame implies. Once the activities through which judgment was built are restructured (junior roles redesigned, apprenticeship pathways eliminated, practice replaced by AI-assisted shortcuts), restoring them requires rebuilding institutional infrastructure, not simply reverting a policy. The people who developed judgment through the prior system may have left, retired, or had their roles redefined. The reversibility assumption treats AI engagement as a tool you can put down rather than a process that changes the people using it. What would "going back" actually require, and how long would it take to rebuild what was lost?
Experimental framing ("pilot," "trial," "test and learn"). References to optionality and fallback positions. Implicit assumption that the pre-engagement state can be recovered on demand. No analysis of what has been restructured (hiring pipelines, training programs, role definitions, team compositions) during the engagement period. Likely to appear in leadership and management contexts rather than individual practitioner discourse.
Implicit in discussions of AI as "one more tool in the diagnostic toolkit." The assumption: if the tool proves unhelpful, clinicians can revert to pre-AI practice. The CAD precedent refutes this directly. After years of CAD-shaped workflows, radiologists could not simply return to pre-CAD interpretive practices. Deskilling evidence (colonoscopy, mammography) confirms capacity loss. Predicted wherever engagement transforms the practitioner.
These traps govern how practitioners reason about systems, metrics, markets, and feedback mechanisms. Each one makes a claim about how organizational structures function under AI engagement: judgment becomes a valued bottleneck (3), metrics confirm improvement (10), access expands democratically (11), or feedback loops catch problems (12). PSF asks whether the organizational structures being described have themselves been transformed by the process they are supposed to govern.
AI makes generation cheap, so judgment becomes the scarce resource. Scarcity increases value. Experienced practitioners are therefore more important, not less. The market validates the claim through current hiring patterns.
Current demand for experienced judgment tells you nothing about whether the conditions for producing more of it remain intact. Present-tense hiring data may represent the early phase of a J-curve: the existing stock of experienced practitioners being consumed at an accelerating rate while the pipeline that produced them is being restructured. What is the depreciation schedule for organizational judgment when the hands-on activities that sustained it are progressively automated?
Market-validation reasoning. Present-tense evidence treated as equilibrium rather than trajectory. Framing of judgment as a fixed asset with stable supply rather than a flow capacity requiring continuous investment.
Diagnostic AI vendors explicitly market the radiologist shortage as a bottleneck that AI resolves by allowing senior clinicians to focus on complex cases. The assumption: freed-up judgment will be applied to higher-value decisions. Unasked: does the freed-up practitioner still exercise judgment the same way after sustained AI exposure? Predicted wherever AI is framed as removing cognitive load.
Improving metrics (velocity, throughput, coverage, output volume, cycle time) are cited as evidence that AI engagement is working. The legible, measurable indicators move in the right direction, and the speaker treats the improvement as proof of success. This may be proxy seduction in its purest form: the easy-to-measure numbers that AI produces are the same ones being cited as evidence that everything is fine.
Are the metrics that improved the same metrics that matter? AI engagement produces metrics (speed, volume, surface quality) that are easy to measure, while the things that actually matter (depth of judgment, appropriateness in context, long-term quality) may erode without signal. The improving numbers may be precisely the proxies that have displaced the criteria that count. Which metrics are not being tracked, and would you know if they were declining?
Quantitative evidence of improvement (percentages, multiples, before/after comparisons). Metrics that are legible and fast-moving (velocity, volume, time-to-delivery). Absence of any corresponding measure for harder-to-quantify criteria (quality of reasoning, contextual appropriateness, organizational learning). Likely to surface in boundary activity interviews where organizational-level patterns are visible.
TAT reduction, throughput gains, and revenue growth are the dominant success metrics in Indian diagnostic AI. 5C Network markets 30-minute scan-to-report. DeepTek advertises 22% revenue growth. Government reports cite consultation volumes. All technically accurate, all functioning as proxies for patient outcomes that go unmeasured. Predicted wherever operational metrics are easier to track than outcome metrics.
AI is celebrated for expanding access to production capabilities previously restricted to trained specialists. More people can now build, design, write, or analyze. The frame activates a progressive political valence (democratization is good, gatekeeping is bad), which makes it socially costly to challenge. The expansion of access is treated as equivalent to the expansion of competence.
What exactly is being democratized: the ability to produce output, or the judgment to know whether the output is good? AI lowers the barrier to generating work, but it does not necessarily transfer the judgment required to tell good output from bad. More people producing work that clears a surface quality bar, while fewer people can evaluate whether it meets the criteria that actually matter, is the condition under which proxy seduction accelerates.
Language of access and inclusion ("anyone can now...," "no longer need years of training to...," "levels the playing field"). Conflation of production capability with judgment capability. The progressive valence may make this trap particularly resistant to challenge in interview settings, requiring careful framing of the counter-question to avoid appearing anti-democratic.
AI-powered screening in underserved areas is framed as democratizing access to specialist-quality diagnosis. The framing is morally compelling and partially accurate. What it conceals: whether the diagnosis delivered at scale has the same quality as the specialist diagnosis it claims to replicate, and whether the infrastructure to act on AI findings exists in underserved settings. Predicted wherever AI enables broader access to previously scarce capabilities.
The practitioner acknowledges quality risk but expresses confidence that external feedback (user complaints, market signals, review processes) will catch problems. The implicit logic: we do not need to monitor judgment internally because the external environment will alert us to degradation. Quality assurance is delegated to downstream signals.
PSF predicts a double-sided erosion. As organizational judgment degrades, the capacity to detect and interpret quality signals degrades alongside it. If the people who process user feedback are themselves subject to proxy seduction, the feedback loop the speaker relies on may already be compromised. Users habituated to AI-generated output may also lower their quality expectations without realizing it, weakening the very signal the organization is counting on. Who evaluates the feedback, and are they themselves subject to the same shift?
Reference to external validation mechanisms (user testing, market response, customer feedback, review cycles). Confidence that the organization's monitoring infrastructure will surface problems. No consideration of whether the monitoring infrastructure itself has been affected by the same process. Likely to surface in boundary activity interviews with people responsible for quality or operations.
UNCERTAIN. Diagnostics has externally verifiable ground truth (biopsy, culture, patient outcomes at follow-up) that software lacks. This could make the feedback confidence trap weaker (because you can actually check) or could make it stronger (because the existence of ground truth creates an illusion of closed-loop quality assurance that may not be operationalized). Whether diagnostic centers actually track outcomes against AI predictions is an empirical question.
Regulatory approval or institutional endorsement is treated as evidence of clinical benefit. FDA clearance, WHO recommendation, or CDSCO classification functions as a proxy for "this tool works for patients," when in fact the regulatory pathway (510(k) substantial equivalence, analytical validity testing) requires demonstration of technical accuracy, not patient outcomes. The regulatory imprimatur confers legitimacy that forecloses further inquiry into whether the tool actually improves health.
FDA clearance of an AI diagnostic device means the device performs technically as claimed. It does not mean the device improves patient outcomes. 97% of FDA-cleared AI devices enter via 510(k), requiring no outcome evidence. The CAD precedent demonstrates that a technology can be FDA-cleared, Medicare-reimbursed, and adopted by 80%+ of US screening facilities while providing no patient benefit. What would it take for regulatory approval to function as a genuine quality signal rather than a proxy for one?
Citation of regulatory clearances or endorsements as sufficient evidence of value. "FDA-cleared" or "WHO-recommended" used as conversation-stoppers when clinical benefit is questioned. Vendor marketing that leads with regulatory credentials rather than outcome data. Procurement decisions where regulatory status substitutes for independent clinical evaluation.
UNCERTAIN. This trap may be specific to regulated domains (healthcare, pharmaceuticals, aviation) where formal approval processes exist. Software development has no equivalent regulatory gatekeeper conferring legitimacy. If this trap is absent in unregulated domains and present in regulated ones, it reveals a domain-specific amplifier of proxy substitution: the regulatory apparatus itself becomes a site where proxies are institutionally constituted.
The absence of human capacity is presented as both the rationale and the authorization for AI deployment. The workforce shortage makes deployment not just inevitable but morally required, simultaneously eliminating the option of not deploying and making quality questions seem secondary to access questions. The discourse move: access to an AI-assisted diagnosis is better than no diagnosis at all. The unstated assumption: that the comparison set is "AI diagnosis vs. no diagnosis" rather than "AI diagnosis vs. alternative resource allocation."
The shortage is genuine and the access argument has moral weight. The counter-question is not whether AI should be deployed (it should, where evidence supports it) but whether the urgency of deployment forecloses the quality question. When necessity-based deployment creates conditions where the "human-in-the-loop" is already overwhelmed, the quality assurance model faces maximum stress. Is an AI-assisted diagnosis with contaminated feedback loops and automation bias actually better than no diagnosis, or does it create different kinds of harm that the urgency framing makes harder to name?
Workforce shortage statistics cited as sufficient justification for deployment. Framing that positions any questioning of AI quality as implicitly arguing against access for underserved populations. The moral weight of the access argument used to shut down inquiry into whether the access being provided delivers the clinical benefit being promised. Language of "better than nothing" or "some diagnosis is better than no diagnosis."
UNCERTAIN. This trap may be specific to domains with severe workforce shortages and life-critical stakes (healthcare, emergency services, certain infrastructure domains). Software development does not face an equivalent moral urgency, which may explain why this trap was not observed in the software evidence base. If confirmed as domain-specific, it identifies a structural amplifier of proxy seduction: moral urgency makes proxies harder to question, not because the urgency is false, but because questioning feels ethically irresponsible.
Diagnostic accuracy metrics (sensitivity, specificity, AUC, concordance) are presented as if they were patient outcomes. The entire Fryback-Thornbury hierarchy (six levels from technical efficacy to societal effectiveness) is collapsed into Level 2. A published sensitivity figure functions as a conversation-stopper: the number carries epistemic authority that forecloses the question of whether accurate detection translates to clinical benefit through the chain of treatment decisions, patient behavior, and system capacity that lies between diagnosis and health outcome.
If AI detects a condition with 97% sensitivity, what happens next? Does detection lead to timely treatment? Does the health system have capacity to act on the finding? Does the patient have access to the indicated intervention? The colonoscopy evidence is instructive: AI increased adenoma detection by 20% while producing no change in advanced neoplasia detection, because the additional findings were clinically insignificant. Accuracy is a necessary condition for clinical benefit but emphatically not a sufficient one. What institutional mechanisms connect detection to outcome?
Published accuracy figures cited without outcome data. Vendor marketing organized around AUC or sensitivity numbers. Procurement and deployment decisions based on accuracy benchmarks. Research publications that measure accuracy as their primary endpoint and discuss clinical impact only speculatively. The absence of outcome data treated as a gap to be filled later rather than as evidence that the value proposition remains undemonstrated.
Predicted MECHANISM-LEVEL, though the specific form varies by domain. In software development, the equivalent is "test coverage" or "lines of code reviewed" functioning as proxies for code quality. In customer service, it is "resolution rate" or "handle time" substituting for customer satisfaction. In diagnostics, accuracy metrics carry more epistemic authority because they are quantified, published, and peer-reviewed, but the structural move (surrogate metric treated as the thing it surrogates for) is the same PSF mechanism. The diagnostics version is more seductive because the numbers look more scientific.
These traps govern how discourse patterns gain authority, spread, and resist correction across actors. Unlike traps 1 through 14, which operate within a single actor's discourse, these three operate between actors. The Legitimation Circuit amplifies traps through mutual validation across academic and practitioner registers. Performative Constitution creates new roles and criteria through institutionally authoritative naming. Domain Expertise Dismissal protects trap clusters from the practitioners most likely to see through them. These are the mechanisms by which individual-level traps become field-level norms.
Unlike the other traps, the Legitimation Circuit is not an individual discursive move. It is a trap-amplification mechanism that operates between actors across different registers (academic, practitioner, executive). A practitioner retrofits their existing role or practice to an academic concept. The academic endorses the retrofit (by liking, citing, or acknowledging). The practitioner gains scholarly authority for their claim. The academic gains empirical validation for their concept. Every trap embedded in the original claim now carries both experiential and scholarly endorsement, making it harder to challenge from either register.
Does the mutual endorsement constitute evidence, or does it create the appearance of evidence through social validation? The circuit is closed: the academic concept is validated by a practitioner who benefits from the validation, and the practitioner is endorsed by an academic who benefits from the endorsement. Neither party has independent reason to test whether the underlying claim (that the role, skill, or complementarity is stable under AI engagement) actually holds. What empirical test could break the circuit? If none exists within the exchange, the circuit is performative rather than evidential.
Cross-register endorsement (academic liking a practitioner post, practitioner citing an academic at a conference, executive quoting research in a strategy deck). Mutual benefit visible in the exchange. The claim gains authority from both experiential ("I was already doing this") and scholarly ("the research confirms") directions simultaneously. The endorsement can substitute for empirical testing.
WHO endorsement of Qure.ai, ICMR guidelines citing AI accuracy studies, vendor marketing citing WHO endorsement. Academic validation and institutional authority create reinforcing loops that insulate embedded proxy substitutions from challenge. Predicted wherever institutional actors and commercial actors can cite each other as evidence of legitimacy.
Unlike Transition Naturalization (which packages a discontinuity as a journey between existing states), Performative Constitution creates the destination through the act of naming it. The speaker does not describe a role that already exists with established quality criteria. The speaker brings the role, skill category, or professional identity into being by declaring it. Because the declaration comes from a position of authority (Chief AI Officer, VP of Engineering, research director), the naming creates reality rather than describing it. Quality criteria for the new role are born already looking legitimate, before anyone can assess whether they predict actual performance.
Do the evaluation criteria for the new role exist, or does the confident naming create the illusion that they do? The post performs the existence of evaluative standards that have not been developed. Questions framed as job criteria ("Can you design evals? Can you curate context windows?") read as though the assessment framework is already in place, when in fact nobody yet knows what "good" looks like for these activities. By what criteria would you judge whether someone is a good Context Engineer versus a bad one, and who developed those criteria through what practice?
Novel role names declared with confidence from institutionally authoritative positions. Criteria-shaped questions that lack measurable referents. Language of inevitability ("the future belongs to..."). The role name begins appearing in job descriptions, curriculum proposals, and self-assessments before any evidence base for its validity exists. The speed of adoption is itself a marker: legitimate evaluation criteria develop slowly through practice, while performatively constituted ones spread through discourse.
BODH benchmarking platform creates evaluation criteria for AI health solutions. The criteria themselves may constitute what counts as "good" AI diagnostics before independent evidence of patient benefit exists. The act of benchmarking brings evaluative standards into being. Predicted wherever new roles, metrics, or standards are created to evaluate a transformative technology.
This trap operates as a maintenance mechanism: it protects other traps from challenge. When a practitioner with hands-on domain experience pushes back against a trap-laden analysis, the response is to delegitimize the challenger rather than engage the challenge. The dismissal can be epistemic ("you didn't read the article"), credentialist ("you don't have the right background for this conversation"), or, most revealingly, ontological ("your comment was generated by AI"). The effect is to keep the trap cluster intact by removing the one voice positioned to see through it.
Is the dismissal responding to the quality of the challenge, or to the threat it poses to the speaker's framing? Practitioners with direct experience in the domain being discussed may be better positioned to detect proxy substitution, because they developed their judgment through the practice activities in question. When their challenges are dismissed rather than engaged, the space for detecting the problem shrinks. What would it mean if the people most likely to spot the problem are systematically excluded from the conversation?
Ad hominem responses to substantive challenges. Escalation from engagement to dismissal when pushback persists. The dismissed challenger has more domain-specific experience than the original speaker. The accusation that the challenge was "AI-generated" is a particularly diagnostic marker: it uses AI itself as the delegitimizing instrument, implying that only a machine would question the speaker's framing of AI's effects.
Clinicians who raise concerns about AI diagnostic quality may be dismissed as technophobic, resistant to change, or protecting professional turf. The dismissal protects the proxy substitution from the practitioners best positioned to detect it. Predicted wherever domain experts question technology claims and can be reframed as self-interested resisters.
The genuine life-and-death stakes of the deployment context are invoked to foreclose questioning of proxy metrics. Because the need is real and urgent (TB kills, diabetic retinopathy causes preventable blindness, cancer screening saves lives), asking "are we measuring the right thing?" feels morally irresponsible. The discourse move is not to deny that measurement matters but to position it as a luxury that the urgency of the situation does not permit. The moral weight of the access argument silences the quality question.
The urgency is real. The moral weight is genuine. The counter-question is whether urgency makes proxy metrics more reliable or less. PSF predicts the latter: moral urgency makes proxy substitution harder to detect because questioning feels ethically costly. The practitioner or policymaker who asks "but does AI-assisted diagnosis actually improve patient outcomes?" when thousands lack access to any diagnosis bears a social cost for the asking. The moral urgency does not change whether throughput metrics track patient benefit. It changes whether anyone is positioned to notice that they do not.
Workforce shortage statistics paired with deployment celebration without outcome data. Framing that treats "access to AI diagnosis" and "access to quality diagnosis" as equivalent. Language that positions skeptics as implicitly arguing against access for underserved populations. The moral authority of the deployment context used to insulate operational metrics from scrutiny. May co-occur with Shortage-as-Authorization (20) and Metrics Improvement (10).
UNCERTAIN. This trap may be specific to life-critical or high-social-impact domains where deployment need carries moral weight. Software development does not have an equivalent: no one dies from slower code deployment. If confirmed as domain-specific, it identifies what may be PSF's most important boundary condition: the mechanism operates differently (is harder to detect, harder to question) in domains where the deployment need is morally urgent. Alternatively, software domains may have weaker analogues (competitive urgency, employment urgency) that perform similar discursive work at lower intensity. This distinction requires empirical testing.
Traps do not appear in isolation. A single passage of practitioner discourse routinely deploys three, four, or five traps simultaneously, and the clustering is itself analytically significant. The matrix below tracks which trap pairs have been observed co-occurring in the same text (solid) and which PSF predicts should co-occur (dashed). Cells populate from the cluster log below: every multi-trap instance updates the relevant cells.
Each entry records a field instance where multiple traps appeared together. The cluster, not the individual trap, is often the primary unit of analysis. A practitioner deploying five traps in a single post tells you something different from five practitioners each deploying one.