Research Materials
Vikram Bapat · University of Cambridge
← Theory Hub
PAPER PIPELINE · CONCEPT STUB

Design Moves

Interrupting the mechanisms of evaluative capacity erosion

Working draft for colleague review · Not for citation

Constructive companion to the Proxy Seduction Framework. This stub captures the spine of the argument as it stands. Section 2 (foils) and full integration with the wider PSF evidence constellation are pending revision.
Abstract

The Proxy Seduction Framework identifies a mechanism by which AI engagement erodes the evaluative capacity organizations need to assess their own work. Current interventions (literacy training, governance frameworks, oversight reviews) presuppose that the practitioner's frame is roughly correct and intervene within it. They are refinement aids in a situation that calls for what L.A. Paul and colleagues, in a computational treatment of individual agent problem-solving, name recentering: switching the problem one is solving rather than refining within the current one. This document offers a portfolio of design moves targeting the three capacities PSF locates at the site of the erosion (detection, judgment stock, braking) and the conditions under which the portfolio fails. The moves are derived from the existing PSF mechanism rather than imported from adjacent literatures, and are positioned against the three prevailing accounts of AI engagement in organizations: stable-evaluator preservation, convergence, and naturalization. Each of those accounts captures a real piece of the empirical picture. None reaches the site PSF identifies. The document is offered as a working draft for colleague reaction rather than a finished argument.

Section 1

Framing

Most current responses to AI engagement in organizations treat the problem as one of refinement. Practitioners are assumed to be solving roughly the right problem and to need help solving it better. AI literacy training teaches the tools more skillfully. Governance committees set policy for which tools are permitted under which conditions. Oversight reviews check outputs against existing criteria. Each presupposes that the practitioner's frame is roughly correct and intervenes within it.

The Proxy Seduction Framework describes a different failure. Engagement with AI changes which criteria the practitioner attends to, and erodes the capacity that would have detected the change. The interventions this framing calls for are not refinement aids. They are interventions on what Paul et al. (2026), in a computational treatment of individual agent problem-solving, name recentering: switching the problem one is solving rather than refining within the current one.

Literature move

Paul's apparatus is built at the individual cognitive level (agents modeled as POMDPs, tested on video-game tasks). PSF operates at practitioner, organizational, and field levels. The borrow from Paul gives Design Moves the vocabulary of recentering at the individual practitioner level. The translation to organizational scale is the document's own contribution, not an assumption. The vocabulary travels. The formal apparatus does not.

Section 2

Detection

Detection is the capacity to notice that the criteria one is applying have shifted. It erodes because the post-engagement practitioner has no access to the pre-engagement baseline. What the practitioner was implicitly tracking before is now overwritten by what the workflow currently rewards. The METR perception-reality gap (39 points between pre-engagement expectation, post-engagement self-report, and objective outcome) is detection failure at the individual level. Practitioners cannot see the gap because the instrument that would have measured it is the same instrument that has been recalibrated by the engagement.

The moves on this capacity share one structural property: they restore an outside-the-frame comparison that the transformed self cannot provide for itself.

Drift detection from frozen baselines. A pre-engagement artifact (a preserved record of the practitioner's prior criteria, judgments, and attentional patterns) is held outside the workflow and consulted retrospectively to surface where current judgments have moved. The detector is not a validator. It does not judge whether the drift is sound. It timestamps the shift and narrows the question.

Archaeological sourcing. The baseline is mined from artifacts the practitioner produced for unrelated reasons (code review comments, design docs, post-mortems, hiring rationales) rather than constructed for the purpose of being a baseline. Construction bias disappears because nothing was constructed. The limitation is coverage. Practitioners with thin paper trails are hard to baseline. Some criteria show up only in artifacts the practitioner considered private at the time.

Cohort comparison. Individual baselines capture personal drift but miss the case where the whole field is moving and the practitioner's relative position is stable. A cohort of un-engaged peers provides a field-level signal. The hard part is identifying that cohort before it shrinks to zero, which it will under field-level convergence pressure.

Asymmetric visibility. The practitioner sees only a directional drift flag (your weighting on this dimension has shifted) with no quantification and no underlying detail. An auditor sees the full comparison. Gaming becomes harder because the practitioner does not know what they are being compared against on any given case.

What this family does not do. It does not restore the ability to act on detected drift. Knowing your standards have moved does not tell you whether to move them back, and even if you decided to, the capacity to enact older criteria may have atrophied. Detection without the other two capacities can produce chronic awareness of drift with no leverage to act on it, which tends to resolve through rationalization rather than correction.

Section 3

Judgment Stock

Judgment stock is the accumulated capacity to recognize good and bad work, built through repeated engagement with cases and their consequences. It erodes through a more direct substitution than detection's. The practitioner's role shifts from doing the work to reviewing AI-produced outputs. Reviewing develops a different capacity than doing. The boundary activity performer learns to recognize patterns of acceptable output without developing the criteria the patterns were once a proxy for. Over time, the patterns become the criteria.

The moves on this capacity share one structural feature: they preserve cases where the practitioner has to build or exercise judgment without AI mediation.

Pre-engagement work to defined depth. The practitioner completes some specified portion of the task before engaging AI: a draft, a working hypothesis, a first pass at the analysis. The depth is set so that judgment is exercised but the engagement still produces speed gains. The risk is that the pre-engagement portion becomes ceremonial, a hurdle to clear rather than a capacity to build.

Rotation through unaided cases. Some defined fraction of the practitioner's caseload is reserved for unaided work, treated as capacity maintenance rather than productivity loss. The fraction has to be large enough to keep judgment stock from atrophying and small enough that the organization can absorb the speed cost. What counts as large enough is currently unknown and is one of the empirical questions Design Moves opens up.

Inverse delegation. Junior practitioners do the AI-mediated work. Senior practitioners review without AI access. This inverts the usual delegation pattern in which seniors use AI and juniors absorb whatever is left. The aim is to keep senior judgment stock from being recalibrated by the proxy, and to preserve the apprenticeship pathway by which juniors will later need to develop their own judgment.

Apprenticeship redesign. The tacit learning that came from doing the work is captured explicitly before the doing gets automated. Case write-ups, decision logs, structured debriefs. The move is preservationist. It accepts that the doing will be lost and tries to render the learning recoverable from artifacts. The limitation is severe. Tacit knowledge is tacit precisely because it does not transfer cleanly to text. What survives the transfer is often the part that was already explicit.

What this family does not do. It does not address the institutional pressure to use AI even when doing so would atrophy capacity. Individual practitioners following these moves inside an organization that rewards speed and penalizes friction will find their compliance erodes. The moves require institutional commitment, which means they require the third capacity to be intact.

Section 4

Braking

Braking is the capacity to slow, stop, or reverse a course of action when something seems wrong. It erodes through a mechanism that compounds the other two. Speed itself becomes the proxy under AI engagement. The AI provides plausible justifications for continuing in the face of weak signals. The cost of stopping rises as workflows reorganize around AI engagement, because pausing the AI loop now pauses everything downstream. The boundary activity performer who would have braked has lost both the signal that would have triggered the brake (detection erosion) and the standing to act on the signal even if it arrived (judgment stock erosion compounded by institutional pressure).

The moves on this capacity share one structural property: they separate the authority to brake from the pressure to proceed.

Institutional friction points. Specific stages in the workflow where braking is permitted, expected, or required. Not optional pauses but mandatory checkpoints where the question "should we continue" is asked under conditions that allow the answer to be no. The risk is that the checkpoints become ceremonial, particularly under speed pressure. Mitigation requires the checkpoints to have teeth: someone whose role rewards stopping, and a procedural pathway that does not penalize them for using it.

Reversibility requirements. Work must remain redoable without AI engagement. The requirement constrains what kinds of AI engagement are permitted: not the ones that produce outputs the organization cannot reconstruct, evaluate, or correct without further AI mediation. The move is upstream of braking. It preserves the conditions under which braking is still possible.

Authority distribution. The person with the capacity to brake is not the same person under speed pressure. The reviewer is structurally distinct from the producer and is evaluated on different criteria. This is harder than it sounds. Most organizations have collapsed producer and reviewer roles under cost pressure even before AI engagement. The move requires reversing a trend, not just adding a position.

Independent review on a different clock. A review pathway that runs on a separate timeline from the production pathway, with explicit permission to take as long as the question requires. The independence is structural, not procedural.

What this family does not do. It does not address field-level convergence. When competitors have all engaged and stopping puts the organization out of business, internal braking moves cannot hold. The Barnesian performativity dynamic operates at the field level (Barnes, Callon, MacKenzie) and is not interruptible by organizational design alone.

Section 5

Conditions of Failure

The portfolio fails under four conditions worth naming.

First, field-level convergence outpacing organizational response. Once the field has reorganized around AI-mediated criteria, individual organizations face a stopping problem with no internal solution. PSF locates the most consequential erosion at the field level, and Design Moves at the organizational level can slow but not reverse it.

Second, construction bias. Any artifact built for the purpose of being a baseline, a reviewer, or a brake gets gamed once practitioners realize what it will be used for. The archaeological and asymmetric design constraints address this but do not eliminate it.

Third, capture of the boundary activity performer. If the person doing the work does not want the moves to succeed (because the moves slow them down, surface judgments they prefer left implicit, or reveal that current judgment has degraded), the moves will be subverted in ways that look like compliance. The detection moves are particularly exposed to this.

Fourth, detection without the other two capacities. Aware of drift, unable to act, rationalizing. This is plausibly worse than ignorance, because it produces the appearance of reflective practice while the substitution continues unimpeded.

References

Harvard style, alphabetical by first author. Verified against the PSF evidence constellation (May 2026). Entries flagged with [verify] are flagged in the constellation itself as having source detail that needs primary-source retrieval before final citation.

Anderson, B. R., Shah, J. H. and Kreminski, M. (2024) 'Homogenization effects of large language models on human creative ideation', in Creativity and Cognition, ACM, Chicago, pp. 413-425. Available at: https://doi.org/10.1145/3635636.3656204

Au Quan, A. (2026) 'LinkedIn post responding to Fast Company on AI and human work', LinkedIn, 10 April.

Axios AI+ Government (2026) 'AI+ Government newsletter', Axios, 10 April.

Bainbridge, L. (1983) 'Ironies of automation', Automatica, 19(6), pp. 775-779. Available at: https://doi.org/10.1016/0005-1098(83)90046-8

Bastani, H., Bastani, O. and Sungu, A. (2025) 'Generative AI without guardrails can harm learning: evidence from high school mathematics', Proceedings of the National Academy of Sciences, 122. Available at: https://doi.org/10.1073/pnas.2422633122

Bean, A. M., Payne, R. E., Parsons, G., Kirk, H. R., Ciro, J., Mosquera-Gómez, R., Hincapié M, S., Ekanayaka, A. S., Tarassenko, L., Rocher, L. and Mahdi, A. (2026) 'Reliability of LLMs as medical assistants for the general public: a randomized preregistered study', Nature Medicine, 32, pp. 609-615. Available at: https://doi.org/10.1038/s41591-025-04074-y

Beane, M. (2019) 'Shadow learning: building robotic surgical skill when approved means fail', Administrative Science Quarterly, 64(1), pp. 87-123. Available at: https://doi.org/10.1177/0001839217751692

Beane, M. (2024) The Skill Code: How to Save Human Ability in an Age of Intelligent Machines. New York: HarperBusiness.

Beane, M. and Anthony, C. (2024) 'Inverted apprenticeship: how senior occupational members develop practical expertise and preserve their position when new technologies arrive', Organization Science, 35, pp. 405-431. Available at: https://doi.org/10.1287/orsc.2023.1688

Bedard and Kropp (2026) AI implementation framework, Boston Consulting Group. [verify: full author initials and exact title pending source retrieval]

Brynjolfsson, E. (2025) Public commentary on AI productivity and the Turing Trap.

Brynjolfsson, E., Li, D. and Raymond, L. (2025) 'Generative AI at work', The Quarterly Journal of Economics, 140, pp. 889-942. Available at: https://doi.org/10.1093/qje/qjae044

Cabantous, L. and Gond, J.-P. (2011) 'Rational decision making as performative praxis: explaining rationality's éternel retour', Organization Science, 22(3), pp. 573-586. Available at: https://doi.org/10.1287/orsc.1100.0534

Callon, M. (1984) 'Some elements of a sociology of translation: domestication of the scallops and the fishermen of St Brieuc Bay', The Sociological Review, 32, pp. 196-233. Available at: https://doi.org/10.1111/j.1467-954X.1984.tb00113.x

Callon, M. (2007) 'What does it mean to say that economics is performative?', in MacKenzie, D., Muniesa, F. and Siu, L. (eds.) Do Economists Make Markets? On the Performativity of Economics. Princeton: Princeton University Press.

Cruces et al. (2026) 'Scaffolded, not internalized', NBER Working Paper No. 34851. Cambridge, MA: National Bureau of Economic Research. [verify: full author list pending source retrieval]

Dell'Acqua, F., Ayoubi, C., Lifshitz-Assaf, H., Sadun, R., Mollick, E. R., Mollick, L., Han, Y., Goldman, J., Nair, H., Taub, S. and Lakhani, K. R. (2026) 'Navigating the jagged technological frontier: field experimental evidence of the effects of artificial intelligence on knowledge worker productivity and quality', Organization Science, Articles in Advance. Available at: https://doi.org/10.1287/orsc.2025.21838

de la Croix (2026) AI effects diagnosis, King's College London. [verify: full citation pending source retrieval]

Doshi, A. R. and Hauser, O. P. (2024) 'Generative AI enhances individual creativity but reduces the collective diversity of novel content', Science Advances, 10. Available at: https://doi.org/10.1126/sciadv.adn5290

Endsley, M. R. (2023) 'Ironies of artificial intelligence', Ergonomics, 66, pp. 1656-1668. Available at: https://doi.org/10.1080/00140139.2023.2243404

Faulkner, P. and Runde, J. (2019) 'Theorizing the digital object', MIS Quarterly, 43(4), pp. 1279-1302. Available at: https://doi.org/10.25300/MISQ/2019/13136

Fernandes, D., Villa, S., Nicholls, S., Haavisto, O., Buschek, D., Schmidt, A., Kosch, T., Shen, C. and Welsch, R. (2026) 'AI makes you smarter but none the wiser: the disconnect between performance and metacognition', Computers in Human Behavior, 175. Available at: https://doi.org/10.1016/j.chb.2025.108779

Foss, N. (2026) 'AI isn't a rationalization machine, it's a motivation amplifier', Notes from a Strategy Scholar (substack), 17 April.

Frey, C. B. (2026) 'Public commentary on Shah & Levy', LinkedIn, May.

Grennan (2026) 'Your company has an AI PR problem', AI Mindset newsletter, 10 April.

Hallowell (2026) 'Multi-agent personas as workflow architecture', public commentary, March. [verify: full citation pending source retrieval]

Humlum, A. and Vestergaard, E. (2025) 'Large language models, small labor market effects', American Economic Review. [verify: volume, issue, pages]

Hussain (2026) Institutional adaptation analysis, Brookings Institution. [verify: full citation pending source retrieval]

Kang, X. and Kim, H. (2025) 'Machine predictions and causal explanations: evidence from a field experiment', Organization Science. Available at: https://doi.org/10.2139/ssrn.5520499

Krueger, D. and Sigman, A. (2026) Bitcoin One Million, Table 14.1.

Leonardi, P. M. (2011) 'Flexible routines meet flexible technologies', MIS Quarterly, 35(1), pp. 147-167.

Leonardi, P. M. and Leavell, V. (2026) 'Knowing enough to be dangerous: the problem of "artificial certainty" for expert authority when using AI for decision making and planning', Organization Science, Articles in Advance. Available at: https://doi.org/10.1287/orsc.2023.18224

List, J. (2026) Public commentary, field-experimental productivity framing. [verify: full citation pending source retrieval]

Liu et al. (2026) 'Causal evidence of rapid evaluative capacity erosion', preprint, arXiv:2604.04721. [verify: full author list and title pending source retrieval]

MacKenzie, D. (2006) 'Is economics performative? Option theory and the construction of derivatives markets', Journal of the History of Economic Thought, 28, pp. 29-55. Available at: https://doi.org/10.1080/10427710500509722

Meincke, L., Nave, G. and Terwiesch, C. (2025) 'ChatGPT decreases idea diversity in brainstorming', Nature Human Behaviour, 9, pp. 1107-1109. Available at: https://doi.org/10.1038/s41562-025-02173-x

Messeri, L. and Crockett, M. J. (2024) 'Artificial intelligence and illusions of understanding in scientific research', Nature, 627, pp. 49-58. Available at: https://doi.org/10.1038/s41586-024-07146-0

METR (2025) Measuring AI impact on developer productivity. Available at: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Mollick, E. (2025) Public commentary on AI augmentation, Wharton.

Moon, K., Suh, J. and Lim, S. (2025) 'Convergence robust to mitigation', Management Science.

Nguyen, C. T. (2020) Games: Agency as Art. New York: Oxford University Press; with related work in (2024) Journal of Ethics and Social Philosophy, 27(3); and (2026) The Score. New York: Penguin.

Paul, L. A. (2014) Transformative Experience. Oxford: Oxford University Press. Available at: https://doi.org/10.1093/acprof:oso/9780198717959.001.0001

Paul, L. A., Mills, T., Ullman, T. D., De Freitas, J., Colas, C. and Tenenbaum, J. B. (2026) 'Reverse engineering the centered self', Psychological Review, 133(4), pp. 919-956. Available at: https://doi.org/10.1037/rev0000623

Ranganathan and Ye (2026) Practitioner-facing diagnosis of AI's skill effects, Harvard Business Review. [verify: full author initials, title, issue pending source retrieval]

Rathje, S., Ye, X., Globig, L., Pillai, A., Oldemburgo de Mello, G. and Van Bavel, J. J. (2025) 'Sycophantic AI increases attitude extremity and overconfidence', PsyArXiv preprint. Available at: https://doi.org/10.31234/osf.io/vmyek_v1 (Invited revision at Nature.)

Shaw, S. D. and Nave, G. (2026) 'Thinking — fast, slow, and artificial: how AI is reshaping human reasoning and the rise of cognitive surrender', SSRN preprint 6097646.

Shen, J. H. and Tamkin, A. (2026) 'How AI impacts skill formation', preprint, arXiv. Available at: https://doi.org/10.48550/arXiv.2601.20245

Simkute, A., Tankelevitch, L., Kewenig, V., Scott, A. E., Sellen, A. and Rintel, S. (2024) 'Ironies of generative AI: understanding and mitigating productivity loss in human-AI interaction', International Journal of Human-Computer Interaction. Available at: https://doi.org/10.1080/10447318.2024.2405782

Stimmler (2026) Public commentary, right-shape diagnosis of AI engagement effects. [verify: full citation pending source retrieval]

Sziebert, C. (2026) Public commentary on AI engagement and the 18-Month Wall, Google Cloud AI. [verify: full citation pending source retrieval]

Thornton, P. H., Ocasio, W. and Lounsbury, M. (2012) The Institutional Logics Perspective: A New Approach to Culture, Structure and Process. Oxford: Oxford University Press. Available at: https://doi.org/10.1093/acprof:oso/9780199601936.001.0001

Williams, M.-A. (2026) 'Agentic AI bluffs by design', Stanford CodeX talk deck, May.

Williams, M.-A. (2026) 'AI agents as colleagues: the workplace design nobody's planning for', UNSW BusinessThink, 1 April.

Workday (2026) Beyond productivity: measuring the real value of AI, Workday, January.

Yegge, S. (2026) '8-level coder evolution', blog post, January.