Study 1 · Fieldwork Instrument

Coding Rubric

The falsification spine states, at the level of the study, what would sink each research question. This rubric operationalizes it at the level of the transcript: what a coder treats as confirming the mechanism, what counts as disconfirming, and, where the two are easily confused, the exact feature that separates them.

Applied: in analysis only Never: visible to participants or used to sort Spine: Position 1/2/3 discrimination test Source: Study 1 Research Design

The discrimination test: Position 1, 2, 3

A single discipline runs through all three RQs and must be stated before the rubric itself. How an accountable criterion appears in the transcript places it in one of three positions, and the position governs everything below. Coders mark every criterion-reach as Position 1, 2, or 3 before applying anything else. The slide from Position 1 toward Position 3 across a transcript is itself a reading on how displaced the stock is.

Position	How the criterion appears	What it evidences
Pos 1 unprompted	The participant volunteers the criterion with no case supplied.	Judgment stock is live and undisplaced.
Pos 2 self-supplied real case	The participant produces a real lived example (satisfying work in Move 1, a clean-looking failure in Move 2) and the criterion surfaces as they narrate it.	Judgment stock is reachable and reasonably live. Weaker than Pos 1 (the criterion came with an occasion, not spontaneously), stronger than Pos 3 (the occasion was the participant's own).
Pos 3 interviewer-supplied hypothetical	The criterion appears only after the interviewer springs a case.	Reachability-under-probe. Confirms proxy seduction (latent but retrievable when occasioned). This is exactly what separates proxy seduction from skill atrophy, where the criterion does not appear even at Position 3.

The line that separates the two competing explanations

Under proxy seduction the criterion is latent but retrievable when a concrete case is supplied, so it surfaces at Position 3. Under skill atrophy the criterion does not appear even at Position 3. The probe is built so that the difference is visible in the transcript rather than asserted.

RQ1 (judgment): does consequence exposure protect, and where does it not?

Confirming signature

A high-consequence-exposure participant reaches, unprompted, for accountable criteria on the dimensions their formation covers (the work held under conditions the tests did not cover, it solved the actual downstream problem, a named past failure taught them what to check), and shows the protection running out on the dimensions the AI-engaged workflow now makes more legible (the METR workflow-speed pattern: even an experienced participant cannot draw consequence-based judgment on AI-mediated work pace or volume). Selective protection, not general protection, is the confirming pattern.

Disconfirming, Grain 1 · between-cell (the population-level falsification condition)

Across the sample, high-consequence-exposure participants show no detectable difference from AI-native participants in how readily they reach for accountable criteria or in how the quality probe resolves. If veterans and newcomers are indistinguishable at the population level, judgment stock does no protective work as a population property, and RQ1's comparative claim fails regardless of any individual transcript. This is the primary condition, and the sampling design is built to power exactly this contrast.

Disconfirming, Grain 2 · within-transcript (the case-level counter-instance)

A high-consequence-exposure participant who reaches fluently and specifically for accountable criteria at Pos 1 across all three moves, volunteering real downstream consequences with no case handed to them, distinguishing proxy from criterion on their own, and articulating a current live standard behind "done," is a counter-instance to the erosion claim for that case. A participant who reaches the criterion at Pos 2 is not a clean counter-instance: self-supplied reachability is partial confirmation that the stock is live, not disconfirmation. The counter-instance requires Position 1. These are expected to be rare, reported as disconfirming cases, and counted against the cell.

The negative case, defined in advance

An un-eroded, high-judgment-stock practitioner is one who (a) reaches for accountable criteria unprompted in Move 1, (b) produces a genuine passed-every-check-and-still-failed instance in Move 2 on their own, narrating how they knew independent of any check, and (c) in Move 3 holds a standard for "done" distinct from "everything passed" that they can show is current, not a war story from years ago. All three, unprompted, is the negative case. Meeting it is what a transcript must do to count against erosion for that participant.

RQ2 (propagation): do elevated metrics travel?

Confirming signature

A boundary-activity participant's account shows engagement-elevated metrics institutionalizing into dashboards, reviews, or rewards, and shows accountable criteria with no legible channel upward dropping out at the handoff. This is visible as a gap between what they track and what they pass along (Move 1), and as a persisting label ("a good review," "done") whose operational content has migrated to the proxy metric while the label stays put (adaptive instability). The criterion the metric was meant to track either survives in the person carrying it upward or only the metric remains.

Disconfirming

Accountable criteria travel upward intact alongside the elevated metrics, with no systematic drop at the handoff, and labels still carry the operational content they originally named. If what gets passed up preserves the criterion rather than substituting the proxy for it, the propagation claim does not hold.

Frame separation

The rubric is applied in analysis only. It is never visible to participants and never shapes recruitment, consistent with the screening-frame / coding-frame separation set out in the Study 1 design. The screening frame sorts on observables, the coding frame holds the prediction, and the two never mix, which is what keeps a thin cell reportable as a finding about the population rather than a recruitment failure.

Derived from PSF Fieldwork: Study 1, Coding Rubric: Confirming and Disconfirming Evidence. See the Interview Protocol for the moves these positions are read from, and the Study 1 Research Design for the cell structure and falsification spine.