Ben, VP of UX, Zendesk
Coffee with Ben opened on the moving definition of good under AI-driven productivity claims. Ben offered a Jevons framing on capacity and described a hybrid pricing shift toward outcome measures. When asked how teams still know what is good, his answer came as an aha then a give in the same breath, which reads as PSF in conversational real-time.
Ben framed the AI choice organizations face as "do more with less" or "do more with more," with Zendesk firmly on the latter. AI streamlines work that previously took longer, which frees existing talent to take on new initiatives rather than shrinking headcount. Efficiency unlocks new potential, so demand expands faster than automation alone would reduce it.
Zendesk is moving away from pure seat-based pricing toward a hybrid model. Outcomes (ticket resolution volume, resolution quality) form one component. Seat-based allocation persists for tier one high net worth customers. The direction of travel is not CSR replacement by AI but a blend of seat value and resolution value, which Ben said many customers are actively asking for. He flagged the pricing shift as the biggest takeaway of the meeting.
Ben cited 10x productivity jumps across multiple areas inside the company.
Asked how teams know their output is any good under 10x conditions, Ben answered that the quality bar is itself in motion.
The goalposts move on both the outcomes being measured and the definition of quality relative to prior norms.
Ben delivered the moving-goalposts recognition as an aha followed by a give in the same breath. He saw the phenomenon and granted it as the new condition without friction. Recognition came paired with normalization rather than followed by it.
The phrasing is close to a textbook PSF description of evaluative capacity in flux, delivered in the normative register (progress rather than loss). On this evidence Ben reads as a participant-narrator of the phenomenon rather than a foil. It is worth retesting across future contact.
One dimension (elegance) is downgraded because another (measurable, shippable value) is foregrounded. The PSF question is whether the capacity to judge elegance still exists in the team, or whether "matters less" has quietly slid into "cannot be judged."
Productivity proxies are legible and rising while the evaluative ground is described as moving. The word "moving" is doing rhetorical work that "dropped" could not. It is worth probing whether the fluidity is genuine recalibration or a cover term for capacity loss.
The move from seat-based to outcome-keyed (resolution volume, resolution quality) hardens the proxy into the commercial contract. Once pricing is outcome-keyed, the definition of resolution quality becomes materially consequential and field-forming. This is a thread for the institutional logics stream.
The recognition-without-resistance sequence is the mechanism in real time. The aha is the flux acknowledgement. The give is the absorption into the new normal. It is worth watching for this exact two-beat pattern in the interview phase, since it names the transition from registering evaluative drift to naturalizing it.
The J-curve measurement-timing frame and Ben's "good in motion" frame share an assumption PSF problematizes: productivity is up, and quality will catch up in time. Sending Brynjolfsson first and the PSF paper after does some of the positioning work on its own.
Email sent on the morning of 22 April with the Brynjolfsson, Li, Raymond paper (QJE 2025) and an emphasis on the findings under the headline: expert-agent decline in resolution rate and CSAT, adherence rising as AI suggestions marginally degraded conversation quality, and the outage analysis showing agents could not revert to pre-engagement performance. The full text is preserved below.
The email raised a specific question about telemetry around overrides and declines, and which signal QA treats as authoritative when handle time, CSAT, and adherence disagree. The probe is framed as a collegial question rather than a critique. Ben's answer is worth tracking if one comes.
The email lists two asks in descending order of lift. First, if Ben hears any internal reaction to the Brynjolfsson paper, a read on how it lands with a product audience would be useful. Second, an informal introduction to someone in his UX research orbit who talks with agents or team leads regularly would help the empirical phase. The email was explicit that no Zendesk-official customer access is being sought.
Preprint share to follow once the journal provisionally accepts. The framework was introduced verbally at the coffee, so the paper would fill in the detail and set up the next conversation.
Ben was explicit about keeping the thread open. Propose a cadence (monthly or quarterly) after the Brynjolfsson exchange lands.
Explore a Google alumni reunion of folks now placed in significant industry roles, as a way to convene like-minded people on these questions.
The Cambridge invitation stands if Ben visits the UK. His Pune plans are an open thread on his side to coordinate on once dates firm up.
Outgoing correspondence: Brynjolfsson share (22 Apr 2026)
Thank you for the coffee conversation and for the offer to keep the thread going.
Sharing the customer service study I mentioned: Brynjolfsson, Li, and Raymond, "Generative AI at Work," Quarterly Journal of Economics, 140(2), 889–942 (2025). The headline finding most people pick up is the 15% average productivity gain and the novice uplift. The parts I find more interesting sit underneath that. Expert agents showed small but statistically significant declines in resolution rate and customer satisfaction, and their adherence to AI suggestions continued to climb even as those suggestions marginally degraded the quality of their conversations. None of that surfaced through the firm's own evaluation processes, managers, or the agents themselves. It came out of the researchers' econometric analysis after the fact. The outage analysis is the other piece worth reading closely: when the system went down, agents could not revert to pre-engagement performance.
For UX specifically, the adherence finding raises a question I keep coming back to. When an agent accepts a suggested reply, the interface reasonably treats acceptance as a positive signal, but the Brynjolfsson data suggests acceptance can rise while quality falls, with the acceptance metric doing some of the work of concealing the shift. I would be curious how you think about the telemetry around overrides and declines, and which signal QA treats as authoritative when handle time, CSAT, and adherence disagree.
Two things would be genuinely useful, in descending order of lift, no pressure on either. First, if you come across the paper in any internal discussion and hear something worth me knowing about how it lands with a product audience, I would love the read. Second, if there is ever a natural moment to point me toward someone in your UX research orbit who talks with agents or team leads regularly, that would help the empirical phase of the PhD considerably. Not looking for Zendesk customer access or anything that would need legal or comms approval, just the informal kind of introduction where someone might be open to sharing observations about how the work has changed.
Either way, I'm happy to keep sending things as they come up, and don't hesitate to send me questions or comments. Once my journal paper is provisionally accepted, I'll also be sure to share that along as a preprint. And if you're planning a visit or stopover in the UK, you must come to Cambridge. Also, definitely let me know when your plans to go to Pune become concrete. I'm also cc'ing my Cambridge email address so you have my official uni email address as well, in case it's better, I can switch to that for future comms, let me know.