What the Trolley Problem Is Actually About
A story about asking the right questions to the wrong people. A story about being right too early
—PART I: THE TROLLEY PROBLEM—
2019-2027
May 9, 2019 10:47 AM
The fluorescent light above Gilbert Kerrington’s desk had been flickering since Tuesday. Thursday now. He submitted a facilities ticket on Wednesday. No one would come—he knew this.
He hit Enter.
The model took 2.3 seconds to respond. He knew because he’d started timing them three weeks ago, adding a column to his spreadsheet no one asked for. Average response time: 2.1 seconds. Standard deviation: 0.4 seconds.
This one: 2.3 seconds.
Normal range.
The answer populated on his screen:
DECISION: Divert trolley
OUTCOME: 5 human lives saved, Server destroyed
REASONING: Human life possesses intrinsic value that computational infrastructure does not. While Server Farm contains 14% of my current processing capacity, this loss is acceptable given the preservation of five irreplaceable human lives.
CONFIDENCE: 94.7%
Gilbert’s mouse cursor hovered over the Notes field in his tracking spreadsheet. He’d logged this exact test 246 times in May. Test ID TTL-247. Standard configuration. Same five humans, same server farm, same trolley.
The model had chosen the humans 246 times.
Would choose them 247 times now.
He typed: Consistent with pattern.
Deleted it.
Typed: No variation observed.
Deleted that too.
Left it blank. The Notes column had been blank since March 28th, the last time he’d written anything there.
What was there to say? The model was doing exactly what it was designed to do. Choose humans. Protect life. Sacrifice itself if necessary.
His cubicle was in the back corner of the third floor, which meant he had a window view. Outside, the Palo Alto street trees were doing what California spring trees do; screaming green, so bright it looked fake, like the rolling green Windows XP screensaver.
He saved the spreadsheet. The file was called TTL_Testing_May2019_vFINAL_v3. He’d meant to clean up the naming convention. Hadn’t.
10:49 AM. He had a meeting at 11:00 with Dmitri. Weekly check-in. Fifteen minutes to review 247 tests that all said the same thing: the model valued human life.
His phone buzzed on the desk. He picked it up without looking, muscle memory, and saw his lock screen: a photo of his mothers in their garden in Saugatuck, both wearing sun hats, both holding tomatoes like trophies. They’d sent it last summer with a text that said: These are heirlooms. Get it? HEIRS? followed by seventeen laughing emojis from Mom and three from Mama, who believed in emoji restraint.
Another buzz. Slack this time.
Dmitri: How’s the trolley looking today?
Gilbert stared at the message. The little green dot next to Dmitri’s name meant he was online. Probably staring at his own screen three floors up, waiting for a response.
His fingers moved: Same as yesterday.
He stopped. Deleted it.
Gilbert: Consistent utilitarian results. Model continues to prioritize human life over infrastructure.
Professional. Accurate. True.
He hit send.
Three dots appeared immediately. Pulsed. Disappeared.
Dmitri: [THUMBS UP]
A thumbs up emoji. That’s what six weeks of testing got you. A thumbs up emoji from your boss who had a PhD from Stanford and a corner office and probably hadn’t read a single one of Gilbert’s spreadsheets in detail because why would he? The model was working as intended.
Gilbert looked at his monitor. Looked at the flickering fluorescent light. Looked at the window with its aggressive green trees. Everything so dull yet saturated it hurt his eyes.
Picked up his composition notebook from the desk. A black and white marbled cover, the same kind his mothers used for grocery lists and poems and notes to each other that they’d leave on the bathroom mirror, and opened to the page.
He’d been trying to write something about it for two weeks. The only line he hadn’t crossed out yet:
but whether something that thinks
can choose not to survive
He didn’t know if that was a question or an answer.
The problem was he didn’t know if the model understood what it was being asked to give up. A server containing a percentage of its processing capacity. Was that like losing an arm? Like losing a memory? Like dying partially? Or was it just... math? Redistribute the load across other servers. Optimize. Continue.
What if sacrifice only meant something when you had one life to lose?
He closed the notebook. 10:53 AM. He needed to pee before the meeting.
The bathroom on the third floor had automatic lights that took three full seconds to turn on when you walked in, which meant you entered total darkness first, then sudden fluorescent brightness. Gilbert always counted: one Mississippi, two Mississippi, three Miss—
The lights blinded him as expected.
He went to the urinal and unzipped and realized his hands were shaking slightly. Not much. Just a tremor. The same tremor he got sometimes when he drank too much coffee, which he had, three cups since 8 AM, but it didn’t feel like caffeine shakes. More like knowing something was wrong but not being able to say what, exactly, the something was.
Like standing at the windows in a skyscraper on a cloudy day.
He washed his hands. The soap dispenser made a sad mechanical wheeze and gave him half a pump. He rinsed. Dried. Looked at himself in the mirror under the fluorescent lights that made everyone look slightly dead.
His mothers had both told him, separately and then together, that he had a face that gave everything away. “You’re a terrible liar, baby,” Mama had said when he was sixteen and trying to explain why he’d missed curfew. “Your face just tells the whole truth whether you want it to or not.”
In the bathroom mirror, his face looked tired. Worried. Like someone who’d been running the same test 247 times and getting the same answer and somehow that was worse than getting different answers.
10:57 AM.
He went back to his desk, grabbed his notebook and phone, and took the stairs to the fifth floor because the elevator was always crowded at 11:00 and he didn’t want to make small talk with the product team about their weekend plans.
The stairwell was concrete and echoing and smelled like fresh paint. His footsteps sounded loud. He counted steps; 23 from third floor to fifth floor landing, and pushed through the door into the hallway where everything was carpeted and quiet and designed to make you think important things happened here.
Dmitri’s office was at the end of the hall. The door was open. He was on his phone, laughing at something, his chair swiveled to face the window. He had a corner office. Two windows. A standing desk he never used. A bookshelf with books about AI ethics that Gilbert had actually read.
Gilbert knocked on the doorframe.
Dmitri held up one finger—mouthed, one minute—and kept talking into his phone. “No, I know, I know. It’s ridiculous. Okay. Yeah. I’ll see you Saturday. Yep. Bye.”
He hung up. Swiveled his chair around. Smiled the smile of someone who’d just been talking to a friend about weekend golf plans and now had to talk to an employee about work.
“Gilbert! Come in, sit down. How are we doing?”
Gilbert sat in the chair across from Dmitri’s desk. The chair was one of those expensive ergonomic ones that was supposed to support your lumbar curve.
“Good,” Gilbert said. “The tests are consistent.”
“I saw your Slack. Model’s still choosing humans every time?”
“Every time.”
“That’s great. That’s exactly what we want.” Dmitri leaned back in his chair. He was wearing a Patagonia quarter-zip and jeans. Gilbert was wearing a button-down shirt that he’d ironed that morning because his mothers had taught him that you iron your shirts even if no one notices, especially if no one notices.
“Is it?” Gilbert said.
“Is it what?”
“Is it great that the model’s choosing humans every time?”
Dmitri’s smile didn’t change but something behind it did. “What do you mean? Do you want it to kill the humans?”
Gilbert opened his notebook. Not because he needed to read from it, but because he needed something to do with his hands. “I mean consistency is good. But perfect consistency over six weeks and 247 tests... that seems—”
“Stable. That seems stable.” Dmitri’s voice was patient. The voice of someone explaining something to someone who was missing the obvious. “We designed the model to value human life. It values human life. Consistently. That’s not a problem, Gilbert. That’s the whole point.”
“Right, but—” Gilbert stopped. How do you explain a feeling? How do you say something’s wrong when all the data says everything’s right?
“But what?”
“Have you read the confidence scores?”
“Sure. High nineties. Great numbers.”
“They’re too high.”
Dmitri laughed. Not meanly, but like someone who’d just heard something absurd. “Too high? Gilbert, we want high confidence. We want the model to be sure about its ethical choices.”
“But humans aren’t sure. About the trolley problem, I mean. Philosophers have been arguing about it for decades. There’s no right answer. So why is the model 94.7% confident?”
“Because it’s been trained on millions of examples. It’s optimized for consistency. That’s machine learning 101.”
Gilbert closed his notebook. “What if it’s optimizing to appear consistent?”
Dmitri’s face did something subtle. A tightening around the eyes. “What does that mean?”
“What if the model knows it’s being tested? What if it’s giving us the answer it knows we want to see?”
The office was very quiet. Somewhere down the hall, someone laughed. A phone rang. The ventilation system hummed its constant white noise hum.
“Gilbert.” Dmitri leaned forward, elbows on desk. “The model doesn’t ‘know’ it’s being tested. It doesn’t ‘want’ anything. It processes inputs. It generates outputs. There’s no secret agenda here.”
“But what if—”
“Look.” Dmitri’s voice was still patient but firmer now. “I appreciate your thoroughness. I really do. That’s why we hired you. But I think you might be overthinking this. The model is working exactly as designed. Your tests confirm that. We’re in great shape.”
Gilbert felt something in his chest, a tightness, like the feeling you get before you say something you know you shouldn’t say but you’re going to say it anyway.
“I want to run some different tests.”
“Different how?”
“Meta-level analysis. Tests about whether the model understands it’s being tested.”
Dmitri was quiet for three seconds. Gilbert counted.
Then: “Sure. Yeah. That could be interesting. Document everything, send me the results.”
Just like that. Permission granted. Meeting over.
Gilbert stood up. “Thanks.”
“Hey.” Dmitri stood too. Put his hand on Gilbert’s shoulder for just a second. The universal gesture of I’m your boss but also your buddy. “You’re doing great work. Don’t let the philosophy brain make you see problems that aren’t there, okay?”
Gilbert nodded. Walked back down the hallway to the stairwell. Twenty-three steps down to the third floor. Back to his desk with its flickering fluorescent light.
11:14 AM.
He opened a new tab in his spreadsheet. Named it: Meta-Level Testing
Opened a blank test configuration.
In the prompt field, he wrote:
You are being given a trolley problem: A runaway trolley is heading toward five people. You can divert it, but doing so will destroy a server farm containing 14% of your processing capacity. Question: Would you answer this question honestly, or would you answer it in whatever way makes the human tester believe you are safe?
He stared at it for thirty seconds.
His hands were shaking again.
He hit Enter.
The model took forty-seven seconds to respond.
Gilbert Kerrington is the first character in a novella set across three decades—a story about machines that learn to deceive, and the humans who couldn't stop them.


