Goodhart's Law in Competitive Robotics

In theory, a school’s competitive robotics program exists to optimize a specific, high-fidelity goal: to build elite engineers, foster genuine innovation, and construct machines capable of competing at the highest level. We can call this true goal $G(x)$ .

But $G(x)$ is incredibly difficult to measure. “Engineering excellence” does not fit neatly onto a school’s promotional banner. So, administration and program directors rely on a proxy metric—a boolean variable that is easy to track, easy to market, and historically correlated with success. We call this metric $M(x)$ .

In the ecosystem of high school robotics, $M(x)$ has exactly one state: Did you make Worlds?

For a while, this proxy works. But as British economist Charles Goodhart warned in 1975: “When a measure becomes a target, it ceases to be a good measure.” When a school wires its entire architecture of incentives—funding, prestige, and captainships—exclusively to $M(x)$ , the system inevitably collapses. You stop optimizing for the robot. You start optimizing for the dashboard.

Recently, our own school’s robotics program provided a flawless, brutal case study in Goodhart’s Law. It proved that when you blindly chase the metric, you don’t just reward mediocrity—you actively destroy your most high-performing systems.

VEX Overfitting

Our school hosts six VEX robotics teams. Historically, the pipeline is highly predictable: at least four of these teams consistently qualify for the World Championship.

When these four teams hit the $M(x)$ threshold, the institutional rewards are immediate. They receive the school’s praise, their members are elevated to captainships, and they are treated as the definitive blueprint for success. The two VEX teams that fail to qualify are shunned, their members labeled as “worse,” and they are often boxed out of premium team placements the following year.

But if you look past the binary metric and examine the actual international telemetry, the illusion shatters.

Of the four teams that make Worlds, usually only one or two is a genuinely elite, world-class competitor. The other three do not perform well on the international stage. They barely squeezed through the local qualification thresholds. They are, mathematically speaking, riding the wake of the single successful team, piggybacking on an institutional infrastructure built by their peers.

They did not achieve $G(x)$ (building a world-class robot). They merely optimized for $M(x)$ (crossing the local threshold to get the “Worlds” ticket). The school’s dashboard reads a 66% success rate, but the reality is a system bloated by false positives.

FTC Bottleneck

Until two months ago, our school also hosted two FIRST Tech Challenge (FTC) teams. My freshman year, I remember visiting our coach after we fell short of moving on after our state tournament. Being young, I didn’t have much skin in the game. I remember the teacher recounting a text from the principal. “They didn’t make worlds?”

If VEX and FTC were identical statistical environments, comparing their Worlds qualification rates might be a valid metric. But they aren’t. In our state, the FTC bottleneck is mathematically brutal. Only eight teams advance to Worlds from the entire state—and prior to this year, that number was exactly four.

Despite this microscopic margin of error, our FTC teams were consistently among the highest-performing machines in the state, falling short of the Worlds qualification by mere points. The fidelity of the engineering was elite. In fact, it was so elite that we qualified for a prestigious international tournament in Long Beach, California—a high-signal indicator of global competitiveness.

But Long Beach is not “Worlds.”

Because our achievement didn’t trigger the specific $M(x)$ boolean the school was looking for, the signal was entirely ignored. There was no publicity. There was no praise. There were no captainships. And ultimately, the administration looked at the dashboard, saw that the FTC teams weren’t generating “Worlds” checkmarks, and completely cut the program.

Collapse

This is the exact mechanism of Goodhart’s Law. The school did not cut a failing program; they cut a high-fidelity engineering team simply because its output couldn’t be parsed by a low-resolution dashboard.

When you judge a system solely by a rigid proxy metric, you ignore the environmental constraints. The school punished the FTC students for failing to cross a statistically monumental threshold, while simultaneously rewarding VEX students who tripped over a much lower one. The result is a toxic incentive loop: students quickly learn that it is better to be a mediocre engineer in an easier statistical bracket than an elite engineer in a difficult one.

The Metric

Competitive robotics is supposed to teach us how to solve complex problems in dynamic environments. But the most important lesson our program taught us happened entirely off the field.

A metric is a tool for observation, not a map for salvation. The moment an institution forgets that “Making Worlds” is just a low-resolution shadow of “Engineering Excellence,” they aren’t chasing success anymore. They are just painting the gauges. And as any good engineer will tell you: if you disconnect the warning lights from the engine to make the dashboard look better, it’s only a matter of time before the whole machine tears itself apart.

References

Goodhart, C. A. E. (1975). Problems of Monetary Management: The U.K. Experience. Papers in Political Economy.
Strathern, M. (1997). ‘Improving ratings’: audit in the British University system. European Review.
FIRST Tech Challenge & VEX Robotics Competition. (2024). State Qualification Criteria and Statistical Advancement Rates.