When Numbers Lie

Author

Yahya Rahhawi

Published

March 22, 2025

When Numbers Lie: Rethinking Fairness in an Age of Algorithmic Authority

I came to machine learning because I believed in its elegance—the way a few lines of code could make predictions, optimize decisions, and maybe even eliminate human bias. Studying philosophy alongside computer science, I have always wondered if utilizing the machine, supported with some rigorous math, can eliminate some tendencies in some humans to perpetuate injustice and unfairness. But somewhere along the way, this class made me ask: Whose data is this model trained on? What decisions does it reinforce? And who gets left behind when we define “fairness” with math?

These questions became even more urgent as I read Arvind Narayanan’s 2022 James Baldwin Lecture, The Limits of the Quantitative Approach to Discrimination (Narayanan 2022). Narayanan argues that current quantitative methods often serve to justify the status quo more than they challenge it. That’s a bold statement—especially for a field that prides itself on objectivity and precision. But as I sat with this claim, I realized: it resonates. At a time when the U.S. government is investing over $500 billion into machine learning (Reuters Staff 2025)—much of it flowing into public systems that affect millions of lives—it’s critical to examine what ethical foundation, if any, these models are built on. Especially during a time where the government is displaying tendencies to practice radical, and often problematic, interpretations of justice, using means that I consider unethical and are against what states can do. Is machine learning another tool for them to justify their actions?

This isn’t a theoretical concern. The U.S. government has reportedly begun employing AI-powered agents to monitor the social media of international students, scanning for signs of “support” for terrorist organizations, but also any signs of criticizing the complicity of the government in war crimes happening right now in the Middle East (Fried 2025; Weiss and Parker 2025). This process—untransparent, unauditable, and unchallengeable—has led to the flagging and deportation of students based on ambiguous indicators and culturally uncontextualized speech. What’s happening here is exactly what Narayanan warns about: using mathematical tools to assess fairness based on the same logics that have historically led to unfairness. These models are embedded with political and cultural assumptions, yet shielded from scrutiny by a veil of statistical legitimacy.

The Illusion of Objectivity

Narayanan’s core critique is this: quantitative methods, especially when wielded without critical awareness, give us the illusion of objectivity while silently reproducing injustice (Narayanan 2022). Researchers make countless subjective decisions when designing models—what features to include, what fairness metric to optimize, what dataset to trust. These choices are often framed as neutral, yet they reflect particular worldviews and assumptions.

What’s worse is that these methods rarely question the framing of the problem. Narayanan argues that if we assess reality with the same lens that led it to become unfair, we can’t expect much change to happen. This means that there’s a structural trap in the way we approach fairness: instead of shifting the paradigm, we tweak parameters.

For example, if a hiring algorithm discriminates against people with non-white-sounding names, we might try to retrain it with “blind” features or adjust the threshold. But the question remains: why does the system prioritize certain qualifications or communication styles in the first place? What histories of exclusion built the resume formats we consider “professional”? These deeper layers are not captured by confusion matrices or calibration curves.

When Quantitative Methods Work (and Don’t)

To be clear, quantitative methods aren’t useless. In Fairness and Machine Learning (Barocas, Hardt, and Narayanan 2023), Barocas, Hardt, and Narayanan describe how algorithms can expose patterns of inequality at a scale that human intuition cannot. They point out that data-driven systems can, in some cases, be more transparent than human decision-making—because at least we can audit the code and track the outcomes.

One powerful example is the use of audit studies in hiring discrimination. Bertrand and Mullainathan’s seminal resume experiment, where identical resumes were sent out with “white-sounding” and “Black-sounding” names, revealed stark differences in callback rates. Technically, this study was probing the fairness notion of demographic parity or equal opportunity, depending on how you interpret the outcome variable. Morally, it revealed a clear case of allocative harm: job opportunities were being withheld purely on the basis of racial signifiers.

The strength of this study lies in its simplicity and clarity. It shows that discrimination exists—not as an abstract theory but as a measurable, reproducible reality. It forced a conversation in policy and public discourse, and rightly so. Fairness doesn’t simply arise when we get rid of human actors in making the decisions directly, and replacing them with machines designed to observe our perspective (through data) of the status quo, in fact, this is the last thing we want to do when we try to reimagine our society to be more fair and radically different from the unfair version of what already exists.

But even this kind of analysis has limits. As Fairness and Machine Learning points out, the use of names as proxies for race is itself fraught. What if employers are responding not to perceived race, but to assumptions about class, region, or education? The interpretation is never clean. And that’s part of Narayanan’s argument: the idea that we can simply “measure” fairness assumes a tidy, apolitical world that does not exist.

When Algorithms Backfire

A more disturbing example of quantitative methods going wrong is found in the use of risk assessment tools in criminal justice. Tools like COMPAS assign scores to defendants predicting their likelihood of reoffending, often based on data from law enforcement records. These tools are “calibrated,” meaning that for each risk score, the actual rate of recidivism is about the same across racial groups.

Technically, calibration sounds like a fairness win. But morally? It’s a disaster. As ProPublica and researchers like Julia Angwin and Virginia Eubanks have shown, the data itself is biased: arrest records reflect over-policing in Black neighborhoods, not an intrinsic difference in behavior. So even if the algorithm is mathematically “fair,” its predictions reinforce biased policing and sentencing.

This is what Data Feminism by D’Ignazio and Klein (2023) calls the “privilege hazard” (D’Ignazio and Klein 2023): the people designing these systems often cannot see the assumptions baked into their models because they’ve never had to. Their lives aren’t algorithmically surveilled, flagged, or punished. And so they optimize for clean metrics rather than complex realities.

Their framework emphasizes that fairness is not just about inputs and outputs—it’s about power. Who collects the data? Who defines the labels? Who decides which outcomes matter? Data Feminism argues that without answering these questions, we are not doing fairness—we are doing statistical performance art.

Redefining Fairness

To me, fairness goes far beyond meritocracy—the belief that the most “qualified” should always win. In practice, meritocracy often just repackages privilege. Fairness isn’t about pretending we all start at the same line; it’s about acknowledging that we don’t—and building systems that reflect that truth.

Fairness also goes beyond what an algorithm can or can’t do. It’s a social commitment: a way of seeing others as equals, of including their experiences and voices in shaping the systems they live under. We can’t fix injustice with math alone. We need historical awareness, community input, qualitative insights, and above all, humility.

Right now, too many people put too much trust in numbers without understanding what those numbers mean—or what they erase. In a world where “data-driven” is a synonym for “truth,” we have to ask: whose truths are missing from the dataset?

This is especially urgent for international students like myself. The idea that an AI agent could monitor my posts—stripping words from context, translating emotion into probability scores, and potentially flagging me for deportation—isn’t just dystopian. It’s happening. It’s real. It forces me to ask whether the systems we’re building even have a place for someone like me.

A Qualified Agreement

So, where do I stand on Narayanan’s claim that quantitative methods “do more harm than good”? I agree—but with qualifications.

Yes, these tools often uphold the status quo. Yes, they obscure rather than reveal injustice. Yes, they can even be dangerous when placed in the hands of people who lack understanding of the cultural, political, and historical narratives that shaped the data.

But that’s not a reason to give up on the tools. It’s a reason to change who uses them—and how.

As Barocas et al. suggest, quantitative methods can serve transparency, accountability, and insight—if wielded with care (Barocas, Hardt, and Narayanan 2023). But that care has to be built into every step of the process: from data collection to metric choice to outcome interpretation. It requires interdisciplinary work, community engagement, and ongoing critique. It requires treating fairness not as a mathematical constraint, but as a moral imperative.

I still believe in machine learning. But I no longer believe that fairness can be computed. Fairness has to be created—with intention, with reflection, and with people, not just models, at the center.

References

Barocas, Solon, Moritz Hardt, and Arvind Narayanan. 2023. Fairness and Machine Learning: Limitations and Opportunities. Cambridge, Massachusetts: The MIT Press.
D’Ignazio, Catherine, and Lauren F. Klein. 2023. Data Feminism. First MIT Press paperback edition. Cambridge, Massachusetts: The MIT Press.
Fried, Ina. 2025. “State Department Deploys AI to Monitor Foreign Students’ Social Media.” 2025. https://www.axios.com/2025/03/06/state-department-ai-revoke-foreign-student-visas-hamas.
Narayanan, Arvind. 2022. “The Limits of the Quantitative Approach to Discrimination.” Speech.
Reuters Staff. 2025. “Trump to Announce Private Sector AI Infrastructure Investment - CBS Reports.” 2025. https://www.reuters.com/technology/artificial-intelligence/trump-announce-private-sector-ai-infrastructure-investment-cbs-reports-2025-01-21/.
Weiss, Bari, and Olivia Reingold Parker. 2025. “The ICE Detention of a Columbia Student.” 2025. https://www.thefp.com/p/the-ice-detention-of-a-columbia-student.