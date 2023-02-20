Using the scientific method takes us beyond the endless debates about whether journalism is “fair” or “objective”

*By Julia Angwin

I regret to inform you that this is my last message to you. after founding The Markup 5 years ago, he left this newsroom to pursue other projects. It has been an honor to correspond with all of you, dear readers, and I am so honored by all of you who have supported my vision. Please stay in touch with me via twitterof mastodon or mine personal newsletter.

Before I go, I would like to share the lessons I learned when building a newsroom that integrated engineers and journalists and sought to use a new model of responsible journalism: the scientific method.

I founded The Markup with the idea that fighting for vague concepts like “objectivity” or “justice” can lead to false equivalents. A better approach, I believe, is for journalists to seek out a hypothesis and gather evidence to test it.

At the markupwe were pioneers in a series of scientifically inspired methods that used automation and computational power to supercharge our journalism. Reflecting on our work, I came up with 10 of the most important lessons I’ve learned using this approach.

Important is different from secret

In a world of limited resources, choosing a topic to investigate is the most important decision a newsroom makes.

At the The markup, we developed an investigative checklist that reporters completed before embarking on a project. The top of the checklist was nothing new, but scale – how many people were affected by the problem we were investigating. In other words, we chose to address things that were important but not secret.

For example, anyone who uses Google has probably noticed that Google takes up a large portion of the search results page for its own properties. However, we decided to invest almost 1 year to quantify how much Google promoted its own products by means of links straight to the source material because the quality of Google search results affects almost everyone in the world.

This type of work has an impact. The European Union has already passed a law that prohibits technology platforms of this type of preference. There is legislation pending in the US Congress to do the same.

Hypothesis 1, data later

It’s extremely tempting for data-driven journalists to jump into a dataset in search of a story, but that’s rarely a good way to assess accountability. Instead, it usually results in what I like to call stories like “ah, this is interesting”.

The best accountability stories, data-driven or not, start with a tip or a hunch, which you report and develop into a testable hypothesis.

Hypotheses must be carefully crafted. the statement “Facebook is hopelessly bad” it is not a testable hypothesis. It is a “hot take” (statement without presentation of evidence to support the opinion). A hypothesis is something provable, such as: Facebook did not keep its promise to stop recommending political groups during the US presidential election. (Spoiler: we check; they didn’t).

Data is political

Data is powerful. Those who collect them have the power to decide what will be noticed and what will be ignored. People and institutions with the money to build large datasets rarely have any incentive to gather information that can be used to challenge their power.

This is why we journalists often need to collect our own data, and why I built a newsroom that had the engineering talent and social science expertise necessary to collect original data at scale.

Choose a sample size

The days when journalists could interview 3 people in a diner and declare a trend are happily over. The public is demanding more persuasive evidence from the media.

At the same time, however, not all proof requires big data. It was only necessary a secret court documentprovided by whistleblower Edward Snowden, to prove that US intelligence agencies were secretly collecting the call logs of all Americans.

The beauty of statistics is that even when you’re examining a large system, you usually only need a relatively small sample size. When we wanted to investigate Facebook’s recommendation algorithms, reporter Surya Mattu assembled a panel of over 1,000 people who shared their Facebook data with us. While it was a drop in the bucket for Facebook’s more than 2 billion users, it was still a representative sampling to test some hypotheses.

embrace the odds

If you’re lucky, sometimes a data set reveals its truths without you having to do any hard math. But for large datasets, statistics are often the best way to extract meaning. That means you can embrace some confusing probabilistic findings.

Consider our investigation into whether Amazon put its own brands at the top of the search results. We reviewed thousands of searches and found that Amazon disproportionately gave its brands #1: Amazon brands and exclusives were only 5.8% of the products in our sample, but ranked #1 19.5% of the time.

The problem was that ratios alone don’t tell you whether Amazon won that spot fairly. Maybe your products really were better than everyone else? To dig deeper, we wanted to see how Amazon brands fared against products with high star ratings or large numbers of reviews.

To do this, investigative data journalist Leon Yin used a statistical technique called random forest analysis which allowed him to identify that being an Amazon brand was the single most important predictor of whether a product would win #1 — far more than any of the other potential factors combined. The probabilities – although they were a bit tricky to explain – made our finding much more robust.

You need a narrative

Data is necessary but not sufficient to persuade readers. Human beings are hardwired to tell, share, and remember stories.

Statistical discovery is what is known in journalism as the “nut graph” of the report. You still need a human voice to be the spine. This is where the old-school reporting skills of knocking on doors and interviewing tons of people are still incredibly valuable. This is where word choice and talented editors make all the difference in crafting a compelling article.

Specialization is important

Journalists are generalists. Even experts like myself who have covered a single topic – technology – for decades have to delve into new topics on a daily basis.

That’s why I believe in seeking expert analysis of statistical work. Over the years, I developed a process similar to academic peer review, in which I shared my methodologies with statisticians and domain experts in whatever field I was writing about.

I never share the narrative article before publication – which would be a fireable offense in most newsrooms. However, sharing the statistical methodology allows me to protect my work and find errors.

No one is more incentivized to find errors than the subject of an investigation, so in the The Markupwe share data, code, and analysis with subjects prior to publication in a process I call “adverse review”. This gives them the opportunity to engage with the work in a meaningful way and provide a thoughtful response.

Objectivity is dead. live the limitations

One of the best parts of using the scientific method as a guide is that it takes us beyond the endless debates about whether journalism is “fair” or “goal”.

Rather than focusing on fairness, it’s better to focus on what you know and what you don’t know. When he reported on the hidden bias in mortgage approval algorithms, reporter Emmanuel Martinez was unable to obtain applicants’ credit scores because the government does not release them. He noted his absence from the limitations section of its methodology.

His analysis was still robust enough to be cited by 3 federal agencies by announcing a new plan to fight mortgage discrimination.

show your work

Journalists have a trust problem. Now that everyone in the world can publish, journalists must work harder to prove that their version of the truth is the most credible.

I’ve found that showing my work – sharing entire datasets, the code used to analyze the data, and extensive methodology – builds trust with readers. As an added bonus, methodologies often get more website traffic over time than narrative articles.

Never give up

Journalists are at a disadvantage. There are 6 public relations professionals for every journalist in the United States, according to the Bureau of Labor Statistics.

This means that we have to use every possible tool at our disposal to hold power accountable. One way to do this is to build tools that allow our work to continue beyond the day the article was published.

Consider real-time forensic privacy scanner, Blacklightwhich Surya Mattu built in The Markup. It runs a series of real-time privacy tests on any website.

Reporters can use Blacklight whenever there is a privacy-related news story. A ProPublicafor example, recently used the blacklight to reveal that online pharmacies selling abortion pills were sharing sensitive data with Google and other third parties.

I will continue to pursue these principles in my future projects. Thanks for sharing the journey with me. It was an honor.

Julia Angwin is the founder and managing director of the news publication The Markup. He holds a BA in Mathematics from the University of Chicago and an MBA from Columbia University. He was a finalist for the 2017 Pulitzer Prize for Explanatory Reporting. This article was republished from The Markup under a Creative Common license.

