Google’s flawed flu tracker and the need to hold algorithms accountable
Something in all of us wants to believe that big tech companies, the ones we provide with reams of personal data everyday, are doing something noble with that information. That's what made Google's flu tracker, Flu Trends, so appealing. Here's Google, taking time out of its busy day of selling our data for profit to apply those millions of Google searches and location trackers to something useful for humanity: Tracking the spread and severity of flu outbreaks across the United States.
There's only one problem: Flu Trends is wrong.
According to a new Science study, Google overestimated flu outbreaks by 50 percent in the 2011-2012 and 2012-2013 seasons. As TIME's Bryan Walsh writes, "If you wanted to project current flu prevalence, you would have done much better basing your models off of 3-week-old data on cases from the CDC than you would have been using GFT’s sophisticated big data methods."
Ah, there it is: "big data." So much hype has centered around that phrase you can forgive journalists and researchers for taking this opportunity to smack "big data" down a peg. One of the study's authors, David Lazer, call this “A 'Dewey beats Truman' moment for big data,” referring to the famous newspaper headline announcing the wrong victor of the 1948 US presidential election. Multiple headlines have blamed the failings of big data for the Google Flu Trends debacle.
Indeed, big data isn't some magical oracle sitting on high -- but if it took you until now to realize that, then you haven't been paying attention. Data can tell us a lot, but the problem with "enbiggening" it is that the more data you collect, the less information you get from each individual piece. Even when that information tells you something meaningful about the world, using it to predict a future occurrence is even harder.
So what went wrong with Google Flu Trends? Therein lies the bigger problem: We don't exactly know because Google's algorithms are proprietary and secret. Therefore, researchers can't replicate it to determine exactly why the scheme failed. We can make an educated hypothesis: People are hypochondriacs, so when they search for the flu or flu symptoms that doesn't mean they actually have the flu. But without transparency, the only method to keep algorithms accountable is by comparing results to what actually happened. Because Google prides itself on providing "real-time" analytics, by the time researchers figure out the insights are wrong, it's too late.
"Algorithmic accountability is one of the biggest problems of our time," Evan Selinger, a technology ethicist at Rochester Institute of Technology, tells New Scientist. "More and more decisions made about us are computed in processes we don't have access to."
Unfortunately this is how data is often presented by corporations and individuals: "Here's a buzzword and a chart. Don't ask questions." These number-crunchers include little about the methodology they use, nor do they even provide proof that the algorithm works.
I suppose that's relatively harmless when it comes to, say, a big data analysis of who will win the NCAA March Madness tournament. But when it comes to analyses of public health or safety that people turn to make real-life decisions, we need real accountability. "Trust us, we're Google" just won't cut it.