Terabytes of deafening noise

By Francisco Dao , written on March 19, 2013

From The News Desk

I pride myself on calling out tech industry BS when I see it, but after reading Nate Silver’s fantastic book "The Signal and The Noise," and seeing the common mistakes people make when trying to make predictions and forecasts, I realized I haven’t been nearly harsh enough in pointing out the flawed thinking that permeates the Valley. Considering the tech industry is reliant on reading the signals to predict, and even create the future, what Silicon Valley players classify as "signal" is almost laughable. Most of them are not only missing signals and listening to all noise, but they’re actively contributing to the noise.

One of the biggest current trends is the hype of “big data.” Data is seen as the holy grail that will eliminate mistakes and show everyone the way. VCs have even started using data to determine what and who they should invest in.

But not all data is useful. More data also means more noise. If anything, it makes it harder to figure out what data matters and what data is worthless, or even what’s a confusing false signal. Who’s parsing the data and on what basis? How do the VCs, entrepreneurs, and various gurus know how to filter the data signals from this now gigantic mass of data noise? In most cases they don’t.

Another mistake the Valley commonly makes is being too confident in their predictions. One of the more fascinating parts of Silver’s book is when he explains how minuscule changes in initial data produced wildly different weather predictions. In one test case, simply rounding off barometric pressure to the third decimal instead of the fourth resulted in a prediction of thunderstorms instead of clear skies.

This is the result of Chaos theory or what people sometimes call The Butterfly Effect. Dynamic systems, including the weather and our economy, are always changing and nonlinear. Even with incredible advancements in weather-predicting technology and satellite imagery of worldwide weather patterns, a weather prediction beyond nine days becomes less reliable than a simple historical average.

Think about that for a moment. An almanac will give you a more accurate prediction for next Thursday’s weather than the National Weather Service forecast.

I’m no statistician, and I could be making some unscientific assumptions here, but this seems to imply that a signal that is an amplification or the result of another signal is more likely to be incorrect beyond a short time window when compared to a forecast independently derived through long term analysis.

As a thought exercise, think about the current wave of incubators and accelerators. First Paul Graham saw the signal that startups could be launched quickly and cheaply and Y-Combinator was born. Then Techstars applied similar reasoning in Boulder and other cities. Then everybody and their mother piled in. Instead of correcting along each step and reanalyzing the landscape, perhaps looking at the “signal” that too many incubators were competing for too few quality entrepreneurs, incubator founders listened for reinforcing “noise” which then doubled as a multiplier for still more incubators. In a very short time, we were off the charts.

As humans, we also have a tendency to put too much weight on immediate and "obvious" signals such as hot streaks. Proper application of Bayes' theorem, which is generally accepted by statisticians including Silver, indicates we often overestimate odds based on false positives while discounting preexisting odds.

In short, we overreact to data which is new or fresh even when it has little bearing on long term outcomes. In the tech world, this manifests itself in the form of things such as overfunding of social media companies, because Facebook and Instagram were successful. Without getting into the mathematical details, I am confident that an honest application of Bayes theorem (I say honest because Bayes theorem still requires us to make initial assumptions) would have shown Viddy to have a laughably small chance of returning investor’s money based on its $370 million dollar funding valuation. But lucky for Viddy, VCs were hyped up on the latest news.

Of course, the biggest noise maker of all is social media itself, which has given everyone a platform to make noise and reinforce trends without anyone checking for validity. Instead of seeing social media for what it is, a room full of insular people screaming for attention, Silicon Valley has turned to social media to find its experts and look for the next big thing. Then we add our own voices to the mix, Retweeting and reinforcing what everyone else is saying, and we label it big data.

When people talk about the terabytes of data being produced each day, there is an automatic assumption that all of this data is valuable. Not only is the vast majority of it not valuable, but most of it just clouds our prediction efforts.

Instead of the age of big data, we’re really stuck in an age of deafening noise.

[Illustration by Niv Bavarsky for PandoDaily]