Covering Disruptive Technology Powering Business in The Digital Age

Home > DTA news > News > Don’t Blame Big Data for Pollsters’ Failings (AAPL, GOOGL)
Don’t Blame Big Data for Pollsters’ Failings (AAPL, GOOGL)

On Wednesday, Nov. 9, the New York Times’ Jim Rutenberg wrote, “All the dazzling technology, the big data and the sophisticated modeling that American newsrooms bring to the fundamentally human endeavor of presidential politics could not save American journalism from yet again being behind the story, behind the rest of the country.” But was mis-calling the election really a failure of big data? Not really.

But the polling that got this election so wrong wasn’t actually big data—nowhere near. The National Council on Public Polls says sample sizes of 1,000 are common. By contrast, big data calls for samples of at least 100,000. By 2020 we will likely be looking at millions.​ “The sample sizes were certainly good enough for a poll, but maybe didn’t meet the definitions around volumes of data, variety of data, [and] historical depth contrasted against real-time immediacy, machine learning, and other advanced analytics,” says Nik Rouda, senior analyst at the Enterprise Strategy Group. “If anything, I’d argue that more application of big data techniques would have given a better forecast.” (MORE: Trump Fears: Why Tech Stocks are Getting Slammed.)

While both presidential teams relied on voter profiles, more robust profiles and “analyzing cohorts by behavior would have shown a clear picture,” Rouda says. “This was a failure of the traditional approach, not a failure of math or failure of big data.”

As it turns out, listening to a wider set of data played a significant role in the Trump victory. In the week before the election, his campaign undertook a big exercise to re-weight all of their polling data, because they believed that the sample that the pollsters used did not correctly identify the electorate. So Matt Oczkowski, the head of product at London firm Cambridge Analytica and team leader on Trump’s campaign, turned his attention to what the data was showing them: They should focus on a Brexit-style mentality and a different demographic trend than other people were seeing.

Clearly, it worked.

Looking ahead to 2020, what might President Trump face when up for re-election? Combining polling with social media and more subtle financial indicators will be the way of the future. “When you have data sets that are large enough, you can find signals for just about anything,” says Tony Baer, a big data analyst at Ovum. “So this places a premium on identifying the right data sets and asking the right questions, and relentlessly testing out your hypothesis with test cases extending to more or different data sets.”

What would this look like? For starters, it would involve much more than asking a single direct question. It means tapping into a richer vein of information. Facebook knows what pages we like, what articles we interact with, and the affiliation of our friends, along with that of our friends’ friends. In 2013, Indiana University sociologist Fabio Rojas found that Twitter (TWTR) mentions were a good predictor of outcomes in U.S. House of Representative races. Now in 2016, Socialbakers saw a tight correlation between positive mention on Twitter and the presidential race. If all of this is examined as one data set, it’s a fairly good approximation of the issues, policies and candidates for which we are likely to vote. Clearly, polling organizations have fallen short in not fully capturing the significance of the increased availability of behavioral and intentional data.

A drawback, of course, is that social media sites can be manipulated. Facebook (FB) CEO Mark Zuckerberg denies it’s a problem. But there have been numerous accounts of questionable news wending its way into the mainstream consciousness through social media. The Guardian recently reported on a steady stream of headlines on social media sites that have been churned out by pro-Trump news sites, including dozens of sites originating from a small town in Macedonia.

For social media sites, it’s apparent that the accuracy of a piece of content is less important than whether it is shared, liked and monetized.The ultimate result is a misinformed electorate. Given the 13 hour lag time to catch misinformation due to insufficient internal checks, the content generation of social media sites results in millions of readers being unwittingly exposed to “fake” news.

Looking forward to the 2018 midterm and 2020 presidential elections, it’s clear that data analytics will aim to become more real-time. This was attempted in 2016 by VoteCastr, a start-up working with Slate and Vice News to publish vote projections hours before the polls closed, breaking with the historic practice of waiting until final votes are cast. VoteCastr was established because candidates, parties and super PACs employ a combination of analytics, while tracking turnout at pre-selected precincts. The goal is to produce rolling projections of how many votes are won by individual candidates as ballots are cast. It’s a method highly accurate in predicting the ultimate vote count.

Specifically, VoteCastr conducted large-sample surveys—targeting many more respondents than typical media polls. Each asked far fewer questions, assessing support for candidates at the individual precinct level. At the same time, VoteCastr deployed a trained army of turnout trackers equipped with smartphone apps, each assigned to pre-selected polling places across battleground states. By digesting real-time information from precincts across each state, VoteCastr’s statistical models tried to predict who, at any moment, was winning the state and by what margin.

The VoteCastr team was made up of data veterans from the Obama and Bush campaigns, led by chief strategist, Sasha Issenberg, author of a book on how data has transformed political campaigns and a former columnist for Slate. However, despite the pedigree, VoteCastr fell short in its efforts to monitor Clinton and Trump supporter turnout in the battleground states of Florida, Iowa, New Hampshire, Nevada, Ohio, Pennsylvania and Wisconsin. In part this was due to technical issues preventing timely website updates and partly due to data inaccuracies. However, the importance of even this faulty effort at real-time data analytics was underscored by the fact that stock market values fluctuated on Election Day as trading desks paid particularly close attention to VoteCastr’s data.

With significant economic consequences attached to political outcomes, it is clear that those companies with sufficient depth of real-time behavioral data will likely increase in value. Apple (AAPL), Alphabet (GOOGL), Amazon (AMZN), Facebook (FB) and Microsoft (MSFT) have taken a post-election hit as institutional portfolios have been reallocated towards expected winners (e.g. drugs, financials, industrials, infrastructure) in a Trump administration, but longer-term investors should consider this an opportunity to improve exposure to the dominant companies in the infrastructure of tomorrow.

This article was originally published on and can be viewed in full here