By Andy Flint, FICO
Analytics depends on data — the more, the merrier. If we’re trying to model, say, the behaviour of customers responding to marketing offers or clicking through a website, we can build a far stronger model with 10,000 samples than with 100.
You would think, then, that the rise of Big Data and its seemingly inexhaustible supply of data would be every analyst’s dream. But Big Data poses its own challenges for modelling. Much of Big Data isn’t what we have historically thought of as “data” at all. In fact, 80% of Big Data is raw, unstructured information, such as text, and doesn’t neatly fit into the columns and rows that feed most modelling programs.
Here’s how data scientists seeking to harness Big Data for predictive modelling have addressed the challenges presented by a mass of messy data.