This article originally appeared in French on JournalDuNet as a guest post from our very own Gaëlle Recourcé, a data scientist who participated in Salesforce’s Big Data round table organized by G9+ in July.
portion of image courtesy of Richard Greenhill and Hugo Elias
Working with big data is not a new activity at all; it’s been quite commonplace for decades for major players in finance and retail sectors. The novelty of big data these days is due to the quite impressive proliferation of mobile applications, connected devices and web experiences which have increased the volume of data that can be collected, analyzed and monetized.
Along those lines, a strong contingency of those behind big data’s evolution into what can be be called “smart data” (data…) are not the traditional players (finance & retail), but millions of smaller actors collecting data through the rapid adoption of their digital services and products. And contrary to common belief, a startup doesn’t need to have millions of users for big data to be effective— even with a hundred thousand, you can quickly get into very BIG big data, and billions of pieces of data within smaller populations (see our analysis of billions of emails and how to increase your productivity in that environment).
So nowadays the data is there, but in many cases, that’s not nearly enough to be valuable as…
1) The data needs to be relevant and structured
So you have lots of data. Good. Now the next question would be: is it uniform and regular, hence easily extracted and analyzed, or is there a significant amount of variation, or is it embedded in a mass of other irrelevant information?
In the case of our team at Evercontact, data mining emails to extra contact information certainly shows that to find value (phone numbers, company names, twitter handles) you need to dig & sift through a ton of non-pertinent data (the email body and all its quirks). That being said, once enough rules & algorithms are in place, the quality of the data (this number is a mobile number) and the fact that it is ‘fresh’ (current at a certain date, and belonging to a certain person) have made it possible to discover what is relevant among much of which is not.
So before embarking on a Big Data project the essential question to ask would be: do I have enough reliable and qualified data and am I able to collect, clean or qualify a sufficient amount for my project?
2) Data interpretation should increasingly point to clear solutions
What is the benefit of interpreting your data today? Obviously, the collection and exploitation of data is only meaningful if we seek to optimize and automate the solutions to our current problems (ie becoming data-driven in decision making).
This might seem obvious today, but it certainly wasn’t 10, 20, 30 years ago when we were in that “fog of guess decision-making based on a mix of intuition, logic, previous experience as we simply couldn’t get enough feedback or data from the world, and consequently spent too much time trying to understand what little data we could gather and making non-data-based decisions.
Because of that we were fairly blind to how a community responded to our newest change in a policy, product, services etc (and of course, coincidentally, marketing was very broadcast-oriented— think 80s TV commercials as opposed to 2012 social media contests).
Step forward to 2014 and we are now clearly in an age where data demonstrates how changing the color of a button on on a web page leads to higher conversions. So, now the aim is not only to understand phenomena through our data, but to improve the performance of an existing process, hence the explosion of A/B tests and platforms like optimizely and leanplum.
3) Not lonely big data, but each bit of BIG DATA with its own specific context, its unique history = data integration
Our data can only have value if it can be understood and interpreted in a specific context. For example, what is the value of information like “Client A clicked on the blue color button” if we don’t provide the context that precedes and follows it (ie Client A’s segmentation, what they did next, and how that compares to client B). In this sense, all the data that you gather needs to have a way of interconnecting among its main elements; our Evercontact example of detecting a phone number in an email signature is only valuable if the number is associated with a person and can be clearly shown to be the ‘good and most up-to-date number’ to reach that person.
What you want to do with the data should be your guide for modeling and structuring it in your database. The best algorithms will only be relevant if the data model they are cutting through is well thought out, and for a very specific result. Spurious correlations are a good example of what we get with quality data but not modeled with a specific goal.
4) What’s Big Data’s REAL value?
The Big Data revolution shouldn’t be seen as a sudden new way to find, understand and extract value, but rather as a way of understanding how value is produced. Big Data algorithms are designed for many processes: to structure data (convert an email signature to a new contact), connect data (correlations between air quality and respiratory diseases), and visualizing information (mapping out global purchases made on cell phones).
Big Data is actually a set of very sophisticated tools that implements human reasoning: just as the first computers were designed to be super calculators and paved the way for major technological innovations in aviation, for example, Big Data is now enabling leaders in fields ranging from sociology, finance, medicine, energy etc to innovate in a myriad of new ways all thanks to the melting down of great quantities of complex and heterogeneous information with the ever-present under layer of understanding, improving and producing value.
5) How Big data becomes Smart data
Data, data, data is the name of the game, but in the end it is most definitely about mixing quantitative proof with qualitative thinking (texts, ideas, opinions, creativity). In a typical startup example, yes you need to measure how many people unsubscribe from your service/product, but reducing this number doesn’t necessarily mean that you will then have more “conversions” or engaged or passionate users which is the end game, right? It’s only in looking at this in the larger context and testing various creative solutions that you’ll see what takes you closer to the end game.
Analyzing the data qualitatively enables you to be data-driven but it most importantly creates space for you to be creatively-driven and this is where big data can become “smart data” and where there’s space for marketers, ethnographers and the more creative leaders on a team to join data scientists in their findings and decision-making.
In conclusion, we’d love to hear your examples of how big data has started driving your decisions in your endeavors. Share below and thanks for reading!