So what exactly is Big Data? 

In the real world view, Big Data is the culmination of several years’ worth of data that your company has stored in their data warehouse as instructed by their DBA since, well, forever.  This data that has been archived in different locations for safe keeping, and possible later use, is extremely valuable for marketing, sales and other decision makers in your organization.

The official Wiki definition of Big Data is: “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.”  You will see that definition used in a lot of places.

Big Data is not:

  • A table with huge fields
  • Data stored in a larger font
  • Just unstructured data
  • A software tool that you can acquire

How long has Big Data been around?

How old is your company?  If you’ve been storing data over the years, then your Big Data has likely been around and available for quite some time.  This is not a bad thing.  Think of it as a very nice gift to your BI team.

What do we do with our Big Data?

The day has arrived and your company would now like to analyze this historical and current data to help drive your company into the future.  There have been huge advancements in technology over the past few years.  These advancements allows you to use the data you already have to determine buying patterns, trends and tendencies.

What is the opposite of Big Data?

Small Data, right?  Knowing the difference between the two determines which tools you need to handle Big Data versus the traditional tools that have been developed over the years to handle Small Data.  (As of yet there is no marketing buzzword for Small Data so we’ll just call it Data.)

How do we know if we have Big Data?  One key measure that tells you the difference between Big Data and Data is when you start to experience substantially longer processes and poor query performance after exhausting all of your best practices.  In order to view trends and see views of data you find yourself always indexing, loading deltas, constantly needing to create archives and executing long running restores of historical data.

What other types of data are considered Big Data?

This is an interesting question that has changed as technology has advanced. For many years large corporations have had huge amounts of data—it’s the sources of this data that has changed remarkably.  Not only is there data that is stored in traditional database tables, we now have the ability to siphon data from the following places (this is a partial list):

  • Website analytics
  • System logging information
  • Internet data via web crawlers
  • Email servers and networking usage
  • Social media data
  • Blogs
  • Surveys
  • Chat
  • Data from streaming devices, e.g.
    • Security systems
    • Factory instrumentation
    • Robotics
    • Medical records
  • Photo and video archives
  • Mobile app tracking

Data can be generated from some of these sources very quickly with the help of millions of people actively taking part in the population of information through social media.

One interesting way to look at this data is that it is “low value but high volume” information.  It typically does not have “dollars” associated with it. In other words, the information by itself is not that meaningful but if it is of super high volume, it can indicate trends and important tracking information.

What tools are available to access my Big Data?

There are many tools that are available to handle Big Data scenarios—and the list is growing.  Many of the new offerings sit on top of the very popular Big Data platform, Hadoop, and add additional functionality to optimize your Big Data experience—at a much lower price.  Just a few of the vendors that are helping in this environment include:

  • Cloudera
  • IBM BigInsights
  • MapR
  • Hortonworks
  • EMC
  • Hadapt
  • Zettaset
  • DataStax
  • SAP

What is the best source of Big Data expertise?

Although your Big Data may have been around the organization for some time, not many companies have fully adopted Big Data methods and architectures.  As a result there are not many “experts” out there.  Finding the right expertise can be challenging and is probably one of the reasons we are so busy as consultants right now.  Connect with local meetups and LinkedIn or user groups to see if you can find people who can help.

Who in particular would be interested in Big Data?

Because you are not tied to relational data formats with Big Data, there is the opportunity to do analysis on all kinds of data—both structured and unstructured.  This opens up opportunities that were previously unimagined.  Financial institutions, law enforcement agencies, telecom, logistic suppliers, as well as government agencies, are all turning to Big Data technology to set them apart from the competition and find important data “gold mines.”

Summary

Do your homework, evaluate your business needs, determine what data you currently possess and discover some of the new and exciting technologies surrounding Big Data platforms that can help your business.

There are many potential uses for Big Data analytics. Much like companies evaluate data sources for a data warehouse, they will need to review data sources for Big Data analytics.  ROI calculations will be different but the payoffs can be substantial.  In many cases, adding data sources to Hadoop costs much less than increasing data in your data warehouse.