All Axcess Members: Login

Is Big Data a Big Problem for Data Warehousing?

Gigabytes and terabytes are now small potatoes. We measure big data in petabytes.  Our smart phones, tablets, highway sensors, physician visits and social media interactions will drive the creation of more data in the next few years than have been created in the entire history of humanity.  Big data is becoming so disruptive, it is prompting some to question, “Will big data be the death of the data warehouse?”

Stating that data is growing rapidly feels too quaint.  IDC predicts that in the next decade the amount of data will grow 44-fold.  Data is getting [insert your favorite expletive] huge!  Yeah, that feels better. Data is spilling out of IT departments enticing the business to slice it, dice it and wrap it in a bow.  Although the possible insights this new data provides are exciting, the exponential proliferation is challenging traditional data management practices like data warehousing.

Data warehouses originated in the mid-1980s; it was a solution to increasing demands for business intelligence and the challenges of querying data to make decisions across numerous operational systems.  The goal of data warehousing is to create a single logical store of organizational information that can used to answer questions, observe trends and make projections.  Early approaches to data warehousing involved consolidating data in a single relational database, but performance, quite frankly, was horrible.  Since the early 1990’s, the leading strategy for data warehouse architecture has been a layered approach consisting of a centralized enterprise data warehouse (EDW) populated by operational systems and multiple data marts populated by the EDW.

The design, typically results in the creation of multiple copies of data.  Data is duplicated across all three layers of the architecture.  But, in a world of big data, this approach is proving to be problematic.

Barry Devlin, originator of the term “data warehouse”, described some of the assumptions of analytics that must change due to big data at O’Reilly’s recent Strata conference:

  • It is no longer possible to route all data through EDW.  Data volumes are just too large for this to occur efficiently.
  • It is not possible to convert and store all soft and unstructured information as hard data.
  • Due to the uncertified quality of externally sourced and repurposed data, there is reduced trust in information quality.
  • Information is too varied and business needs are too diverse to achieve a  “single version of the truth” is no longer achievable

IBM has coined the term“4Vs” to refer to the challenges big data brings to data warehousing:

  • Volume – Terabytes per hour during peak operations is becoming quite common.
  • Velocity – Being able to perform analytics on thousands of transactions a second is becoming mission critical. Analytic modeling, continual scoring and efficiently storing the throughput of this high volume has become critical.
  • Variety – Harnessing structured, semi-structured and unstructured information to gain insight by correlating them together has become a key business requirement for many organizations.
  • Vitality – Neither problems nor opportunities are static. Big Data analysis and predictive models need to be updated as changes occur to seize opportunities as they come.

Although architecturally sound, the data warehouse model of the 1990s simply isn’t designed for data this big.  Does this mean data warehouse is destined to wither away in the cold lonely shadows of new products with cool names?  I don’t think so.  It is far more likely that current data warehouse tools and architecture evolves to meet the current demands; after all, a data warehouse from its inception was intended to be a logical store, not a physical entity.  The EDW is just an implementation choice.

The market has already taken notice and is beginning to offer new buzzwords like “data cloud,” “total data” and “business information resource layer” to describe the new strategies.  New commercial and open source products have emerged:

  • Proprietary frameworks like VLDB from Oracle that are relational data frameworks designed to handle big data.
  • Data Warehouse Appliances like Oracle’s Exadata.
  • NoSQL Databases like open source CouchDB or commercial GreenPlum that implement advanced algorithms like Map/Reduce and can run on commodity hardware for massively parallel processing that offers astonishing processing speeds.

Many of these solutions are very new, but they are an indication that the organizations will not be left alone in trying to solve their big data problems.

Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Google Buzz

Josette Rigsby Josette Rigsby
Josette Rigsby is a 15+ year technology professional with a passion for software and process improvement. Josette has held a number of roles in IT ranging from consultant, developer, IT Director and Enterprise Architect. Josette also works as a Staff Writer for TecAxcess and freelancer with a focus on technology. In her spare time, she enjoys excessive caffeine consumption, telling made up stories about when she was young to her 3 kids and adventure travel. You can follow Josette on Twitter using her alter ego @techielicous.

Related posts:

  1. Green Data Centers: Sustainable Data Solutions offered at GDC Con NY
  2. Intel Announces Free Data Center Consolidation
  3. Time Warner Cable Acquires Navisite
  4. Google’s Container Data Center Tour
  5. The Green Data Center Conference and Exhibition Comes Home
  • http://josetterigsby.wordpress.com/2011/02/22/is-big-data-a-big-problem-for-data-warehousing/ Is Big Data a Big Problem for Data Warehousing? « Josette Rigsby

    [...] Stating that data is growing rapidly feels too quaint.  IDC predicts that in the next decade the amount of data will grow 44-fold.  Data is getting [insert your favorite expletive] huge!  Yeah, that feels better. Data is spilling out of IT departments enticing the business to slice it, dice it and wrap it in a bow.  Although the possible insights this new data provides are exciting, the exponential proliferation is challenging traditional data management practices like data warehousing. Read more on TechAxcess. [...]

  • http://techielicous.com/2011/03/06/is-big-data-a-big-problem-for-data-warehousing/ Is Big Data a Big Problem for Data Warehousing | Techielicous

    [...] Stating that data is growing rapidly feels too quaint. IDC predicts that in the next decade the amount of data will grow 44-fold. Data is getting [insert your favorite expletive] huge! Yeah, that feels better. Data is spilling out of IT departments enticing the business to slice it, dice it and wrap it in a bow. Although the possible insights this new data provides are exciting, the exponential proliferation is challenging traditional data management practices like data warehousing. Read more on TechAxcess. [...]

  • http://techielicous.com/2011/02/22/is-big-data-a-big-problem-for-data-warehousing/ Is Big Data a Big Problem for Data Warehousing? | Techielicous

    [...] Stating that data is growing rapidly feels too quaint.  IDC predicts that in the next decade the amount of data will grow 44-fold.  Data is getting [insert your favorite expletive] huge!  Yeah, that feels better. Data is spilling out of IT departments enticing the business to slice it, dice it and wrap it in a bow.  Although the possible insights this new data provides are exciting, the exponential proliferation is challenging traditional data management practices like data warehousing. Read more on TechAxcess. [...]