Tuesday, August 6, 2013

MAD Practices: Radical Departure from Traditional EDW & BI

"If you are looking for a career where your services will be in high demand, you should find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap.
So, what's getting ubiquitous and cheap? Data.
And what is complementary to data? Analysis."
                                                       - Prof. Hal Varian, UC Berkeley, Chief Economist at Google


Standard business practices for large-scale data analysis center on the notion of an "Enterprise Data Warehouse" (EDW) that is queried by "Business Intelligence" (BI) software. BI tools produce reports and interactive interfaces that summarize data via basic aggregation functions (e.g., counts and averages) over various hierarchical breakdowns of the data into groups.

The value of data analysis has entered common culture, with numerous companies showing how sophisticated data analysis leads to cost savings and even direct revenue. The end result of these opportunities is a grassroots move to collect and leverage data in multiple organizational units. In this changed climate of widespread, large-scale data collection, here comes the MAD skills. The acronym arises from three aspects of this environment that differ from EDW orthodoxy:

  • Magnetic: Traditional EDW approaches "repel" new data sources, discouraging their incorporation until they are carefully cleansed and integrated. Given the ubiquity of data in modern organizations, a data warehouse can keep pace today only by being "magnetic": attracting all the data sources that crop up within an organization regardless of data quality niceties.
  • Agile: Data Warehousing orthodoxy is based on long-range, careful design and planning. Given growing numbers of data sources and increasingly sophisticated and mission-critical data analyses, modern warehouse must instead allow analysts to easily ingest, digest, produce and adapt data at a rapid pace. This requires a database whose physical and logical contents can be in continuous rapid evolution.
  • Deep: Modern data analyses involve increasingly sophisticated statistical methods that go well beyond the rollups and drilldowns of traditional BI. Moreover, analysts often need to see both the forest and the trees in running these algorithms - they want to study enormous datasets without resorting to samples and ex-tracts. The modern data warehouse should serve both as a deep data repository and as a sophisticated algorithmic runtime engine.

The MAD approach requires support from the DBMS. First, getting data into a "Magnetic" database must be painless and efficient, so analysts will play with new data sources within the warehouse. Second, to en-courage "Agility", the system has to make physical storage evolution easy and efficient. Finally, "Depth" of analysis - and really all aspects of MAD analytics - require the database to be a powerful,  flexible programming environment that welcomes developers of various stripes.

As a concluding remark, the economics of data are changing exponentially fast. The question is not whether to get MAD, but how and when. In nearly all environments, an evolutionary approach makes sense: traditionally conservative data warehousing functions are maintained even as new skills, practices and people are brought together to develop MAD approaches. This coexistence of multiple styles of analysis is itself an example of MAD design.



[2] Cohen J., Dolan B., Dunlap M., Hellerstein J.M., Welton C. (2009). "MAD Skills: New Analysis Practices for        Big Data". VLDB Endowment Inc., France.


Post a Comment