“In God we trust, all others bring data.”

W. Edwards Deming Drowning

A Harvard Business Review paper pointed out that a total of 2.5 quintillion terabytes of data (that’s 2.5 followed by a staggering 18 zeros!) are generated every day, and that it is estimated that as much data is now generated in two days as was created from the dawn of civilization. Furthermore, it is estimated that 90% of all the data in the world has been generated over just the last two years.

But how much of this data is really meaningful, useful or actionable? As the health care industry marches on from analogue to digital, we are seeing a massive proliferation of data sources, often siloed, often not talking to each other, and almost always created to address just a defined set of use cases. My clinical colleagues often find themselves playing the role of a detective, navigating from one clinical information system to another, piecing information together around our patients. While we rejoice that the digital era is upon us, we should be distraught that we have not really been able to capitalize on the sheer power of the data that we have all around us. This article is a wakeup call; a cry for us as a health care community to comprehend the power of data and to actively seek the insights that can be garnered from the right approaches to big data technologies.

With health care’s focus being on value-based care paradigms, on quality and on efficiency coupled with effectivity, data driven evidence-based decisions are of paramount importance. But they say that a frog in a well cannot conceive of the ocean. Intelligent decisions are best made with data that gives us rich context, a fuller view of all parameters, and possibilities. However, how do we not drown in all this data we’re generating? How do we stay afloat, swim and surf – harnessing the tremendous power of this valuable resource?

Zettabytes of data 

In my institution alone, we currently have about 27 petabytes of data in real-time storage, and this amount is doubling almost every 18 months. The industry is talking about not just petabytes, but exabytes and zettabytes of data. Data explosion worldwide is projected to reach 35 zettabytes by 2020, with a 44-fold increase from 2009. If you’re keeping track anymore, a zettabyte is one sextillion bytes, which is exactly 1 million petabytes. That’s a lot of data.

Big data technologies present a fresh opportunity in health care to bring previously unfathomable amounts of data to life, and transform the data to valuable insights. It is an opportunity to put the data to work for us – because as valuable as data is, the real value comes from actionable insights derived from the data. Big data technologies are illustrating novel ways to measure improvements in quality care, patient outcomes and driving efficiencies in clinical workflow with new insights that we did not know were possible to attain.

Data liquidity

Over the past decade and more, we have made steady progress to move from paper, analog and film to paperless, digital and filmless. However, we still find ourselves struggling with multiple disparate silos of data that do not do much more than just sit there and collect even more data. Often, these data silos have elements of proprietary standards that lock the data to the application layer from their respective vendors. The movement of the data is also highly unidirectional, and unextractable. Often, the data is far from liquid, and that is a problem.

In a KLAS Research study on Accountable Care that looked at the information technology (IT) solutions needed for an ACO, it was found that analytics was at the top of the list. Another study that delved further into “obstacles to widespread analytics adoption” saw the ability to get the data as being the leading limitation. Data interoperability is not just a “nice to have” feature anymore, it is a competitive advantage. It is encouraging that the Office of the National Coordinator (ONC) has been pushing for wider adoption of data exchange standards as part of the Meaningful Use (MU) initiative. Achieving a level of data interoperability enables a number of key functions that essentially are performed at the “above the electronic health record (EHR) level,” such as enterprise analytics.

The reality too is that as much as we struggle on with data liquidity across EHRs, real interoperability that leads a more wholistic view of the patient and the community they live in needs to take into consideration data that’s well outside the confines of these EHRs. These include social determinants of health data such as economics indices, environmental dimensions, housing data sets and zip code data. Data from multiple disparate information systems need to be woven together to derive meaningful operational and clinical insights that drive actionable workflow and behavior change.


Getting insights from data requires some level of ground work. Much like how a gardener sows his seeds, and cares and nurtures his garden, managing data, especially at scale, requires some discipline, and, arguably, a good deal of passion. Managing data entails having disciplined methodologies around data integration, and data governance, and managing data quality, security and information lifecycle management.

Like a gardener, data stewards may need to do some weeding and pruning before data analysts and data scientists can start harvesting the data farms. The crops of data may yield insightful ingredients that cooks, professional or otherwise, may then want to conjure together into an appetizing and nutritious meal. With the right set of tools and capabilities, data analysts and data scientists can serve up the right set of capabilities for clinicians, administrators and technologists to glean insights that are truly meaningful. What’s perhaps even more interesting is movement towards self-service capabilities where front-line users, such as clinicians, can have direct access to simple tools that are able to yield tasteful and actionable information visualizations with minimal effort.

What’s the future of big data?

I believe that the value of big data will rise exponentially, especially as ways to tame the veracity of big data continue to be addressed, alongside now established methodologies to deal with the volume, variety and velocity of data.

Big data calls for newer approaches and tools to perform actions such as predictive analytics. It also calls for a change to how we think about knowledge extraction and interpretation. There’s been much development in the space of artificial intelligence, machine learning, deep learning and what is now being called cognitive computing. Machine learning is to big data as human learning is to life experience: we interpolate and extrapolate from vast past experiences to deal with specific unfamiliar situations. Machine learning with big data will duplicate this behavior, at massive and replicable scale. Big data coupled with the ‘pattern recognition at scale’ capabilities of machine learning will allow us to program systems to automatically learn and to improve with experience, such as learning to predict which patients will respond best to which treatments, by analyzing multiple streams of data and experience at scale, often in real time. This will continue to allow us to develop algorithms that discover general conjectures and knowledge from specific data and experience, based on sound statistical and computational principles.

The future ahead could look nothing like the past we left behind.

Health Datapalooza

As a co-chair at this year’s Health Datapalooza, April 26-27 in Washington, D.C., I’m excited to learn about the newest, most innovative, and effective uses of health data and hear real stories of actionable knowledge being created from data, used to improve health care and health care delivery. I hope you’ll consider joining us!

Want to know more? Follow this blog for thoughts from AcademyHealth, the Health Datapalooza steering committee, and other thought leaders and experts, and share your thoughts with us on Twitter using #hdpalooza and follow @hdpalooza.

Rasu Shrestha
Committee Member

Rasu Shrestha, MD, MBA

CIO - University of Pittsburgh Medical Center (UPMC)

As Chief Innovation Officer, Dr. Shrestha is responsible for driving UPMC’s innovation strategy. Read Bio