Big Data and the HyperScale Challenge

The emergence of mission-critical business functions that rely upon machine-generated data requires enterprises to evaluate and adopt new storage architectures that can support the ever-accelerating velocity of data ingest, volume of data, and variety of object data types.  At the same time, these architectures must support the scaling of storage capacity to exabytes of data behind the firewall, while delivering the economies of a public storage cloud, similar to Google, Facebook, and Amazon.

The term Big Data has become a categorical phase compassing a broad landscape of challenges, use cases and requirements. Within the Big Data category a number of unique requirements exist for different architectures and solutions. YottaStor is focused on machine generated data, the fast growing segment of Big Data. Machine-generated data is overwhelming traditional, POSIX-based architectural designs and rendering them obsolete. Commercial and Federal enterprises are spending hundreds of millions if not billions of dollars deploying advanced sensor technologies that create and capture machine generated data. The YottaDrive is a patented, purpose-built large data object storage service that economically stores this data and exploits it for business insight.  

Defining the HyperScale Challenge

The traditional business enterprise is experiencing massive data growth. Traditional database architectures are giving way to new data types like video, multi-spectral sensing, medical imaging, cyberpacket capture, and geospatial data.

In early 2010, a new category of data was recognized by industry data organizations that speaks to massive data growth. This new category is called machine generated data.

Machine generated data is created by powerful sensor technologies. Examples include:

  1. smartphone cameras that range from 8-41 megapixels
  2. medical imaging devices used for diagnostic processes
  3. genome sequencing technologies that are the foundation of bio-informatics discovery, and
  4. powerful gigapixel class EO/IR sensors that are part of the DoD ISR mission and IP packet capture probes attached to the WAN backbone.

According to a recent IDC report, 80% of future enterprise data growth will consist of machine generated data.

Another characteristic of machine generated data is that each stored object is growing in size as well.  The smartphone camera serves as an instructive example.  As recently as three years ago the most advanced smartphone camera was 1 or 2 megapixels. Today the Nokia Luma 1020 smartphone has an integrated 41 megapixel camera. A single picture is still just one picture, but with the Nokia Luma 1020 that picture now has 20 to 40 times more data.

The business processes supporting these workloads are fundamentally different from traditional enterprise workloads like supply chain, manufacturing and ERP that drive traditional POSIX based storage architectures.  The new workload requirements for machine generated data include:

  • Data Ingest
  • Content Streaming
  • Content Collaboration
  • Content Dissemination
  • Analytical Frameworks

Psychology of the Hyperscale Curve

As machine-generated volume grows and the need to maintain more data grows, most CIOs and Data Managers go through a series of stages we call the "Psychology of the Hyperscale Curve" from denial of a problem to the reality of the cost issues to maintain volumes of data.  It is relatively easy to see that it becomes prudent to address the these issues earlier rather than later.  The chart below illustrates these stages:

Introducing HyperScale Technologies

HyperScale technologies date back to the early part of the decade to support the workload requirements of the maturing internet market. This innovation was necessary because the amount of data stored was growing at such a rate that traditional enterprise technologies simply could not scale at the required economic to meet the new requirements. 

Market leaders like YouTube, Google, Facebook, and Amazon had the ability to employ hundreds of computer scientist to design and develop the first generation of technologies needed to handle the explosion of users and data to be stored. But, over the past decade, as these operational concepts, architectures and workload requirements have become better understood, operationally proven and generally understood by a larger audience of IT professionals a new class of entrepreneur has emerged to transfer these breakthrough technologies to the enterprise market. 

Analytics at HyperScale

Analytics at HyperScale can be summed up in a single phase – scale changes everything. The constantly accelerating ingest velocity creates a situation where the storage system must seamlessly expand without operational disruption. This accelerating ingest velocity renders traditional extract, transform and load (ETL) architectures for moving data from primary storage to the analytical environment using traditional enterprise analytic engines like SAS, GreenPlum, and Hadoop infeasible.

Instead, at HyperScale, the storage system must deliver integrated analytical capabilities like MapReduce which allows smaller, more precise data sets to be virtually identified and then moved to an analytic post-processing engine for exploitation.

Traditional approaches to extracting all potential and possible metadata at the point of ingest is also infeasible because the ingest velocity and sheer scale of the data volumes quickly results in this being an unaffordable solution.

Finally, one of the industry lessons in building advanced analytics in the anti-terror or law enforcement domain is that our adversaries have learned to quickly change their tactics and procedures to avoid detention by our asymmetric information dominance capability. This means that the intelligence yield of a particular algorithm can have a very “short shelf life.”