Design Principles for Big Data

HyperScale Design Principles

In October 2011, YottaStor published a set of eight design tenets for developing HyperScale storage solutions. These design tenets were developed based on multiple customer experiencing HyperScale data growth. The tenets have been shared and reviewed by multiple DoD and Intelligence Community CTOs, incorporating their thoughts and inputs.

 

YottaStor Design Tenets

1.  Capture and store data once

Write data to disk once during its lifecycle.  The accelerating velocity at which data is being created as well as the sheer magnitude of data being managed will overwhelm network capacity, rendering impractical any attempt to move data electronically.

2.  Process data at the edge

The only point in the architecture that an organization can affordably process data is during the ingest process.  This requires co-locating processing and storage.  Then users can create and capture the metadata required to access the data in the future.

3.  Automate data movement to less expensive storage

The storage system must continually migrate data to less expensive storage, letting customers lower their overall storage cost.  The key metric for storing data becomes cost/GB/month for storing the data. Once this metric is developed for a specific organization, then the year-to-year planning process will focus on reducing this cost.

4.  Adopt self-healing, self-replicating technology

In order to reduce the cost/GB/month the technology must be self-healing and self-replicating. This capability will substantially reduce the number and cost of FTEs required to manage the storage system.

5.  Deploy a federated, global name-space

Adopting name-space technologies that support billions of objects in a single name-space eliminates cost and support complexities.

6.  Access through web services (e.g. S3, REST API, etc)

This level of application abstraction is key to allowing the operational optimization of the storage cloud without impacting the application layer.  One important benefit of this capability is the elimination of “location awareness” that applications must have in POSIX-compliant storage environments.

7.  Design for ever-increasing data variety, data volume and data velocity

The storage system must demonstrate the ability to scale in three dimensions: Data types which will evolve and extend over time, the daily ingest requirement which will continue to increase, and the overall capacity of the storage system which will expand at accelerating rates.

8.  Eliminate RAID

The data durability requirement is actually greater than traditional storage environments. Using new approaches such as replication and erasure coding must be embraced to meet this requirement