The New World of Hyper-Scale Unstructured Data

CLOUD STORAGE

The New World of Hyper-Scale Unstructured Data

The world has become awash in rich media, and it’s driving a different kind of Big Data set of business requirements to keep it safe, secure, uncorrupted and accessible for years.Every day consumers create and view images and video from a wide variety of origins, whether self-created or shared online from friends or professional sources.  Resolutions are increasing rapidly, with high-definition formats and even 3D becoming the norm. Individual photo image files have grown rapidly from hundreds of kilobytes to tens of megabytes.

Such rich media represents a unique and rapidly proliferating type of Big Data best characterized as “Big Unstructured Data.” Usually the term Big Data refers to analytics performed on relative small structured data files. In contrast Big Unstructured Data refers to massive size files as typified by HD video and other video files that are each hundreds of gigabytes. In media and entertainment (M&E) specifically, the advent of High-Frame Rate (HFR) movies, at 48 frames per second, and soon in 60 frames per second and even higher, easily will be generating10’s of Terabytes of digital file content per hour. Imagine the countless hours of video that will eventually be captured and maintained across the world.

Aside from rich media, a massive amount of Big Unstructured Data is also generated in healthcare, oil and gas, government, scientific applications and cloud services. The trend here goes beyond long term archiving of this content for legal and compliance reasons, but also for monetizing and unlocking the value of the content.  

Traditional enterprise storage arrays or tape were never designed for the scale of the data that now must be stored.   Managing systems with tens to hundreds of thousands of disk drives becomes very different from managing a few hundred disk drives. The challenge here is to create unbreakable storage for the world’s largest portals, social networks, online applications, enterprise and scientific repositories at this scale.

Disk-based object storage systems with erasure code data protection address this problem. Rather than designing for localized application storage, bounded by a few users and isolated to a single corporation – these are systems designed for Internet scale storage. These systems are purpose built to address the scale, durability, management and cost requirements of the coming generation of Hyperscale unstructured data. They share fundamental properties that make them suitable for these new deployments:

  • Systems can scale seamlessly, without interruption, from tens of petabytes to exabytes and billions to trillions of files
  • Extremely high levels of storage durability, to not only protect this precious data against the loss, but also to assure the integrity of the data. Since hardware will always fail, and in fact the rate of failures increases as the number of disk drives increases – the ability to manage this type of durability in systems with daily and even hourly failures in the underlying systems is a key aspect.
  • Performance that is matched to big data access patterns and increase to match system capacity
  • A true comprehensive reduction in cost of ownership, not just entry price but affordable overall ownership costs, including administration and environmental costs
  • Organic scaling over newer generations of platform components, with automated data migration to keep systems running for decades while underlying hardware changes occur
  • Distributed access across geographically disparate locations

Content owners are starting to reap the value of their historical media assets to offset the cost of storage for their Big Unstructured Data. For example, several major sports leagues have realized that they can monetize historical video clips of their athletes or key events by making them available on-line on object-based disk storage. The Monteux Jazz Festival is monetizing its 45 years of concert videos by converting them form the analog tape vault to a high-durable disk based archive. Rather than collecting dust as they did in the last 45 years, they can now deliver the videos “live” via streaming to a series of jazz cafes.

With the requirement of enterprises, research institutes and government organizations to retain their unstructured data on-line to repurpose and monetize it, the Big Unstructured Data problem extends well beyond the storage of media content. Organizations should thoughtfully consider the scalability, performance and durability demands placed by their unstructured data needs, and whether it’s time for a sea change in their underlying storage architectures in order to turn it from challenge into opportunity. 

Wim De Wispelaere is chief technology officer and founder of Amplidata (News - Alert).




Edited by Stefania Viscusi
blog comments powered by Disqus