The cloud isn’t new and, realistically, neither are hybrid, multi-cloud environments. What is new is pandemic-fueled awareness that the hybrid, multi-cloud is here and isn’t going away, ever. Several proof points come to mind from the past year alone. The first involves Oracle’s (News - Alert) deal with Zoom. Oracle was already selected as TikTok’s secure cloud provider. Now it’s hosting two of the most highly visible apps today. It also makes them the fourth public cloud vendor that matters. The second proof point is that large enterprises are starting to realize strategic value in data repatriation by moving from cloud back to on-prem.
The trends are clear. Data exists in a larger number of locations than ever before; strategic data movement beyond mere copying is on the rise; and, most crucially, the business value of accomplishing data-driven strategy and operations is more urgent than ever. But these trends are in tension. The more data moves, and the more places it is needed, the harder it becomes to connect data to drive insight.
From the data perspective, there’s no such thing as “the cloud” and hybrid multi-cloud is actually making the job of connecting data harder, not easier. At the infrastructure level, hybrid multi-cloud is, in some sense, ideal, giving unprecedented flexibility and economic advantage to large enterprises. Unfortunately, it also makes data fragmentation worse, creating even more impediments to achieving digital transformation.
Just like fish don’t pay much attention to the ocean they swim in, historically data management practitioners haven’t paid much attention to the main lever of data integration since there weren’t really any alternatives. That lever is, of course, data location in the storage layer. With very few exceptions, conventional data management including everything from databases to data warehouses to data lakes has always worked by moving data to computation and by leveraging the location of data in the storage layer.
The arrival of the cloud didn’t fundamentally change data integration initially. Databases, data warehouses, and data lakes have all moved to the cloud, but from the data perspective, the cloud is just someone else’s “on prem” environment. Data is still integrated in the cloud by moving and copying it to new places. Snowflake, which is unquestionably successful, is still conventional data management in this sense. Snowflake customers move their data into Snowflake to integrate it there…or they don’t get any value from the cloud-based data-warehousing solution at all.
But the hybrid, multi-cloud world is here now, and the pandemic has accelerated awareness of this change. It also means it’s time to start considering alternatives. Is there another way to connect data such that the infrastructure and finance people can have their hybrid, multi-cloud cake while the data and digital transformation people get to eat it, too?
Data Fabrics: The Missing Stich to Abstracting Away the Data Location Debate
Fortunately, there is an alternative in data virtualization (DV) approaches, which have existed for many years but were confined strictly to the subset of enterprise data that neatly fits into the relational data model. As a result, existing DV pure plays struggle with the diverse breadth of today’s heterogenous data landscape, including semi- and unstructured data. But data virtualization can be decoupled from the relational data model and that points us to a solution.
The emerging space of data fabric vendors offers a way forward. The key alternative to leveraging data location in storage, especially in a world where storage locations are proliferating, is to leverage the meaning of data to the business. Rather than moving data to computation, data fabric, and in particular those powered by knowledge graph technologies, offers the chance to move computation to data.
When data environments are proliferating, the best strategy is not to fight the trends but to co-opt them. Rather than moving data to integrate, query, and connect it, data fabric platforms use virtualization technology, graph data models, and metadata intelligence to connect and query data where it lives, without moving or copying it, based on what the data means, in place, to the business.
This approach makes it possible to rationalize data location, storage, and hybrid, multi-cloud strategy in a way that’s decoupled from how the business is using or consuming the data. As a result, economic-drive decisions about spot prices, cloud vendor arbitrage, or long-term cost and operational efficiencies can all be considered without direct, massive second-order effects on the enterprise’s overall strategic goal of using data to make sense of the world. That’s a key insight. IT always advances by recognizing these opportunities to decouple.
Just as the cloud itself abstracted away the idea of a physical IT environment, data fabric technology offers a similar opportunity to abstract away the idea of data location itself. To prosper in a hybrid, multi-cloud world, we need to employ approaches that make data location increasingly irrelevant. Where data is stored is almost always irrelevant to what the data means and to how the business uses the data to derive insight, increase profits, and decrease costs.
About The Author: Kendall Clark is founder and CEO of Stardog, the leading Enterprise Knowledge Graph (EKG) platform provider. For more information visit www.stardog.com or follow them @StardogHQ.
Edited by Maurice Nagle