Google (News - Alert) announced some big news concerning its cloud platform this week, removing the beta label from its Cloud Dataflow offering and making it generally available to users of Google Cloud Platform. As a value-added service, Cloud Dataflow is designed to help streamline large-scale cluster management and optimization.
Google Cloud Platform enables the processing and analysis of thousands of terabytes of data, and Cloud Dataflow essentially offers a unified programming model to handle the batching and streaming of disparate data flows. Google used a number of its existing resources in developing the service, including MapReduce, FlumeJava and Millwheel. Cloud Dataflow works in tandem with the Cloud Pub/Sub and BigQuery services to help users analyze and process massive amounts of data efficiently, so they may focus on their applications instead of data flow and analysis.
"We are utilizing Cloud Dataflow to overcome elasticity challenges with our current Hadoop cluster,” said Sudhir Hasbe, director of software engineering at Zullily.com, as part of a post on the Google Cloud Platform Blog announcing general availability. “Starting with some basic ETL workflow for BigQuery ingestion, we transitioned into full blown clickstream processing and analysis. This has helped us significantly improve performance of our overall system and reduce cost."
Cloud Dataflow offers fault tolerance and SLA for batch and stream processing, and can also deal with balancing latency and correctness at a reasonable price point. Google claims the service is faster and cheaper than Hadoop and effectively optimizes resource usage for performance gains.
“General availability is a key milestone, though hardly the end of the road,” wrote Eric Schmidt (News - Alert), PM Cloud Dataflow & Rohit Khare, PM Cloud Pub/Sub in the blog post. “We are continuing to innovate with the alpha release of the gcloud pubsub tool and today’s beta release of our new Identity and Access Management (IAM) APIs and Permissions Editor in the Google Developers Console. These improvements allow users to control access down to the level of particular operations on specific topics and subscriptions. IAM ACLs make it easier to connect multiple Cloud Platform projects, either within the same organization or to third-party services.”
Edited by Dominick Sorrentino