SHARE
Facebook X Pinterest WhatsApp

The Next Challenge for Big Data: Geo-Distributed Architectures

The Role of IT in the Cloud Era Few enterprises have the means to build their own scale-out infrastructure for Big Data collection and analytics. That means much of the workload will be ported to the cloud. But since it is neither wise nor practical to push all that data to a single cloud facility, […]

Written By
thumbnail
Arthur Cole
Arthur Cole
Oct 31, 2016
Slide Show

The Role of IT in the Cloud Era

Few enterprises have the means to build their own scale-out infrastructure for Big Data collection and analytics. That means much of the workload will be ported to the cloud.

But since it is neither wise nor practical to push all that data to a single cloud facility, organizations will need to develop the skills and technology to manage Big Data operations across multiple data centers, which will most likely be spread across large geographic areas.

This isn’t as easy as it seems, however. As Morpheus Data’s Darren Perucci points out to DZone, bursting data onto the cloud is still more of a theory than a practice. It isn’t enough to simply push volumes onto third-party infrastructure; it has to involve authentication, usage tracking, performance monitoring, and a host of other functions. At the same time, distributed cloud environments incorporate a vast array of platforms, formats, protocols and other elements, making it difficult to effectively coordinate resource consumption and data flow, even in open source environments. Emerging systems and services are smoothing out many of the rough edges, of course, but we are still a long way from an integrated cloud environment capable of functioning across multiple data centers.

One of the keys to such a scheme for Big Data workloads is to embed it into the streaming module of the database cluster framework. Confluent recently took this step with the Confluent Enterprise version of Apache Kafka, providing critical tools like multi-data center (MDC) replication, automatic load balancing and cloud migration. The system allows the enterprise to establish secure cluster replication across geo-distributed infrastructure while maintaining centralized configuration management. It also takes care of the synchronization between clusters, as well as SSL encryption and SASL-supported authentication under Kerberos and Active Directory protocols.

Since many Big Data processes must be automated by necessity, this same functionality must extend to multiple data centers, as well. Snowflake Computing recently added new resilience and fault-tolerance capabilities to its Elastic Data Warehouse platform to maintain the high-speed performance that users expect from single-entity data environments. The system now enables automatic scalability with no delays or operator intervention, as well as improved dashboarding and reporting for faster query management, plus data milestoning and continuous data access for improved replication and disaster recovery. In this way, the company says it can support advanced Hadoop-based workloads alongside normal business intelligence and reporting applications in the same warehouse.

Much of the Big Data analytics process is also expected to take place on in-memory infrastructure to provide faster transfer between storage and processing. This can become problematic in distributed environments due to the need to quickly compile data from diverse sources and generate results to users who may be some distance away from the processing center.  This is what GigaSpaces is hoping to address with the new XAP 12 platform that features an open and decoupled core to support high-performance in-memory data grids. The solution supports millisecond performance in cross-platform architectures utilizing both RAM and SSD storage. At the same time, it enables multi-center replication for both recovery operations and data localization requirements, as well as full session replication to cut latency across distributed clusters.

As mentioned above, however, none of these solutions addresses the full gamut of challenges involved in distributed Big Data environments. This will likely lead to a layered approach at most enterprises, with the most critical, time-sensitive applications hosted close to home while multiple regional and edge solutions provide results to users and other stakeholders wherever they reside.

In time, as networking and automation tools become more sophisticated, this should evolve into an integrated data analytics presence that extends across the wide area, providing both the scale and resilience the enterprise needs to mine real value from all accumulated data.

Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata and Carpathia. Follow Art on Twitter @acole602.

Recommended for you...

Best Cloud Security Solutions
Aminu Abdullahi
Jun 24, 2022
Strategies for Successful Data Migration
Kashyap Vyas
May 25, 2022
Leveraging AI to Secure CloudOps as Threat Surfaces Grow
ITBE Staff
May 20, 2022
The Emergence of Confidential Computing
Tom Taulli
Apr 20, 2022
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.