Though many enterprises have scaled the mountains of Big Data and are leading the way with management of data on an extreme scale, many others have yet to begin the ascent. The towering amounts of information that must be processed and managed seem overwhelming, so IT and the data analysis staff have yet to figure out where to begin.
When Big Data also involves cloud computing, it can seem especially daunting. Various programming models and cloud deployment options are available, but how does an organization choose which one is best?
A recent book on the subject of Big Data provides background to help IT organizations make the right choices for their large-scale data management needs. Titled “Large Scale and Big Data: Processing and Management,” the book includes chapters developed by leaders in the data management community. The book provides insight on management and processing techniques and tools used with Big Data within a wide range of computing models.
Chapters in the book cover important topics such as:
- Distributed programming for the cloud
- Incremental MapReduce computations
- Overview of large-scale stream processing engines
- Virtualizing resources for the cloud
- Security in Big Data and cloud computing
In the preface, the authors explain the need for such a book within the realm of Big Data management:
This book approaches the challenges associated with Big Data-processing techniques and tools on cloud computing environments from different but integrated perspectives; it connects the dots. The book is designed for studying various fundamental challenges of storing and processing Big Data. In addition, it discusses the applications of Big Data processing in various domains… In a nutshell, the book provides a comprehensive summary from both of the research and the applied perspectives. It will provide the reader with a better understanding of how Big Data-processing techniques and tools can be effectively utilized in different application domains.
In our downloads area, ITBE readers can download an excerpt from this book, “Chapter 9: An Overview of the NoSQL World” by Liang Zhao, Sherif Sakr, and Anna Liu. This excerpt explores current advancements in the area of Web-scale Big Data management.
It explains how MySQL, PostgreSQL and SQL Server are all “one-size-fits-all solutions” for relational database management systems (RDBMS), but Big Data has of course created a need for even more scalability. The emergence of NoSQL (Not Only SQL) has helped relieve some of the issues that were experienced with the other RDBMS solutions.
The chapter goes on to explain NoSQL systems and how they have been used among “three of the key players in the Web-scale data management domain: Google, Yahoo and Amazon.” It includes sections on each of these enterprises and explains how each has used its own database system to support larger-scale data.
The chapter also explains open source systems that are available for public use. Other segments in the chapter deal with database as a service and multitenancy within the data management scheme.