dcsimg

How to Future-Proof Your Data Lake: Six Critical Considerations

  • How to Future-Proof Your Data Lake: Six Critical Considerations-

    Consider Ingest and Download Speeds

    The ability to stream data in and out of the system is critical no matter what kind of analysis you may ultimately want to perform. Today, people experimenting with Hadoop generally create a separate, not terribly reliable HDFS-based repository for data under analysis. Over time, this model is not feasible. The data lakes of the future must be able to present themselves as potential sources for MapReduce type analysis. Additionally, they need to deliver the data at very high rates of speed over parallel streams to the compute engine(s).

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

How to Future-Proof Your Data Lake: Six Critical Considerations

  • 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
  • How to Future-Proof Your Data Lake: Six Critical Considerations-6

    Consider Ingest and Download Speeds

    The ability to stream data in and out of the system is critical no matter what kind of analysis you may ultimately want to perform. Today, people experimenting with Hadoop generally create a separate, not terribly reliable HDFS-based repository for data under analysis. Over time, this model is not feasible. The data lakes of the future must be able to present themselves as potential sources for MapReduce type analysis. Additionally, they need to deliver the data at very high rates of speed over parallel streams to the compute engine(s).

We have reached an inflection point in the rate of data creation that, unless you are willing to start throwing huge quantities of it away, you simply cannot afford to continue using the same technologies and tools to store and analyze it. The existing data silos – impractical for many reasons beyond pure expense – simply must be consolidated, even if the full picture of exactly how the utility of each piece of data will be maximized is still unknown.

One potential option many businesses have chosen to pursue in the hopes of addressing current business concerns while also maximizing future possibilities and minimizing future risks is building a data lake. With that, however, comes a separate set of challenges and considerations.

Large data volumes drive the need for data lakes. In simple terms, a data lake is a repository for large quantities and varieties of data, both structured and unstructured. The data is placed in a store, where it can be saved for analysis throughout the organization. In this slideshow, Storiant, a cloud storage provider, has identified six tips on how a data lake can reconcile large volumes of data with the need to access it.