top of page

What is a Data Lake?

This is the first article in my series of articles on Building a Data Lake on AWS. In this article we will see what is a Data Lake.

You can find wikepdia definition in below window.

For me, Data Lake is an architectural decision than a technology. Data Lake is a central repository where we can store data of variety of formats. The repository itself is built on a scalable architecture that enables to load and access data using wide range of tools.

In summary, any data lake platform should have following properties.

  • All data in a single place.

  • Handles structured/semi-structured/unstructured/raw data.

  • Supports fast ingestion and consumption.

  • Schema on read and Schema on write.

  • Designed for low cost and multi tired storage

  • Decouple storage and compute.

  • Supports protection and security rules.

bottom of page