This is the first article in my series of articles on Building a Data Lake on AWS. In this article we will see what is a Data Lake.
You can find wikepdia definition in below window.
For me, Data Lake is an architectural decision than a technology. Data Lake is a central repository where we can store data of variety of formats. The repository itself is built on a scalable architecture that enables to load and access data using wide range of tools.
In summary, any data lake platform should have following properties.
All data in a single place.
Handles structured/semi-structured/unstructured/raw data.
Supports fast ingestion and consumption.
Schema on read and Schema on write.
Designed for low cost and multi tired storage
Decouple storage and compute.
Supports protection and security rules.