Azure data lake vs Azure Data storage Blob


Azure Data lake basically is a storage repository. but it's more than just a storage repository. Microsoft says "It's an enterprise-wide hyper-scale repository for big data analytic workloads". Here is the list of key capabilities support that statement comparing with Azure data storage Blob.

First of all, Azure Data lake store is an Apache Hadoop file system compatible with HDFS but Azure data storage Blob is not compatible with HDFS. You could use webHDFS-compaitble REST API call on Azure Data lake storage. but Azure Blob Storage REST API on Azure Blob data storage.

Since Azure Data lake is compatible with HDFS, data stored in Data Lake Store can be easily analyzed using Hadoop analytic frameworks. It provides much more higher throughput to query and analyze large amounts of data comparing with Azure Data storage Blob.

Second, Azure Data Storage Blob has file size,storage accounts and  other limitations. Azure Storage provides three types of blobs: block blobs, page blobs and append blobs. A single block blob can contain up to 50,000 blocks of up to 100 MB each,which is total size to around 4.75 TB. append blobs has much less size in each block, 4MB each, totals to 195GB. Page blobs can be up to 1TB. Other limitation details are in this link . Azure Data lake doesn't have all these limitation.

Third, security. Data operations authentication and authorization in Azure Data Lake is based on Azure Active Directory Identities.Authentication cab be set at file and folder level. In Azure data storage, it's based on shared secrets, Account Access Keys and Shared Access Signature keys. It's less secure than AAD.

References:

Comparing Azure Data Lake Store and Azure Blob Storage
Overview of Azure Data Lake Store
WebHDFS FileSystem APIs
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-integrate-with-other-services
Azure subscription and service limits, quotas, and constraint




Comments