Overview
Azure Blob Storage is Azure's managed object storage service for unstructured data such as images, documents, media, logs, backups, exports, and analytical files.
Its primary resource hierarchy is:
Storage account
-> Blob service
-> Container
-> Blob
A storage account defines the namespace, region, redundancy, performance class, networking, identity settings, encryption, and data-protection features. Containers group blobs and form authorization and policy boundaries. A blob is an individual object composed of content, system properties, and optional metadata or index tags.
In a standard flat namespace, folders are not independent resources. A name such as invoices/2026/06/123.pdf is one blob name containing slash characters. Tools can present those prefixes as virtual folders. With Azure Data Lake Storage Gen2 hierarchical namespace enabled, directories become real resources with directory-aware operations and access control semantics.
Access tiers optimize block-blob cost according to expected access frequency:
- Hot for frequently accessed data.
- Cool for infrequently accessed online data.
- Cold for rarely accessed online data.
- Archive for offline data that can tolerate rehydration.
- Smart for supported accounts where online access patterns are uncertain.
For interviews, candidates should be able to explain the resource hierarchy, distinguish virtual folders from hierarchical directories, design blob names, choose between metadata and blob index tags, select access tiers using total cost, and describe how organization choices affect authorization, lifecycle policies, listing performance, and operations.
Core Concepts
Storage Accounts
A storage account is the top-level Azure resource and globally unique namespace for Azure Storage data.
It controls:
- Azure region.
- Redundancy such as LRS, ZRS, GRS, RA-GRS, GZRS, or RA-GZRS.
- Standard or premium performance.
- Public and private networking.
- Microsoft Entra ID and Shared Key configuration.
- Encryption and customer-managed keys.
- Soft delete, versioning, and point-in-time recovery.
- Default blob access tier.
- Diagnostics and governance.
A default Blob endpoint resembles:
https://examplestorage.blob.core.windows.net
The storage account is a substantial security, performance, billing, and lifecycle boundary. Do not place unrelated workloads in one account merely for convenience when they require different:
- Network exposure.
- Encryption keys.
- Identity boundaries.
- Redundancy.
- Lifecycle policies.
- Performance characteristics.
- Regulatory controls.
Conversely, creating a separate account for every small container adds operational overhead. Use accounts to separate meaningful policy and scale boundaries.
Storage Account Types
Common account choices include:
- Standard general-purpose v2: Recommended for most Blob, Queue, Table, and Azure Files scenarios.
- Premium block blob: Optimized for block and append blobs requiring high transaction rates or consistently low latency.
- Premium page blob: Designed for page blob workloads.
Legacy account types should generally not be selected for new solutions unless a specific compatibility requirement exists.
Standard access tiers and lifecycle tiering are primarily features of general-purpose v2 block blob storage. Premium accounts use a different price and performance model.
Redundancy Is Separate from Access Tier
Redundancy determines where copies of data are maintained. Access tier determines the capacity-versus-access cost model.
Examples:
- Hot data can use LRS or ZRS.
- Cool data can use geo-redundancy.
- Archive currently supports a narrower set of redundancy options than online tiers.
Ask two separate questions:
- How quickly and frequently will the data be accessed?
- Which infrastructure failures must the data survive?
Do not assume that archive means backup or that geo-replication protects against accidental deletion. Replication can reproduce deletion or corruption. Use versioning, soft delete, immutability, or independent copies according to the threat model.
Containers
A container groups blobs within a storage account. A container can be used as a boundary for:
- Azure RBAC scope.
- Anonymous access configuration.
- Stored access policies.
- Immutability policies.
- Lifecycle name prefixes.
- Operational ownership.
- Tenant or data-class separation.
Container names:
- Must be 3 through 63 characters.
- Must use lowercase letters, numbers, and hyphens.
- Must begin and end with a letter or number.
- Cannot contain consecutive hyphens.
An example container URI is:
https://examplestorage.blob.core.windows.net/invoices
Containers are not nested. A container cannot contain another container. Apparent nesting happens within blob names.
Container Design
Create containers around policy and operational boundaries rather than visual folder preferences.
Good reasons for separate containers include:
- Public marketing assets versus private customer documents.
- Temporary upload quarantine versus approved content.
- Different retention or immutability policies.
- Different RBAC assignments.
- Different lifecycle schedules.
- Different owning teams.
Avoid one container per end user when the application expects millions of users unless the design has measured the operational and policy implications. A tenant prefix or storage-account partition can be more manageable.
Blobs
A blob is an individual object stored in a container. It has:
- Binary content.
- A case-sensitive name.
- System properties such as content type, content length, ETag, and last-modified time.
- Optional user-defined metadata.
- Optional blob index tags.
- Optional snapshots or versions.
- An access tier when supported.
The full identity of a blob includes its account, container, and blob name:
https://examplestorage.blob.core.windows.net/invoices/2026/06/123.pdf
Blob names can be up to 1,024 characters and may contain many characters, but URL-reserved characters must be encoded. Avoid names that end in dots or slashes because clients and file-system tools can handle them inconsistently.
Blob Names Are Case-Sensitive
These are different objects:
reports/June.csv
reports/june.csv
Case-sensitive names can surprise users and systems originating from case-insensitive file systems. Establish a naming convention, commonly lowercase for generated identifiers, and enforce it in one place.
Do not use a user-supplied filename as the only blob identity. Safer naming uses stable generated identifiers:
documents/{tenant-id}/{document-id}/{version-id}
Store the original filename as validated business metadata if users need it.
Flat Namespace and Virtual Folders
By default, Blob Storage uses a flat namespace. A slash is simply part of the blob name:
photos/2026/06/banner.png
There is no separate photos, 2026, or 06 directory resource. A client lists blobs using a prefix and delimiter to display a hierarchy.
Consequences include:
- Creating a virtual folder does not require an API call.
- Renaming a virtual folder means copying or renaming all matching blobs through supported tooling.
- Deleting a prefix means enumerating and deleting matching blobs.
- A virtual folder cannot independently own metadata or RBAC.
- An empty virtual folder does not normally exist.
This model scales well for object storage but differs from a traditional file system.
Hierarchical Namespace
Azure Data Lake Storage Gen2 adds a hierarchical namespace. Directories become first-class resources with:
- Atomic directory rename and delete operations.
- POSIX-like access control lists.
- Directory-level organization.
- Better analytics filesystem semantics.
Enabling hierarchical namespace affects feature compatibility and cannot be treated as a cosmetic folder option. Choose it when analytics, filesystem operations, or directory ACLs require it.
For ordinary application documents accessed by object ID, flat Blob Storage is often simpler.
Prefix Design
Prefixes organize listing and lifecycle operations:
tenant-42/invoices/2026/06/123.pdf
tenant-42/exports/2026/06/15/report.csv
A useful prefix can encode stable partitioning dimensions:
- Tenant.
- Data class.
- Date.
- Processing state, if state changes do not require moving large objects.
Avoid embedding mutable business values, such as display names or status, when changing them would require copying objects. Put mutable classification in indexed metadata or a database.
Modern Blob Storage automatically partitions data, so old advice about randomizing every prefix is often unnecessary. Still test high-scale workloads and avoid concentrating all operations on a single hot object.
System Properties
System properties are HTTP headers or storage-managed values such as:
Content-Type.Content-Length.Content-Disposition.Content-Encoding.Cache-Control.ETag.Last-Modified.
Set these correctly when uploading. For example:
Content-Type: application/pdfhelps clients interpret a document.Content-Disposition: attachment; filename="invoice.pdf"encourages download.Cache-Controlaffects browser and CDN behavior.
Do not trust a user-provided content type. Validate the content and set server-controlled properties.
ETags and Conditional Operations
Each committed blob version has an ETag. Use ETags for optimistic concurrency:
If-Match: "expected-etag"
The update succeeds only if the current blob still has that ETag. This prevents one writer from silently overwriting another writer's changes.
Useful conditions include:
If-Matchfor update-if-unchanged.If-None-Match: *for create-only behavior.- Lease IDs for exclusive write coordination.
- Blob index tag conditions for state-aware operations.
Avoid unconditional overwrite when concurrent writers are possible.
Metadata
Metadata consists of user-defined name-value pairs stored with a container or blob.
Metadata keys:
- Must begin with a letter or underscore.
- Can then contain letters, numbers, and underscores.
- Must be valid ASCII.
- Are case-insensitive when read or set, although original casing is preserved.
Metadata values must also be valid ASCII.
Use metadata for small descriptive values retrieved with the object, such as:
original_filename = invoice-june.pdf
source_system = billing
schema_version = 3
Metadata is not a general searchable index. Finding blobs by arbitrary metadata usually requires listing and inspecting objects or maintaining an external index.
Metadata Update Behavior
Metadata updates replace the metadata collection rather than patching one field in isolation in many APIs. A safe update reads existing metadata, modifies the intended value, and writes the complete set with an ETag condition.
Concurrent metadata updates without ETags can lose fields.
Metadata is part of the blob's descriptive state. Avoid:
- Secrets.
- Access tokens.
- Sensitive personal data.
- Large JSON documents.
- High-cardinality query fields that need indexed search.
Blob Index Tags
Blob index tags are indexed key-value strings associated with a blob. They support:
- Finding blobs across containers.
- Conditional blob operations.
- Lifecycle policy filters.
- Dynamic classification without changing the blob name.
Example:
Tenant = tenant-42
Status = Approved
RetentionClass = FinanceSevenYears
A blob can currently have up to 10 index tags. Keys and values are case-sensitive strings. Queries compare values lexicographically, so numeric and date values should use sortable formats:
Priority = 00042
BusinessDate = 2026-06-15
Index updates can take time to appear in search results. Do not use tag search as an immediate strongly consistent transaction lookup.
Metadata Versus Blob Index Tags
Neither replaces a relational database for rich business queries, joins, transactions, and authorization rules.
Business Metadata in a Database
Store rich business metadata separately when the application needs:
- Full-text or multi-field search.
- Joins.
- Tenant and ownership constraints.
- Workflow state transitions.
- Transactional updates.
- Audit history.
A common model is:
Azure SQL Database
DocumentId
TenantId
OwnerId
BlobName
OriginalFilename
Size
Checksum
Status
RetentionClass
Blob Storage
Binary content
Technical metadata
Index tags for storage lifecycle
The database record authorizes and locates the object. The blob name alone is not an authorization mechanism.
Access Tiers
Access tiers apply to block blobs in supported standard accounts.
The main standard tiers are:
- Hot: Frequently accessed or modified.
- Cool: Infrequently accessed but immediately available.
- Cold: Rarely accessed but immediately available.
- Archive: Offline and requires rehydration before content access.
Current minimum recommended retention periods for general-purpose v2 accounts are:
- Cool: 30 days.
- Cold: 90 days.
- Archive: 180 days.
Deleting, overwriting, or moving data before the minimum period can incur prorated early-deletion charges.
Hot Tier
Hot has the highest capacity cost but the lowest access and transaction cost among standard online tiers.
Use it for:
- Active application content.
- Frequent reads and writes.
- New data before usage is known.
- Processing inputs.
There is no minimum recommended retention period.
Cool and Cold Tiers
Cool and cold remain online with millisecond retrieval. They trade lower capacity cost for:
- Higher transaction cost.
- Data retrieval charges.
- Slightly different availability targets.
- Minimum retention economics.
Use cool for infrequently accessed data expected to remain at least 30 days. Use cold for rarely accessed data expected to remain at least 90 days but that still needs immediate access.
Archive Tier
Archive is offline. The blob's properties and metadata remain visible, but the content cannot be read or modified until rehydrated to an online tier.
Rehydration:
- Can take hours.
- Has standard and high-priority cost and latency trade-offs.
- Must be initiated explicitly.
- Cannot be performed by lifecycle management.
Archive supports only certain redundancy configurations. Verify compatibility before choosing account redundancy.
Smart Tier
Smart tier automatically moves eligible block blobs among hot, cool, and cold based on access patterns in supported accounts.
It is useful when:
- Data must remain online.
- Access frequency is uncertain.
- The account and redundancy configuration support it.
It does not move data to archive. It also has monitoring and eligibility considerations, so compare it with explicit lifecycle policies.
Default Account Tier and Explicit Tier
A storage account has a default online tier. A block blob without an explicitly assigned tier inherits it.
An explicit blob tier remains independent of later default-account changes. Changing the account default can affect many inferred-tier blobs and generate access or transaction charges.
Inventory affected data and estimate cost before changing the default.
Lifecycle Management
Lifecycle policies can transition blobs based on:
- Age since creation or modification.
- Last access time when tracking is enabled.
- Blob type.
- Name prefix.
- Blob index tags.
Policies can also delete current blobs, versions, and snapshots. Execution is asynchronous, not an exact-time scheduler.
Example strategy:
uploads/quarantine/ -> delete abandoned data after 7 days
invoices/ -> cool after 90 days, archive after 365 days
temporary-exports/ -> delete after 30 days
Test filters against inventory before enabling deletion.
Listing and Pagination
Blob listing is paginated. Production code should:
- Handle continuation tokens.
- Use a prefix to reduce the result set.
- Avoid loading all blob names into memory.
- Stop when the required result is found.
- Avoid using listing as an interactive business search engine.
For user-facing search, maintain indexed business metadata in a database or search service.
Security Boundaries
Containers and blobs are data-plane resources. Secure access using:
- Microsoft Entra ID.
- Managed identities.
- Azure data-plane RBAC roles.
- User delegation SAS for temporary direct access.
- Private endpoints.
- Restricted public network access.
Do not confuse control-plane permissions, such as managing the storage account, with data-plane permission to read blob content.
Common Mistakes
Common mistakes include:
- Treating virtual folders as independent resources.
- Using user filenames as unique object identifiers.
- Ignoring case sensitivity.
- Storing searchable business state only in metadata.
- Putting secrets or personal data in tags.
- Assuming tag search is immediately consistent.
- Changing account default tier without estimating charges.
- Archiving data that requires immediate recovery.
- Using cooler tiers for frequently overwritten blobs.
- Designing containers only around visual folder structure.
- Listing an entire container to answer user-facing queries.
- Using blob names as authorization.
Best-Practice Design Checklist
A production design should normally:
- Use general-purpose v2 unless a measured premium need exists.
- Separate accounts and containers by meaningful policy boundaries.
- Generate stable, case-consistent blob names.
- Keep mutable business fields out of object paths where possible.
- Use metadata for small descriptive values.
- Use index tags for storage search and lifecycle classification.
- Use a database for rich business metadata and authorization.
- Apply ETag conditions for concurrent updates.
- Choose access tier from measured access and retention patterns.
- Model retrieval, transaction, early-deletion, and egress costs.
- Use paginated prefix listing.
- Test lifecycle policies and archive restore procedures.
- Secure data with identity, private networking, and least privilege.