Blob Storage, access tiers, lifecycle management, and immutability Interview Questions

Overview

Azure Blob Storage is an object storage service for large amounts of unstructured data such as images, documents, backups, logs, media, exports, and analytical datasets. Data is stored as blobs inside containers in a storage account.

The main design decisions include:

Storage account type and redundancy.
Blob type.
Naming, containers, metadata, and blob index tags.
Authentication and network access.
Hot, cool, cold, archive, or smart access tier.
Lifecycle transition and deletion rules.
Versioning, soft delete, and point-in-time recovery.
Time-based immutability or legal holds.
Monitoring, inventory, cost, and recovery procedures.

Access tiers trade storage price against access price and retrieval latency:

Hot: Highest storage cost and lowest access cost for frequently used data.
Cool: Lower storage cost with higher access cost for infrequently used data.
Cold: Lower storage cost than cool while remaining online for immediate access.
Archive: Lowest storage cost but offline retrieval measured in hours.
Smart: Automatically moves eligible blobs among hot, cool, and cold based on access patterns.

Lifecycle management automates tier changes and deletion based on age, access time, prefixes, and blob index tags. Immutable storage provides write-once, read-many, or WORM, protection by preventing modification and deletion during a retention period or legal hold.

For interviews, candidates should distinguish:

Online access tiers from the offline archive tier.
Lifecycle management from archive rehydration.
Soft delete and versioning from WORM immutability.
Time-based retention from legal holds.
Container-level from version-level immutability.
Durability from availability and disaster recovery.
Capacity cost from transaction, retrieval, early-deletion, and network costs.

Core Concepts

Storage Accounts, Containers, and Blobs

A storage account is the top-level Azure resource for Blob Storage. It defines:

Region.
Redundancy.
Performance class.
Networking.
Authentication and authorization.
Encryption.
Data-protection features.
Default blob access tier.

A container groups blobs and provides a scope for access policies, immutability, and organization. Blob names can contain slash characters that tools display as virtual folders, but standard blob namespaces are flat.

Blob Storage supports:

Block blobs: Files and objects uploaded as blocks. Access tiers apply to block blobs.
Append blobs: Optimized for append operations such as logging.
Page blobs: Random-access pages used by scenarios such as virtual hard disks.

Hot, cool, cold, archive, and smart tier behavior discussed here applies to block blobs. Access tier assignment is not supported for append and page blobs.

Standard and Premium Performance

Standard general-purpose v2 accounts are appropriate for most Blob Storage workloads and support the standard access tiers.

Premium block blob accounts provide low and consistent latency using high-performance storage. They do not use the hot, cool, cold, and archive tiering model in the same way. Moving data between premium and standard accounts generally requires copying it.

Choose premium based on measured low-latency and transaction-rate needs, not because data is important.

Storage Redundancy

Redundancy and access tier are separate decisions:

LRS: Replicates data within one datacenter region.
ZRS: Replicates across availability zones in a region.
GRS: Adds asynchronous replication to a paired secondary region.
RA-GRS: Adds read access to the GRS secondary.
GZRS: Combines zonal primary replication with geo-replication.
RA-GZRS: Adds read access to the GZRS secondary.

Hot, cool, and cold support all standard redundancy choices. Archive currently supports LRS, GRS, and RA-GRS, but not ZRS, GZRS, or RA-GZRS.

Geo-redundancy is not the same as backup or immutability. Replication can copy accidental deletion or corruption. Combine redundancy with versioning, soft delete, immutability, and tested recovery according to the threat model.

Hot Access Tier

Hot is an online tier optimized for frequent access and modification.

Use cases include:

Active application content.
Frequently viewed images and documents.
Processing inputs.
Newly ingested data before usage patterns are known.
Data with frequent reads or writes.

Hot has relatively high capacity cost but low read, write, and retrieval charges. It has no minimum recommended retention period.

Hot is not automatically cheapest for small frequently accessed objects because transaction and network costs also matter. Model the complete access pattern.

Cool Access Tier

Cool is an online tier for infrequently accessed data that still requires millisecond retrieval.

Typical use cases include:

Short-term backup.
Older documents with occasional access.
Infrequently downloaded media.
Data expected to remain for at least 30 days in a general-purpose v2 account.

Cool has lower capacity cost than hot but higher operation and retrieval costs. Deleting, overwriting, or moving a blob before the minimum period can trigger a prorated early-deletion charge.

Cold Access Tier

Cold is an online tier for rarely accessed data that must remain immediately available.

Typical use cases include:

Older backups with an immediate recovery requirement.
Historical datasets with rare interactive reads.
Long-lived records that cannot tolerate archive rehydration delay.
Data expected to remain for at least 90 days in a general-purpose v2 account.

Cold has lower capacity cost and higher access costs than cool. It retains online millisecond access, unlike archive.

Archive Access Tier

Archive is an offline tier for data that is rarely accessed and can tolerate retrieval latency measured in hours.

Typical use cases include:

Long-term backup.
Compliance archives.
Raw source data retained for future reprocessing.
Historical exports.
Secondary copies with low restore frequency.

While archived:

Blob data cannot be read or modified.
Metadata and properties remain available.
The blob must be rehydrated to hot, cool, or cold before data access.
Rehydration can take up to approximately 15 hours depending on priority and conditions.
A minimum recommended retention period of 180 days applies.

Archive is unsuitable when an RTO requires immediate access. A lower storage bill does not compensate for a recovery design that misses the business deadline.

Archive Rehydration

Two common rehydration approaches are:

Set Blob Tier: Changes the source blob from archive to an online tier.
Copy Blob: Creates an online copy while the source remains archived.

Copying is often useful because:

The archive source remains protected.
The restore can be isolated from the retained record.
It can avoid an early-deletion charge on the source when the minimum archive period has not elapsed.

The trade-off is temporary or ongoing storage for both source and destination.

Rehydration priority affects latency and cost. The application should:

Submit the restore request.
Track status asynchronously.
Avoid repeated restore submissions.
Notify users when data becomes available.
Validate restored data.
Clean up temporary online copies according to policy.

Lifecycle management cannot rehydrate archived blobs to an online tier. Rehydration is an explicit operational or application workflow.

Smart Tier

Smart tier automatically manages eligible block blobs across the hot, cool, and cold online capacity tiers based on access.

Current default behavior is:

New eligible data starts hot.
Data not accessed for 30 days moves to cool.
After 90 total days of inactivity, it moves to cold.
Access moves the object back to hot and restarts the cycle.

Smart tier:

Requires a supported standard general-purpose v2 account.
Requires zone-redundant storage such as ZRS, GZRS, or RA-GZRS.
Supports block blobs.
Does not use archive.
Is useful when online access patterns are unknown.
Charges monitoring for eligible managed objects while avoiding normal internal tier-transition and retrieval charges.

Small objects and account eligibility affect behavior and economics. Validate current regional support, SDK support, pricing, and object-size distribution before adoption.

Choose smart tier when automatic online optimization is valuable. Choose lifecycle rules when the organization needs custom ages, prefixes, tags, archive transitions, or explicit deletion schedules.

Default Account Tier and Explicit Blob Tier

A general-purpose v2 account has a default online access tier. A blob without an explicitly assigned tier inherits that default.

Important behavior:

The default can be hot, cool, cold, or smart where supported.
Archive cannot be the account default.
An upload can explicitly choose an eligible tier.
Changing the account default affects blobs whose tier is inferred.
Explicitly tiered blobs do not automatically follow later default changes.

Changing an account default can generate transaction and retrieval charges across many inferred-tier blobs. Inventory the affected objects and estimate cost before changing it.

Complete Cost Model

Blob cost is not only capacity. Include:

Data stored by tier.
Read, write, list, and other operations.
Retrieval per GB from cooler tiers.
Early-deletion charges.
Rehydration priority and restore operations.
Geo-replication transfer.
Internet or cross-region data transfer.
Versions and snapshots.
Soft-deleted data.
Lifecycle transactions.
Smart-tier monitoring.
Logging and inventory.

A cooler tier can cost more overall when data is read frequently, overwritten early, or stored as very large numbers of small objects.

Model scenarios with representative object sizes and operations:

Code

Monthly cost =
  capacity
  + write operations
  + read operations
  + data retrieval
  + replication
  + outbound transfer
  + retained versions and deleted data
  + management features

Early-Deletion Charges

In general-purpose v2 accounts, the minimum recommended durations are:

Cool: 30 days.
Cold: 90 days.
Archive: 180 days.

Deleting, overwriting, or moving a blob to another tier before its minimum period can trigger a prorated early-deletion charge.

An overwrite replaces the full object from a billing perspective. A workload that rewrites objects frequently is usually a poor candidate for a cooler tier.

Soft delete affects when data is considered deleted for early-deletion billing. Cost testing should include the configured retention period.

Metadata and Blob Index Tags

Metadata stores application-defined name-value information with a blob. Blob index tags provide indexed key-value properties that can be queried and used by lifecycle filters.

Use tags for policy and discovery attributes such as:

Data class.
Tenant.
Processing state.
Retention category.
Business date.

Do not put secrets or sensitive personal data in metadata or tags. They can have different authorization and indexing behavior than blob content.

Lifecycle rules should use stable business classifications rather than fragile filename conventions where possible.

Lifecycle Management

Lifecycle management is a free policy feature, although tiering and deletion operations incur normal transaction and storage charges.

Policies can:

Move current blob versions to cool, cold, or archive.
Move previous versions and snapshots.
Delete current versions, previous versions, or snapshots.
Filter by blob type.
Filter by name prefix.
Filter by blob index tags.
Use days since creation or modification.
Use days since last access when access-time tracking is enabled.

Lifecycle execution is asynchronous. It should not be treated as an exact-time job scheduler. Design compliance deadlines with sufficient margin and verify outcomes through inventory and monitoring.

Lifecycle Policy Example

The following example moves report blobs to cool after 30 days, archive after 180 days, and deletes them after seven years:

Code

{
  "rules": [
    {
      "enabled": true,
      "name": "manage-completed-reports",
      "type": "Lifecycle",
      "definition": {
        "filters": {
          "blobTypes": ["blockBlob"],
          "prefixMatch": ["reports/"],
          "blobIndexMatch": [
            {
              "name": "Status",
              "op": "==",
              "value": "Complete"
            }
          ]
        },
        "actions": {
          "baseBlob": {
            "tierToCool": {
              "daysAfterModificationGreaterThan": 30
            },
            "tierToArchive": {
              "daysAfterModificationGreaterThan": 180
            },
            "delete": {
              "daysAfterModificationGreaterThan": 2555
            }
          }
        }
      }
    }
  ]
}

Before deployment, test filters against inventory data. A broad prefix or missing tag condition can tier or delete far more data than intended.

Last-Access-Based Lifecycle Rules

Last-access tracking can support policies based on inactivity rather than age.

Considerations include:

Tracking must be enabled.
Not every metadata operation counts as data access.
Tracking updates are not necessarily immediate.
Access-based transitions can create repeated tier movement.
Reads from an application, scanner, or analytics job can reset the clock.

Last-access logic can improve cost optimization, but it must be tested with real client behavior.

Lifecycle Policies and Versions

When blob versioning is enabled, overwriting a blob creates a previous version. Lifecycle policies should manage:

Current base blobs.
Previous versions.
Snapshots.

Ignoring previous versions can cause hidden storage growth. Deleting the current blob does not necessarily remove retained versions or snapshots.

Data-protection retention and lifecycle deletion must be coordinated so the policy does not delete recovery copies earlier than intended or retain them indefinitely.

Soft Delete

Soft delete retains deleted blobs or containers for a configured period so they can be restored.

It protects against:

Accidental deletion.
Some application bugs.
Unintended overwrite scenarios when combined with versioning.

Soft delete is reversible protection. Privileged users can change its configuration, and retained data eventually expires. It is not WORM compliance.

Blob Versioning

Versioning preserves previous versions when a block blob is created or modified.

Benefits include:

Recovery from overwrites.
Historical state.
Support for version-level immutability.

Costs and risks include:

Additional capacity.
Many versions for frequently updated blobs.
More complex lifecycle policy.
Restore and inventory complexity.

Versioning is not appropriate for every high-frequency write pattern. Estimate version creation and configure lifecycle cleanup.

Point-in-Time Restore

Point-in-time restore can return block blob data to an earlier state within a configured window when its prerequisites are enabled.

It is useful for broad accidental changes, but it has feature compatibility constraints. Immutable storage is incompatible with point-in-time restore in the same supported configuration. Select controls according to the recovery and compliance requirements rather than enabling every feature.

Immutable Storage

Immutable storage protects blobs in a WORM state. During protection, authorized users and administrators cannot modify or delete the protected data through normal operations.

It supports:

Time-based retention: Protect for a known interval.
Legal hold: Protect until explicitly cleared.

Both can apply at the same time. Data can be deleted only after all applicable protections have ended.

Use cases include:

Regulatory records.
Audit logs.
Financial records.
Evidence.
Security logs.
Backup protection.
Business-critical documents.

Immutability does not make content correct. Validate data before locking it.

Time-Based Retention

A time-based policy protects data for a configured number of days. The effective expiration is calculated from object creation and the current retention interval.

Policies begin unlocked for testing. An unlocked policy protects data but can be shortened, extended, or deleted. A locked policy:

Cannot be deleted.
Cannot have its retention period shortened.
Can have retention extended within supported limits.
Is the state required for relevant regulatory compliance claims.

Locking is intentionally difficult to reverse. Test uploads, reads, expected application writes, legal procedures, and deletion behavior before locking.

Legal Holds

A legal hold protects data for an unknown duration until an authorized process explicitly clears the hold.

Use legal holds when retention depends on an event such as:

Litigation.
Investigation.
Audit.
Security incident.
Business preservation order.

Container-level legal holds use descriptive tags such as a case or event identifier. Removing one hold must not accidentally release data still covered by another requirement.

Container-Level and Version-Level WORM

Container-level WORM applies a policy to every blob in a container. It is appropriate when:

Data can be grouped by a common retention period.
All records in the container share the same policy.
Blob versioning is not desired.
Hierarchical namespace compatibility is required.

Version-level WORM supports finer policies at account, container, or blob-version scope. It requires versioning and is useful when:

Individual records need different retention periods.
The workload overwrites blob names but must preserve protected versions.
Account-wide default retention is appropriate.

Version-level WORM creates extra versions and cost. It is not currently supported with every account capability, including hierarchical namespace scenarios. Verify feature compatibility before account creation because some storage features cannot be changed easily later.

Immutability and Application Writes

An application that overwrites the same blob name can fail under container-level immutability because the existing object cannot be modified.

Possible designs include:

Write each record with a unique immutable name.
Use append blobs with explicitly supported protected append behavior.
Enable versioning and version-level WORM where supported.
Separate mutable working data from final immutable records.

Do not apply immutability to active virtual machine page blobs or mutable working files without understanding the write pattern.

Immutability and Access Tiers

All blob access tiers can support immutable data in eligible configurations. Tier changes can still occur using supported operations because tiering changes storage placement rather than blob content.

However:

Retention prevents deletion even when lifecycle policy requests it.
Archived immutable data still requires rehydration to read.
Early-deletion economics still matter after retention ends.
A lifecycle rule should not conflict with legal retrieval deadlines.

Compliance retention and cost lifecycle should be designed together.

Security and Network Access

Secure Blob Storage usually includes:

Microsoft Entra ID and managed identities.
Least-privilege data-plane RBAC.
User delegation SAS for delegated temporary access.
Disabled shared-key access where compatible.
Short SAS lifetimes and narrow permissions.
Private endpoints for private-access requirements.
Disabled or restricted public network access.
Encryption at rest and in transit.
Key management aligned with compliance.
Diagnostic logs and threat detection.

A private endpoint does not automatically disable public access. Validate private DNS and every operational path, then restrict the public endpoint.

Upload and Download Safety

Applications handling user files should:

Validate authorization before issuing upload or download access.
Restrict file size and expected content type.
Generate server-controlled blob names.
Avoid trusting file extensions.
Scan uploaded content before publication.
Store untrusted files in a quarantine container.
Prevent path or tenant confusion in blob naming.
Set safe response headers when serving files.
Avoid exposing broad account keys or container SAS tokens.

Blob Storage safely stores bytes; it does not decide whether a document is safe to execute or render.

Monitoring and Inventory

Monitor:

Capacity by tier.
Transactions and retrieval volume.
Egress.
Authentication failures.
Throttling and latency.
Lifecycle policy outcomes.
Archived data and restore requests.
Version and snapshot growth.
Soft-delete capacity.
Immutability policy status.
Public-access configuration.

Blob inventory can report blobs, versions, snapshots, tiers, metadata, and immutability state. Use it to validate lifecycle and compliance at scale.

Common Mistakes

Common mistakes include:

Moving data to archive without an acceptable restore time.
Treating cool or cold storage as offline.
Assuming lifecycle policy can rehydrate archive data.
Ignoring retrieval, transaction, and early-deletion costs.
Setting lifecycle deletion rules without testing filters.
Managing current blobs but forgetting versions and snapshots.
Enabling versioning without cleanup policies.
Treating soft delete as regulatory immutability.
Locking a WORM policy before testing the application.
Applying one retention period to mixed data with different legal requirements.
Enabling private endpoints while leaving public access open.
Using account keys or broad long-lived SAS tokens in applications.

Best-Practice Design Process

A practical process is:

Classify data by access frequency, retrieval time, retention, and legal requirements.
Select redundancy independently from access tier.
Model capacity, operation, retrieval, egress, and retained-copy costs.
Keep immediately required data in online tiers.
Use archive only with an asynchronous restore workflow and compatible RTO.
Use smart tier for uncertain online access patterns where supported.
Test lifecycle rules on a narrow prefix or test account.
Manage base blobs, versions, and snapshots explicitly.
Separate recovery features from compliance immutability.
Test and approve WORM policies before locking them.
Restrict network and identity access using least privilege.
Verify policy outcomes using monitoring and inventory.

Blob Storage, access tiers, lifecycle management, and immutability

Overview

Core Concepts

Storage Accounts, Containers, and Blobs

Standard and Premium Performance

Storage Redundancy

Hot Access Tier

Cool Access Tier

Cold Access Tier

Archive Access Tier

Archive Rehydration

Smart Tier

Default Account Tier and Explicit Blob Tier

Complete Cost Model

Early-Deletion Charges

Metadata and Blob Index Tags

Lifecycle Management

Lifecycle Policy Example

Last-Access-Based Lifecycle Rules

Lifecycle Policies and Versions

Soft Delete

Blob Versioning

Point-in-Time Restore

Immutable Storage

Time-Based Retention

Legal Holds

Container-Level and Version-Level WORM

Immutability and Application Writes

Immutability and Access Tiers

Security and Network Access

Upload and Download Safety

Monitoring and Inventory

Common Mistakes

Best-Practice Design Process

Interview Practice

Beginner Interview Practice

Intermediate Interview Practice

Advanced Interview Practice