Block blobs, append blobs, and page blobs Interview Questions

Overview

Azure Blob Storage supports three blob types:

Block blobs: Optimized for uploading, downloading, streaming, and managing ordinary files and large objects.
Append blobs: Optimized for adding new blocks to the end of an object.
Page blobs: Optimized for random reads and writes to fixed 512-byte pages.

The blob type is selected when the blob is created and cannot be changed in place. Each type has a different API and update model:

Code

Block blob  -> stage blocks, then commit a block list
Append blob -> append a new block to the end
Page blob   -> write aligned byte ranges in place

Most application files should be block blobs. Append blobs are specialized for append-only records such as logs. Page blobs primarily support virtual hard disks and other sparse random-access workloads.

For interviews, candidates should explain the operational differences, choose a type from the write pattern, discuss concurrency and integrity, and avoid using specialized blob types where ordinary block blobs are simpler.

Core Concepts

Blob Type Is Immutable

The blob type is set at creation. A block blob cannot later become an append or page blob through a metadata change.

To change type:

Create a new blob of the desired type.
Copy or transform the content.
Validate the result.
Update references.
Delete the old object when policy permits.

The selection should follow the mutation pattern, not the file extension.

Shared Blob Capabilities

All blob types support common object-storage concepts such as:

Unique names within a container.
HTTP properties and metadata.
ETags.
Conditional requests.
Leases.
Snapshots in supported scenarios.
Encryption.
Azure RBAC and SAS authorization.

The update operations and feature compatibility differ by blob type.

Strong Consistency and ETags

Committed changes are visible immediately. Every committed state has an ETag.

Use conditions such as:

Code

If-Match: "etag-value"

to prevent lost updates. A failed precondition tells the client that another writer changed the object.

Leases provide stronger exclusive-write coordination where needed, but applications must handle lease expiration, renewal, and abandoned owners.

Block Blobs

Block blobs are the default for:

Documents.
Images.
Videos.
Backups.
Exports.
Data lake objects.
Application packages.
User uploads.

They are composed of individually identified blocks. A client can:

Stage blocks in any order.
Upload blocks in parallel.
Retry failed blocks independently.
Commit an ordered block list atomically.

Only committed blocks form the visible blob content.

Single-Request Block Blob Upload

Small objects can be uploaded with one request. This is simple and reduces coordination overhead.

The SDK chooses single-request or staged upload according to payload size and transfer options. Do not hardcode assumptions based on an old SDK threshold.

Use a single upload when:

The object is small.
Network reliability is good.
Resumability is unnecessary.
Memory and request limits are acceptable.

Staged Block Upload

Large uploads use staged blocks:

Code

Stage block A
Stage block B
Stage block C
Commit [A, B, C]

Benefits include:

Parallel transfer.
Per-block retry.
Resume support.
Ordered final assembly.
Atomic visibility at commit.

Block IDs must have a consistent encoded length within one blob. Applications often derive them from a zero-padded sequence number.

Uncommitted Blocks

Staged blocks are uncommitted until included in a committed block list. They:

Do not form visible blob content.
Can be listed for resume logic.
Are discarded after the service retention window if never committed.
Can be replaced by uploading the same block ID.

The application should persist upload state or deterministically regenerate block IDs. After interruption, list uncommitted blocks and continue missing parts.

Atomic Block List Commit

Committing the block list creates or replaces the visible block blob as one operation. If a referenced block is missing, the commit fails and the previous committed blob remains unchanged.

Use conditional headers to avoid replacing an object modified by another writer.

Committing a new block list can also replace properties and metadata, so callers must set the intended final values.

Block Blob Size and Scale

Current REST service versions support up to 50,000 committed blocks, with modern maximum block sizes producing block blobs of approximately 190.7 TiB.

Practical limits are often reached earlier through:

Client memory.
Network throughput.
Request duration.
Application retry behavior.
Account throughput.
Downstream processing.

Use SDK transfer options and production-like tests rather than targeting theoretical maximums without measurement.

Parallelism

Parallel block upload improves throughput until constrained by:

Client CPU and memory.
Network bandwidth.
Storage account limits.
Server throttling.
Proxy limits.

Excessive concurrency causes throttling and unstable tail latency. Tune block size and concurrency together.

For example, larger blocks reduce request count but increase retry cost and memory use. Smaller blocks improve retry granularity but increase transactions.

Integrity Checks

Validate transport and content using supported checksums such as MD5 or CRC64 for transfer operations, plus an application-level checksum when long-term integrity or deduplication requires it.

A robust workflow stores:

Expected length.
Checksum algorithm.
Checksum value.
Upload completion state.
Content validation result.

Do not assume a successful HTTP response proves that the uploaded bytes represent a safe or expected file.

Block Blob Updates

A block blob is not a general random-write file. Updating content typically means:

Uploading a complete replacement.
Staging changed and reused blocks, then committing a new list.
Creating a new immutable version.

For application documents, immutable versioned blob names are often safer than in-place replacement.

Access Tiers and Block Blobs

Hot, cool, cold, archive, and smart tier behavior applies to eligible block blobs in supported accounts.

Block blobs are therefore appropriate for lifecycle-managed documents and archives. Append and page blobs do not use these access tiers in the same way.

Append Blobs

Append blobs are block-based but expose only append operations. A writer adds blocks to the end:

Code

Existing content
+ new append block
= longer content

Existing blocks cannot be updated or deleted individually. Append blob block IDs are managed by the service rather than exposed to the client.

Use cases include:

Append-only application logs.
Audit trails where storage-level immutability is separately designed.
Sequential telemetry batches.
Event records from controlled writers.

Append Blob Limits

An append block is limited to 4 MiB in the documented REST model, and an append blob supports up to 50,000 blocks, resulting in a maximum slightly above 195 GiB.

Applications should rotate append blobs before reaching limits:

Code

logs/2026/06/15/00.log
logs/2026/06/15/01.log

Rotation also improves lifecycle management, parallel processing, and fault isolation.

Append Concurrency

Multiple writers can race to append. Conditional append operations can enforce:

Expected append position.
Maximum blob size.
ETag conditions.
Lease ownership.

If strict global ordering is required across many producers, Blob Storage alone may not provide the desired event semantics. Use a messaging service or partitioned event log, then write batches to storage.

An append completing successfully means its bytes were added, but application-level ordering still depends on producer coordination.

Append Blobs Are Not a Message Broker

Append blobs lack features expected from a messaging system:

Consumer offsets.
Per-message acknowledgement.
Dead-letter queues.
Delivery attempts.
Ordered competing consumers.
Message locks.

Use Service Bus or Event Hubs for messaging. Persist logs or event archives to append or block blobs as a separate concern.

Page Blobs

Page blobs are sparse collections of 512-byte pages optimized for random reads and writes.

Creation specifies a maximum size. Writes:

Specify an offset and range.
Must align to 512-byte page boundaries.
Modify content in place.
Are committed immediately.

The maximum documented page blob size is 8 TiB.

Page Blob Use Cases

Page blobs are designed primarily for:

Virtual hard disks.
Azure VM disk backing.
Specialized random-access data.

Most application files should not use page blobs. A page blob adds alignment and sparse-file semantics that are unnecessary for documents, media, and ordinary uploads.

Managed disks abstract page blob management for most Azure VM scenarios. Applications should generally use managed disks rather than manipulating VHD page blobs directly.

Sparse Allocation

Only written pages consume storage capacity in the sparse representation, subject to service billing rules. Unwritten ranges read as zeros.

This makes page blobs useful for large logical files with allocated regions, such as virtual disks.

Clearing page ranges can deallocate them. Applications need careful concurrency because writes happen in place.

Page Ranges and Incremental Snapshots

Page-range APIs identify populated or changed ranges. Incremental snapshots can capture changes efficiently for supported disk and backup workflows.

These capabilities are specialized. Avoid rebuilding an application database or file system on page blobs unless the team is prepared to implement consistency, locking, indexing, and recovery.

Choosing a Blob Type

Workload	Blob type	Reason
PDF, image, video, ZIP	Block	Whole-object transfer and streaming
Large resumable upload	Block	Parallel staged blocks and commit
Lifecycle-managed archive	Block	Access tier support
Append-only log	Append	End-only writes
Virtual hard disk	Page	Random aligned page updates
Mutable JSON document	Block, often immutable versions	Simple replacement and versioning
Event queue	Neither	Use a messaging service

Start with block blobs unless the workload clearly requires append-only or random page writes.

Leases

A blob lease grants exclusive write or delete access to a lease holder for a duration or indefinitely until released.

Use leases for:

Coordinating one writer.
Preventing deletion during processing.
Leadership for a simple storage-based process.

Do not use a lease as the only business lock when:

Work can outlive the lease.
Network partitions are possible.
Database state also changes.
Fencing tokens are required.

Lease loss must cause the worker to stop or validate ownership before committing.

Snapshots and Versions

A snapshot is a read-only point-in-time copy. Versioning automatically creates versions for supported block blob changes.

Use them for recovery, but account for:

Additional storage.
Lifecycle cleanup.
Authorization.
Interactions with immutability.

Snapshots and versions do not replace tested backup and disaster-recovery procedures.

Idempotency

Blob operations should be retry-safe.

Patterns include:

Deterministic blob names from operation IDs.
If-None-Match: * for create-only upload.
Stable block IDs for resumable transfer.
ETags for update-if-current.
Append position conditions.
Persisted upload state.

Without idempotency, a retry can create duplicate objects, overwrite newer data, or append the same record twice.

Security

All blob types use the same core authorization approaches:

Microsoft Entra ID and Azure RBAC.
Managed identity.
SAS.
Shared Key where still permitted.

Use private endpoints, TLS, restricted public network access, and data-plane least privilege. Blob type does not alter the need for authorization and content validation.

Monitoring

Monitor:

Request latency.
Success and error status codes.
Throttling.
Ingress and egress.
Transaction count.
Capacity.
Uncommitted block accumulation.
Append failures and blob rotation.
Page write patterns.
Authentication failures.

Application metrics should track upload completion, checksum failures, retry count, and orphan cleanup.

Common Mistakes

Common mistakes include:

Using append blobs for ordinary files.
Using page blobs for application documents.
Assuming append blobs provide message-queue semantics.
Creating random block IDs that cannot support resume.
Retrying append without duplicate protection.
Ignoring ETags during concurrent replacement.
Using excessive parallel upload concurrency.
Failing to commit or clean uncommitted blocks.
Applying block-blob access tier assumptions to append or page blobs.
Editing mutable content in place when immutable versions are safer.
Manipulating page blobs instead of using managed disks.

Best-Practice Selection Checklist

A production design should normally:

Choose block blobs by default.
Use append blobs only for controlled append-only workloads.
Use page blobs only for genuine random-write or VHD scenarios.
Define idempotent names and conditional writes.
Use staged blocks for large resumable uploads.
Tune block size and concurrency using realistic tests.
Validate checksums and content.
Rotate append blobs before limits.
Use messaging services for delivery semantics.
Apply ETags or leases where writers compete.
Monitor retries, throttling, and incomplete uploads.
Secure every type with identity and network controls.

Block blobs, append blobs, and page blobs

Overview

Core Concepts

Blob Type Is Immutable

Shared Blob Capabilities

Strong Consistency and ETags

Block Blobs

Single-Request Block Blob Upload

Staged Block Upload

Uncommitted Blocks

Atomic Block List Commit

Block Blob Size and Scale

Parallelism

Integrity Checks

Block Blob Updates

Access Tiers and Block Blobs

Append Blobs

Append Blob Limits

Append Concurrency

Append Blobs Are Not a Message Broker

Page Blobs

Page Blob Use Cases

Sparse Allocation

Page Ranges and Incremental Snapshots

Choosing a Blob Type

Leases

Snapshots and Versions

Idempotency

Security

Monitoring

Common Mistakes

Best-Practice Selection Checklist

Interview Practice

Beginner Interview Practice

Intermediate Interview Practice

Advanced Interview Practice