DEV_NET_CORE
GET_STARTED
AzureAzure Blob Storage and file handling

Large file uploads, block upload, retry behavior, checksums, and resumable upload patterns

Overview

Large file upload design is a distributed workflow, not merely a call to UploadAsync. Networks fail, browsers close, processes restart, requests time out after the server may have accepted data, and several clients can target the same object.

Azure block blobs support reliable large uploads by dividing a file into blocks:

  1. Generate a stable upload ID and target blob name.
  2. Split the content into numbered blocks.
  3. Stage blocks independently, often in parallel.
  4. Retry only failed blocks.
  5. Persist upload progress outside process memory.
  6. Commit an ordered block list.
  7. Verify final length and checksum.
  8. Mark the business record complete.

Until the block list is committed, staged blocks do not form the visible blob. This provides a useful atomic publication boundary.

For interviews, candidates should distinguish:

  • SDK request retries from resumability across application restarts.
  • Transfer checksums from a whole-file business checksum.
  • A timeout from a known failed write.
  • Parallelism from uncontrolled concurrency.
  • Upload completion from content validation and publication.
  • Server-proxied upload from direct-to-Blob browser upload.

Core Concepts

Why Large Uploads Fail Differently

Large uploads are exposed to:

  • Wi-Fi and mobile-network changes.
  • Reverse-proxy request limits.
  • Application restarts.
  • Browser tab closure.
  • Authentication token expiry.
  • Storage throttling.
  • Client memory pressure.
  • Long request timeouts.
  • Duplicate retries.
  • Concurrent upload attempts.

One long HTTP request forces the entire transfer to restart after a failure. Block upload limits the retry scope to one block.

Single-Request Versus Block Upload

The .NET SDK can upload a small block blob with one Put Blob request. For larger data it can stage blocks and commit them automatically according to StorageTransferOptions.

Use high-level UploadAsync when:

  • Resume is needed only within the current SDK operation.
  • Default or tuned SDK partitioning is sufficient.
  • The process is expected to remain alive.
  • The application does not need to persist per-block state.

Use explicit StageBlockAsync and CommitBlockListAsync when:

  • Uploads must resume after process or browser restart.
  • The application must display durable progress.
  • Blocks are uploaded by several workers or clients.
  • The server must validate an expected block manifest.
  • A business workflow needs explicit staged and committed states.

Block Blob Model

A block blob can contain up to 50,000 committed blocks. Modern service versions support very large blocks, producing a theoretical maximum near 190.7 TiB.

Practical limits usually come from:

  • Available memory.
  • Network bandwidth.
  • Client timeout.
  • Transaction cost.
  • Account throughput.
  • User experience.
  • Downstream scanning and processing.

Do not choose block size solely from the service maximum.

Deterministic Block IDs

Block IDs are Base64-encoded strings and must have equal encoded length within a blob.

A deterministic scheme:

Code
static string CreateBlockId(int index) =>
    Convert.ToBase64String(
        Encoding.UTF8.GetBytes(index.ToString("D8")));

This produces stable IDs:

Code
00000000
00000001
00000002

Deterministic IDs allow a restarted client to identify which blocks are already staged. Random GUID block IDs make reconciliation harder unless every ID is persisted.

Upload Session Record

Persist an upload session in a durable database:

Code
UploadId
TenantId
BlobName
ExpectedLength
BlockSize
ExpectedBlockCount
WholeFileChecksum
ChecksumAlgorithm
Status
CreatedAt
ExpiresAt

Optional per-block state can include:

Code
UploadId
BlockIndex
BlockId
Length
Checksum
Status
AttemptCount

The record prevents the upload from depending on one application process.

Server-Controlled Blob Names

Generate blob names on the trusted server:

Code
quarantine/{tenant-id}/{upload-id}

Do not let a browser choose an arbitrary final path. Server-controlled names prevent:

  • Tenant path confusion.
  • Overwriting another user's file.
  • Unsafe filenames.
  • Guessable business identifiers.
  • Mutable display names becoming storage identity.

Store the original filename separately after validation.

Upload State Machine

Useful states include:

Code
Pending
Uploading
Committed
Scanning
Available
Rejected
Expired
Failed

State transitions should be conditional. For example, only the expected upload owner can move Uploading to Committed.

Blob existence alone should not mean the file is safe to expose.

Tuning StorageTransferOptions

The .NET SDK exposes:

  • InitialTransferSize
  • MaximumTransferSize
  • MaximumConcurrency

Example:

Code
var options = new BlobUploadOptions
{
    TransferOptions = new StorageTransferOptions
    {
        InitialTransferSize = 16 * 1024 * 1024,
        MaximumTransferSize = 8 * 1024 * 1024,
        MaximumConcurrency = 4
    }
};

await blobClient.UploadAsync(stream, options, cancellationToken);

These values are environment-specific, not universal recommendations.

Initial Transfer Size

For a seekable stream, a blob smaller than InitialTransferSize can be uploaded in one request. A larger stream is partitioned into subtransfers.

A larger initial request:

  • Reduces request overhead for medium files.
  • Costs more to retry.
  • Requires a stable connection for longer.

For unreliable networks, smaller requests can be more robust.

Maximum Transfer Size

MaximumTransferSize controls the maximum subtransfer size.

Larger blocks:

  • Reduce transaction count.
  • Improve throughput on fast networks.
  • Increase retry cost.
  • Increase buffering needs for non-seekable streams.

Smaller blocks:

  • Improve retry granularity.
  • Increase request count and block-list size.
  • Can reduce peak buffering.

Always ensure:

Code
ceil(file length / block size) <= 50,000

Maximum Concurrency

Concurrency allows several blocks to upload in parallel.

Benefits:

  • Better bandwidth utilization.
  • Lower total upload duration.

Risks:

  • Memory pressure.
  • Client connection exhaustion.
  • Storage throttling.
  • Unstable tail latency.
  • Mobile-device resource usage.

Increase concurrency only while measured throughput improves. Use asynchronous methods because synchronous SDK operations do not parallelize transfer workers in the same way.

Seekable and Non-Seekable Streams

A seekable stream can move its position and supports easier retry. A non-seekable stream cannot replay bytes after they are consumed.

The SDK buffers individual subtransfers from non-seekable streams to make request retries possible. Large transfer sizes and high concurrency can therefore multiply memory use:

Code
Approximate buffering
= block size x active workers

For large server uploads, prefer a seekable file stream or explicit staging rather than buffering the entire request body.

SDK Retry Behavior

Azure SDK clients use bounded retry policies for transient errors such as:

  • Network failures.
  • HTTP 408.
  • HTTP 429.
  • Selected 5xx responses.

Retries should use exponential backoff with jitter and respect server guidance such as Retry-After.

Avoid:

  • Infinite retries.
  • Retrying authorization failures.
  • Retrying checksum mismatch without rereading the source.
  • Immediate parallel retry storms.
  • Extending retries beyond the user or request deadline.

The SDK retries individual requests. It does not automatically preserve a manual upload workflow across process termination.

Request Retry Versus Workflow Resume

These are separate layers:

Request retry

  • Repeats one failed HTTP operation.
  • Usually handled by the SDK.
  • Lives within the current process.

Workflow resume

  • Continues a multi-block upload after process or browser restart.
  • Requires stable upload ID and block IDs.
  • Requires persisted state or listing uncommitted blocks.
  • Must refresh expired authorization.

A reliable system implements both.

Unknown Outcomes

A timeout does not prove that a write failed. The service might have accepted the block but the response was lost.

After an uncertain result:

  1. Query staged blocks or blob properties.
  2. Compare the deterministic block ID, expected length, ETag, or checksum.
  3. Treat the operation as successful if the intended state exists.
  4. Retry only if it does not.

This is idempotent recovery.

Listing Uncommitted Blocks

A resumed block upload can request the uncommitted block list and compare it with the expected manifest.

The client should still verify:

  • Block ID.
  • Expected length.
  • Optional block checksum.
  • Upload ownership.

Do not commit every uncommitted block found under an attacker-controlled or reused blob name. Commit only the server-approved ordered manifest.

Atomic Commit

CommitBlockListAsync defines the visible blob content from the ordered IDs.

If a referenced block is missing, commit fails and the previous committed blob remains unchanged.

Use conditions:

Code
var commitOptions = new CommitBlockListOptions
{
    Conditions = new BlobRequestConditions
    {
        IfNoneMatch = ETag.All
    }
};

Create-only semantics prevent accidental overwrite. For intended replacement, use If-Match with the expected ETag.

Transfer Checksums

The .NET SDK supports upload transfer validation using:

  • Automatic algorithm selection.
  • MD5.
  • Storage CRC64.

Example:

Code
var options = new BlobUploadOptions
{
    TransferValidation = new UploadTransferValidationOptions
    {
        ChecksumAlgorithm = StorageChecksumAlgorithm.Auto
    }
};

await blobClient.UploadAsync(stream, options, cancellationToken);

The service computes a checksum for the request payload and rejects a mismatch.

Checksum Mismatch

A checksum mismatch indicates that the transmitted bytes do not match the supplied checksum. It normally produces a 400 response and is not treated as a transient failure by the default retry policy.

The application should:

  • Reopen or rewind the source.
  • Recalculate the checksum.
  • Investigate source mutation.
  • Avoid repeatedly resending corrupted buffered data.

Do not catch every failure and retry it as transient.

Whole-File Checksums

Per-request checksums validate blocks in transit. They do not necessarily prove that the final object is the exact business file expected by the application.

Calculate a whole-file digest such as SHA-256:

Code
static async Task<string> ComputeSha256Async(
    Stream stream,
    CancellationToken cancellationToken)
{
    using var sha256 = SHA256.Create();
    var hash = await sha256.ComputeHashAsync(stream, cancellationToken);
    return Convert.ToHexString(hash);
}

Store the expected digest in the authoritative upload record. After commit, verify the final object or trusted manifest.

SHA-256 is useful for identity and tamper detection. MD5 and CRC64 are primarily transport-integrity mechanisms in this context.

Source Mutation During Upload

If a local file changes during upload, blocks can come from different file states.

Mitigations:

  • Open the file with restrictive sharing.
  • Record size and modification time.
  • Compute a whole-file digest.
  • Upload from an immutable temporary copy.
  • Reject the commit if source state changed.

For browser File objects, treat the selected object as the upload source and calculate a digest if the workflow requires it.

Retry-Safe Block Staging

Staging the same bytes under the same block ID is idempotent. The later block replaces the earlier uncommitted block for that ID.

This makes deterministic block IDs useful. A retry does not add duplicate content when the final block list includes each ID once.

The commit request itself should also be retry-safe because the same ordered list produces the same blob content, subject to conditions and metadata.

Browser Resumability

For direct browser upload:

  1. API creates an upload session.
  2. API issues a short user delegation SAS.
  3. Browser splits the File into chunks.
  4. Browser stages blocks and persists local progress.
  5. On restart, browser asks the API to resume.
  6. API reauthorizes and issues a fresh SAS.
  7. Browser compares expected and staged blocks.
  8. Browser stages missing blocks.
  9. API or browser commits the approved manifest.

Browser storage should not be the only source of truth. The server must know the upload owner, target, expected size, and expiry.

SAS Expiry During Upload

A long transfer can outlive its SAS.

Do not issue a day-long broad SAS. Instead:

  • Choose a reasonable short lifetime for expected blocks.
  • Allow the authenticated client to request a replacement.
  • Keep the same upload ID and blob name.
  • Recheck business authorization before renewal.

Already staged blocks remain available until their storage retention window; a new valid token can continue the workflow.

Server Proxy Versus Direct Upload

Server-proxied upload

  • Simpler browser authorization.
  • Allows inline validation.
  • Consumes application bandwidth and compute.
  • Can hit request-size and timeout limits.

Direct-to-Blob upload

  • Removes large payloads from the API.
  • Scales storage transfer independently.
  • Requires SAS issuance, CORS, and completion validation.
  • Exposes the storage endpoint to the browser.

Use direct upload for large files when the security workflow is designed correctly.

CORS

Browser direct upload requires Blob service CORS configuration for:

  • Approved origins.
  • Required methods.
  • Required request headers.
  • Exposed response headers.
  • Preflight cache duration.

CORS is browser enforcement, not storage authorization. A non-browser client can ignore it. The SAS or identity still controls access.

Avoid wildcard origins for authenticated upload applications.

Quarantine and Publication

Upload untrusted data to a private quarantine location.

After commit:

  • Verify expected size.
  • Verify checksum.
  • Inspect content signature, not only extension.
  • Scan for malware.
  • Enforce file-type and decompression limits.
  • Extract metadata safely.
  • Move or copy to an approved location.
  • Mark the business record available.

A successful upload only proves that bytes reached storage.

Idempotent Completion

The completion endpoint should be safe to call repeatedly.

It can:

  • Load the upload session.
  • Return success if already completed.
  • Verify the committed blob matches the expected manifest.
  • Transition state using a concurrency token.
  • Publish one outbox event.

This avoids duplicate processing when a client retries after losing the completion response.

Cancellation and Deadlines

Pass cancellation tokens to SDK calls. Stopping abandoned work reduces:

  • Wasted bandwidth.
  • Application resources.
  • User confusion.

Cancellation does not guarantee the service did not accept the last request. Reconcile state before continuing or deleting staged data.

Cleanup

Clean up:

  • Expired upload sessions.
  • Committed but rejected quarantine blobs.
  • Unreferenced completed blobs.
  • Temporary server files.

Uncommitted blocks expire automatically after the service retention window, but application records and any placeholder objects require explicit cleanup.

Use a grace period so an active slow upload is not deleted.

Observability

Track:

  • Upload size.
  • Duration.
  • Throughput.
  • Block count and size.
  • Retry count.
  • Failure status and error code.
  • Resume count.
  • Checksum mismatch.
  • Commit latency.
  • Scan duration.
  • Abandoned sessions.

Use a correlation ID across API, browser telemetry, storage operations, and processing workers. Never log SAS query strings.

Common Mistakes

Common mistakes include:

  • Uploading multi-gigabyte files through one API request.
  • Assuming SDK retry provides restart-safe resume.
  • Generating random block IDs without persisting them.
  • Retrying every error.
  • Ignoring unknown outcomes after timeouts.
  • Using too much concurrency.
  • Buffering whole files in application memory.
  • Trusting file extensions or browser content types.
  • Publishing immediately after commit.
  • Issuing broad, long-lived container SAS.
  • Allowing unconditional overwrite.
  • Treating per-block checksums as a complete malware or authenticity check.

Best-Practice Upload Checklist

A production design should normally:

  • Use block blobs.
  • Generate upload and blob IDs on the server.
  • Persist upload state.
  • Use deterministic equal-length block IDs.
  • Tune block size and concurrency through realistic tests.
  • Use bounded transient retries with jitter.
  • Reconcile unknown outcomes.
  • Validate transfer and whole-file checksums.
  • Commit with create-only or ETag conditions.
  • Use short per-object user delegation SAS for browser uploads.
  • Separate upload, validation, and publication.
  • Make completion idempotent.
  • Clean expired and rejected data.
  • Monitor end-to-end transfer and processing.

Interview Practice

PreviousBlock blobs, append blobs, and page blobsNext UpLifecycle management, soft delete, versioning, immutability, encryption, and retention