DEV_NET_CORE
GET_STARTED
AzureAzure data, storage, and caching services

Azure SQL Database tiers, scaling, serverless options, and failover groups

Overview

Azure SQL Database is a fully managed relational database service based on the SQL Server engine. Microsoft operates the infrastructure, database engine patching, automated backups, and built-in high availability, while the customer remains responsible for schema design, query performance, security configuration, data access, capacity choices, and disaster-recovery planning.

Choosing an Azure SQL Database configuration involves several independent decisions:

  • Purchasing model: DTU or vCore.
  • Service tier: Basic, Standard, and Premium in the DTU model, or General Purpose, Business Critical, and Hyperscale in the vCore model.
  • Compute tier: Provisioned or serverless where supported.
  • Deployment model: Single database or elastic pool.
  • Availability design: Local high availability, zone redundancy, backups, geo-replication, and failover groups.
  • Scaling strategy: Query tuning, vertical compute scaling, storage scaling, read scale-out, elastic pooling, partitioning, or sharding.

There is no universally best tier. The correct choice depends on:

  • CPU, memory, data I/O, and transaction-log requirements.
  • Database size and growth.
  • Read/write ratio.
  • Latency sensitivity.
  • Usage predictability.
  • Recovery point objective, or RPO.
  • Recovery time objective, or RTO.
  • Region and zone resilience.
  • Feature requirements.
  • Cost constraints.

For interviews, candidates should be able to explain why a workload belongs in General Purpose, Business Critical, Hyperscale, serverless, or an elastic pool. They should also understand that changing compute can briefly interrupt connections, that failover groups use asynchronous geo-replication, and that application retry logic and end-to-end disaster-recovery testing remain necessary.

Core Concepts

Azure SQL Database as a PaaS Service

Azure SQL Database provides database-as-a-service capabilities. The service manages:

  • Database engine installation and patching.
  • Infrastructure replacement.
  • Automated backups.
  • Built-in high availability.
  • Monitoring integration.
  • Point-in-time restore capabilities.
  • Service-level resource governance.

The customer manages:

  • Tables, indexes, constraints, and stored procedures.
  • Query design and performance.
  • Authentication and authorization.
  • Network exposure.
  • Encryption and auditing configuration.
  • Retention and disaster-recovery requirements.
  • Application connection and retry behavior.

Azure SQL Database is not the same as running SQL Server on a virtual machine. It reduces infrastructure ownership but also limits operating-system access and some instance-level features. If an application requires full SQL Server instance control, unsupported extensions, or operating-system access, SQL Server on Azure Virtual Machines or Azure SQL Managed Instance might be a better fit.

Logical Servers

A logical server is a management boundary for Azure SQL databases. It provides:

  • A server-level DNS name.
  • Firewall and networking configuration.
  • Microsoft Entra administration.
  • Auditing and threat-protection settings.
  • A scope for databases, elastic pools, and failover groups.

It is not a customer-managed virtual machine or a traditional SQL Server instance. Databases on the same logical server do not automatically share compute unless they belong to the same elastic pool.

DTU Purchasing Model

A database transaction unit, or DTU, is a bundled measure of:

  • CPU.
  • Memory.
  • Data reads and writes.
  • Transaction-log throughput.

The DTU model offers:

  • Basic: Small and infrequently used workloads.
  • Standard: General workloads with moderate performance needs.
  • Premium: Higher I/O performance and features for demanding workloads.

DTUs simplify purchasing because the resource mix is packaged into a service objective. The trade-off is less transparency and less control over the individual resource dimensions.

DTU utilization is constrained by the busiest governed dimension. A database can appear CPU-light but still be throttled by data I/O or log-write limits.

Useful indicators include:

Code
SELECT
    end_time,
    avg_cpu_percent,
    avg_data_io_percent,
    avg_log_write_percent,
    max_worker_percent,
    max_session_percent
FROM sys.dm_db_resource_stats
ORDER BY end_time DESC;

Scaling DTUs can address a genuine resource ceiling, but it should not replace query and index analysis.

vCore Purchasing Model

A virtual core, or vCore, represents a logical CPU. The vCore model exposes resource choices more directly:

  • Number of vCores.
  • Memory characteristics.
  • Hardware family where supported.
  • Compute tier.
  • Storage allocation and performance characteristics.
  • Service tier.

The vCore model is often preferred when teams need:

  • Easier mapping from an existing SQL Server workload.
  • Greater resource transparency.
  • Azure Hybrid Benefit.
  • Reserved capacity pricing.
  • Serverless compute.
  • Hyperscale.
  • More granular hardware and compute choices.

The vCore model does not eliminate performance testing. Two workloads with the same CPU requirement can have very different memory, log, or I/O behavior.

DTU Versus vCore

ConcernDTU modelvCore model
Resource expressionBundled DTU or eDTUvCores, memory, storage, hardware
SimplicityHigherMore explicit choices
ServerlessNot supportedSupported where available
HyperscaleNot supportedSupported
Azure Hybrid BenefitNot availableAvailable where eligible
Best fitSimple sizing and existing DTU workloadsTransparent sizing and advanced options

Do not choose solely by comparing a DTU count with a vCore count. Benchmark the real workload and measure CPU, memory pressure, data I/O, log rate, workers, sessions, latency, and storage growth.

General Purpose Tier

General Purpose is the default vCore tier for many business workloads. It separates compute from remote storage.

Typical characteristics include:

  • Balanced cost and performance.
  • Remote premium storage.
  • Built-in high availability.
  • Optional zone redundancy in supported configurations.
  • Provisioned and serverless compute options.
  • Storage latency suitable for many transactional applications.

General Purpose is a strong starting point when:

  • The workload does not require consistently very low storage latency.
  • Database size is within supported limits.
  • A built-in readable secondary is not a core requirement.
  • Cost efficiency matters more than maximum I/O performance.

A compute failover can start with a cold cache because the replacement compute node must rebuild memory state from storage. Applications should tolerate transient connection failures.

Business Critical Tier

Business Critical uses multiple database engine replicas with local SSD storage. It targets workloads that require:

  • Low and predictable I/O latency.
  • High transaction throughput.
  • Fast local high-availability failover.
  • A built-in read-only replica for read scale-out.
  • Features such as In-Memory OLTP where applicable.
  • Stronger resilience characteristics.

Business Critical costs more because it maintains multiple replicas and uses local high-performance storage. It is not justified simply because an application is "important." The workload should have measurable latency, throughput, availability, or feature requirements that General Purpose cannot meet cost-effectively.

Hyperscale Tier

Hyperscale uses a cloud-native architecture that separates compute, log processing, page serving, and storage. It is designed for:

  • Very large databases.
  • Rapid storage growth.
  • Fast backup and restore behavior independent of traditional full database copies.
  • Fast compute scaling.
  • Multiple read scale-out replicas.
  • Workloads that benefit from independently scalable compute and storage.

Current supported configurations can reach very large database sizes, including up to 128 TB for supported single-database scenarios. Exact limits depend on configuration and evolve, so architecture decisions should verify current regional and hardware constraints.

Hyperscale can provide:

  • High-availability replicas selected according to resilience needs.
  • Named replicas with independently sized compute for read workloads.
  • Geo-replicas for regional resilience.
  • Serverless compute in supported configurations.

Hyperscale is not only for enormous databases, but it adds architectural and cost considerations. Evaluate:

  • Replica cost.
  • Migration and reverse-migration constraints.
  • Feature compatibility.
  • Backup retention behavior.
  • Geo-replication design.
  • Workload sensitivity to its distributed storage architecture.

Provisioned Compute

Provisioned compute reserves a configured compute size continuously, independent of actual utilization. It is a good fit for:

  • Predictable workloads.
  • Sustained utilization.
  • Low tolerance for warm-up delay.
  • Stable performance requirements.
  • Elastic pools.

Provisioned compute is billed according to the capacity provisioned. Cost optimization usually involves right-sizing, reservations, Azure Hybrid Benefit where eligible, and scaling schedules for predictable environments.

Serverless Compute

Serverless automatically scales compute within configured minimum and maximum vCore limits. Billing reflects compute used per second, subject to the configured minimum, plus storage costs.

Serverless is well suited to:

  • Intermittent single-database workloads.
  • Unpredictable bursts.
  • Development or low-duty-cycle applications.
  • New applications without reliable sizing history.

Important configuration includes:

  • Minimum vCores.
  • Maximum vCores.
  • Auto-pause delay where supported.
  • Whether auto-pause is enabled.

Auto-pause and auto-resume are currently supported for serverless databases in General Purpose, not Hyperscale. Hyperscale serverless can autoscale compute but does not provide the same auto-pause behavior.

When a General Purpose serverless database is paused:

  • Compute billing stops.
  • Storage billing continues.
  • The next connection triggers resume.
  • The application can experience resume latency.
  • Memory caches must warm again.

Features such as active geo-replication, failover groups, and some retention or scheduling features prevent auto-pause even though autoscaling can still operate.

Open sessions are a common reason a database does not pause. Connection pools, health checks, monitoring, and administrative tools can generate enough activity to keep it online.

Serverless Trade-Offs

Serverless is not automatically cheaper. It can cost more than provisioned compute when:

  • Average utilization is consistently high.
  • The configured minimum is too large.
  • Background activity prevents pause.
  • Frequent cache reclamation increases query work.
  • Resume latency causes timeouts or poor user experience.

Before choosing serverless, model:

Code
Compute cost while active
+ minimum compute floor
+ storage
+ backup storage
+ expected active hours
+ operational impact of warm-up

Load testing should include first-request behavior after inactivity, not only steady-state throughput.

Elastic Pools

An elastic pool lets multiple databases share a configured pool of compute and storage resources. Each database can consume resources within per-database minimum and maximum limits.

Elastic pools are effective for:

  • Many tenant databases.
  • Low average utilization per database.
  • Usage spikes that occur at different times.
  • A need for predictable aggregate cost.

Pools are a poor fit when:

  • Most databases are busy at the same time.
  • One database has consistently high utilization.
  • Tenants require strict dedicated performance.
  • Aggregate log or I/O demand regularly reaches pool limits.

The main risk is the noisy-neighbor effect. Per-database maximums protect the pool, while minimums reserve capacity but can reduce consolidation efficiency.

Monitor both pool-level and database-level metrics. A database may be throttled by its own maximum even when the pool has capacity, or the pool may be saturated while individual databases appear moderate.

Vertical Scaling

Vertical scaling changes the service objective, tier, or compute size. It can increase or decrease:

  • CPU.
  • Memory.
  • I/O capacity.
  • Log throughput.
  • Worker and session limits.
  • Storage capabilities.

Azure performs most of the scale operation online, but the final switchover can terminate open connections and roll back uncommitted transactions. Applications need transient-fault retry logic.

A safe scaling plan includes:

  1. Identify the actual bottleneck.
  2. Validate the target tier supports required features.
  3. Check regional capacity and quotas.
  4. Avoid unnecessary long-running transactions.
  5. Scale during a controlled window when risk warrants it.
  6. Monitor connection failures and performance.
  7. Keep a rollback or scale-down plan.

Scaling is not a substitute for fixing scans, missing indexes, excessive chatty queries, or poor transaction design.

Application Retry Logic

Database connections can be interrupted by scaling, maintenance, failover, throttling, or transient network conditions. Retry logic should:

  • Retry only transient failures.
  • Use bounded exponential backoff with jitter.
  • Open a new connection for the retry.
  • Limit total attempts and elapsed time.
  • Preserve cancellation and request deadlines.
  • Avoid automatically replaying unsafe business operations.

For a transaction that fails after an uncertain commit outcome, the application needs idempotency or reconciliation. Blindly retrying a payment or order operation can create duplicates.

Storage Scaling

Storage decisions include:

  • Maximum configured data size.
  • Actual allocated storage.
  • Data and log growth.
  • tempdb limits.
  • Backup storage.
  • Long-term retention.

In General Purpose and Business Critical, billing and performance behavior can depend on configured maximum storage. In Hyperscale, storage grows automatically and is billed based on allocated data storage.

Monitor both used and allocated space. Deleting rows does not necessarily return allocated file space immediately, and shrinking files can be disruptive and should not be routine maintenance.

Read Scale-Out

Read-heavy workloads can be scaled by directing read-only queries to replicas:

  • Business Critical provides a readable secondary.
  • Hyperscale supports high-availability and named replicas.
  • Failover groups provide a read-only listener for the geo-secondary.

Read replicas are eventually consistent with the primary to varying degrees. Do not send operations that require immediate read-after-write consistency to a lagging geo-secondary.

Connection strings commonly signal read intent:

Code
ApplicationIntent=ReadOnly

The application should separate read-only and read-write data-access paths explicitly.

Scaling Beyond One Database

If one database cannot meet the workload cost-effectively, options include:

  • Partitioning large tables.
  • Archiving old data.
  • Caching repeated reads.
  • Read replicas.
  • Tenant-per-database architecture.
  • Sharding by tenant, geography, or another stable key.
  • Moving analytical workloads to a separate platform.

Sharding adds routing, cross-shard query, transaction, schema deployment, and rebalancing complexity. It should follow measured limits, not speculative scale concerns.

Built-In High Availability Versus Disaster Recovery

Built-in high availability protects against local infrastructure failures. Zone redundancy improves resilience to availability-zone failures. Neither automatically provides complete recovery from a regional outage.

Disaster recovery may require:

  • Active geo-replication or failover groups.
  • Application deployment in another region.
  • Replicated storage and messaging services.
  • Regional networking and DNS.
  • Secrets and configuration.
  • Tested operational procedures.

Backups protect against corruption, accidental deletion, and point-in-time recovery. They do not provide the same RTO as a warm geo-secondary.

Failover Groups

A failover group manages geo-replication and failover for one or more databases between logical servers in different Azure regions.

It provides stable DNS listener names:

  • Read-write listener: Routes to the current primary.
  • Read-only listener: Routes read workloads to the configured secondary.

After failover, Azure updates listener DNS records. Applications still need:

  • Connection retry logic.
  • Sensible DNS caching.
  • Region-ready application components.
  • Network access to both logical servers.
  • Credentials and identities that work in both regions.

Failover groups are a convenience over active geo-replication. They do not make the entire application multi-region.

Planned and Forced Failover

A planned failover synchronizes the databases before switching roles and is intended to avoid data loss when both sides are reachable.

A forced failover promotes the secondary without waiting for full synchronization. It is used when the primary is unavailable and can lose recent transactions because geo-replication is asynchronous.

This distinction connects directly to:

  • RPO: Maximum acceptable data loss.
  • RTO: Maximum acceptable recovery time.

If the business requires zero data loss during a regional disaster, asynchronous cross-region replication may not satisfy that requirement. The architecture or business process must address the gap explicitly.

Customer-Managed and Microsoft-Managed Failover Policies

Customer-managed failover is the recommended policy for most designs because the organization controls when to fail over after evaluating application health and dependencies.

Microsoft-managed failover applies to widespread regional events and can occur only after its configured grace period and service evaluation. It is not a fast per-application health mechanism. The application must tolerate an extended outage and potential data loss if this model is selected.

Disaster-recovery automation should make the decision process explicit:

  • What signals declare the primary region unavailable?
  • Who or what authorizes forced failover?
  • How is replication lag evaluated?
  • Which application components move with the database?
  • How is split-brain or accidental failback avoided?

Failover and Application Consistency

All databases in a failover group switch as a unit, but this does not create distributed transactions across databases. Applications using several databases should ensure that the group boundary aligns with the recovery unit.

After failover:

  • DNS caches may briefly point to the previous primary.
  • Existing connections fail and must reconnect.
  • Read-only workloads may have different routing.
  • Cross-region latency can increase if application components did not move.
  • Recent asynchronous writes can be absent after forced failover.

Failback is another migration event and must be planned and tested, not assumed to be automatic and risk-free.

Security and Networking

Secure Azure SQL Database design normally includes:

  • Microsoft Entra authentication where practical.
  • Managed identities for workloads.
  • Least-privilege database permissions.
  • Private endpoints for private access requirements.
  • Disabled or restricted public network access.
  • Encryption in transit.
  • Transparent Data Encryption.
  • Auditing and threat detection.
  • Key management aligned with compliance requirements.

A failover group requires equivalent networking and identity configuration in both regions. A secondary that is replicated correctly but unreachable by the application does not satisfy disaster-recovery objectives.

Monitoring and Capacity Management

Monitor:

  • CPU and memory pressure.
  • Data I/O and log-write utilization.
  • Worker and session limits.
  • Deadlocks and blocking.
  • Query duration and regressions.
  • Storage growth.
  • Serverless active and paused behavior.
  • Elastic pool utilization.
  • Geo-replication lag.
  • Failed connections and throttling.
  • Backup and restore tests.

Use Query Store, Query Performance Insight, Azure Monitor metrics, dynamic management views, alerts, and workload-level service indicators.

Do not scale from CPU alone. High latency might result from blocking, an inefficient plan, log throughput, or application round trips.

Common Mistakes

Common mistakes include:

  • Selecting Business Critical based only on business importance.
  • Assuming serverless always pauses or is always cheaper.
  • Leaving connection pools or monitoring sessions that prevent auto-pause.
  • Scaling without transient retry logic.
  • Treating a failover group as complete application disaster recovery.
  • Assuming forced failover cannot lose data.
  • Sending consistency-sensitive reads to a geo-secondary.
  • Putting uniformly busy databases into an elastic pool.
  • Scaling compute before investigating query and index problems.
  • Ignoring log-write, worker, session, or storage limits.
  • Testing only initial failover and never failback.
  • Configuring the secondary region without equivalent networking and identity.

Best-Practice Selection Process

A practical selection process is:

  1. Measure the workload over representative business cycles.
  2. Identify CPU, memory, I/O, log, storage, and concurrency requirements.
  3. Define latency, RPO, RTO, and availability targets.
  4. Start with the simplest tier that meets measurable requirements.
  5. Choose serverless only for suitable duty cycles and warm-up tolerance.
  6. Use elastic pools when database peaks are statistically independent.
  7. Add read replicas or Hyperscale for demonstrated scale requirements.
  8. Implement retries and idempotency before relying on online scaling.
  9. Design disaster recovery for the complete application.
  10. Test scaling, failover, failback, restore, and regional connectivity.

Interview Practice

PreviousAzure Cache for Redis for low-latency reads and session/state scenariosNext UpBlob Storage, access tiers, lifecycle management, and immutability