CHAR vs VARCHAR vs NVARCHAR and Unicode storage choices Interview Questions

Overview

CHAR, VARCHAR, and NVARCHAR are SQL Server string data types. They all store character data, but they make different trade-offs around fixed length versus variable length, Unicode support, storage size, collation behavior, indexing, and compatibility with multilingual data.

CHAR stores fixed-length non-Unicode or UTF-8 character data depending on collation. VARCHAR stores variable-length non-Unicode or UTF-8 character data depending on collation. NVARCHAR stores variable-length Unicode character data using UCS-2 or UTF-16 behavior depending on the collation. NCHAR is the fixed-length Unicode counterpart, but most practical comparisons focus on CHAR, VARCHAR, and NVARCHAR.

This topic matters because string column choices affect correctness, storage, index size, query performance, sorting, comparisons, API behavior, and internationalization. A bad choice can truncate data, corrupt names from other languages, waste storage, make indexes unnecessarily large, or create bugs when literals and parameters use the wrong type.

For interviews, strong candidates can explain the storage difference, know that length declarations are byte-oriented, understand when Unicode is required, and choose string types based on actual data shape rather than habit.

Core Concepts

Character Data Type Families

SQL Server has two major character data type families:

Non-Unicode or UTF-8-capable types: CHAR and VARCHAR
Unicode types: NCHAR and NVARCHAR

The older shorthand is that VARCHAR is non-Unicode and NVARCHAR is Unicode. That is still useful, but it needs nuance in modern SQL Server because CHAR and VARCHAR can store Unicode data when the database or column uses a UTF-8 enabled collation.

Practical rule:

Use VARCHAR for variable-length text when the allowed character set is well understood and compatible with the chosen collation.
Use NVARCHAR when the application must safely store multilingual text across languages and symbols, especially in systems that do not use UTF-8 collations.
Use CHAR only for truly fixed-size values.

CHAR

CHAR(n) stores fixed-size string data. The value is padded to the declared size.

Example:

Code

CREATE TABLE dbo.Country
(
    CountryCode CHAR(2) NOT NULL PRIMARY KEY,
    Name NVARCHAR(100) NOT NULL
);

CHAR(2) is reasonable for a fixed-length country code such as US, CA, or VN.

Less ideal example:

Code

CREATE TABLE dbo.Customer
(
    FirstName CHAR(50) NOT NULL
);

Most names are not exactly 50 bytes. CHAR(50) wastes space and may introduce confusing trailing-space behavior. VARCHAR(50) or NVARCHAR(50) is usually better.

Use CHAR when:

Values are always the same length.
The column stores codes, flags, fixed-width hashes, or fixed-format identifiers.
You can tolerate padding semantics.

Avoid CHAR when:

Values vary widely in length.
User-entered text is stored.
Storage efficiency matters for many rows.
Trailing spaces might confuse application logic.

VARCHAR

VARCHAR(n) stores variable-length string data. It stores the actual bytes used by the value plus a small length overhead.

Example:

Code

CREATE TABLE dbo.Product
(
    ProductId BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
    Sku VARCHAR(40) NOT NULL,
    DisplayName NVARCHAR(200) NOT NULL
);

VARCHAR(40) can be a good fit for SKU values when the SKU format is ASCII or otherwise supported by the column collation. DisplayName is NVARCHAR because product names may include customer-facing multilingual text.

Use VARCHAR when:

Values vary in length.
The character set is limited or controlled.
The database uses a collation that supports the needed characters.
Storage and index width matter.
You are storing technical identifiers, slugs, URLs, email addresses, or codes with predictable character sets.

Avoid using VARCHAR blindly for names, addresses, comments, and content that may need international characters unless you have intentionally selected UTF-8 collations and tested the application path.

NVARCHAR

NVARCHAR(n) stores variable-length Unicode string data. In SQL Server, the n value is expressed in byte-pairs, not a guaranteed number of user-perceived characters.

Example:

Code

CREATE TABLE dbo.CustomerProfile
(
    CustomerId BIGINT NOT NULL PRIMARY KEY,
    FullName NVARCHAR(200) NOT NULL,
    City NVARCHAR(100) NULL,
    Bio NVARCHAR(1000) NULL
);

NVARCHAR is a strong default for user-facing text in international applications. It reduces the chance that characters are lost because the database code page cannot represent them.

Use NVARCHAR when:

User-entered text can include multiple languages.
Names, addresses, product titles, comments, or free-form content need Unicode.
The system integrates with modern APIs that use Unicode strings.
The application has no strict single-code-page assumption.

Trade-off: NVARCHAR often uses more storage than VARCHAR for simple Latin text. Wider strings also make indexes larger, reduce rows per page, increase memory use, and can slow comparisons.

The Length Parameter Is Not Always Characters

One of the most common interview traps is assuming VARCHAR(50) always means 50 characters and NVARCHAR(50) always means 50 characters.

In SQL Server:

For CHAR(n) and VARCHAR(n), n is the string length in bytes.
For NCHAR(n) and NVARCHAR(n), n is the string length in byte-pairs.
With UTF-8 or supplementary Unicode characters, one visible character can require multiple bytes or byte-pairs.

Example:

Code

CREATE TABLE dbo.Messages
(
    ShortCode VARCHAR(10) NOT NULL,
    DisplayText NVARCHAR(10) NOT NULL
);

This is not a guarantee that every possible 10-character human string can fit. Some Unicode characters can require more storage units than a simple Latin letter.

Interview answer: declare lengths based on the real maximum stored value, understand whether the length is bytes or byte-pairs, and test with representative characters, not only ASCII test data.

Unicode Literals And The N Prefix

Unicode string literals should be prefixed with N.

Correct:

Code

SELECT *
FROM dbo.CustomerProfile
WHERE FullName = N'Nguyen Van A';

The N prefix tells SQL Server to treat the literal as Unicode input. Without it, the literal may be converted through the database default code page before it is compared or stored. Characters unsupported by that code page can be lost or changed before SQL Server ever compares them to the NVARCHAR column.

Common mistake:

Code

INSERT INTO dbo.CustomerProfile (CustomerId, FullName)
VALUES (1, 'Pham Thi Hoa');

For simple ASCII text this may appear to work, which is why the bug hides. It becomes a real issue when names, symbols, accents, or non-Latin characters appear.

Collation And Unicode Storage

Collation controls sorting, comparison, case sensitivity, accent sensitivity, and code page behavior for character data.

Examples of collation-sensitive behavior:

Whether abc equals ABC.
Whether accented characters compare as equal to unaccented forms.
How strings sort.
Which code page is used for CHAR and VARCHAR data when a UTF-8 collation is not used.
Whether VARCHAR can store Unicode through UTF-8.

Example with a column-level collation:

Code

CREATE TABLE dbo.SearchTerm
(
    TermId BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
    Term VARCHAR(200) COLLATE Latin1_General_100_CI_AI_SC_UTF8 NOT NULL
);

This uses a UTF-8 capable collation for a VARCHAR column. That can reduce storage for mostly Latin text while still supporting Unicode. The trade-off is that every part of the system must understand and consistently use the chosen collation behavior.

VARCHAR With UTF-8 Vs NVARCHAR

Modern SQL Server supports UTF-8 collations for CHAR and VARCHAR. This gives teams another Unicode storage option.

VARCHAR with UTF-8 can be attractive when:

Most text is Latin-based.
Unicode support is needed.
Storage size matters.
The system standardizes on UTF-8 collations.
Application and integration paths are tested with UTF-8 data.

NVARCHAR remains attractive when:

The system already uses Unicode NVARCHAR widely.
You want compatibility with existing SQL Server conventions.
Data contains many characters that may not be smaller under UTF-8.
You want to avoid mixing Unicode strategies.
Third-party tools and procedures expect NVARCHAR.

A good interview answer does not say "NVARCHAR is always better" or "VARCHAR is always faster." It explains the data, collation, storage, and compatibility trade-offs.

Storage And Index Impact

String data type choice affects index size.

Example:

Code

CREATE TABLE dbo.Users
(
    UserId BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
    Email VARCHAR(320) NOT NULL,
    DisplayName NVARCHAR(200) NOT NULL
);

CREATE UNIQUE INDEX UX_Users_Email
ON dbo.Users (Email);

Email addresses are typically constrained to a predictable character set in many systems, so VARCHAR may be appropriate. DisplayName is user-facing and may need Unicode.

Larger index keys can cause:

More storage.
More memory use.
More page splits.
Lower cache efficiency.
Slower sorts and joins.
Lower maximum index key flexibility.

Best practice: use the narrowest type that safely represents the required data. Narrow does not mean unsafe. A compact column that corrupts names is not a good design.

VARCHAR(MAX) And NVARCHAR(MAX)

VARCHAR(MAX) and NVARCHAR(MAX) are for large values, not for avoiding design decisions.

Use MAX types for:

Long descriptions.
Documents or message bodies.
Large JSON text.
Free-form content that can exceed normal row-size expectations.

Avoid MAX types for:

Names.
Codes.
Email addresses.
Titles with known limits.
Indexed search keys.

Example:

Code

CREATE TABLE dbo.Article
(
    ArticleId BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
    Title NVARCHAR(250) NOT NULL,
    Body NVARCHAR(MAX) NOT NULL
);

This is better than making every string column NVARCHAR(MAX). The title has a real limit. The body is long-form content.

Parameters And Implicit Conversions

Application parameters should match column types.

Problem pattern:

Code

-- Column is VARCHAR, parameter is NVARCHAR
SELECT *
FROM dbo.Users
WHERE Email = @Email;

If @Email is sent as NVARCHAR while Email is VARCHAR, SQL Server may need implicit conversion. Depending on data type precedence and query shape, implicit conversions can make indexes less useful or produce unexpected comparison behavior.

Better:

Match parameter types to column types.
Use NVARCHAR parameters for NVARCHAR columns.
Use VARCHAR parameters for VARCHAR columns when Unicode is not needed.
Avoid wrapping indexed columns in conversion functions in predicates.

Common Mistakes

Common mistakes include:

Using VARCHAR for multilingual names and addresses without a UTF-8 collation.
Forgetting the N prefix for Unicode literals.
Assuming VARCHAR(50) means 50 characters under every collation.
Making every string column NVARCHAR(MAX).
Using CHAR for variable-length user-entered text.
Mixing VARCHAR columns with NVARCHAR parameters and ignoring implicit conversions.
Choosing string types without considering indexes.
Treating collation as an afterthought.
Storing structured data in strings instead of proper typed columns.

Best Practices

Best practices:

Use VARCHAR for variable-length controlled character data.
Use NVARCHAR for user-facing multilingual text unless UTF-8 VARCHAR is an intentional standard.
Use CHAR only for truly fixed-length values.
Always specify explicit lengths.
Prefix Unicode literals with N.
Avoid MAX unless values can genuinely be large.
Match parameter types to column types.
Choose collations intentionally and document the decision.
Test with real multilingual examples, not only ASCII data.
Consider storage and index width before choosing broad types.

CHAR vs VARCHAR vs NVARCHAR and Unicode storage choices

Overview

Core Concepts

Character Data Type Families

CHAR

VARCHAR

NVARCHAR

The Length Parameter Is Not Always Characters

Unicode Literals And The N Prefix

Collation And Unicode Storage

VARCHAR With UTF-8 Vs NVARCHAR

Storage And Index Impact

VARCHAR(MAX) And NVARCHAR(MAX)

Parameters And Implicit Conversions

Common Mistakes

Best Practices

Interview Practice

Beginner Interview Practice

Intermediate Interview Practice

Advanced Interview Practice