Overview
CHAR, VARCHAR, and NVARCHAR are SQL Server string data types. They all store character data, but they make different trade-offs around fixed length versus variable length, Unicode support, storage size, collation behavior, indexing, and compatibility with multilingual data.
CHAR stores fixed-length non-Unicode or UTF-8 character data depending on collation. VARCHAR stores variable-length non-Unicode or UTF-8 character data depending on collation. NVARCHAR stores variable-length Unicode character data using UCS-2 or UTF-16 behavior depending on the collation. NCHAR is the fixed-length Unicode counterpart, but most practical comparisons focus on CHAR, VARCHAR, and NVARCHAR.
This topic matters because string column choices affect correctness, storage, index size, query performance, sorting, comparisons, API behavior, and internationalization. A bad choice can truncate data, corrupt names from other languages, waste storage, make indexes unnecessarily large, or create bugs when literals and parameters use the wrong type.
For interviews, strong candidates can explain the storage difference, know that length declarations are byte-oriented, understand when Unicode is required, and choose string types based on actual data shape rather than habit.
Core Concepts
Character Data Type Families
SQL Server has two major character data type families:
- Non-Unicode or UTF-8-capable types:
CHARandVARCHAR - Unicode types:
NCHARandNVARCHAR
The older shorthand is that VARCHAR is non-Unicode and NVARCHAR is Unicode. That is still useful, but it needs nuance in modern SQL Server because CHAR and VARCHAR can store Unicode data when the database or column uses a UTF-8 enabled collation.
Practical rule:
- Use
VARCHARfor variable-length text when the allowed character set is well understood and compatible with the chosen collation. - Use
NVARCHARwhen the application must safely store multilingual text across languages and symbols, especially in systems that do not use UTF-8 collations. - Use
CHARonly for truly fixed-size values.
CHAR
CHAR(n) stores fixed-size string data. The value is padded to the declared size.
Example:
CREATE TABLE dbo.Country
(
CountryCode CHAR(2) NOT NULL PRIMARY KEY,
Name NVARCHAR(100) NOT NULL
);
CHAR(2) is reasonable for a fixed-length country code such as US, CA, or VN.
Less ideal example:
CREATE TABLE dbo.Customer
(
FirstName CHAR(50) NOT NULL
);
Most names are not exactly 50 bytes. CHAR(50) wastes space and may introduce confusing trailing-space behavior. VARCHAR(50) or NVARCHAR(50) is usually better.
Use CHAR when:
- Values are always the same length.
- The column stores codes, flags, fixed-width hashes, or fixed-format identifiers.
- You can tolerate padding semantics.
Avoid CHAR when:
- Values vary widely in length.
- User-entered text is stored.
- Storage efficiency matters for many rows.
- Trailing spaces might confuse application logic.
VARCHAR
VARCHAR(n) stores variable-length string data. It stores the actual bytes used by the value plus a small length overhead.
Example:
CREATE TABLE dbo.Product
(
ProductId BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
Sku VARCHAR(40) NOT NULL,
DisplayName NVARCHAR(200) NOT NULL
);
VARCHAR(40) can be a good fit for SKU values when the SKU format is ASCII or otherwise supported by the column collation. DisplayName is NVARCHAR because product names may include customer-facing multilingual text.
Use VARCHAR when:
- Values vary in length.
- The character set is limited or controlled.
- The database uses a collation that supports the needed characters.
- Storage and index width matter.
- You are storing technical identifiers, slugs, URLs, email addresses, or codes with predictable character sets.
Avoid using VARCHAR blindly for names, addresses, comments, and content that may need international characters unless you have intentionally selected UTF-8 collations and tested the application path.
NVARCHAR
NVARCHAR(n) stores variable-length Unicode string data. In SQL Server, the n value is expressed in byte-pairs, not a guaranteed number of user-perceived characters.
Example:
CREATE TABLE dbo.CustomerProfile
(
CustomerId BIGINT NOT NULL PRIMARY KEY,
FullName NVARCHAR(200) NOT NULL,
City NVARCHAR(100) NULL,
Bio NVARCHAR(1000) NULL
);
NVARCHAR is a strong default for user-facing text in international applications. It reduces the chance that characters are lost because the database code page cannot represent them.
Use NVARCHAR when:
- User-entered text can include multiple languages.
- Names, addresses, product titles, comments, or free-form content need Unicode.
- The system integrates with modern APIs that use Unicode strings.
- The application has no strict single-code-page assumption.
Trade-off: NVARCHAR often uses more storage than VARCHAR for simple Latin text. Wider strings also make indexes larger, reduce rows per page, increase memory use, and can slow comparisons.
The Length Parameter Is Not Always Characters
One of the most common interview traps is assuming VARCHAR(50) always means 50 characters and NVARCHAR(50) always means 50 characters.
In SQL Server:
- For
CHAR(n)andVARCHAR(n),nis the string length in bytes. - For
NCHAR(n)andNVARCHAR(n),nis the string length in byte-pairs. - With UTF-8 or supplementary Unicode characters, one visible character can require multiple bytes or byte-pairs.
Example:
CREATE TABLE dbo.Messages
(
ShortCode VARCHAR(10) NOT NULL,
DisplayText NVARCHAR(10) NOT NULL
);
This is not a guarantee that every possible 10-character human string can fit. Some Unicode characters can require more storage units than a simple Latin letter.
Interview answer: declare lengths based on the real maximum stored value, understand whether the length is bytes or byte-pairs, and test with representative characters, not only ASCII test data.
Unicode Literals And The N Prefix
Unicode string literals should be prefixed with N.
Correct:
SELECT *
FROM dbo.CustomerProfile
WHERE FullName = N'Nguyen Van A';
The N prefix tells SQL Server to treat the literal as Unicode input. Without it, the literal may be converted through the database default code page before it is compared or stored. Characters unsupported by that code page can be lost or changed before SQL Server ever compares them to the NVARCHAR column.
Common mistake:
INSERT INTO dbo.CustomerProfile (CustomerId, FullName)
VALUES (1, 'Pham Thi Hoa');
For simple ASCII text this may appear to work, which is why the bug hides. It becomes a real issue when names, symbols, accents, or non-Latin characters appear.
Collation And Unicode Storage
Collation controls sorting, comparison, case sensitivity, accent sensitivity, and code page behavior for character data.
Examples of collation-sensitive behavior:
- Whether
abcequalsABC. - Whether accented characters compare as equal to unaccented forms.
- How strings sort.
- Which code page is used for
CHARandVARCHARdata when a UTF-8 collation is not used. - Whether
VARCHARcan store Unicode through UTF-8.
Example with a column-level collation:
CREATE TABLE dbo.SearchTerm
(
TermId BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
Term VARCHAR(200) COLLATE Latin1_General_100_CI_AI_SC_UTF8 NOT NULL
);
This uses a UTF-8 capable collation for a VARCHAR column. That can reduce storage for mostly Latin text while still supporting Unicode. The trade-off is that every part of the system must understand and consistently use the chosen collation behavior.
VARCHAR With UTF-8 Vs NVARCHAR
Modern SQL Server supports UTF-8 collations for CHAR and VARCHAR. This gives teams another Unicode storage option.
VARCHAR with UTF-8 can be attractive when:
- Most text is Latin-based.
- Unicode support is needed.
- Storage size matters.
- The system standardizes on UTF-8 collations.
- Application and integration paths are tested with UTF-8 data.
NVARCHAR remains attractive when:
- The system already uses Unicode
NVARCHARwidely. - You want compatibility with existing SQL Server conventions.
- Data contains many characters that may not be smaller under UTF-8.
- You want to avoid mixing Unicode strategies.
- Third-party tools and procedures expect
NVARCHAR.
A good interview answer does not say "NVARCHAR is always better" or "VARCHAR is always faster." It explains the data, collation, storage, and compatibility trade-offs.
Storage And Index Impact
String data type choice affects index size.
Example:
CREATE TABLE dbo.Users
(
UserId BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
Email VARCHAR(320) NOT NULL,
DisplayName NVARCHAR(200) NOT NULL
);
CREATE UNIQUE INDEX UX_Users_Email
ON dbo.Users (Email);
Email addresses are typically constrained to a predictable character set in many systems, so VARCHAR may be appropriate. DisplayName is user-facing and may need Unicode.
Larger index keys can cause:
- More storage.
- More memory use.
- More page splits.
- Lower cache efficiency.
- Slower sorts and joins.
- Lower maximum index key flexibility.
Best practice: use the narrowest type that safely represents the required data. Narrow does not mean unsafe. A compact column that corrupts names is not a good design.
VARCHAR(MAX) And NVARCHAR(MAX)
VARCHAR(MAX) and NVARCHAR(MAX) are for large values, not for avoiding design decisions.
Use MAX types for:
- Long descriptions.
- Documents or message bodies.
- Large JSON text.
- Free-form content that can exceed normal row-size expectations.
Avoid MAX types for:
- Names.
- Codes.
- Email addresses.
- Titles with known limits.
- Indexed search keys.
Example:
CREATE TABLE dbo.Article
(
ArticleId BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
Title NVARCHAR(250) NOT NULL,
Body NVARCHAR(MAX) NOT NULL
);
This is better than making every string column NVARCHAR(MAX). The title has a real limit. The body is long-form content.
Parameters And Implicit Conversions
Application parameters should match column types.
Problem pattern:
-- Column is VARCHAR, parameter is NVARCHAR
SELECT *
FROM dbo.Users
WHERE Email = @Email;
If @Email is sent as NVARCHAR while Email is VARCHAR, SQL Server may need implicit conversion. Depending on data type precedence and query shape, implicit conversions can make indexes less useful or produce unexpected comparison behavior.
Better:
- Match parameter types to column types.
- Use
NVARCHARparameters forNVARCHARcolumns. - Use
VARCHARparameters forVARCHARcolumns when Unicode is not needed. - Avoid wrapping indexed columns in conversion functions in predicates.
Common Mistakes
Common mistakes include:
- Using
VARCHARfor multilingual names and addresses without a UTF-8 collation. - Forgetting the
Nprefix for Unicode literals. - Assuming
VARCHAR(50)means 50 characters under every collation. - Making every string column
NVARCHAR(MAX). - Using
CHARfor variable-length user-entered text. - Mixing
VARCHARcolumns withNVARCHARparameters and ignoring implicit conversions. - Choosing string types without considering indexes.
- Treating collation as an afterthought.
- Storing structured data in strings instead of proper typed columns.
Best Practices
Best practices:
- Use
VARCHARfor variable-length controlled character data. - Use
NVARCHARfor user-facing multilingual text unless UTF-8VARCHARis an intentional standard. - Use
CHARonly for truly fixed-length values. - Always specify explicit lengths.
- Prefix Unicode literals with
N. - Avoid
MAXunless values can genuinely be large. - Match parameter types to column types.
- Choose collations intentionally and document the decision.
- Test with real multilingual examples, not only ASCII data.
- Consider storage and index width before choosing broad types.