Go to the first, previous, next, last section, table of contents.


Locales and Internationalization

Different countries and cultures have varying conventions for how to communicate. These conventions range from very simple ones, such as the format for representing dates and times, to very complex ones, such as the language spoken.

Internationalization of software means programming it to be able to adapt to the user's favorite conventions. In ANSI C, internationalization works by means of locales. Each locale specifies a collection of conventions, one convention for each purpose. The user chooses a set of conventions by specifying a locale (via environment variables).

All programs inherit the chosen locale as part of their environment. Provided the programs are written to obey the choice of locale, they will follow the conventions preferred by the user.

What Effects a Locale Has

Each locale specifies conventions for several purposes, including the following:

Some aspects of adapting to the specified locale are handled automatically by the library subroutines. For example, all your program needs to do in order to use the collating sequence of the chosen locale is to use strcoll or strxfrm to compare strings.

Other aspects of locales are beyond the comprehension of the library. For example, the library can't automatically translate your program's output messages into other languages. The only way you can support output in the user's favorite language is to program this more or less by hand. (Eventually, we hope to provide facilities to make this easier.)

This chapter discusses the mechanism by which you can modify the current locale. The effects of the current locale on specific library functions are discussed in more detail in the descriptions of those functions.

Choosing a Locale

The simplest way for the user to choose a locale is to set the environment variable LANG. This specifies a single locale to use for all purposes. For example, a user could specify a hypothetical locale named `espana-castellano' to use the standard conventions of most of Spain.

The set of locales supported depends on the operating system you are using, and so do their names. We can't make any promises about what locales will exist, except for one standard locale called `C' or `POSIX'.

A user also has the option of specifying different locales for different purposes--in effect, choosing a mixture of multiple locales.

For example, the user might specify the locale `espana-castellano' for most purposes, but specify the locale `usa-english' for currency formatting. This might make sense if the user is a Spanish-speaking American, working in Spanish, but representing monetary amounts in US dollars.

Note that both locales `espana-castellano' and `usa-english', like all locales, would include conventions for all of the purposes to which locales apply. However, the user can choose to use each locale for a particular subset of those purposes.

Categories of Activities that Locales Affect

The purposes that locales serve are grouped into categories, so that a user or a program can choose the locale for each category independently. Here is a table of categories; each name is both an environment variable that a user can set, and a macro name that you can use as an argument to setlocale.

LC_COLLATE
This category applies to collation of strings (functions strcoll and strxfrm); see section Collation Functions.
LC_CTYPE
This category applies to classification and conversion of characters, and to multibyte and wide characters; see section Character Handling and section Extended Characters.
LC_MONETARY
This category applies to formatting monetary values; see section Numeric Formatting.
LC_NUMERIC
This category applies to formatting numeric values that are not monetary; see section Numeric Formatting.
LC_TIME
This category applies to formatting date and time values; see section Formatting Date and Time.
LC_ALL
This is not an environment variable; it is only a macro that you can use with setlocale to set a single locale for all purposes.
LANG
If this environment variable is defined, its value specifies the locale to use for all purposes except as overridden by the variables above.

How Programs Set the Locale

A C program inherits its locale environment variables when it starts up. This happens automatically. However, these variables do not automatically control the locale used by the library functions, because ANSI C says that all programs start by default in the standard `C' locale. To use the locales specified by the environment, you must call setlocale. Call it as follows:

setlocale (LC_ALL, "");

to select a locale based on the appropriate environment variables.

You can also use setlocale to specify a particular locale, for general use or for a specific category.

The symbols in this section are defined in the header file `locale.h'.

Function: char * setlocale (int category, const char *locale)
The function setlocale sets the current locale for category category to locale.

If category is LC_ALL, this specifies the locale for all purposes. The other possible values of category specify an individual purpose (see section Categories of Activities that Locales Affect).

You can also use this function to find out the current locale by passing a null pointer as the locale argument. In this case, setlocale returns a string that is the name of the locale currently selected for category category.

The string returned by setlocale can be overwritten by subsequent calls, so you should make a copy of the string (see section Copying and Concatenation) if you want to save it past any further calls to setlocale. (The standard library is guaranteed never to call setlocale itself.)

You should not modify the string returned by setlocale. It might be the same string that was passed as an argument in a previous call to setlocale.

When you read the current locale for category LC_ALL, the value encodes the entire combination of selected locales for all categories. In this case, the value is not just a single locale name. In fact, we don't make any promises about what it looks like. But if you specify the same "locale name" with LC_ALL in a subsequent call to setlocale, it restores the same combination of locale selections.

When the locale argument is not a null pointer, the string returned by setlocale reflects the newly modified locale.

If you specify an empty string for locale, this means to read the appropriate environment variable and use its value to select the locale for category.

If you specify an invalid locale name, setlocale returns a null pointer and leaves the current locale unchanged.

Here is an example showing how you might use setlocale to temporarily switch to a new locale.

#include <stddef.h>
#include <locale.h>
#include <stdlib.h>
#include <string.h>

void
with_other_locale (char *new_locale,
                   void (*subroutine) (int),
                   int argument)
{
  char *old_locale, *saved_locale;

  /* Get the name of the current locale.  */
  old_locale = setlocale (LC_ALL, NULL);
  
  /* Copy the name so it won't be clobbered by setlocale. */
  saved_locale = strdup (old_locale);
  if (old_locale == NULL)
    fatal ("Out of memory");
  
  /* Now change the locale and do some stuff with it. */
  setlocale (LC_ALL, new_locale);
  (*subroutine) (argument);
  
  /* Restore the original locale. */
  setlocale (LC_ALL, saved_locale);
  free (saved_locale);
}

Portability Note: Some ANSI C systems may define additional locale categories. For portability, assume that any symbol beginning with `LC_' might be defined in `locale.h'.

Standard Locales

The only locale names you can count on finding on all operating systems are these three standard ones:

"C"
This is the standard C locale. The attributes and behavior it provides are specified in the ANSI C standard. When your program starts up, it initially uses this locale by default.
"POSIX"
This is the standard POSIX locale. Currently, it is an alias for the standard C locale.
""
The empty name says to select a locale based on environment variables. See section Categories of Activities that Locales Affect.

Defining and installing named locales is normally a responsibility of the system administrator at your site (or the person who installed the GNU C library). Some systems may allow users to create locales, but we don't discuss that here.

If your program needs to use something other than the `C' locale, it will be more portable if you use whatever locale the user specifies with the environment, rather than trying to specify some non-standard locale explicitly by name. Remember, different machines might have different sets of locales installed.

Numeric Formatting

When you want to format a number or a currency amount using the conventions of the current locale, you can use the function localeconv to get the data on how to do it. The function localeconv is declared in the header file `locale.h'.

Function: struct lconv * localeconv (void)
The localeconv function returns a pointer to a structure whose components contain information about how numeric and monetary values should be formatted in the current locale.

You shouldn't modify the structure or its contents. The structure might be overwritten by subsequent calls to localeconv, or by calls to setlocale, but no other function in the library overwrites this value.

Data Type: struct lconv
This is the data type of the value returned by localeconv.

If a member of the structure struct lconv has type char, and the value is CHAR_MAX, it means that the current locale has no value for that parameter.

Generic Numeric Formatting Parameters

These are the standard members of struct lconv; there may be others.

char *decimal_point
char *mon_decimal_point
These are the decimal-point separators used in formatting non-monetary and monetary quantities, respectively. In the `C' locale, the value of decimal_point is ".", and the value of mon_decimal_point is "".
char *thousands_sep
char *mon_thousands_sep
These are the separators used to delimit groups of digits to the left of the decimal point in formatting non-monetary and monetary quantities, respectively. In the `C' locale, both members have a value of "" (the empty string).
char *grouping
char *mon_grouping
These are strings that specify how to group the digits to the left of the decimal point. grouping applies to non-monetary quantities and mon_grouping applies to monetary quantities. Use either thousands_sep or mon_thousands_sep to separate the digit groups. Each string is made up of decimal numbers separated by semicolons. Successive numbers (from left to right) give the sizes of successive groups (from right to left, starting at the decimal point). The last number in the string is used over and over for all the remaining groups. If the last integer is -1, it means that there is no more grouping--or, put another way, any remaining digits form one large group without separators. For example, if grouping is "4;3;2", the correct grouping for the number 123456787654321 is `12', `34', `56', `78', `765', `4321'. This uses a group of 4 digits at the end, preceded by a group of 3 digits, preceded by groups of 2 digits (as many as needed). With a separator of `,', the number would be printed as `12,34,56,78,765,4321'. A value of "3" indicates repeated groups of three digits, as normally used in the U.S. In the standard `C' locale, both grouping and mon_grouping have a value of "". This value specifies no grouping at all.
char int_frac_digits
char frac_digits
These are small integers indicating how many fractional digits (to the right of the decimal point) should be displayed in a monetary value in international and local formats, respectively. (Most often, both members have the same value.) In the standard `C' locale, both of these members have the value CHAR_MAX, meaning "unspecified". The ANSI standard doesn't say what to do when you find this the value; we recommend printing no fractional digits. (This locale also specifies the empty string for mon_decimal_point, so printing any fractional digits would be confusing!)

Printing the Currency Symbol

These members of the struct lconv structure specify how to print the symbol to identify a monetary value--the international analog of `$' for US dollars.

Each country has two standard currency symbols. The local currency symbol is used commonly within the country, while the international currency symbol is used internationally to refer to that country's currency when it is necessary to indicate the country unambiguously.

For example, many countries use the dollar as their monetary unit, and when dealing with international currencies it's important to specify that one is dealing with (say) Canadian dollars instead of U.S. dollars or Australian dollars. But when the context is known to be Canada, there is no need to make this explicit--dollar amounts are implicitly assumed to be in Canadian dollars.

char *currency_symbol
The local currency symbol for the selected locale. In the standard `C' locale, this member has a value of "" (the empty string), meaning "unspecified". The ANSI standard doesn't say what to do when you find this value; we recommend you simply print the empty string as you would print any other string found in the appropriate member.
char *int_curr_symbol
The international currency symbol for the selected locale. The value of int_curr_symbol should normally consist of a three-letter abbreviation determined by the international standard ISO 4217 Codes for the Representation of Currency and Funds, followed by a one-character separator (often a space). In the standard `C' locale, this member has a value of "" (the empty string), meaning "unspecified". We recommend you simply print the empty string as you would print any other string found in the appropriate member.
char p_cs_precedes
char n_cs_precedes
These members are 1 if the currency_symbol string should precede the value of a monetary amount, or 0 if the string should follow the value. The p_cs_precedes member applies to positive amounts (or zero), and the n_cs_precedes member applies to negative amounts. In the standard `C' locale, both of these members have a value of CHAR_MAX, meaning "unspecified". The ANSI standard doesn't say what to do when you find this value, but we recommend printing the currency symbol before the amount. That's right for most countries. In other words, treat all nonzero values alike in these members. The POSIX standard says that these two members apply to the int_curr_symbol as well as the currency_symbol. The ANSI C standard seems to imply that they should apply only to the currency_symbol---so the int_curr_symbol should always precede the amount. We can only guess which of these (if either) matches the usual conventions for printing international currency symbols. Our guess is that they should always preceed the amount. If we find out a reliable answer, we will put it here.
char p_sep_by_space
char n_sep_by_space
These members are 1 if a space should appear between the currency_symbol string and the amount, or 0 if no space should appear. The p_sep_by_space member applies to positive amounts (or zero), and the n_sep_by_space member applies to negative amounts. In the standard `C' locale, both of these members have a value of CHAR_MAX, meaning "unspecified". The ANSI standard doesn't say what you should do when you find this value; we suggest you treat it as one (print a space). In other words, treat all nonzero values alike in these members. These members apply only to currency_symbol. When you use int_curr_symbol, you never print an additional space, because int_curr_symbol itself contains the appropriate separator. The POSIX standard says that these two members apply to the int_curr_symbol as well as the currency_symbol. But an example in the ANSI C standard clearly implies that they should apply only to the currency_symbol---that the int_curr_symbol contains any appropriate separator, so you should never print an additional space. Based on what we know now, we recommend you ignore these members when printing international currency symbols, and print no extra space.

Printing the Sign of an Amount of Money

These members of the struct lconv structure specify how to print the sign (if any) in a monetary value.

char *positive_sign
char *negative_sign
These are strings used to indicate positive (or zero) and negative (respectively) monetary quantities. In the standard `C' locale, both of these members have a value of "" (the empty string), meaning "unspecified". The ANSI standard doesn't say what to do when you find this value; we recommend printing positive_sign as you find it, even if it is empty. For a negative value, print negative_sign as you find it unless both it and positive_sign are empty, in which case print `-' instead. (Failing to indicate the sign at all seems rather unreasonable.)
char p_sign_posn
char n_sign_posn
These members have values that are small integers indicating how to position the sign for nonnegative and negative monetary quantities, respectively. (The string used by the sign is what was specified with positive_sign or negative_sign.) The possible values are as follows:
0
The currency symbol and quantity should be surrounded by parentheses.
1
Print the sign string before the quantity and currency symbol.
2
Print the sign string after the quantity and currency symbol.
3
Print the sign string right before the currency symbol.
4
Print the sign string right after the currency symbol.
CHAR_MAX
"Unspecified". Both members have this value in the standard `C' locale.
The ANSI standard doesn't say what you should do when the value is CHAR_MAX. We recommend you print the sign after the currency symbol.

It is not clear whether you should let these members apply to the international currency format or not. POSIX says you should, but intuition plus the examples in the ANSI C standard suggest you should not. We hope that someone who knows well the conventions for formatting monetary quantities will tell us what we should recommend.


Go to the first, previous, next, last section, table of contents.