< New Website
12.10.2012 16:40 Age: 5 yrs
Category: Helicon Translator

Internationalization of Applications

Internationalization of Applications is more than just translation.


This document describes the most important steps which are necessary to internationalize an application and to prepare it for localization. This document describes the general requirements, not just the steps necessary for Helicon Translator

What is Internationalization?

Internationalization (i18n) is the process of designing or adaption of an application so that it can be adapted to various languages and regions without additional changes in the core application. Internationalization is the required preperation for successfull localization of an application.

Basically the following points need to be kept in mind because they vary all over the world:

  • Language
  • Data Formats (Like zip-code, phone-numbers, Date and Time-Format)
  • Time zones
  • Currency
  • Number-Formats
  • Weights and measures
  • Paper sizes
  • Cultural differences like different meaning of colors
  • Names and Titles

When you look at the list you will see which things you need to keep variable for localized versions.

Sometimes it's less important to translate the application then to keep it open for the local requirements. For Example, an english application may have a better chance in Europe if it accepts metric units than a translated application that requires input of inches or feet. On the other hand, an application that requires input of metric units may have difficulties in the US.

Translation is not the most important part of localization, but it completes it.

The most important step is to avoid hard-coding of things that may need to be different in localized versions.

Language and Translation

Do not write strings directly into the source, but declare them as resource-strings or put them in a resource file.

A well internationalized software does not need to be changed when it's localized into additional languages.

Reserve Space

A text may be longer when translated, thus always reserve additional space. For example, the German "Abbrechen" is 3 characters longer than the English "Cancel". A button that is designed to small might cut-off the text. As a rule of thumb, about 50-100 % additional space should be reserved for short text, and 20-50% for long text.

Character-Set

One of the big drawbacks of Delphi's VCL is that it's not really based on Unicode. Since Delphi 6, Delphi stores resourcesin unicode-format which is a big step forward. Unfortunately the component themselve still convert the text from Unicode to Ansi. One reason probably was that Unicode was not supported on Win9x.

The problem is that one character may not be one byte long. If the application gets input that requires a multibyte-characterset (MBCS), string-handling-routines that rely on 1 byte = 1 character may fail. Fortunately, the standard-VCL/RTL-functions handle MBCS-strings well.

Writing-Direction

In some languages text is written from left to right, in others from right to left. In order to make an application look good, it is necessary to swap the layout according to the writing-direction. Fortunately the VCL somewhat helps.

Data-Formats

Data-formats are different all over the world. So if your application uses input-masks or validates data, don't rely on the format you are used to.

The most important formats (Date, time, numbers) can be configured in the Windows control-panel, thus applications should use the format configured there, instead of relying on an own format. Fortunately most of Delphi's conversion-functions use the system settings, so internationalization is not a big problem. You just have to keep different data formats in mind when you manipulate data yourself.

Input-Masks and validation

While Europeans also use area-codes and phone numbers, a input and validation-mask of (999)999-9999 will make most Europeans unhappy because the width of the area-code is too narrow. Additionally it does not accept the country-code.

If you want to use input-masks or validate data, then keep the input-mask in resource-strings in the form-resource in order to make localization possible.

Additionally it's better not to validate or format data than to do it in a way that does not allow the user to input valid data.

Examples for different formats are

  • Phone Numbers
  • ZIP-code
  • Social Security Number
  • and so on.

Number Formats

Some countries use decimal-points, other commas. For example, "99.99" is the format in the US, and "99,99" in Europe. The same is true for the thousand-seperator: It's "1,000.00" in the US, but "1.000,00" in most European countries.

Data storage

While it's a good idea not to rely on fixed formats when it comes to user-input or output, it may be important to fixed formats when it comes to data-exchange and data storage. Data should be stored in a format independend of the format used on a specific computer in order to make data-exchange possible.

Units, Weights and measures

Imperial units are usual in many english speaking countries while metric units are the usual format in Europe and scientific communities.

Thus, if an application stores weight or measures, it may be good to store it in a specific unit internally, but let the user choose in which units he or she want to see or input the data.

For example, data could always be stored in millimeters internally, but could be displayed in inches in a region where this unit is commonly used.

Cultural Differences

Some symbols and colors have different or no meaning in other cultures, thus icons should be kept very culture-indpendend or replaced in localized versions.