The languagelens translation technology employs new statistical techniques that make it possible to quickly build translation systems that are customized for a given language pair and text type. The key to the technology is that the system is able to learn from previously translated texts. The system examines each word sequence in the translation data, and compiles a translation table containing all the translations observed in the data. The translation table is similar to a traditional bilingual dictionary -- it differs, however, in two important ways. First, while a traditional dictionary consists primarily of individual words, the translation table includes a large number of multi-word sequences. Secondly, the translation table computes a probability score for each translation choice, based on the number of occurrences in the translation data. These probabilities are the key to the success of the system -- they allow translations to mimic the subtle, complex preferences observed in the translation data.

To translate a sentence, the system consults the translation table to determine all possible translations for each sub-part of the sentence. This gives the system a large search space of possible translations -- the system examines thousands of options for each sentence, consistently arriving at a high-quality translation in a matter of milliseconds.

A key factor in the success of the SMT system is the flexibility of its translation tables. These lists correlate not only individual words, but also multi-word sequences of varying sizes. This effectively liberates the system from the plodding word-by-word approach thought to be characteristic of automatic translation. Another essential feature of the system is that its choices are relativized to context: a potential translation will often be rejected because it doesn't fit smoothly with what comes before or after. This flexibility and context-sensitivity enables the system to produce translations that are remarkably faithful to its translation data, not only in terms of grammatical correctness, but also in terms of the style and idioms of a given text type.