Multilingualism on the Web

Chapter of the a.s.sociation of Computational Linguistics (EACL), which provides a regional focus for its members.

Eurodicautom is the multilingual terminological database of the Translation Service of the European Commission. Initially developed to a.s.sist in-house translators, it is consulted today by an increasing number of European Union officials other than translators, as well as by language professionals throughout the world. Its huge, constantly updated, contents is drafted in twelve languages (Danish, Dutch, English, Finnish, French, German, Greek, Italian, Latin, Portuguese, Spanish, Swedish), and covers a broad spectrum of human knowledge, while the main core relates to European Union topics.

ILOTERM is the quadrilingual (English, French, German, Spanish) terminology database maintained by the Terminology and Reference Unit of the Official Doc.u.mentation Branch (OFFDOC) of the International Labour Office (ILO), Geneva, Switzerland. Its primary purpose is to provide solutions, reflecting current usage, to terminological problems in the social and labor fields. Terms are entered in English with their French, Spanish and/or German equivalents. The database also includes records (in up to four languages) concerning the structure and programmes of the ILO, official names of international inst.i.tutions, national bodies and employers" and workers" organizations, as well as t.i.tles of international meetings and instruments.

The ITU Telecommunication Terminology Database (TERMITE) is maintained by the Terminology, References and Computer Aids to Translation Section of the Conference Department of the International Telecommunication Union (ITU), Geneva, Switzerland. TERMITE (59,000 entries) is a quadrilingual (English, French, Spanish, Russian) terminological database which contains all the terms which appeared in ITU printed glossaries since 1980, as well as more recent entries relating to the different activities of the Union.

Maintained by the World Health Organization (WHO), Geneva, Switzerland, the WHO Terminology Information System (WHOTERM) includes: the WHO General Dictionary Index, giving access to an English glossary of terms, with the French and Spanish equivalents for each term; three glossaries in English: Health for All, Programme Development and Management, and Health Promotion; the WHO TermWatch, an awareness service of the Technical Terminology, which is a service reflecting the current WHO usage -- but not necessarily terms officially approved by WHO -- and a series of links to health-related terminology

4. TRANSLATION RESOURCES

[In this chapter:]

[4.1. Translation Services / 4.2. Machine Translation / 4.3. Computer-a.s.sisted Translation]

4.1. Translation Services

Maintained by Vorontsoff, Wesseling & Partners, Amsterdam, the Netherlands, Aquarius is a directory of translators and interpreters including 6,100 translators, 800 translation companies, 91 specialized areas of expertise and 369 language combinations. This non-commercial project helps to locate and contact the best translators in the world directly, without intermediaries or agencies. Aquarius Database can be searched using location, language combination and specialization.

Founded by Bill Dunlap, Euro-Marketing a.s.sociates proposes Global Reach, a methodology for companies to expand their Internet presence into a more international framework. this includes translating a website into other languages, actively promoting it and using local banner advertising to increase local website traffic in all on-line countries. Bill Dunlap explains:

"Promoting your website is at least as important as creating it, if not more important. You should be prepared to spend at least as much time and money in promoting your website as you did in creating it in the first place. With the "Global Reach" program, you can have it promoted in countries where English is not spoken, and achieve a wider audience... and more sales. There are many good reasons for taking the on-line international market seriously. "Global Reach" is a means for you to extend your website to many countries, speak to on-line visitors in their own language and reach on-line markets there."

In his e-mail of December 11, 1998, he also explains what the use of the Internet brought in his professional life:

"Since 1981, when my professional life started, I"ve been involved with bringing American companies in Europe. This is very much an issue of language, since the products and their marketing have to be in the languages of Europe in order for them to be visible here. Since the Web became popular in 1995 or so, I"ve turned these activities to their on-line dimension, and have come to champion European e-commerce among my fellow American compatriates. Most lately at Internet World in New York, I spoke about European e-commerce and how to use a website to address the various markets in Europe."

4.2. Machine Translation

Machine translation (MT) is the automated process of translating from one natural language to another. MT a.n.a.lyzes the language text in the source language and automatically generates corresponding text in the target language.

Characterized by the absence of any human intervention during the translation process, machine translation (MT) is also called "fully automatic machine translation (FAMT)". It differs from "machine-aided human translation (MAHT)" or "computer-a.s.sisted translation (CAT)", which involves some interaction between the translator and the computer.

As SYSTRAN, a company specialized in translation software, explains on its website:

"Machine translation software translates one natural language into another natural language. MT takes into account the grammatical structure of each language and uses rules to transfer the grammatical structure of the source language (text to be translated) into the target language (translated text). MT cannot replace a human translator, nor is it intended to."

The European a.s.sociation for Machine Translation (EAMT) gives the following definition:

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful for certain specific applications, usually in the domain of technical doc.u.mentation. In addition, translation software packages which are designed primarily to a.s.sist the human translator in the production of translations are enjoying increasing popularity within professional translation organizations."

Machine translation is the earliest type of natural language processing. Here are the explanations given by Globalink:

"From the very beginning, machine translation (MT) and natural language processing (NLP) have gone hand-in-hand with the evolution of modern computational technology. The development of the first general-purpose programmable computers during World War II was driven and accelerated by Allied cryptographic efforts to crack the German Enigma machine and other wartime codes. Following the war, the translation and a.n.a.lysis of natural language text provided a testbed for the newly emerging field of Information Theory.

During the 1950s, research on Automatic Translation (known today as Machine Translation, or "MT") took form in the sense of literal translation, more commonly known as word-for-word translations, without the use of any linguistic rules.

The Russian project initiated at Georgetown University in the early 1950s represented the first systematic attempt to create a demonstrable machine translation system. Throughout the decade and into the 1960s, a number of similar university and government-funded research efforts took place in the United States and Europe. At the same time, rapid developments in the field of Theoretical Linguistics, culminating in the publication of Noam Chomsky"s Aspects of the Theory of Syntax (1965), revolutionized the framework for the discussion and understanding of the phonology, morphology, syntax and semantics of human language.

In 1966, the U.S. government-issued ALPAC report offered a prematurely negative a.s.sessment of the value and prospects of practical machine translation systems, effectively putting an end to funding and experimentation in the field for the next decade. It was not until the late 1970s, with the growth of computing and language technology, that serious efforts began once again. This period of renewed interest also saw the development of the Transfer model of machine translation and the emergence of the first commercial MT systems.

While commercial ventures such as SYSTRAN and METAL began to demonstrate the viability, utility and demand for machine translation, these mainframe-bound systems also ill.u.s.trated many of the problems in bringing MT products and services to market. High development cost, labor-intensive lexicography and linguistic implementation, slow progress in developing new language pairs, inaccessibility to the average user, and inability to scale easily to new platforms are all characteristics of these second-generation systems."

A number of companies are specialized in machine translation development, such as Lernout & Hauspie, Globalink, Logos or SYSTRAN.

Based in Ieper (Belgium) and Burlington (Ma.s.sachussets, USA), Lernout & Hauspie (L&H) is an international leader in the development of advanced speech technology for various commercial applications and products. The company offers four core technologies - automatic speech recognition (ASR), text-to-speech (TTS), text-to-text and digital speech compression. Its ASR, TTS and digital speech compression technologies are licensed to main companies in the telecommunications, computers and multimedia, consumer electronics and automotive electronics industries. Its text-to-text (translation) services are provided to information technology (IT) companies and vertical and automation markets.

The Machine Translation Group of Lernout & Hauspie comprises enterprises that develop, produce, and market highly sophisticated machine translation systems: L&H Language Technology, AppTek, AILogic, NeocorTech and Globalink. Each is an international leader in its particular segment.

Founded in 1990, Globalink is a major U.S. company in language translation software and services, which offers customized translation solutions built around a range of software products, on-line options and professional translation services. The company publishes language translation software products in Spanish, French, Portuguese, German, Italian and English, and finds solutions to translation problems faced by individuals and small businesses, to multinational corporations and governments (a stand-alone product that gives a fast, draft translation or a full system to manage professional doc.u.ment translations). Globalink explains its corporate information on its website as follows:

"With Globalink"s translation applications, the computer uses three sets of data: the input text, the translation program and permanent knowledge sources (containing a dictionary of words and phrases of the source language), and information about the concepts evoked by the dictionary and rules for sentence development. These rules are in the form of linguistic rules for syntax and grammar, and some are algorithms governing verb conjugation, syntax adjustment, gender and number agreement and word re-ordering.

Once the user has selected the text and set the machine translation process in motion the program begins to match words of the input text with those stored in its dictionary. Once a match is found, the application brings up a complete record that includes information on possible meanings of the word and its contextual relationship to other words that occur in the same sentence. The time required for the translation depends on the length of the text. A three-page, 750-word doc.u.ment takes about three minutes to render a first draft translation."

Randy Hobler is a Marketing Consultant for Globalink. He is currently acting as the Product Marketing Manager for Globalink"s suite of Internet based products and services. In his e-mail of 3 September 1998, he wrote:

"85% of the content of the Web in 1998 is in English and going down. This trend is driven not only by more websites and users in non-English-speaking countries, but by increasing localization of company and organization sites, and increasing use of machine translation to/from various languages to translate websites.

Because the Internet has no national boundaries, the organization of users is bounded by other criteria driven by the medium itself. In terms of multilingualism, you have virtual communities, for example, of what I call "Language Nations"... all those people on the Internet wherever they may be, for whom a given language is their native language. Thus, the Spanish Language nation includes not only Spanish and Latin American users, but millions of Hispanic users in the US, as well as odd places like Spanish-speaking Morocco.

Language Transparency: We are rapidly reaching the point where highly accurate machine translation of text and speech will be so common as to be embedded in computer platforms, and even in chips in various ways. At that point, and as the growth of the Web slows, the accuracy of language translation hits 98% plus, and the saturation of language pairs has covered the vast majority of the market, language transparency (any-language-to-any-language communication) will be too limiting a vision for those selling this technology. The next development will be "transcultural, transnational transparency", in which other aspects of human communication, commerce and transactions beyond language alone will come into play. For example, gesture has meaning, facial movement has meaning and this varies among societies. The thumb-index finger circle means "OK" in the United States. In Argentina, it is an obscene gesture.

When the inevitable growth of multi-media, multi-lingual videoconferencing comes about, it will be necessary to "visually edit" gestures on the fly. The MIT Media Lab [MIT: Ma.s.sachussets Inst.i.tute of Technology], Microsoft and many others are working on computer recognition of facial expressions, biometric access identification via the face, etc. It won"t be any good for a U.S.

business person to be making a great point in a Web-based multi-lingual video conference to an Argentinian, having his words translated into perfect Argentinian Spanish if he makes the "O" gesture at the same time. Computers can intercept this kind of thing and edit them on the fly.

There are thousands of ways in which cultures and countries differ, and most of these are computerizable to change as one goes from one culture to the other.

They include laws, customs, business practices, ethics, currency conversions, clothing size differences, metric versus English system differences, etc., etc.

Enterprising companies will be capturing and programming these differences and selling products and services to help the peoples of the world communicate better. Once this kind of thing is widespread, it will truly contribute to international understanding."

Logos is an international company (US, Canada and Europe) specialized in machine translation for 25 years, which provides various translation tools, machine translation systems and supporting services.

SYSTRAN (an acronym for System Translation) is a company specialized in machine translation software. SYSTRAN"s headquarters are located in Soisy-sous-Montmorency, France. Sales and marketing, along with most development, operate out of its subsidiary, in La Jolla, California. The SYSTRAN site gives an interesting overview of the company"s history. One of the company"s products is AltaVista Translation, an automatic translation service of English Web pages into French, German, Italian, Portuguese, or Spanish, and vice versa, and is available on the AltaVista site, the most frequently used search engine on the Web.

Based in Montreal, Canada, Alis Technologies is an international company specialized in the development and marketing of language handling solutions and services, particularly at language implementation in the IT industry. Alis Translation Solutions (ATS) offers a wide selection of applications and languages, and multiple tools and services for best possible translation quality. Language Technology Solutions (LTS) is devoted to commercializing advanced tools and services in the field of language engineering and information technology. The unilingual information systems are transformed into software that users can put to work in their own language (90 languages covered).

Another machine translation development is SPANAM and ENGSPAN, which are fully automatic machine translation systems developed and maintained by the computational linguists, translators, and systems programmer of the Pan American Health Organization (PAHO), Washington, D.C. The PAHO Translation Unit has used SPANAM (Spanish to English) and ENGSPAN (English to Spanish) to process over 25 million words since 1980. Staff and free-lance translators postedit the raw output to produce high-quality translations with a 30-50% gain in productivity.

The system is installed on a local area network at PAHO Headquarters and is used regularly by staff in the technical and administrative units. The software is also installed in a number of PAHO field offices and has been licensed to public and non-profit inst.i.tutions in the US, Latin America, and Spain.

Some a.s.sociations also contribute to machine translation development.

The a.s.sociation for Computational Linguistics (ACL) is the main international scientific and professional society for people working on problems involving natural language and computation. Published by MIT Press, the ACL quarterly journal, Computational Linguistics (ISSN 0891-2017), continues to be the primary forum for research on computational linguistics and natural language processing.

The Finite String is its newsletter supplement. The European branch of ACL is the European Chapter of the a.s.sociation of Computational Linguistics (EACL), which provides a regional focus for its members.

© 2024 www.topnovel.cc