Multilingualism on the Web

Chapter 7

Prev List Next

That means that the barrier between personal information (your phone lists and diary) and non-personal information (Seneca and Moses) will be overcome, so that you can get to both types anytime. I would love to have something that tells me, when next I am at a conference and someone steps up, smiling to say h.e.l.lo, who this person is, where last I met him/her, and what we said then!

But that is the future. Today, the Web has made big changes in the way I shop (I spent 20 minutes looking for plane routes for my next trip with a difficult transition on the Web, instead of waiting for my secretary to ask the travel agent, which takes a day). I look for information on anything I want to know about, instead of having to make a trip to the library and look through complicated indexes. I send e-mail to you about this question, at a time that is convenient for me, rather than your having to make a phone appointment and then us talking for 15 minutes. And so on."

The Computing Research Laboratory (CRL) at New Mexico State University (NMSU) is a non-profit research enterprise committed to basic research and software development in advanced computing applications concentrated in the areas of natural language processing, artificial intelligence and graphical user interface design. Applications developed from basic research endeavors include a variety of configurations of machine translation, information extraction, knowledge acquisition, intelligent teaching, and translator workstation systems.

Maintained by the Department of Linguistics of the Translation Research Group of Brigham Young University (BYU), Utah, TTT.org (Translation, Theory and Technology) provides information about language theory and technology, particularly relating to translation. Translation technology includes translator workbench tools and machine translation. In addition to translation tools, TTT.org is interested in data exchange standards that allow various tools to interoperate, allowing the integration of tools from multiple vendors in the multilingual doc.u.ment production chain.

In the area of data exchange standards, TTT.org is actively involved in the development of MARTIF (machine-readable terminology interchange format). MARTIF is a format to facilitate the interchange of terminological data among terminology management systems. This format is the result of several years of intense international collaboration among terminologists and database experts from various organizations, including academic inst.i.tutions, the Text Encoding Initiative (TEI), and the Localisation Industry Standards a.s.sociation (LISA).

5.2. Computational Linguistics

The Laboratoire de recherche appliquee en linguistique informatique (RALI) (Laboratory of Applied Research in Computational Linguistics) is a laboratory of the University of Montreal, Quebec. The RALI"s personnel includes experienced computer scientists and linguists in natural language processing both in cla.s.sical symbolic methods as well as in newer probabilist methods.

Thanks to the Incognito laboratory, which was founded in 1983, the University of Montreal"s Computer Science and Operational Research Department (DIRO) established itself as a leading research centre in the area of natural language processing. In June 1997, Industry Canada agreed to transfer to the DIRO all the activities of the machine-aided translation program (TAO), which had been conducted at the Centre for Information Technology Innovation (CITI) since 1984.

A new laboratory -- the RALI -- was opened in order to promote and develop the results of the CITI"s research, allowing the members of the former TAO team to pursue their work within the university community. The RALI"s areas of expertise include work in: automatic text alignment, automatic text generation, automatic reaccentuation, language identification and finite state transducers.

The RALI produces the "TransX family" of what it calls "a new generation" of translation support tools (TransType, TransTalk, TransCheck and TransSearch), which are based on probabilistic translation models that automatically calculate the correspondences between the text produced by a translator and the original source language text.

" TransType speeds up the keying-in of a translation by antic.i.p.ating a translator"s choices and critiquizing them when appropriate. In proposing its suggestions, TransType takes into account both the source text and the partial translation that the translator has already produced.

TransTalk is an automatic dictation system that makes use of a probabilistic translation model in order to improve the performance of its voice recognition model.

TransCheck automatically detects certain types of translation errors by verifying that the correspondences between the segments of a draft and the segments of the source text respect well-known properties of a good translation.

TransSearch allows translators to search databases of pre-existing translations in order to find ready-made solutions to all sorts of translation problems. In order to produce the required databases, the translations and the source language texts must first be aligned."

Some of RALI"s other projects are:

- the SILC Project, concerning language identification. When a doc.u.ment is submitted to the system, SILC attempts to determine what language the doc.u.ment is written in and the character set in which it is encoded.

- the FAP: Finite Automata Package (FAP), a project concerning finite-state transducers. The finite-state automaton is a simple and efficient computational device for describing sequences of symbols (words, characters, etc.) known as the regular languages. The finite-state transducer is a device for linking pairs of these sequences under the control of a grammar of local correspondences, and thus provides a means of rewriting one sequence as another. Applications of these techniques in NLP include: dictionaries, morphological a.n.a.lysis, part-of-speech tagging, syntactic a.n.a.lysis, and speech processing.

The Xerox Palo Alto Research Center (PARC)"s projects include two main projects concerning languages: Inter-Language Unification (ILU) and Natural Language Theory and Technology (NLTT).

The Inter-Language Unification (ILU) System is a multi-language object interface system. The object interfaces provided by ILU hide implementation distinctions between different languages, between different address s.p.a.ces, and between operating system types. ILU can be used to build multilingual object-oriented libraries ("cla.s.s libraries") with well-specified language-independent interfaces. It can also be used to implement distributed systems, or to define and doc.u.ment interfaces between the modules of non-distributed programs.

The goal of Natural Language Theory and Technology (NLTT) is to develop theories of how information is encoded in natural language and technologies for mapping information to and from natural language representations. This will enable the efficient and intelligent handling of natural language text in critical phases of doc.u.ment processing, such as recognition, summarizing, indexing, fact extraction and presentation, doc.u.ment storage and retrieval, and translation. It will also increase the power and convenience of communicating with machines in natural language.

Based in Cambridge, United Kingdom, and Gren.o.ble, France, The Xerox Research Centre Europe (XRCE) is also a research organization of the international company XEROX, which focuses on increasing productivity in the workplace through new doc.u.ment technologies, with several tools and projects relating to languages.

One of Xerox"s research activities is MultiLingual Theory and Technology (MLTT), to study how to a.n.a.lyze and generate text in many languages (English, French, German, Italian, Spanish, Russian, Arabic, etc.). The MLTT team creates basic tools for linguistic a.n.a.lysis, e.g. morphological a.n.a.lysers, parsing and generation platforms and corpus a.n.a.lysis tools. These tools are used to develop descriptions of various languages and the relation between them. Currently under development are phrasal pa.r.s.ers for French and German, a lexical functional grammar (LFG) for French and projects on multilingual information retrieval, translation and generation.

Founded in 1979, the American a.s.sociation for Artificial Intelligence (AAAI) is a non-profit scientific society devoted to advancing the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines. AAAI also aims to increase public understanding of artificial intelligence, improve the teaching and training of AI pract.i.tioners, and provide guidance for research planners and funders concerning the importance and potential of current AI developments and future directions.

The Inst.i.tut Dalle Molle pour les etudes semantiques et cognitives (ISSCO) (Dalle Molle Inst.i.tute for Semantic and Cognitive Studies) is a research laboratory attached to the University of Geneva, Switzerland, which conducts basic and applied research in computational linguistics (CL), and artificial intelligence (AI). The site gives a presentation of the ISSCO projects (European projects, projects of the Swiss National Science Foundation, projects of the French-speaking community, etc.).

Created by the Foundation Dalle Molle in 1972 for research into cognition and semantics, ISSCO has come to specialize in natural language processing and, in particular, in multilingual language processing, in a number of areas : machine translation, linguistic environments, multilingual generation, discourse processing, data collection, etc. The University of Geneva provides administrative support and infrastructure for ISSCO. The research is funded solely by grants and by contracts with public and private bodies.

ISSCO is multi-disciplinary and multi-national, "drawing its staff and its visitors from the disciplines of computer science, linguistics, mathematics, psychology and philosophy. The long-term staff of the Inst.i.tute is relatively small in number; with a much larger number of visitors coming for stays ranging from a month to two years. This ensures a continual exchange of ideas and encourages flexibility of approach amongst those a.s.sociated with the Inst.i.tute."

The International Conferences on Computational Linguistics (COLINGs) are organized every two years by the International Committee on Computational Linguistics (ICCL).

"The International Committee on Computational Linguistics was set up by David Hays in the mid-Sixties as a permanent body to run international computational linguistics conferences in an original way, with no permanent secretariat, subscriptions or funds. It was ahead of its time in that and other ways. COLING has always been distinguished by pleasant venues and atmosphere, rather than by the clinical efficiency of an airport conference hotel: COLINGs are simply nice conferences to be at. [...] In recent years, the ACL [a.s.sociation for Computational Linguistics] has given great a.s.sistance and cooperation in keeping COLING proceedings available and distributed."

5.3. Language Engineering

Launched in January 1999 by the European Commission, the website HLTCentral (HLT: Human Language Technologies) gives a short definition of language engineering:

"Through language engineering we can find ways of living comfortably with technology. Our knowledge of language can be used to develop systems that recognise speech and writing, understand text well enough to select information, translate between different languages, and generate speech as well as the printed world.

By applying such technologies we have the ability to extend the current limits of our use of language. Language enabled products will become an essential and integral part of everyday life."

A full presentation of language engineering can be found in Language Engineering: Harnessing the Power of Language.

From 1992 to 1998, the Language Engineering Sector was part of the Telematics Applications Programme of the European Commission. Its aim was to facilitate the use of telematics applications and to increase the possibilities for communication in and between European languages. RTD (research and technological development) work focused on pilot projects that integrated language technologies into information and communications applications and services. A key objective was to improve their ease of use and functionality and broaden their scope across different languages.

From January 1999, the Language Engineering Sector has been rebranded as Human Language Technologies (HLT), a sector of the IST Programme (IST: Information Society Technologies) of the European Commission for 1999-2002. HLTCentral has been set up by the LINGLINK Project as the springboard for access to Language Technology resources on the Web: information, news, downloads, links, events, discussion groups and a number of specially-commissioned studies (e-commerce, telecommunications, Call Centres, Localization, etc.).

The Multilingual Application Interface for Telematic Services (MAITS) is a consortium formed to specify an applications programming interface (API) for multilingual applications in the telematic services. A number of telematic applications, such as X.500, WWW, X.400, internet mail and data bases, is planned to be enhanced to use this i18n API, and products are planned to be implemented using the API.

FRANCIL (Reseau francophone de l"ingenierie de la langue) (Francophone Network in Language Engineering) is a programme launched in June 1994 by the Agence universitaire de la francophonie (AUPELF-UREF) (University Agency for Francophony) to strengthen activities in linguistic engineering, particularly for automatic language processing. This quickly-growing sector includes research and development for text a.n.a.lysis and generation, and for speech recognition, comprehension and synthesis. It also includes some applications in the following fields: doc.u.ment management, communication between the human being and the machine, writing aid, and computer-a.s.sisted translation.

5.4. Internationalization and Localization

"Towards communicating on the Internet in any language..." Babel is an Alis Technologies/ Internet Society joint project to internationalize the Internet.

Its multilingual site (English, French, German, Italian, Portuguese, Spanish and Swedish) has two main sections: languages (the world"s languages; typographical and linguistic glossary; Francophonie (French-speaking countries); and the Internet and multilingualism (developing your multilingual Web site; coding the world"s writing).

The Localisation Industry Standards a.s.sociation (LISA) is a main organization for the localization and internationalization industry. The current membership of 130 leading players from all around the world includes software publishers, hardware manufacturers, localization service vendors, and an increasing number of companies from related IT sectors. LISA defines its mission as "promoting the localization and internationalization industry and providing a mechanism and services to enable companies to exchange and share information on the development of processes, tools, technologies and business models connected with localization, internationalization and related topics". Its site is housed and maintained by the University of Geneva, Switzerland.

W3C Internationalization/Localization is part of the World Wide Web Consortium (W3C), an international industry consortium founded in 1994 to develop common protocols for the World Wide Web. The site gives in particular a definition of protocols used for internationalization/localization: HTML; base character set; new tags and attributes; HTTP; language negotiation; URLs & other identifiers including non-ASCII characters; etc. It also offers some help with creating a multilingual site.

Prev List Next