ML: "How do you see multilingualism on the Web?"
BL: "Although English is still the most important language used on the Web, and the Internet in general, I believe that multilingualism is an inevitable part of the future direction of cybers.p.a.ce.
Here are some of the important developments that I see as making a multilingual Web become a reality:
a) Popularization of information technology
Computer technology has traditionally been the sole domain of a "techie" elite, fluent in both complex programming languages and in English -- the universal language of science and technology. Computers were never designed to handle writing systems that couldn"t be translated into ASCII. There wasn"t much room for anything other than the 26 letters of the English alphabet in a coding system that originally couldn"t even recognize acute accents and umlauts -- not to mention nonalphabetic systems like Chinese.
But tradition has been turned upside down. Technology has been popularized. GUIs (graphical user interfaces) like Windows and Macintosh have hastened the process (and indeed it"s no secret that it was Microsoft"s marketing strategy to use their operating system to make computers easy to use for the average person).
These days this ease of use has spread beyond the PC to the virtual, networked s.p.a.ce of the Internet, so that now nonprogrammers can even insert Java applets into their webpages without understanding a single line of code.
b) Compet.i.tion for a chunk of the "global market" by major industry players
An extension of (local) popularization is the export of information technology around the world. Popularization has now occurred on a global scale and English is no longer necessarily the lingua franca of the user. Perhaps there is no true lingua franca, but only the individual languages of the users. One thing is certain -- it is no longer necessary to understand English to use a computer, nor it is necessary to have a degree in computer science.
A pull from non-English-speaking computer users and a push from technology companies competing for global markets has made localization a fast growing area in software and hardware development. This development has not been as fast as it could have been. The first step was for ASCII to become Extended ASCII. This meant that computers could begin to start recognizing the accents and symbols used in variants of the English alphabet -- mostly used by European languages.
But only one language could be displayed on a page at a time.
c) Technological developments
The most recent development is Unicode. Although still evolving and only just being incorporated into the latest software, this new coding system translates each character into 16 bytes. Whereas 8 byte Extended ASCII could only handle a maximum of 256 characters, Unicode can handle over 65,000 unique characters and therefore potentially accommodate all of the world"s writing systems on the computer.
So now the tools are more or less in place. They are still not perfect, but at last we can at least surf the Web in Chinese, j.a.panese, Korean, and numerous other languages that don"t use the Western alphabet. As the Internet spreads to parts of the world where English is rarely used -- such as China, for example, it is natural that Chinese, and not English, will be the preferred choice for interacting with it. For the majority of the users in China, their mother tongue will be the only choice.
There is a change-over period, of course. Much of the technical terminology on the Web is still not translated into other languages. And as we found with our Multilingual Glossary of Internet Terminology -- known as NetGlos -- the translation of these terms is not always a simple process. Before a new term becomes accepted as the "correct" one, there is a period of instability where a number of competing candidates are used. Often an English loanword becomes the starting point -- and in many cases the endpoint. But eventually a winner emerges that becomes codified into published technical dictionaries as well as the everyday interactions of the nontechnical user. The latest version of NetGlos is the Russian one and it should be available in a couple of weeks or so [end of September 1998]. It will no doubt be an excellent example of the ongoing, dynamic process of "Russification" of Web terminology.
d) Linguistic democracy
Whereas "mother-tongue education" was deemed a human right for every child in the world by a UNESCO report in the early "50s, "mother-tongue surfing" may very well be the Information Age equivalent. If the Internet is to truly become the Global Network that it is promoted as being, then all users, regardless of language background, should have access to it. To keep the Internet as the preserve of those who, by historical accident, practical necessity, or political privilege, happen to know English, is unfair to those who don"t.
e) Electronic commerce
Although a multilingual Web may be desirable on moral and ethical grounds, such high ideals are not enough to make it other than a reality on a small-scale. As well as the appropriate technology being available so that the non-English speaker can go, there is the impact of "electronic commerce" as a major force that may make multilingualism the most natural path for cybers.p.a.ce.
Sellers of products and services in the virtual global marketplace into which the Internet is developing must be prepared to deal with a virtual world that is just as multilingual as the physical world. If they want to be successful, they had better make sure they are speaking the languages of their customers!"
ML: "What did the Internet bring to the life of your organization?"
BK: "Our main service is providing language instruction via the Web. Our company is in the unique position of having come into existence BECAUSE of the Internet!"
ML: "How do you see the future of Internet-related activities as regards languages?"
BK: "As a company that derives its very existence from the importance attached to languages, I believe the future will be an exciting and challenging one. But it will be impossible to be complacent about our successes and accomplishments.
Technology is already changing at a frenetic pace. Life-long learning is a strategy that we all must use if we are to stay ahead and be compet.i.tive. This is a difficult enough task in an English-speaking environment. If we add in the complexities of interacting in a multilingual/multicultural cybers.p.a.ce, then the task becomes even more demanding. As well as compet.i.tion, there is also the necessity for cooperation -- perhaps more so than ever before."
The seeds of cooperation across the Internet have certainly already been sown.
Our NetGlos Project has depended on the goodwill of volunteer translators from Canada, U.S., Austria, Norway, Belgium, Israel, Portugal, Russia, Greece, Brazil, New Zealand and other countries. I think the hundreds of visitors we get coming to the NetGlos pages everyday is an excellent testimony to the success of these types of working relationships. I see the future depending even more on cooperative relationships -- although not necessarily on a volunteer basis."
3.4. Textual Databases
Let us take the example of two textual databases relating to the French language -- the French FRANTEXT and the US-French ARTFL Project.
The FRANTEXT textual database has been available on the Web through subscription since the beginning of 1995. It is prepared in France by the Inst.i.tut national de la langue francaise (INaLF) (National Inst.i.tute of the French Language), a section of the Centre national de la recherche scientifique (CNRS) (National Center for Scientific Research). This interactive database includes 180 million words resulting from the automatic processing of a collection of 3,500 texts in arts, techniques and sciences, representing five centuries of literature (16th-20th centuries).
At the beginning of 1998, 82 research centers and university libraries in Europe, Australia, Canada and j.a.pan were subscribing to FRANTEXT, with 1,250 work stations connected to the database, and about 50 questioning sessions per day. The detailed results of the inquiry sent to FRANTEXT users in January 1998 are presented on the website by Arlette Attali.
In the future, Arlette Attali is thinking about "contributing to the development of the linguistic tools a.s.sociated to the FRANTEXT database and getting teachers, researchers and students to know them." In her e-mail of June 11, 1998, she also explained the changes brought by the Internet in her professional life:
"As I was more specially a.s.signed to the development of textual databases at the INaLF, I had to explore the websites giving access to electronic texts and test them. I became a "textual tourist" with the good and bad sides of this activity.
The tendency to go quickly from one link to another, and to skip through the information, was a permanent danger -- it is necessary to target what you are looking for if you don"t want to lose your time. The use of the Web totally changed my working methods -- my investigations are not only bookish and within a narrow circle anymore, on the contrary they are expanding thanks to the electronic texts available on the Internet."
The ARTFL Project (ARTFL: American and French Research on the Treasury of the French Language) is a cooperative project established in 1981 by the Inst.i.tut national de la langue francaise (INaLF) (National Inst.i.tute of the French Language, based in France) and the Division of the Humanities of the University of Chicago. Its purpose is to be a research tool for scholars and students in all areas of French studies.
The origin of the project is a 1957 initiative of the French government to create a new dictionary of the French language, the Tresor de la Langue Francaise (Treasure of the French Language). In order to provide access to a large body of word samples, it was decided to transcribe an extensive selection of French texts for use with a computer. Twenty years later, a corpus totaling some 150 million words had been created, representing a broad range of written French -- from novels and poetry to biology and mathematics -- stretching from the 17th to the 20th centuries.
This corpus of French texts was an important resource not only for lexicographers, but also for many other types of humanists and social scientists engaged in French studies -- on both sides of the Atlantic. The result of this realization was the ARTFL Project, as explained on its website:
"At present the corpus consists of nearly 2,000 texts, ranging from cla.s.sic works of French literature to various kinds of non-fiction prose and technical writing. The eighteenth, nineteenth and twentieth centuries are about equally represented, with a smaller selection of seventeenth century texts as well as some medieval and Renaissance texts. We have also recently added a Provencal database that includes 38 texts in their original spellings. Genres include novels, verse, theater, journalism, essays, correspondence, and treatises.
Subjects include literary criticism, biology, history, economics, and philosophy. In most cases standard scholarly editions were used in converting the text into machine-readable form, and the data contain page references to these editions."
One of the largest of its kind in the world, the ARTFL database permits both the rapid exploration of single texts, and the inter-textual research of a kind.
ARTFL is now on the Web, and the system is available through the Internet to its subscribers. Access to the database is organized through a consortium of user inst.i.tutions, in most cases universities and colleges which pay an annual subscription fee.
The ARTFL Encyclopedie Project is currently developing an on-line version of Diderot and d"Alembert"s Encyclopedie, ou Dictionnaire raisonne des sciences, des arts et des metiers, including all 17 volumes of text and 11 volumes of plates from the first edition, that is to say about 18,000 pages of text and exactly 20,736,912 words.
Published under the direction of Diderot between 1751 and 1772, the Encyclopedie counted as contributors the most prominent philosophers of the time: Voltaire, Rousseau, d"Alembert, Marmontel, d"Holbach, Turgot, etc.
"These great minds (and some lesser ones) collaborated in the goal of a.s.sembling and disseminating in clear, accessible prose the fruits of acc.u.mulated knowledge and learning. Containing 72,000 articles written by more than 140 contributors, the Encyclopedie was a ma.s.sive reference work for the arts and sciences, as well as a machine de guerre which served to propagate Enlightened ideas [...] The impact of the Encyclopedie was enormous, not only in its original edition, but also in multiple reprintings in smaller formats and in later adaptations. It was hailed, and also persecuted, as the sum of modern knowledge, as the monument to the progress of reason in the eighteenth century. Through its attempt to cla.s.sify learning and to open all domains of human activity to its readers, the Encyclopedie gave expression to many of the most important intellectual and social developments of its time."
At present, while work continues on the fully navigational, full-text version, ARTFL is providing public access on its website to the Prototype Demonstration of Volume One. From Autumn 1998 a preliminary version is released for consultation by all ARTFL subscribers.
Mentioned on the ARTFL home page in the Reference Collection, other ARTFL projects are: the 1st (1694) and 5th (1798) editions of the Dictionnaire de L"Academie francaise; Jean Nicot"s Tresor de la langue francaise (1606) Dictionary; Pierre Bayle"s Dictionnaire historique et critique (1740 edition) (text of an image-only version); The Wordsmyth English Dictionary-Thesaurus; Roget"s Thesaurus, 1911 edition; Webster"s Revised Unabridged Dictionary; the French Bible by Louis Segond and parallel Bibles in German, Latin, and English, etc.
Created by Michael S. Hart in 1971, the Project Gutenberg was the first information provider on the Internet. It is now the oldest digital library on the Web, and the biggest considering the number of works (1,500) which has been digitalized for it, with 45 new t.i.tles per month. Michael Hart"s purpose is to put on the Web as many literary texts as possible for free.
In his e-mail of August 23, 1998, Michael S. Hart explained:
"We consider e-text to be a new medium, with no real relationship to paper, other than presenting the same material, but I don"t see how paper can possibly compete once people each find their own comfortable way to e-texts, especially in schools. [...] My own personal goal is to put 10,000 e-texts on the Net, and if I can get some major support, I would like to expand that to 1,000,000 and to also expand our potential audience for the average e-text from 1.x% of the world population to over 10%... thus changing our goal from giving away 1,000,000,000,000 e-texts to 1,000 time as many... a trillion and a quadrillion in US terminology."
Project Gutenberg is now developing its foreign collections, as announced in the Newsletter of October 1997. In the Newsletter of March 1998, Michael S. Hart mentioned that Project Gutenberg"s volunteers were now working on e-texts in French, German, Portuguese and Spanish, and he was also hoping to get some e-texts in the following languages: Arabic, Chinese, Danish, Dutch, Esperanto, Greek, Hebrew, Hungarian, Italian, j.a.panese, Korean, Latin, Lithuanian, Polish, Romanian, Russian, Slovak, Slovene, and Valencian (Catalan).
3.5. Terminological Databases
The free consultation of terminological databases on the Web is much appreciated by language specialists. There are some terminological databases maintained by international organizations, such as Eurodicautom, maintained by the Translation Service of the European Commission; ILOTERM, maintained by the International Labour Organization (ILO), the ITU Telecommunication Terminology Database (TERMITE), maintained by the International Telecommunication Union (ITU) and the WHO Terminology Information System (WHOTERM), maintained by the World Health Organization (WHO).