Towards a WELD Charter: a vision for language documentation
A Charter for the WELD paradigm would include at least the following five benchmark principles of comprehensiveness, efficiency, state of the art, affordability and fairness:
Language documentation must be comprehensive. In principle this means that language documentation must apply to all languages. But economy is a component of efficiency, and priorities must be set which may be hard to justify in social or political terms: if a language is more similar to a well-documented language than another language is, then the priority must be with the second.
Language documentation must be efficient. Simple, workable, efficient and inexpensive enabling technologies must be developed, and new applications for existing technologies created, which will empower local academic communities to multiply the human resources available for the task. A model of this kind of development is provided by the Simputer ("Simple Computer") handheld Community Digital Assistant (CDA) enterprise of the "Bangalore Seven" in India (see http://www.simputer.org/), which could easily be incorporated into Eurpean and US project funding.
Language documentation must be state-of-the-art. In addition to using modern exchange formats and compatibility enhancing archiving technologies such as XML and schema languages, efficient language documentation requires the deployment of state of the art techniques from computational linguistics, human language technologies and artificial intelligence, for instance by the use of machine learning techniques for lexicon construction and grammar induction. The SIL organisation, for example, has a long history of application of advanced computational linguistic methodologies (see www.sil.org), and more research is needed here.
Language documentation must be affordable.In order to achieve a multiplier effect, and at the same time benefit education, research and development world-wide, local conditions must be taken into account. Traditional colonial policies of presenting "white elephants" to local communities which must be expensively cared for and then rapidly become dysfunctional, must be replaced by inexpensive dissemination methods - at third world Internet prices, it can cost hundreds of Euros to download a large, modern software package (not counting landline interruptions), and net-based registration and support is unthinkably costly, as is wireless data transfer.
Language documentation must be fair. If a language community shares its most valuable commodity, its language, with the rest of the world, then the human language engineering and computational linguistic communities must do likewise, and provide open source software (also to reap the other well-known potential benefits of open source software such as transparency and reliability). The Simputer Public Licence for hardware and the Gnu Public Licence for software are useful references. The development and deployment of proprietary software (and hardware for that matter) and closed websites in this topic domain is a form of exploitation which is ethically comparable to other forms of one-way exploitation in biology and geology, for example in medical ethnobotany and oil prospecting.[cont]
Showing posts with label Language Documentation. Show all posts
Showing posts with label Language Documentation. Show all posts
Tuesday, November 25, 2008
Monday, November 17, 2008
World Centre for Language Documentation
Official Launch of the World Centre for Language Documentation
16-05-2007 (Paris)
Official Launch of the World Centre for Language Documentation
Debbie Garside, CEO, WLDC
© UNESCO
The World Language Documentation Centre (WLDC), which comprises world-renowned experts in language technologies, linguistics, terminology standardisation, and localisation, was officially launched on 9 May 2007 at the offices of UNESCO in Paris.
The aims and objectives of the WLDC are wide and far-reaching and include the promotion of multilingualism in cyberspace and the maintenance and sustainability of the wealth of information about the languages of the world. Developed countries may think of the Web as ubiquitous, but there is a distinct lack of content in a majority of the world's languages.
The predominance of use of the English language in readable Web content is gradually being suppressed, but as a variety of studies have demonstrated the Web does not present a reliable surrogate for the use of languages in the world.
In a number of cases, this is because the capability for representing these languages and the variety within these languages is lacking.
The launch of a World Centre is due, in part, to a significant expansion to a series of international standards that are fundamental to a number of information systems and the need to encapsulate a broad range of linguistic and technical expertise.
The International Organization for Standardization (ISO) publishes the standards that result in identifiers, referred to by some as metadata such as “en” and “fr” being used in computer systems to stand for “English” and “French, respectively.
Some web search engines allow users to specialize their searches to pages that are using these language identifiers, Accoona for example.
Until this year, there were about 400 such identifiers in ISO standards; early in 2007 this number was expanded to over 7,500, and 2008 is expected to see this number expand way beyond 30,000.
The reason for this significant expansion is to allow for the identification of languages in all their written, spoken and signed varieties.
Until now, ISO standards have only catered for a small proportion of languages.
These new ISO standards provide for the ability to index and retrieve the potential content of a truly diverse and multilingual information society and for the future development of technologies with greater language-targeting features.
Work is already in progress in the Internet community through the Internet Engineering Task Force (IETF) to make use of these emerging standards and discussions are already underway in relation to the so-called "Multilingual Internet" - described by some as a major element of the Next Generation Internet.
Official Launch of the World Centre for Language Documentation Participants during the meeting
© UNESCO
Related themes/countries
· France
· Multilingualism in Cyberspace
16-05-2007 (Paris)
Official Launch of the World Centre for Language Documentation
Debbie Garside, CEO, WLDC
© UNESCO
The World Language Documentation Centre (WLDC), which comprises world-renowned experts in language technologies, linguistics, terminology standardisation, and localisation, was officially launched on 9 May 2007 at the offices of UNESCO in Paris.
The aims and objectives of the WLDC are wide and far-reaching and include the promotion of multilingualism in cyberspace and the maintenance and sustainability of the wealth of information about the languages of the world. Developed countries may think of the Web as ubiquitous, but there is a distinct lack of content in a majority of the world's languages.
The predominance of use of the English language in readable Web content is gradually being suppressed, but as a variety of studies have demonstrated the Web does not present a reliable surrogate for the use of languages in the world.
In a number of cases, this is because the capability for representing these languages and the variety within these languages is lacking.
The launch of a World Centre is due, in part, to a significant expansion to a series of international standards that are fundamental to a number of information systems and the need to encapsulate a broad range of linguistic and technical expertise.
The International Organization for Standardization (ISO) publishes the standards that result in identifiers, referred to by some as metadata such as “en” and “fr” being used in computer systems to stand for “English” and “French, respectively.
Some web search engines allow users to specialize their searches to pages that are using these language identifiers, Accoona for example.
Until this year, there were about 400 such identifiers in ISO standards; early in 2007 this number was expanded to over 7,500, and 2008 is expected to see this number expand way beyond 30,000.
The reason for this significant expansion is to allow for the identification of languages in all their written, spoken and signed varieties.
Until now, ISO standards have only catered for a small proportion of languages.
These new ISO standards provide for the ability to index and retrieve the potential content of a truly diverse and multilingual information society and for the future development of technologies with greater language-targeting features.
Work is already in progress in the Internet community through the Internet Engineering Task Force (IETF) to make use of these emerging standards and discussions are already underway in relation to the so-called "Multilingual Internet" - described by some as a major element of the Next Generation Internet.
Official Launch of the World Centre for Language Documentation Participants during the meeting
© UNESCO
Related themes/countries
· France
· Multilingualism in Cyberspace
Subscribe to:
Posts (Atom)