About Oriental COCOSDA

Promoting speech research and coordination of spoken language corpora for Oriental languages since 1997

What is Oriental COCOSDA?

Oriental COCOSDA (O-COCOSDA) originally is the Oriental branch of COCOSDA, which stands for the International Committee for the Coordination and Standardisation of Speech Databases and Assessment Techniques.

Oriental Cocosda was set up at the beginning of the 1990’s to enable people concerned with spoken language processing to exchange ideas, share information and discuss regional matters on every issues related to the creation, the utilization and the dissemination of spoken language corpora of oriental languages, and also to launch the design of assessment methods of speech recognition/synthesis systems as well as to promote speech research on oriental languages.

Established independently in 1997, its primary goal is to foster idea exchange, share insights, and discuss regional matters related to the creation, use, and distribution of spoken language corpora for Oriental languages.

Now O-COCOSDA is an independent organization to promote speech research in Asia, with minimal ties to COCOSDA or other regional groups. Additionally, O-COCOSDA focuses on assessing speech technology like speech recognition and synthesis systems for oriental languages.

🌏 Conference History

The annual Oriental COCOSDA International Conference is the flagship event of O-COCOSDA.

The first preparatory meeting took place in Hong Kong in 1997, and since then, 27 workshops have been hosted in various countries, including:

🇯🇵 Japan
🇹🇼 Taiwan
🇨🇳 China
🇰🇷 Korea
🇹🇭 Thailand
🇸🇬 Singapore
🇮🇳 India

🇮🇩 Indonesia
🇲🇾 Malaysia
🇻🇳 Vietnam
🇳🇵 Nepal
🇲🇴 Macau
🇲🇲 Myanmar
🇵🇭 Philippines

🎯 Background & Purpose

It has been well understood that it is necessary to collect and maintain large amounts of speech data of various kinds, allowing unrestricted access so that they can be utilized for research and development as well as for recognizer performance assessment.

Why Speech Corpora Matter

🔬 Research Repeatability: Utilization of common speech corpora increases repeatability and objectivity of speech research

🌍 Cultural Preservation: From the linguistic or cultural viewpoint, it is necessary and important to preserve speech data of various languages, especially those that are becoming extinct

⏰ Urgency: Many local languages or dialects are disappearing by the day

Hence there is a pressing need to preserve natural record of such languages. This is another important purpose of speech databases.

Speech Corpora Necessity

Figure: The necessity and purpose of speech corpora

🎯 Our Missions

O-COCOSDA supports the development of spoken language resources and speech technology evaluation.

Resource Development

Promoting the development of distinctive types of spoken language data corpora for the purpose of building and/or evaluating current or future spoken language technology.

Research Coordination

Offering coordination of projects and research efforts to improve their efficiency.

📋 Strategy

Technical interests are organized on both country and topical basis.

Country Representation

Each country is represented on the central committee by country rapporteurs.

Topic Domains

Each agreed topic domain is represented by a topic domain rapporteur.

Synergy

Interaction between regional and topical rapporteurs provides the basis for promotion and coordination activities informed by both local and global expertise.

🔬 Topic Domains

O-COCOSDA supports the development of new topic domains based on technological needs.

Current Topic Areas

Our focus areas include:

🎤 Speech recognition
🗣️ Speech synthesis
🏷️ Speech classification
📚 Speech corpora
🔧 Corpus annotation tools
🌏 Local languages

Criteria for New Domains

A new topic domain is warranted by a new speech technology application ONLY if that application places new demands on:

📊 Data corpora form and structure

🧪 Technology evaluation approaches

Open Documentation

We’re open to topic domains that relate to the formal documentation of spoken language without reference to any specific technological application.

Avoiding Redundancy

If redundancy is seen between a new topic domain proposal and an existing one, a combined topic domain will be considered.