About Oriental COCOSDA

Promoting speech research and coordination of spoken language corpora for Oriental languages since 1997

What is Oriental COCOSDA?

Oriental COCOSDA (O-COCOSDA) originally is the Oriental branch of COCOSDA, which stands for the International Committee for the Coordination and Standardisation of Speech Databases and Assessment Techniques.

Established in 1997, its primary goal is to foster idea exchange, share insights, and discuss regional matters related to the creation, use, and distribution of spoken language corpora for Oriental languages.

Now O-COCOSDA is independent, with minimal ties to COCOSDA or other regional groups. Additionally, O-COCOSDA focuses on assessing speech recognition and synthesis systems while promoting speech research in Oriental languages.

๐ŸŒ Conference History

The annual Oriental COCOSDA International Conference is the flagship event of O-COCOSDA.

The first preparatory meeting took place in Hong Kong in 1997, and since then, 27 workshops have been hosted in various countries, including:

๐Ÿ‡ฏ๐Ÿ‡ต Japan
๐Ÿ‡น๐Ÿ‡ผ Taiwan
๐Ÿ‡จ๐Ÿ‡ณ China
๐Ÿ‡ฐ๐Ÿ‡ท Korea
๐Ÿ‡น๐Ÿ‡ญ Thailand
๐Ÿ‡ธ๐Ÿ‡ฌ Singapore
๐Ÿ‡ฎ๐Ÿ‡ณ India

๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesia
๐Ÿ‡ฒ๐Ÿ‡พ Malaysia
๐Ÿ‡ป๐Ÿ‡ณ Vietnam
๐Ÿ‡ณ๐Ÿ‡ต Nepal
๐Ÿ‡ฒ๐Ÿ‡ด Macau
๐Ÿ‡ฒ๐Ÿ‡ฒ Myanmar
๐Ÿ‡ต๐Ÿ‡ญ Philippines

๐ŸŽฏ Background & Purpose

It has been well understood that it is necessary to collect and maintain large amounts of speech data of various kinds, allowing unrestricted access so that they can be utilized for research and development as well as for recognizer performance assessment.

Why Speech Corpora Matter

๐Ÿ”ฌ Research Repeatability: Utilization of common speech corpora increases repeatability and objectivity of speech research

๐ŸŒ Cultural Preservation: From the linguistic or cultural viewpoint, it is necessary and important to preserve speech data of various languages, especially those that are becoming extinct

โฐ Urgency: Many local languages or dialects are disappearing by the day

Hence there is a pressing need to preserve natural record of such languages. This is another important purpose of speech databases.

Speech Corpora Necessity

Figure: The necessity and purpose of speech corpora

๐ŸŽฏ Our Missions

O-COCOSDA supports the development of spoken language resources and speech technology evaluation.

Resource Development

Promoting the development of distinctive types of spoken language data corpora for the purpose of building and/or evaluating current or future spoken language technology.

Research Coordination

Offering coordination of projects and research efforts to improve their efficiency.

๐Ÿ“‹ Strategy

Technical interests are organized on both country and topical basis.

Country Representation

Each country is represented on the central committee by country rapporteurs.

Topic Domains

Each agreed topic domain is represented by a topic domain rapporteur.

Synergy

Interaction between regional and topical rapporteurs provides the basis for promotion and coordination activities informed by both local and global expertise.

๐Ÿ”ฌ Topic Domains

O-COCOSDA supports the development of new topic domains based on technological needs.

Current Topic Areas

Our focus areas include:

๐ŸŽค Speech recognition
๐Ÿ—ฃ๏ธ Speech synthesis
๐Ÿท๏ธ Speech classification
๐Ÿ“š Speech corpora
๐Ÿ”ง Corpus annotation tools
๐ŸŒ Local languages

Criteria for New Domains

A new topic domain is warranted by a new speech technology application ONLY if that application places new demands on:

๐Ÿ“Š Data corpora form and structure

๐Ÿงช Technology evaluation approaches

Open Documentation

We’re open to topic domains that relate to the formal documentation of spoken language without reference to any specific technological application.

Avoiding Redundancy

If redundancy is seen between a new topic domain proposal and an existing one, a combined topic domain will be considered.