• Bodo Raw Speech Corpus
Bodo Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-BRX-RAW-Speech-120
Sample Download | size: 2MB | type: zip
Added on : 29 Jul 2019

176:53:28 hours of 113 Gigabytes speech data | 456 Speakers | 77443 Audio segments | 48 kHz | 16 bit wav

Bodo, one of the scheduled language of India, is one of the Tonal languages of the world. There are two clearly distinguishable kinds of tones in Bodo which are known as Low and High. The language belongs to the Tibeto Burmese linguistic family. It is the language of Bodos, which are the major tribes of Indian State of Assam.

The LDC-IL speech data is collected from the regions of Chirang, Baksa Sonitpur Udalguri, Kamrup, Barpeta, Udalguri, Kokrajhar districts of Assam State of India which covers Bwrdwnari, Eastern, and Standard dialects. The data is collected from both the genders and different age group.

The LDC-IL Bodo Speech data set consists of different types of datasets that are made up of word lists, sentences running texts and date formats.

The available Speech Corpus details:

    •          Total of 456 speakers (220 Female and 236 Male.)
    •          Contemporary Text (News) - 411 Audio Segments - 53:47:56 Hours
    •          Creative Text - 413 Audio Segments - 26:43:07 Hours
    •          Sentence - 10257 Audio Segments - 9:38:58 Hours
    •          Date - 938 Audio Segments - 1:16:54 Hours
    •          Command and Control Words - 12348 Audio Segments - 14:19:32 Hours
    •          Person Name - 8222 Audio Segments - 14:49:44 Hours
    •          Place Name - 4115 Audio Segments - 05:17:14 Hours
    •          Most Frequent Word-Part - 12397 Audio Segments - 14:34:05 Hours
    •          Most Frequent Word-Full - 15999 Audio Segments - 20:07:33 Hours
    •          Phonetically Balanced - 5960 Audio Segments - 7:50:00 Hours
    •          Form and Function Word - 6383 Audio Segments - 8:28:25 Hours
Speech Data Attributes
Annotation Raw Speech Corpus
Language Bodo
Duration 176:53:28
Speaker Type Native
File Size 113 GB
No. of Audio Segment 77443
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Boro, Bodo, Raw Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.