• Assamese Raw Speech Corpus
Assamese Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-ASM-RAW-Speech-132
Sample Download | size: 1.3MB | type: zip
Added on : 26 Aug 2021

Dataset Description 

54:21:12 Hours | 32.5 GB | 304 Speakers | 37,570 Audio Segments | 48 kHz | 16 bit wav. 

Assamese is the official language of AssamIts linguistic presence is widely presented in the state of Assam and some parts of Arunachal Pradesh and Nagaland.According to 2011 census, the Assamese Language is spoken by 15 million speakers.Assamese a widely spoken language does encounter several dialectal variations. The regional dialects can be broadly divided into two parts - the Eastern Group and the Western Group.LDC-IL divided the Assamese speaking areas into these four regions Xiboxagoria, Central Assam, Kamrupi, Goalparia and have collected speech data from each speaker. LDC-IL Assamese Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.


The available Speech Corpus details:

Total Speakers 304 (154 Female and 150 Male)


Domains

Audio Segments

Each Domain

Duration

Contemporary Text (News)

304

17:23:25

Creative Text

304

11:44:37

Sentence

7593

5:55:29

Date Format

599

0:33:59

Command and Control Words

9118

4:56:49

Person Name

6081

5:38:07

Place Name

3044

1:58:33

Phonetically Balanced-W4

6567

3:41:45

Form and Function-

Word-W5

3960

2:28:28


A detailed explanation of the Assamese Speech Corpus will be available in the Assamese Speech Data Documentation. 

For any research-based citations, please use the following citations: 

  • Ramamoorthy L., Narayan Kumar Choudhary, Atreyee Sharma, Jahnobi Kalita, Samhita Bharadwaj, Plabita Bora, Priyanshee Adhyapak, Mustafiza Tamim, Rajesha N., Manasa G..  2021. Assamese Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.
Speech Data Attributes
Annotation Raw Speech Corpus
Language Assamese
Duration 54:21:18
Speaker Type Native
No. of Audio Segment 37,570
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Assamese, Raw Speech Corpus, Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.