• Indian English Raw Speech Corpus - Kannada Variant
Indian English Raw Speech Corpus - Kannada Variant
  • Contributor: CIIL Mysore
  • Product Code: CIIL-KAN-RAW-Speech-141
Sample Download | size: 2.8MB | type: zip
Added on : 27 Aug 2021

Dataset Description

23:43:04 Hours | 15.3 GB | 56 Speakers| 14,455 Audio Segments | 48 kHz | 16 bit wav. 

English language is a blend of Anglo-Saxon which is the prominent language of Britain in middle ages. It has been propagated to every corner of the world by colonists. English emerges as the most visible legacy of British in India because India was under British raj for almost two centuries and English is a part of education system here. Most of the states in India use their regional languages and do not have a common language to communicate. So English is used for inter-state communication.

LDC-IL has 23 hours Indian English – Kannada Variant speech data. The LDC-IL Indian English Speech data set consists of different types of datasets that are made up of word lists, sentences, texts and date formats. Approximately 15 minutes of speech (per speaker) has taken from 29 female and 27 Male from Kannada mother tongue speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

The available Speech Corpus details: 


Total Speakers 56 (29 Female and 27 Male)


Domains

Audio Segments

Each Domain Duration

Contemporary Text (News)

52

7:19:31

Creative Text

58

3:57:15

Sentence

1522

1:54:10

Date Format

106

0:04:32

Command and Control Words

2543

1:55:43

Person Name

2040

0:39:43

Place Name

762

2:38:49

Most Frequent Word - Part

1563

1:09:10

Most Frequent Word - Full Set

3999

2:49:55

Phonetically Balanced

1194

0:49:21

Form and Function - Word

616

0:24:55



A detailed explanation of the Indian English Raw Speech Corpus - Kannada Variant will be available in the Indian English Raw Speech Corpus - Kannada Variant Documentation. 

For any research-based citations, please use the following citations: 

  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.
Speech Data Attributes
Annotation Raw Speech Corpus
Language Kannada
Duration 23:43:04
Speaker Type Native
No. of Audio Segment 14455
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Indian English, Raw Speech Corpus, Kannada Variant, Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.