• Odia Raw Speech Corpus
Odia Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-ORI-RAW-Speech-137
Sample Download | size: 1.4MB | type: zip
Added on : 27 Aug 2021

Dataset Description 

138:06:18 hours |  89 GB | 474 Speakers | 73,418 Audio segments | 48 kHz | 16 bit wav.Odia is an Indo-Aryan language; which is mainly spoken in the state of Odisha and also in some of the border states like West Bengal, Jharkhand, Chhatisgarh and Andhra Pradesh. It is designated with Classical Language Status by the Govt. of India. The LDC-IL Odia speech data is collected from the Central and Northern parts of Odisha from both the genders and different age groups. This data consists of different types of datasets that are made up of word lists, sentences include running texts and date formats.



The available Speech Corpus details:

Total Speakers 474 (239 Female and 235 Male)



Domains

Audio Segments

Each Domain

Duration

Contemporary Text (News)

449

42:49:56

Creative Text

450

19:43:50

Sentence

11,248

8:22:57

Date Format

900

1:27:49

Command and Control Words

13,499

14:18:49

Person Name

8,998

5:01:40

Place Name

4,496

13:22:45

Most Frequent Word - Part

8,994

9:40:04

Most Frequent Word - Full Set

10,989

10:21:04

Phonetically Balanced

10,438

10:05:10

Form and Function - Word

2,957

2:52:14



A detailed explanation of the Bengali Speech Corpus will be available in the Odia Raw Speech Data Documentation. 

For any research-based citations, please use the following citations: 

  •  Ramamoorthy, L., Narayan Choudhary, Raja Kumar Naik, Pramod Kumar Rout, Kshirod Kumar Das & Santosh Kumar Mohanty. 2021. Odia Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

 

Speech Data Attributes
Annotation Raw Speech Corpus
Language Odia
Duration 138:06:18
Speaker Type Native
No. of Audio Segment 73,418
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Odia, Raw Speech Corpus, Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.