• Hindi Raw Speech Corpus
Hindi Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-HIN-RAW-Speech-121
Sample Download | size: 1.4MB | type: zip
Added on : 29 Jul 2019

Hindi is a Major, Indo-Aryan language, a descendant of Sanskrit, which is spoken in the central and northern India.

LDC-IL Hindi speech data of 118 hours. The LDC-IL Hindi Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.

Approximately 15 minutes of speech (per speaker) taken from 234 female and 255 Male native speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

Corpus details:

  • a total of 489 speakers (234 Female and 255 Male.)
  • 73695 audio segments
  • 78.6 gigabytes of WAV files and Metadata Text Files
  • 118:40:03 hours of speech data
Speech Data Attributes
Annotation Raw Speech Corpus
Language Hindi
Duration 118:40:03
Speaker Type Native
File Size 78.6 GB
No. of Audio Segment 73695
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Hindi, Raw Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.