Hindi Raw Speech Corpus

Contributor: CIIL Mysore
Product Code: CIIL-HIN-RAW-Speech-121

Sample Download | size: 1.4MB | type: zip

Added on : 29 Jul 2019

Hindi is a Major, Indo-Aryan language, a descendant of Sanskrit, which is spoken in the central and northern India.

LDC-IL Hindi speech data of 118 hours. The LDC-IL Hindi Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.

Approximately 15 minutes of speech (per speaker) taken from 234 female and 255 Male native speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

Corpus details:

a total of 489 speakers (234 Female and 255 Male.)
73695 audio segments
78.6 gigabytes of WAV files and Metadata Text Files
118:40:03 hours of speech data

Speech Data Attributes
Annotation	Raw Speech Corpus
Language	Hindi
Duration	118:40:03
Speaker Type	Native
File Size	78.6 GB
No. of Audio Segment	73695
Speaker Gender	Male and Female

Tags: Hindi, Raw Speech Corpus

Write a review