• Bengali Raw Speech Corpus
Bengali Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-BEN-RAW-Speech-119
Sample Download | size: 1.9MB | type: zip
Added on : 29 Jul 2019

Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family.

LDC-IL Bengali Speech Data set consists of different types of word list along with sentence list, running text and date format. Approximately 15 minutes of speech (per speaker) has been taken from 223 female and 227 male native speakers with different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset. Along with this random set, some full sets are there in the database where the speaker has uttered some full set of words.


Corpus details:

  •          Total number of speakers: 450 random set & 28 full set
  •          Total audio segments: 73399 audio segments
  •          Total duration: 128:40:17 hours
  •          Total volume: 81.7 gigabytes of WAV files and Metadata Txt Files
  •          Age group: 16 to 20, 21 to 50, 51 above
  •          Recording mode: .WAV – 16bit
  •          Sampling frequency: 48.0 Kilohertz
Speech Data Attributes
Annotation Raw Speech Corpus
Language Bengali
Duration 130:11:14
Speaker Type Native
File Size 81.7 GB
No. of Audio Segment 73399
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Bengali, Raw Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.