• Punjabi Raw Speech Corpus
Punjabi Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-PAN-RAW-Speech-129
Sample Download | size: 3MB | type: zip
Added on : 29 Jul 2019

Punjabi is one of the Indo-Aryan Language. Punjabi is a tonal language it has three tones, high-falling, low-rising, and level (neutral). 

101:09:28 hours of Punjabi speech data | 76,240 audio segments | 467 speakers | 65.5 GB | 48 kHz | 16 bit wav

LDC-IL Punjabi speech data of 101 hours. The LDC-IL Punjabi Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.

Speech recordings taken from 234 female and 233 Male native speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

Corpus details:

    •          Total of 467 speakers (234 Female and 233 Male.)
    •          Contemporary Text (News) - 448 Audio Segments - 27:07:41 hours
    •          Created Text - 446 Audio Segments - 19:29:15 hours
    •          Date - 887 Audio Segments - 00:27:53 hours
    •          Sentence– 11,168 Audio Segments - 08:58:33 hours
    •          Command and Control Words– 13,274 Audio Segments - 07:49:16 hours
    •          Place Name– 4,473 Audio Segments - 03:17:02 hours
    •          Person Names – 8,949 Audio Segments - 10:28:40 hours
    •          Most Frequent Word-Part– 8,889 Audio Segments - 05:21:56 hours
    •          Most Frequent Word-Full– 3,988 Audio Segments - 02:52:44 hours
    •          Phonetically Balanced Vocabulary– 13,939 Audio Segments - 08:56:04 hours 
    •          Form and Function Word– 9,779 Audio Segments - 06:24:07 hours
Speech Data Attributes
Annotation Raw Speech Corpus
Language Punjabi
Duration 101:09:28
Speaker Type Native
File Size 65.5 GB
No. of Audio Segment 76240
Speaker Gender Male and Female

Write a review

Please login or register to review

Tags: Punjabi, Raw Speech Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.