• Dogri Raw Speech Corpus
Dogri Raw Speech Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-DOI-RAW-Speech-133
Sample Download | size: 2MB | type: zip
Added on : 26 Aug 2021

Dataset Description

17:10:26 Hours | 11 GB speech data | 61 Speakers | 12,036 Audio segments | 48 kHz | 16 bit wav. 

 

Dogri, the language of the Dogras, belongs to the Indo-Aryan group and is the first major language of the multi-lingual region i. e. Jammu of the Jammu & Kashmir state. It derives its name from ‘Duggar’ the ancient title of this region. Dogri is a morphologically rich language having the pre-dominant word order of Subject-Object-Verb (SOV) with a flexibility to rearrange the constituents as many Indian languages allow.

Dogri had its own script namely “Dogare Akkhar”or “Dogare” based on Takri script which is closely related to the Sharada script employed by Kashmiri language. This script was the official language script during the regime of Maharaja Ranbir Singh (1857-1885 AD). After the independence, the state government constituted a committee on 29th October, 1953 headed by Sh. Girdhari Lal Dogra. The committee presented a report and accordingly the state government decided to adopt Devanagari as well as Persian script for Dogri and it was incorporated in the State Constitution in 1957. 

 

The LDC-IL speech data is collected from Jammu, from both the genders and different age groups. The LDC-IL Dogri Speech data set consists of different types of datasets that are made up of words, sentences, running texts and date formats.  Each speaker recorded these datasets which are randomly selected from a master dataset. 

 The available Speech Corpus details:


Total Speakers 
61 (30 Female and 31 Male)

 

Domains

Audio Segments

Each Domain

Duration

Contemporary Text (News)

60

4:27:51

Creative Text

61

2:51:42

Sentence

1527

1:24:48

Date Format

122

0:14:07

Command and Control Words

1830

1:24:31

Person Name

1222

1:23:41

Place Name

609

0:29:10

Most Frequent Word - Part

1831

1:18:06

Most Frequent Word - Full Set

2000

1:16:27

Phonetically Balanced

2050

1:50:38

Form and Function - Word

724

0:29:25

 

A detailed explanation of the Dogri Speech Corpus will be available in the Dogri Raw Speech Documentation. 

For any research-based citations, please use the following citations: 

  • Narayan Kumar Choudhary, Sunil Kumar Choudhary, Rajesha N.,ManasaG., 2021. Dogri Raw Speech Corpus.  Central Institute of Indian Languages, Mysore.

  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

    • Speech Data Attributes
      Annotation Raw Speech Corpus
      Language Dogri
      Duration 17:10:26
      Speaker Type Native
      No. of Audio Segment 12036
      Speaker Gender Male and Female

      Write a review

      Please login or register to review

      Tags: Dogri, Raw Speech Corpus, Speech Corpus

      Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.