Resources

Here the term Resources refers to a set of speech or language data and descriptions in machine readable form, for the purpose of building, improving or evaluating natural language and speech algorithms or systems.

Refine Search


NLTM Pilot TTS Data for Indian Languages — Hindi, Punjabi, Tamil, and Indian English.

NLTM Pilot TTS Data for Indian Languages — Hindi, Punjabi, Tamil, and Indian English.

TTS data for Indian languages — Hindi, Punjabi, Tamil, and Indian English. Text and corresponding speech data record in studio environment...

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 423.2MB | type: zip

Added on : 16 Aug 2021

Indian English ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

Indian English ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

The data set comprises of English read and conversational speech data along with the corresponding transcriptions. This speech data was collected by S..

Available Under License:
Research  

Added on : 26 Jul 2021

Hindi ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

Hindi ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

The data set comprises of Hindi read and conversational speech data along with the corresponding transcriptions. This speech data was collected by Spe..

Available Under License:
Research  

Added on : 26 Jul 2021

English-Hindi ,Tamil-Telugu Parallel  Data Developed Under PSA Pilot

English-Hindi ,Tamil-Telugu Parallel Data Developed Under PSA Pilot

English-Hindi , Tamil-Telugu Parallel Data Developed Under PSA Pilot on  SSMT, lead by IIIT-Hyderabad..

Available Under License:
CC BY-NC-SA 4.0  

Sample Download | size: 978B | type: zip

Added on : 23 Jul 2021

Hindi -Telugu Domain Dictionary by IIIT-H

Hindi -Telugu Domain Dictionary by IIIT-H

Hindi  and Telugu Domain Dictionary developed under ILMT Hindi-Telugu Pilot by IIIT-Hyderabad (Part1). The Domain of Dictionary is Chemistry and ..

Available Under License:
CC BY-NC-SA 4.0  

Sample Download | size: 566B | type: zip

Added on : 20 Jun 2021

Hindi ASR Challenge Data (ASR Speech Data released under 1st Challenge) - NLTMP

Hindi ASR Challenge Data (ASR Speech Data released under 1st Challenge) - NLTMP

The data set comprises of Hindi read speech data along with the corresponding transcriptions. The text data was crawled from newspapers, and then volu..

Available Under License:
Research  

Sample Download | size: 66MB | type: zip

Added on : 10 Jun 2021

Hindi–Telugu Parallel Text Corpus  IIIT-Hyd

Hindi–Telugu Parallel Text Corpus IIIT-Hyd

Hindi – Telugu Parallel Text corpus developed Under NLTM Pilot by IIIT-Hyderabad. The domain of corpus is Chemistry, Law, News & General,&nbs..

Available Under License:
CC BY-NC-SA 4.0  

Sample Download | size: 29.8KB | type: zip

Added on : 17 Mar 2021

Hindi Annotated  Text Corpus - IIIT Hyderabad

Hindi Annotated Text Corpus - IIIT Hyderabad

Hindi Annotated corpus developed Under NLTM Pilot by IIIT-Hyderabad (Part1). Domains of the Corpus are Chemistry, Law, News & General,HealthCare, ..

Available Under License:
CC BY-NC-SA 4.0  

Sample Download | size: 10.6KB | type: zip

Added on : 17 Mar 2021

Gujarati Wordnet

Gujarati Wordnet

Under the Indo-Wordnet Consortium project, led by IIT Bombay, Gujarati Wordnet's Synsets (synonym set) has been developed. For each synset a POS categ..

Available Under License:
Commercial   Research  

Sample Download | size: 65.6KB | type: rar

Added on : 17 Jul 2019

Indian English Raw Speech Corpus - Kannada Variant

Indian English Raw Speech Corpus - Kannada Variant

Dataset Description23:43:04 Hours | 15.3 GB | 56 Speakers| 14,455 Audio Segments | 48 kHz | 16 bit wav. English language is a blend of Anglo-Saxo..

Sample Download | size: 2.8MB | type: zip

Added on : 27 Aug 2021

Indian English Raw Speech Corpus - Bengali Variant

Indian English Raw Speech Corpus - Bengali Variant

Dataset Description 25:47:11 Hours | 15.5 GB | 53 Speakers| 16,044 Audio Segments | 48 kHz | 16 bit wav.English language is a blend of Anglo-Saxo..

Sample Download | size: 1.7MB | type: zip

Added on : 27 Aug 2021

Multilingual Raw Speech Corpus

Multilingual Raw Speech Corpus

Dataset Description 97:43:54 Hours | 62.2 GB speech data | 1916 Speakers ..

Sample Download | size: 387.1KB | type: pdf

Added on : 27 Aug 2021

Tamil Raw Speech Corpus

Tamil Raw Speech Corpus

Dataset Description139:11:41 Hours | 86 GB speech data | 452 Speakers | 60,287 Audio segments | 48 kHz | 16 bit wav. Tamil is one of the longes..

Sample Download | size: 2.8MB | type: zip

Added on : 27 Aug 2021

Odia Raw Speech Corpus

Odia Raw Speech Corpus

Dataset Description 138:06:18 hours |  89 GB | 474 Speakers | 73,418 Audio segments | 48 kHz | 16 bit wav.Odia is an Indo-Aryan ..

Sample Download | size: 1.4MB | type: zip

Added on : 27 Aug 2021

Kashmiri Raw Speech Corpus

Kashmiri Raw Speech Corpus

Dataset Description 28:10:07 Hours | 18 GB speech data | 150 Speakers | 16,380 Audio segments | 48 kHz | 16 bit wa..

Sample Download | size: 1.6MB | type: zip

Added on : 26 Aug 2021

Showing 1 to 15 of 140 (10 Pages)
Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.