Your cart is empty!
Here the term Resources refers to a set of speech or language data and descriptions in machine readable form, for the purpose of building, improving or evaluating natural language and speech algorithms or systems.
TTS data for Indian languages — Hindi, Punjabi, Tamil, and Indian English. Text and corresponding speech data record in studio environment...
Available Under License: CC BY-SA 2.0
The data set comprises of English read and conversational speech data along with the corresponding transcriptions. This speech data was collected by S..
Available Under License: Research
The data set comprises of Hindi read and conversational speech data along with the corresponding transcriptions. This speech data was collected by Spe..
English-Hindi , Tamil-Telugu Parallel Data Developed Under PSA Pilot on SSMT, lead by IIIT-Hyderabad..
Available Under License: CC BY-NC-SA 4.0
Hindi and Telugu Domain Dictionary developed under ILMT Hindi-Telugu Pilot by IIIT-Hyderabad (Part1). The Domain of Dictionary is Chemistry and ..
The data set comprises of Hindi read speech data along with the corresponding transcriptions. The text data was crawled from newspapers, and then volu..
Hindi – Telugu Parallel Text corpus developed Under NLTM Pilot by IIIT-Hyderabad. The domain of corpus is Chemistry, Law, News & General,&nbs..
Hindi Annotated corpus developed Under NLTM Pilot by IIIT-Hyderabad (Part1). Domains of the Corpus are Chemistry, Law, News & General,HealthCare, ..
Under the Indo-Wordnet Consortium project, led by IIT Bombay, Gujarati Wordnet's Synsets (synonym set) has been developed. For each synset a POS categ..
Available Under License: Commercial Research
Test Upload dsgdf dfhdfs hsdh df..
Dataset Description23:43:04 Hours | 15.3 GB | 56 Speakers| 14,455 Audio Segments | 48 kHz | 16 bit wav. English language is a blend of Anglo-Saxo..
Dataset Description 25:47:11 Hours | 15.5 GB | 53 Speakers| 16,044 Audio Segments | 48 kHz | 16 bit wav.English language is a blend of Anglo-Saxo..
Dataset Description 97:43:54 Hours | 62.2 GB speech data | 1916 Speakers ..
Dataset Description139:11:41 Hours | 86 GB speech data | 452 Speakers | 60,287 Audio segments | 48 kHz | 16 bit wav. Tamil is one of the longes..
Dataset Description 138:06:18 hours | 89 GB | 474 Speakers | 73,418 Audio segments | 48 kHz | 16 bit wav.Odia is an Indo-Aryan ..
Dataset Description 28:10:07 Hours | 18 GB speech data | 150 Speakers | 16,380 Audio segments | 48 kHz | 16 bit wa..
Dataset Description 64:44:02 Hours | 7.1 GB | 233 Speakers| 26,223 Audio Segments | 16 kHz | 16 bit wav. Gujarati is one of ..
Dataset Description57:17:08 Hours | 37 GB | 204 Speakers| 25,712 Audio Segments | 48 kHz | 16 bit wav. Gujarati is one of the ma..
Dataset Description 17:10:26 Hours | 11 GB speech data | 61 Speakers | 12,036 Audio segments | 48 kHz | 16..
Dataset Description 54:21:12 Hours | 32.5 GB | 304 Speakers | 37,570 Audio Segments | 48 kHz | 16 bit wav.&n..
The data set comprises of Tamil read and conversational speech data along with the corresponding transcriptions. This speech data was collected by Spe..
The data set comprises of Indian English read speech and lecture speech data along with the corresponding transcriptions. The read speech covers genre..
English-Urdu Parallel Tourism Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) consortium. The core vo..
English-Urdu Parallel Health Text corpus is developed in Unicode, under English to Indian Language Machine Translation (EILMT) Consortium. This corpus..
English-Urdu Parallel Agriculture Text corpus is developed in Unicode, under English to Indian Language Machine Translation (EILMT) Consortium. This c..
English-Tamil Parallel Tourism Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) consortium. The core v..
English-Tamil Parallel Health Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) Consortium. This corpus..
English-Tamil Parallel Agriculture Text corpus is developed in Unicode, under English to Indian Language Machine Translation (EILMT) Consortium. This ..
English-Odia Parallel Tourism Text corpus is developed in Unicode, under English to Indian Language Machine Translation (EILMT) consortium. The core v..
English-Odia Parallel Health Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) Consortium. This corpus ..
English-Odia Parallel Agriculture Text corpus is developed in Unicode, under English to Indian Language Machine Translation (EILMT) Consortium. This c..
Under the Indian Languages Corpora Initiative phase –II (ILCI Phase-II) project, initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, ..
English-Marathi Parallel Tourism Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) consortium. The core..
English-Marathi Parallel Health Text corpus is developed in Unicode, under English to Indian Language Machine Translation (EILMT) Consortium. This cor..
English-Marathi Parallel Agriculture Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) Consortium. This..
English-Gujarati Parallel Tourism Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) consortium. The cor..
English-Gujarati Parallel Health Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) Consortium. This cor..
English-Gujarati Parallel Agriculture Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) Consortium. Thi..
This contains collection of English sentences of tourism domain provided by the EILMT consortia and was translated into Bodo by NE Consortia. This cou..
English-Bodo Parallel Health Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) Consortium. This corpus ..
English-Bodo Parallel Agriculture Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) Consortium. This co..
English-Bangla Parallel Tourism Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) consortium. The core ..
English-Bangla Parallel Health Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) Consortium. This corpu..
English-Bangla Agriculture Parallel Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) Consortium. This ..
English-Hindi Tourism Parallel Text corpus is developed in Unicode, under English to Indian Language Machine Translation (EILMT) consortium..
English-Hindi Health Parallel Text corpus is developed in Unicode, under English to Indian Language Machine Translation (EILMT) consortium. Th..
English-Hindi Agriculture Parallel Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) consortium. This c..
This is a monolingual aligned corpus developed for Tourism domain under English to Indian Language Machine Translation (EILMT) Consortium. Supported t..
This is a monolingual aligned corpus developed for Health domain under English to Indian Language Machine Translation (EILMT) Consortium. Supported te..
This is a monolingual aligned corpus developed for Agriculture domain under English to Indian Language Machine Translation (EILMT) Consortium. Support..
It is a voice data collected for building HTS based statistical speech synthesis for Telugu language under the project developing text-to-speech (TTS)..
It is a voice data collected for building HTS based statistical speech synthesis for Tamil language under the project developing text-to-speech (TTS) ..
It is a voice data collected for building HTS based statistical speech synthesis for Rajasthani language under the project developing text-to-speech (..
It is a voice data collected for building HTS based statistical speech synthesis for Odia language under the project developing text-to-speech (TTS) s..
It is a voice data collected for building HTS based statistical speech synthesis for Manipuri language under the project developing text-to-speech (TT..
It is a voice data collected for building HTS based statistical speech synthesis for Malayalam language under the project developing text-to-speech (T..
Under the Indian Languages Speech Resources Development for Speech Applications project initiated by the MeitY, Govt. of India, Speech Consortium..
It is a voice data collected for building HTS based statistical speech synthesis for Assamese language under the project developing text-to-speech (TT..
It is a voice data collected for building HTS based statistical speech synthesis for Kannada language under the project developing text-to-speech (TTS..
It is a voice data collected for building HTS based statistical speech synthesis for Bengali language under the project developing text-to-speech (TTS..
It is a voice data collected for building HTS based statistical speech synthesis for Marathi language under the project developing text-to-speech (TTS..
It is a voice data collected for building HTS based statistical speech synthesis for Hindi language under the project developing text-to-speech (TTS) ..
Marathi treebank data is in Shakti Standard Format (SSF). SSF is a common representation for data. SSF allows information in a sentence to be represen..