CIIL Mysore Repository

List of linguistic resources developed by Linguistic Data Consortium for Indian Languages (LDC-IL), CIIL Mysore. 

**Repository Last Crawled Date: 26/08/2021

A Gold Standard Marathi Raw Text Corpus

A Gold Standard Marathi Raw Text Corpus

Marathi is an Indo-Aryan language. It is the official language of Maharashtra state of India. Marathi Text Corpus encoded in a machine readable f..

Sample Download | size: 59.9KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Malayalam Raw Text Corpus

A Gold Standard Malayalam Raw Text Corpus

 63,70,954 words taken from 1,119 different titles.Malayalam is a highly agglutinative and morphologically rich language.The actual pattern of la..

Sample Download | size: 25.2KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Maithili Raw Text Corpus

A Gold Standard Maithili Raw Text Corpus

Maithili Raw Text Corpus encoded in a machine readable form and stored in a standard format.Maithili is an Indio-Aryan language, a direct descendent o..

Sample Download | size: 57.2KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Konkani Raw Text Corpus

A Gold Standard Konkani Raw Text Corpus

Konkani is the principal and administrative language of Goa. Konkani is an Indo-Aryan language belonging to the Indo-European family of languages and ..

Sample Download | size: 61.6KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Kashmiri Raw Text Corpus

A Gold Standard Kashmiri Raw Text Corpus

Kashmiri language is one of the 22 scheduled languages of India and is a part of the Eighth Schedule in the constitution of Jammu and Kashmir.Kashmiri..

Sample Download | size: 34.3KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Kannada Raw Text Corpus

A Gold Standard Kannada Raw Text Corpus

Kannada text Corpus of 77,63,124 words | 1772 Titles | Data and Metadata in XML format |  6 text domainsKannada is one of the Ancient Indian..

Sample Download | size: 18.8KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Hindi Raw Text Corpus

A Gold Standard Hindi Raw Text Corpus

Hindi is a Major, Indo-Aryan language, a descendent of Sanskrit, which is spoken in the central and northern India.Hindi Text Corpus encoded in a mach..

Sample Download | size: 66.8KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Gujarati Raw Text Corpus

A Gold Standard Gujarati Raw Text Corpus

Gujarati is a major, Indo-Aryan language and the administrative language of Gujarat, Union territories of Daman and Diu and Dadra and Nagar Haveli.Guj..

Sample Download | size: 30.7KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Dogri Raw Text Corpus

A Gold Standard Dogri Raw Text Corpus

Dogri, is an Indo-Aryan Language spoken by about five million people in India and Pakistan, Particularly in the Jammu.Dogri Text Corpus encoded in a m..

Sample Download | size: 45.8KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Bodo Raw Text Corpus

A Gold Standard Bodo Raw Text Corpus

Unicode Standard Bodo text Corpus of 29, 15,544 words | 80Titles |Data and Metadata in XML format | 5 text domainsBodo is a major tribal language..

Sample Download | size: 21.4KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Bengali Raw Text Corpus

A Gold Standard Bengali Raw Text Corpus

Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family.Bengali Text Corpus encoded in a machine r..

Sample Download | size: 56.5KB | type: zip

Added on : 26 Jul 2019

Showing 31 to 41 of 41 (3 Pages)
Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.