Other Repositories

List of Indian Languages linguistic resources and tools developed by other institutions apart from NPLT umbrella. 

Refine Search

A Gold Standard Urdu Raw Text Corpus

A Gold Standard Urdu Raw Text Corpus

Unicode Standard Urdu text corpus of  5161927  Words| 739 Titles | Data and Metadata in XML format | 5 Text domains.Urdu is one am..

Sample Download | size: 15.5KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Telugu Raw Text Corpus

A Gold Standard Telugu Raw Text Corpus

Standard Telugu Text Corpus of 30,10,993 words|859 Titles|Data and Metadata in XML format | 6 Text Domains |Telugu Text Corpus encoded in a machine re..

Sample Download | size: 39.8KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Tamil Raw Text Corpus

A Gold Standard Tamil Raw Text Corpus

Tamil is one of the longest-surviving Classical Languages in the world. It is a Dravidian Language Family.Tamil Text Corpus encoded in a machine reada..

Sample Download | size: 33.1KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Punjabi Raw Text Corpus

A Gold Standard Punjabi Raw Text Corpus

Punjabi Text Corpus encoded in a machine readable form and stored in a standard format. The major encoding being used is Unicode and stored in XM..

Sample Download | size: 46.8KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Odia Raw Text Corpus

A Gold Standard Odia Raw Text Corpus

LDC-IL Odia Raw Text Corpus developed according to various factors such as quality of the text, representativeness, retrievable format, size of corpus..

Sample Download | size: 19.9KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Nepali Raw Text Corpus

A Gold Standard Nepali Raw Text Corpus

Nepali is one of the 22 schedule languages of India. It is descendent of Sanskrit.Nepali Text Corpus encoded in a machine readable form and stored in ..

Sample Download | size: 14KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Manipuri Raw Text Corpus

A Gold Standard Manipuri Raw Text Corpus

Manipuri Text Corpus is encoded in a machine readable form and stored in a standard format. The major encoding being used is Unicode and stored in XML..

Sample Download | size: 27.7KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Marathi Raw Text Corpus

A Gold Standard Marathi Raw Text Corpus

Marathi is an Indo-Aryan language. It is the official language of Maharashtra state of India. Marathi Text Corpus encoded in a machine readable f..

Sample Download | size: 59.9KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Malayalam Raw Text Corpus

A Gold Standard Malayalam Raw Text Corpus

 63,70,954 words taken from 1,119 different titles.Malayalam is a highly agglutinative and morphologically rich language.The actual pattern of la..

Sample Download | size: 25.2KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Maithili Raw Text Corpus

A Gold Standard Maithili Raw Text Corpus

Maithili Raw Text Corpus encoded in a machine readable form and stored in a standard format.Maithili is an Indio-Aryan language, a direct descendent o..

Sample Download | size: 57.2KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Konkani Raw Text Corpus

A Gold Standard Konkani Raw Text Corpus

Konkani is the principal and administrative language of Goa. Konkani is an Indo-Aryan language belonging to the Indo-European family of languages and ..

Sample Download | size: 61.6KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Kashmiri Raw Text Corpus

A Gold Standard Kashmiri Raw Text Corpus

Kashmiri language is one of the 22 scheduled languages of India and is a part of the Eighth Schedule in the constitution of Jammu and Kashmir.Kashmiri..

Sample Download | size: 34.3KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Kannada Raw Text Corpus

A Gold Standard Kannada Raw Text Corpus

Kannada text Corpus of 77,63,124 words | 1772 Titles | Data and Metadata in XML format |  6 text domainsKannada is one of the Ancient Indian..

Sample Download | size: 18.8KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Hindi Raw Text Corpus

A Gold Standard Hindi Raw Text Corpus

Hindi is a Major, Indo-Aryan language, a descendent of Sanskrit, which is spoken in the central and northern India.Hindi Text Corpus encoded in a mach..

Sample Download | size: 66.8KB | type: zip

Added on : 26 Jul 2019

A Gold Standard Gujarati Raw Text Corpus

A Gold Standard Gujarati Raw Text Corpus

Gujarati is a major, Indo-Aryan language and the administrative language of Gujarat, Union territories of Daman and Diu and Dadra and Nagar Haveli.Guj..

Sample Download | size: 30.7KB | type: zip

Added on : 26 Jul 2019

Showing 31 to 45 of 48 (4 Pages)
Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.