CIIL Mysore

A Gold Standard Marathi Raw Text Corpus

A Gold Standard Marathi Raw Text Corpus

Marathi is an Indo-Aryan language. It is the official language of Maharashtra state of India. Marathi Text Corpus encoded in a machine readable f..

A Gold Standard Malayalam Raw Text Corpus

A Gold Standard Malayalam Raw Text Corpus

 63,70,954 words taken from 1,119 different titles.Malayalam is a highly agglutinative and morphologically rich language.The actual pattern of la..

A Gold Standard Maithili Raw Text Corpus

A Gold Standard Maithili Raw Text Corpus

Maithili Raw Text Corpus encoded in a machine readable form and stored in a standard format.Maithili is an Indio-Aryan language, a direct descendent o..

A Gold Standard Konkani Raw Text Corpus

A Gold Standard Konkani Raw Text Corpus

Konkani is the principal and administrative language of Goa. Konkani is an Indo-Aryan language belonging to the Indo-European family of languages and ..

A Gold Standard Kashmiri Raw Text Corpus

A Gold Standard Kashmiri Raw Text Corpus

Kashmiri language is one of the 22 scheduled languages of India and is a part of the Eighth Schedule in the constitution of Jammu and Kashmir.Kashmiri..

A Gold Standard Kannada Raw Text Corpus

A Gold Standard Kannada Raw Text Corpus

Kannada text Corpus of 77,63,124 words | 1772 Titles | Data and Metadata in XML format |  6 text domainsKannada is one of the Ancient Indian..

A Gold Standard Hindi Raw Text Corpus

A Gold Standard Hindi Raw Text Corpus

Hindi is a Major, Indo-Aryan language, a descendent of Sanskrit, which is spoken in the central and northern India.Hindi Text Corpus encoded in a mach..

A Gold Standard Gujarati Raw Text Corpus

A Gold Standard Gujarati Raw Text Corpus

Gujarati is a major, Indo-Aryan language and the administrative language of Gujarat, Union territories of Daman and Diu and Dadra and Nagar Haveli.Guj..

A Gold Standard Dogri Raw Text Corpus

A Gold Standard Dogri Raw Text Corpus

Dogri, is an Indo-Aryan Language spoken by about five million people in India and Pakistan, Particularly in the Jammu.Dogri Text Corpus encoded in a m..

A Gold Standard Bodo Raw Text Corpus

A Gold Standard Bodo Raw Text Corpus

Unicode Standard Bodo text Corpus of 29, 15,544 words | 80Titles |Data and Metadata in XML format | 5 text domainsBodo is a major tribal language..

A Gold Standard Bengali Raw Text Corpus

A Gold Standard Bengali Raw Text Corpus

Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family.Bengali Text Corpus encoded in a machine r..

Showing 31 to 41 of 41 (3 Pages)