Search

Hindi–Telugu Parallel Text Corpus IIIT-Hyd

Hindi – Telugu Parallel Text corpus developed Under NLTM Pilot by IIIT-Hyderabad. The domain of corpus is Chemistry, Law, News & General, Health-Care, Education, Open Education...

Contributor: NLTM IIIT-Hyderabad

Tags: NLTM Pilot, Hindi, Telugu, Hindi–Telugu, Parallel, Text Corpus

Hindi Annotated Text Corpus - IIIT Hyderabad

Hindi Annotated corpus developed Under NLTM Pilot by IIIT-Hyderabad (Part1). Domains of the Corpus are Chemistry, Law, News & General,HealthCare, Education Others, open education books....

Contributor: NLTM IIIT-Hyderabad

Tags: NLTM Pilot, Hindi, Telugu, Hindi–Telugu, Annotated, Text Corpus , IIIT-Hyderabad

Hindi-Magahi parallel data set

This Hindi-Magahi parallel data set, having total 1000 sentences (500 dev, 500 test) has been release under license: CC BY-NC-SA 4.0 by Panlingua Language Processing LLP, New Delhi, India....

Contributor: Panlingua Language Processing LLP

Tags: Hindi, Magahi, Parallel Text Corpus

Hindi-Bhojpuri parallel data set

This Hindi-Bhojpuri parallel data set, having total 1000 sentences (500 dev, 500 test) has been release under license: CC BY-NC-SA 4.0 by Panlingua Language Processing LLP, New Delhi, India....

Contributor: Panlingua Language Processing LLP

Tags: Hindi, Bhojpuri, Parallel, Text Corpus

Hindi Monolingual Data Set

This Hindi monolingual data set, having 473605 sentences and total word count of 7092870, has been release under license: CC BY-NC-SA 4.0 by Panlingua Language Processing LLP, New Delhi, India....

Contributor: Panlingua Language Processing LLP

Tags: Hindi, Monolingual, Text Corpus

Magahi monolingual data set

This Magahi monolingual data set, having 148606 sentences and total word count of 2178424, has been release under license: CC BY-NC-SA 4.0 by Panlingua Language Processing LLP, New Delhi, India....

Contributor: Panlingua Language Processing LLP

Tags: Magahi, Monolingual, Text Corpus

Bhojpuri monolingual data set

This Bhojpuri monolingual data set, having 91131 sentences and total word count of 1562465, has been release under license: CC BY-NC-SA 4.0 by Panlingua Language Processing LLP, N. Delhi, India....

Contributor: Panlingua Language Processing LLP

Tags: Bhojpuri, Monolingual, Text Corpus

English-Urdu Tourism Set - II Parallel Text corpus-EILMT

English-Urdu Parallel Tourism Text corpus is developed in Unicode under English to Indian Language Machine Translation (EILMT) consortium. The core vocabulary of this corpus consist of various names, ...