By Private Players

Resources owned and managed by Startups/ MNCs/ Private Companies working in NLP/ Language Technology/ Localization etc. domains. 

Hindi-Magahi parallel data set

Hindi-Magahi parallel data set

This Hindi-Magahi parallel data set, having total 1000 sentences (500 dev, 500 test) has been release under license: CC BY-NC-SA 4.0 by Panlingua Lang..

Added on : 15 Dec 2020

Hindi-Bhojpuri parallel data set

Hindi-Bhojpuri parallel data set

This Hindi-Bhojpuri parallel data set, having total 1000 sentences (500 dev, 500 test) has been release under license: CC BY-NC-SA 4.0 by Panlingua La..

Available Under License:
CC BY-NC-SA 4.0  

Added on : 15 Dec 2020

Hindi Monolingual Data Set

Hindi Monolingual Data Set

This Hindi monolingual data set, having 473605 sentences and total word count of 7092870, has been release under license: CC BY-NC-SA 4.0 by Panlingua..

Available Under License:
CC BY-NC-SA 4.0  

Added on : 14 Dec 2020

Magahi monolingual data set

Magahi monolingual data set

This Magahi monolingual data set, having 148606 sentences and total word count of 2178424, has been release under license: CC BY-NC-SA 4.0 by Panlingu..

Available Under License:
CC BY-NC-SA 4.0  

Added on : 14 Dec 2020

Bhojpuri monolingual data set

Bhojpuri monolingual data set

This Bhojpuri monolingual data set, having 91131 sentences and total word count of 1562465, has been release under license: CC BY-NC-SA 4.0 by Panling..

Available Under License:
CC BY-NC-SA 4.0  

Added on : 14 Dec 2020

Showing 1 to 5 of 5 (1 Pages)
Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.