• Gujarati  Monolingual Chunked Text Corpus ILCI
Gujarati Monolingual Chunked Text Corpus ILCI

Available Under License: Commercial   Research  

Sample Download | size: 22.2KB | type: zip
Added on : 29 Jul 2020

Under the Indian Languages Corpora Initiative phase –II (ILCI Phase-II) project, initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected monolingual corpus in Gujarati. This is the final outcome of the project and there are  30,000 sentences of general domain. The translated sentences have been Chunked tagged according to BIS (Bureau of Indian Standards) tagset. This corpus has following features: unique ID, UTF-8 encoding, and text file format.

Text Corpus Attributes
Language Gujarati
Parallel or Monolingual Monolingual
Annotation Chunked Tagged
No. of Sentences 30000
Word-Count 3,04,394
File Format Text File
Encoding UTF-8
File Size 1.29MB

Write a review

Please login or register to review

Tags: Gujarati, Monolingual, Chunked Tagged, Text Corpus, ILCI

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.