• A Gold Standard Kashmiri Raw Text Corpus
A Gold Standard Kashmiri Raw Text Corpus
  • Contributor: CIIL Mysore
  • Product Code: CIIL-KAS-RAW-TEXT-107
Sample Download | size: 34.3KB | type: zip
Added on : 26 Jul 2019

Kashmiri language is one of the 22 scheduled languages of India and is a part of the Eighth Schedule in the constitution of Jammu and Kashmir.

Kashmiri text has been typed in Unicode by using the In Script Keyboard in XML files. Metadata information has also been provided along with the data. The corpus has been developed from the available contemporary text. Kashmiri Text Corpus in LDC-IL comprises of 466,054 Words and character count is 2646948, drawn from books, newspapers and magazines. The representations of the two major domains are Aesthetics and Social Sciences etc.

Text Corpus Attributes
Language Kashmiri
Parallel or Monolingual Monolingual
Annotation Raw Text Corpus
Word-Count 466054
Encoding UTF-8

Write a review

Please login or register to review

Tags: Kashmiri, Raw Text Corpus

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.