Hindi - Telugu Parallel POS Tagged Text Corpus ILCI

Contributor: ILCI Consortia
Product Code: ILCI-HIN-TEL-POS-TEXT-030

Available Under License: Commercial Research

Sample Download | size: 261.5KB | type: rar

Added on : 15 Jul 2019

Under the Indian Languages Corpora Initiative (ILCI) project, ILCI Consortia led by Jawaharlal Nehru University, New Delhi has created parallel corpus, Hindi as source language and translated in Telugu as the target language. Health, Tourism, Agriculture and Entertainment domains have been covered in this corpus. This corpus has a unique sentence ID for each sentence and complete corpus is in UTF-8 encoding. This corpus is POS tagged according to BIS (Bureau of Indian Standards) tagset.

Text Corpus Attributes
Language	Hindi and Telugu
Parallel or Monolingual	Parallel Hindi to Telugu
Annotation	POS Tagged
No. of Sentences	70000 Hindi sentences 70000 Telugu sentences
Word-Count	1128215 Hindi Words 788955 Telugu Words
File Format	Text File
Encoding	UTF-8
File Size	6.46 MB

Tags: Hindi, Telugu, Text Corpus, POS tag, Parallel text corpus

Write a review