Introducing a Gold Standard Corpus from Young Multilinguals for
the Evaluation of Automatic UD-PoS Taggers for Italian

Introducing a Gold Standard Corpus from Young Multilinguals for the Evaluation of Automatic UD-PoS ... Tato aplikace je zatím určena pro spuštění na stolním počítači. Na tomto mobilním zařízení je také funkční, ale zatím plně nevyužívá jeho možností.

Tato aplikace je zatím určena pro spuštění na stolním počítači. Na tomto mobilním zařízení je také funkční, ale zatím plně nevyužívá jeho možností.

Detailed Information on Publication Record

SCHMALZ, Verena, Jennifer-Carmen FREY and Egon STEMLE. Introducing a Gold Standard Corpus from Young Multilinguals for the Evaluation of Automatic UD-PoS Taggers for Italian. Online. In 8th Italian Conference on Computational Linguistics, CLiC-it 2021. Milan, Italy: CEUR Workshop Proceedings, 2021, p. 1-7. ISSN 1613-0073.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Introducing a Gold Standard Corpus from Young Multilinguals for the Evaluation of Automatic UD-PoS Taggers for Italian
Authors	SCHMALZ, Verena (380 Italy), Jennifer-Carmen FREY (40 Austria) and Egon STEMLE (276 Germany, guarantor, belonging to the institution).
Edition	Milan, Italy, 8th Italian Conference on Computational Linguistics, CLiC-it 2021, p. 1-7, 7 pp. 2021.
Publisher	CEUR Workshop Proceedings

Other information
Original language	English
Type of outcome	Proceedings paper
Country of publisher	Italy
Confidentiality degree	is not subject to a state or trade secret
Publication form	electronic version available online
WWW	URL
RIV identification code	RIV/00216224:14330/21:00125291
Organization	Fakulta informatiky – Repository – Repository
ISSN	1613-0073
Keywords in English	PoS tagging; automatic evaluation
Changed by	Changed by: RNDr. Daniel Jakubík, učo 139797. Changed: 7/4/2023 04:30.

Abstract

Part-of-speech (PoS) tagging constitutes a common task in Natural Language Processing (NLP), given its widespread applicability. However, with the advance of new information technologies and language variation, the contents and methods for PoS-tagging have changed. The majority of Italian existing data for this task originate from standard texts, where language use is far from multifaceted informal real-life situations. Automatic PoS-tagging models trained with such data do not perform reliably on non-standard language, like social media content or language learners’ texts. Our aim is to provide additional training and evaluation data from language learners tagged in Universal Dependencies (UD), as well as testing current automatic PoStagging systems and evaluating their performance on such data. We use a multilingual corpus of young language learners, LEONIDE, to create a tagged gold standard for evaluating UD PoStagging performance on the Italian nonstandard language. With the 3.7 version of Stanza, a Python NLP package, we apply available automatic PoS-taggers, namely ISDT, ParTUT, POSTWITA, TWITTIRÒ and VIT, trained with both standard and non-standard data, on our dataset. Our results show that the above taggers, trained on non-standard data or multilingual Treebanks, can achieve up to 95% of accuracy on multilingual learner data, if combined.

Type	Name	Uploaded/Created by	Uploaded/Created
	paper13.pdf		27/1/2022
Properties Name paper13.pdf Address within IS https://repozitar.cz/auth/repo/48309/1233896/ Address for the users outside IS https://repozitar.cz/repo/48309/1233896/ Address within Manager https://repozitar.cz/auth/repo/48309/1233896/?info Address within Manager for the users outside IS https://repozitar.cz/repo/48309/1233896/?info Uploaded/Created Thu 27/1/2022 02:16 Rights Right to read anyone on the Internet Right to upload Right to administer: a concrete person Mgr. Lucie Vařechová, uco 106253 a concrete person RNDr. Daniel Jakubík, uco 139797 a concrete person Mgr. Jolana Surýnková, uco 220973 Attributes

Print
Add to clipboard Displayed: 19/5/2024 19:55

Detailed Information on Publication Record

Properties

Rights

Other applications