Přehled o publikaci
2025
Database and Corpora Creation within RapCor Project for Czech
NĚMCOVÁ POLICKÁ, Alena and Pavel RYCHLÝBasic information
Original name
Database and Corpora Creation within RapCor Project for Czech
Authors
NĚMCOVÁ POLICKÁ, Alena and Pavel RYCHLÝ
Edition
Brno, Raslan 2025 : recent advances in slavonic natural language processing, p. 137-144, 8 pp. 2025
Publisher
Tribun EU
Other information
Language
English
Type of outcome
Proceedings paper
Country of publisher
Czech Republic
Confidentiality degree
is not subject to a state or trade secret
Publication form
printed version "print"
References:
Marked to be transferred to RIV
No
Organization
Filozofická fakulta – Repository – Repository
ISBN
978-80-263-1858-3
ISSN
EID Scopus
Keywords in English
database; corpora; hip hop; RapCor; Czech
Links
LINDAT/CLARIAH-CZ II, large research infrastructures.
Changed: 21/2/2026 00:51, RNDr. Daniel Jakubík
Abstract
In the original language
This paper introduces the motivations and first results of the creation of Czech RapCor project, mainly the constitution process of Czech RapCor Boosted v1 (Czech RCB), a specialized corpus of Czech rap lyrics designed for sociolinguistic and NLP research. The corpus highlights distinctive linguistic features, such as written colloquialism, frequent use of vulgarisms, and non-standard forms, which pose challenges for traditional NLP tools. Preliminary results demonstrate the corpus’s potential for studying authentic spoken language in written form, offering insights into rap culture and sociolinguistic phenomena.