D 2025

Database and Corpora Creation within RapCor Project for Czech

NĚMCOVÁ POLICKÁ, Alena and Pavel RYCHLÝ

Basic information

Original name

Database and Corpora Creation within RapCor Project for Czech

Authors

NĚMCOVÁ POLICKÁ, Alena and Pavel RYCHLÝ

Edition

Brno, Raslan 2025 : recent advances in slavonic natural language processing, p. 137-144, 8 pp. 2025

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Proceedings paper

Country of publisher

Czech Republic

Confidentiality degree

is not subject to a state or trade secret

Publication form

printed version "print"

References:

Marked to be transferred to RIV

No

Organization

Filozofická fakulta – Repository – Repository

ISBN

978-80-263-1858-3

ISSN

EID Scopus

Keywords in English

database; corpora; hip hop; RapCor; Czech

Links

LINDAT/CLARIAH-CZ II, large research infrastructures.
Changed: 21/2/2026 00:51, RNDr. Daniel Jakubík

Abstract

In the original language

This paper introduces the motivations and first results of the creation of Czech RapCor project, mainly the constitution process of Czech RapCor Boosted v1 (Czech RCB), a specialized corpus of Czech rap lyrics designed for sociolinguistic and NLP research. The corpus highlights distinctive linguistic features, such as written colloquialism, frequent use of vulgarisms, and non-standard forms, which pose challenges for traditional NLP tools. Preliminary results demonstrate the corpus’s potential for studying authentic spoken language in written form, offering insights into rap culture and sociolinguistic phenomena.

Files attached