Přehled o publikaci
2014
Effective Corpus Virtualization
JAKUBÍČEK, Miloš; Pavel RYCHLÝ and Adam KILGARRIFFBasic information
Original name
Effective Corpus Virtualization
Authors
JAKUBÍČEK, Miloš; Pavel RYCHLÝ and Adam KILGARRIFF
Edition
Reykjavik, Challenges in the Management of Large Corpora (CMLC-2), p. 7-9, 3 pp. 2014
Publisher
EUROPEAN LANGUAGE RESOURCES ASSOCIATION-ELRA
Other information
Language
English
Type of outcome
Proceedings paper
Field of Study
Informatics
Country of publisher
France
Confidentiality degree
is not subject to a state or trade secret
Publication form
electronic version available online
References:
Marked to be transferred to RIV
Yes
RIV identification code
RIV/00216224:14330/14:00094187
Organization
Fakulta informatiky – Repository – Repository
ISBN
978-2-9517408-8-4
UT WoS
Keywords in English
corpus; corpus linguistics; virtualization; indexing; database
Links
LM2010013, research and development project. VF20102014003, research and development project.
Changed: 1/9/2020 20:35, RNDr. Daniel Jakubík
Abstract
In the original language
In this paper we describe an implementation of corpus virtualization within the Manatee corpus management system. Under corpus virtualization we understand logical manipulation with corpora or their parts grouping them into new (virtual) corpora. We discuss the motivation for such a setup in detail and show space and time efficiency of this approach evaluated on a 11 billion word corpus of Spanish.