D 2014

Effective Corpus Virtualization

JAKUBÍČEK, Miloš; Pavel RYCHLÝ and Adam KILGARRIFF

Basic information

Original name

Effective Corpus Virtualization

Authors

JAKUBÍČEK, Miloš; Pavel RYCHLÝ and Adam KILGARRIFF

Edition

Reykjavik, Challenges in the Management of Large Corpora (CMLC-2), p. 7-9, 3 pp. 2014

Publisher

EUROPEAN LANGUAGE RESOURCES ASSOCIATION-ELRA

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

Informatics

Country of publisher

France

Confidentiality degree

is not subject to a state or trade secret

Publication form

electronic version available online

References:

Marked to be transferred to RIV

Yes

RIV identification code

RIV/00216224:14330/14:00094187

Organization

Fakulta informatiky – Repository – Repository

ISBN

978-2-9517408-8-4

Keywords in English

corpus; corpus linguistics; virtualization; indexing; database

Links

LM2010013, research and development project. VF20102014003, research and development project.
Changed: 1/9/2020 20:35, RNDr. Daniel Jakubík

Abstract

In the original language

In this paper we describe an implementation of corpus virtualization within the Manatee corpus management system. Under corpus virtualization we understand logical manipulation with corpora or their parts grouping them into new (virtual) corpora. We discuss the motivation for such a setup in detail and show space and time efficiency of this approach evaluated on a 11 billion word corpus of Spanish.

Files attached