D 2018

Modifying Hamming Spaces for Efficient Search

MÍČ, Vladimír; David NOVÁK a Pavel ZEZULA

Základní údaje

Originální název

Modifying Hamming Spaces for Efficient Search

Autoři

MÍČ, Vladimír; David NOVÁK a Pavel ZEZULA

Vydání

USA, 18th International Conference on Data Mining Workshops (ICDMW), Singapore, November 17-21, 2018, od s. 945-953, 9 s. 2018

Nakladatel

IEEE

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Stát vydavatele

Spojené státy

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Označené pro přenos do RIV

Ano

Kód RIV

RIV/00216224:14330/18:00103668

Organizace

Fakulta informatiky – Masarykova univerzita – Repozitář

ISBN

978-1-5386-9288-2

ISSN

UT WoS

000465766800128

EID Scopus

2-s2.0-85062890466

Klíčová slova anglicky

Similarity search;Hamming space;Hamming Weight Tree; Lower bound inthe Hamming space

Návaznosti

EF16_019/0000822, projekt VaV.
Změněno: 6. 9. 2020 07:04, RNDr. Daniel Jakubík

Anotace

V originále

We focus on the efficient search for the most similar bit strings to a given query in the Hamming space. The distance of this space can be lower-bounded by a function based on a difference of the number of ones in the compared strings, i.e. their weights. Recently, such property has been successfully used by the Hamming Weight Tree (HWT) indexing structure. We propose modifications of the bit strings that preserve pairwise Hamming distances but improve the tightness of these lower bounds, so the query evaluation with the HWT is several times faster. We also show that the unbalanced bit strings, recently reported to provide similar quality of search as the traditionally used balanced bit strings, are more easy to index with the HWT. Combined with the distance preserving modifications, the HWT query evaluation can be more than one order of magnitude faster than the HWT baseline. Finally, we show that such modifications are useful even for a very complex data where the search with the HWT is slower than a sequential search.
Zobrazeno: 9. 5. 2026 06:45