Přehled o publikaci
2018
Modifying Hamming Spaces for Efficient Search
MÍČ, Vladimír; David NOVÁK and Pavel ZEZULABasic information
Original name
Modifying Hamming Spaces for Efficient Search
Authors
MÍČ, Vladimír; David NOVÁK and Pavel ZEZULA
Edition
USA, 18th International Conference on Data Mining Workshops (ICDMW), Singapore, November 17-21, 2018, p. 945-953, 9 pp. 2018
Publisher
IEEE
Other information
Language
English
Type of outcome
Proceedings paper
Country of publisher
United States of America
Confidentiality degree
is not subject to a state or trade secret
Publication form
printed version "print"
Marked to be transferred to RIV
Yes
RIV identification code
RIV/00216224:14330/18:00103668
Organization
Fakulta informatiky – Repository – Repository
ISBN
978-1-5386-9288-2
ISSN
UT WoS
EID Scopus
Keywords in English
Similarity search;Hamming space;Hamming Weight Tree; Lower bound inthe Hamming space
Links
EF16_019/0000822, research and development project.
Changed: 6/9/2020 07:04, RNDr. Daniel Jakubík
Abstract
In the original language
We focus on the efficient search for the most similar bit strings to a given query in the Hamming space. The distance of this space can be lower-bounded by a function based on a difference of the number of ones in the compared strings, i.e. their weights. Recently, such property has been successfully used by the Hamming Weight Tree (HWT) indexing structure. We propose modifications of the bit strings that preserve pairwise Hamming distances but improve the tightness of these lower bounds, so the query evaluation with the HWT is several times faster. We also show that the unbalanced bit strings, recently reported to provide similar quality of search as the traditionally used balanced bit strings, are more easy to index with the HWT. Combined with the distance preserving modifications, the HWT query evaluation can be more than one order of magnitude faster than the HWT baseline. Finally, we show that such modifications are useful even for a very complex data where the search with the HWT is slower than a sequential search.