TY - JOUR
T1 - Aligning codebooks for near duplicate image detection
AU - Battiato, Sebastiano
AU - Farinella, Giovanni Maria
AU - Puglisi, Giovanni
AU - Ravì, Daniele
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2014/9
Y1 - 2014/9
N2 - The detection of near duplicate images in large databases, such as the ones of popular social networks, digital investigation archives, and surveillance systems, is an important task for a number of image forensics applications. In digital investigation, hashing techniques are commonly used to index large quantities of images for the detection of copies belonging to different archives. In the last few years, different image hashing techniques based on the Bags of Visual Features paradigm appeared in literature. Recently, this paradigm has been augmented by using multiple descriptors (e.g., Bags of Visual Phrases) in order to exploit the coherence between different feature spaces. In this paper we propose to further improve the Bags of Visual Phrases approach considering the coherence between feature spaces not only at the level of image representation, but also during the codebook generation phase. Also we introduce a novel image database specifically designed for the development and benchmarking of near duplicate image retrieval techniques. The dataset consists of more than 3,300 images depicting more than 500 different scenes having at least three real near duplicates. The dataset has a huge variability in terms of geometric and photometric transformations between scenes and their corresponding near duplicates. Finally, we suggest a method to compress the proposed image representation for storage purposes. Experiments show the effectiveness of the proposed near duplicate retrieval technique, which outperforms the original Bags of Visual Phrases approach.
AB - The detection of near duplicate images in large databases, such as the ones of popular social networks, digital investigation archives, and surveillance systems, is an important task for a number of image forensics applications. In digital investigation, hashing techniques are commonly used to index large quantities of images for the detection of copies belonging to different archives. In the last few years, different image hashing techniques based on the Bags of Visual Features paradigm appeared in literature. Recently, this paradigm has been augmented by using multiple descriptors (e.g., Bags of Visual Phrases) in order to exploit the coherence between different feature spaces. In this paper we propose to further improve the Bags of Visual Phrases approach considering the coherence between feature spaces not only at the level of image representation, but also during the codebook generation phase. Also we introduce a novel image database specifically designed for the development and benchmarking of near duplicate image retrieval techniques. The dataset consists of more than 3,300 images depicting more than 500 different scenes having at least three real near duplicates. The dataset has a huge variability in terms of geometric and photometric transformations between scenes and their corresponding near duplicates. Finally, we suggest a method to compress the proposed image representation for storage purposes. Experiments show the effectiveness of the proposed near duplicate retrieval technique, which outperforms the original Bags of Visual Phrases approach.
KW - Bags of visual phrases
KW - Bags of visual words
KW - Codebooks alignment
KW - Image forensics
KW - Image retrieval
KW - Near duplicate images
UR - http://www.scopus.com/inward/record.url?scp=84904857411&partnerID=8YFLogxK
U2 - 10.1007/s11042-013-1470-4
DO - 10.1007/s11042-013-1470-4
M3 - Article
AN - SCOPUS:84904857411
SN - 1380-7501
VL - 72
SP - 1483
EP - 1506
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 2
ER -