Aligning codebooks for near duplicate image detection

Sebastiano Battiato, Giovanni Maria Farinella, Giovanni Puglisi, Daniele Ravì

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)


The detection of near duplicate images in large databases, such as the ones of popular social networks, digital investigation archives, and surveillance systems, is an important task for a number of image forensics applications. In digital investigation, hashing techniques are commonly used to index large quantities of images for the detection of copies belonging to different archives. In the last few years, different image hashing techniques based on the Bags of Visual Features paradigm appeared in literature. Recently, this paradigm has been augmented by using multiple descriptors (e.g., Bags of Visual Phrases) in order to exploit the coherence between different feature spaces. In this paper we propose to further improve the Bags of Visual Phrases approach considering the coherence between feature spaces not only at the level of image representation, but also during the codebook generation phase. Also we introduce a novel image database specifically designed for the development and benchmarking of near duplicate image retrieval techniques. The dataset consists of more than 3,300 images depicting more than 500 different scenes having at least three real near duplicates. The dataset has a huge variability in terms of geometric and photometric transformations between scenes and their corresponding near duplicates. Finally, we suggest a method to compress the proposed image representation for storage purposes. Experiments show the effectiveness of the proposed near duplicate retrieval technique, which outperforms the original Bags of Visual Phrases approach.

Original languageEnglish
Pages (from-to)1483-1506
Number of pages24
JournalMultimedia Tools and Applications
Issue number2
Publication statusPublished - Sept 2014
Externally publishedYes


  • Bags of visual phrases
  • Bags of visual words
  • Codebooks alignment
  • Image forensics
  • Image retrieval
  • Near duplicate images


Dive into the research topics of 'Aligning codebooks for near duplicate image detection'. Together they form a unique fingerprint.

Cite this