BLOG • DIGITAL INSIGHTS
How to build a content-based image retrieval (CBIR) system for a Sitefinity travel agency blog - using a Google Landmark V2 deep neural network to extract feature vectors and Elasticsearch to find visually similar blog posts when a user uploads a photo.
Artificial intelligence is becoming more and more part of our everyday life - it's in the camera of our smartphone, the recommended products in online shops, suggested ads in social media, and much more.
Our travel agency has a blog where trips to different places and landmarks are posted. Each blog post has a title, summary, description, tags with the nearest cities, and a media gallery with images from the trip. We're building an image search engine where a user can upload an image of a public place or landmark and get related blog posts as a result - with links to buy flight tickets or book a hotel in the nearest city.
There are three main types of image search engines: search by example, search by metadata, and hybrid search. For this project I chose to implement search by example - also known as Content-Based Image Retrieval (CBIR) - which means a user uploads a photo, that image is compared to all images from the blog posts, and the most similar ones are returned as results.
There are 4 steps to designing and building a CBIR system:
The descriptor is what you use to describe an image and compare it to others. This could be colour distribution (colour histogram), shape, texture, and so on - the choice depends on the expected results. For this project I'm using a Deep Neural Network to extract the feature vector, specifically the Google Landmark V2 pre-trained model.
Our dataset consists of all images associated with blog posts. Each image is passed through the DNN and its feature vector is extracted. Elasticsearch has a dense_vector type field which we use to store these vectors.
You can read more about the dense_vector type field in the Elasticsearch documentation.
There are different metrics for defining similarity between images. Elasticsearch provides two convenient functions: Cosine similarity (cosineSimilarity) and Euclidean distance (l2norm). While both can be used to compare vectors, I found that Euclidean distance gives better results for images - Cosine similarity is more suitable for text comparisons.
Note: The l2norm function is available in Elasticsearch 7.x. If you are on an older version you can still use cosineSimilarity. You can read more about both in the Elasticsearch documentation.
Here is a comparison of results using both methods:
When the user uploads an image, we use the same DNN to extract its feature vector and compare it against the vectors stored in Elasticsearch. One important consideration is defining the right threshold for accurate results. Through trial and error, I found that the most accurate results come from setting the threshold to the mean of the minimum and maximum score.
As you can see, AI can be put to good use even in a simple blog. Stay tuned for more interesting use cases.
If you missed the previous article about using face recognition for cropping images, you can read it here: AI Sitefinity Image Processor.
Explore more insights and case studies from our team.