Teaching a search engine to “see” images

If a picture is worth a thousand words, an online image search today is effectively one where the search engine is partly blindfolded. Back when ye olde World Wide Web was a collection of static interlinked web pages comprised mostly of text, uncovering relevant images wasn’t of paramount importance. But today’s web has exploded to include immense volumes of multimedia—most of which lack the text format that enables them to be indexed or crawled by search engines.

Case in point: search for “miami hurricanes t-shirt” in Google and the first listing in Google’s shopping results is a green Adidas Miami Hurricanes t-shirt. But type in “green miami hurricanes t-shirt” and the same product doesn’t appear, even though it’s a perfect match. Ironically, the shopper is penalized for specificity, and Adidas has lost a potential sale.

Effective merchandising depends not only on the pervasive use of images, but specifically on the right image. You can now see why image search is critical not just for the potential customer but especially for an online business.
Yet, while many products on the web today are represented by an image and accompanying text, the description is often limited. Search engines do not easily understand much of the richness (e.g. color) conveyed through an image or photo.

So how does one begin to identify, analyze and categorize the immense volume of image-based content on the web that is essentially invisible to searchers? The only manual option is using available metadata because search engines look for text associated with the image (alt text) as an indicator for what the image represents. The main challenge with this approach is that it depends on the website owner to create the metadata, which could potentially result in missing and/or inaccurate descriptions. To encourage labeling, Carnegie Mellon University created the ESP Game, which packaged the task of creating labels to describe images as a game – using humans to help computers recognize images.

People are also trying to crack the problem in an automated fashion. In color extraction alone, multiple challenges exist, including distinguishing background from foreground colors, and separating the main product from the “noise” within an image – e.g., separating the high-heeled shoe that the model is wearing from her dress. Within products, it’s also necessary to comb through subtleties to identify the main color (e.g., interior vs. exterior colors, patterns/stripes/dots, accent/highlight colors) and overcome image dithering techniques commonly used to approximate colors. In the dithering example below, humans perceive the intended color violet, but computers only “see” the individual red and blue colors used to form the diffused image.

An illustration of dithering. Red and blue are the only colors used but, as the red and blue squares are made smaller, the patch appears violet.

Categorizing and classifying image-based content is a difficult but important problem to solve. No perfect solution exists, and many challenges surround the automated extraction of metadata from images. BloomReach—along with other companies—is actively working to solve this problem algorithmically (e.g., blending colors and regenerating images to circumvent dithering).

Images sell—and helping search engines “see” and interpret them in order to deliver the most relevant image to the right consumer is imperative as image-based content continues to accelerate in growth across the web.