Please send your code, additional data you used and the output to email@example.com
We encourage you to think outside the box and use absolutely any means necessary to solve the following problems: Wine Tasting, Perplexing Puzzles, Color Me Purple and Up Scrambled All. The solutions will be judged based on the code quality, accuracy and scalability to significantly larger input sizes.
A large group of friends from the town of Nocillis visit the vineyards of Apan to taste wines. The vineyards produce many fine wines and the friends decide to buy as many as 3 bottles of wine each if they are available to purchase. Unfortunately, the vineyards of Apan have a peculiar restriction that they can not sell more than one bottle of the same wine. So the vineyards come up with the following scheme: They ask each person to write down a list of up to 10 wines that they enjoyed and would be happy buying. With this information, please help the vineyards maximize the number of wines that they can sell to the group of friends.
A two-column TSV file with the first column containing the ID (just a string) of a person and the second column the ID of the wine that they like. Here are three input data sets of increasing sizes. Please send us solutions even if it runs only on the first file.
First line contains the number of wine bottles sold in aggregate with your solution. Each subsequent line should be two columns, tab separated. The first column is an ID of a person and the second column should be the ID of the wine that they will buy.
Please check your work. Note that the IDs of the output second column should be unique since a single bottle of wine can not be sold to two people and an ID on the first column can appear at most three times since each person can only buy up to 3 bottles of wine.
Finding synonyms for words or phrases is an integral part of search retrieval. Not all users use the same terms for searching and understanding user intent is critical for finding relevant content. There are several online resources for finding synonyms for common words in English. However, finding synonyms for phrases used in specific domains such as commercial products can be tricky.
In this challenge, we keep things simpler, and try to build for a program that can consider a candidate pair of words or phrases and flag it as a synonym. For the purpose of this exercise, we’ll focus on shopping queries, and consider two candidate phrases to be synonyms if a typical shopper would use the two candidates interchangeably. For example, “ac adapter” is a synonym for “power adapter” but is not a synonym for “power tools.”
A two-column, comma-separated file of candidate synonym pairs:
A three column comma separated file with each line corresponding to the input file and third column is one of “true” or “false” denoting whether the candidate is a synonym or not respectively.
Color Me Purple
Detecting content present in images is a hard problem for computers and is easy for humans. This has simultaneously led to a variety of hard problems and useful applications. For example, captchas rely on this hardness to ensure that websites only allow humans but not bots to access parts of the site. Unfortunately, search engines have the same problem as well. Google Image search largely relies on text around the image to determine its content (see Wikipedia for details).
The task in this challenge is to take a first crack at understanding an image. Specifically, we wish to identify the color of the main product that is being sold in an image. Note that this might require distinguishing the color of the product from that of the background, the model’s skin or hair, etc. If there are multiple main colors that are good matches for an image (a striped shirt, for example), then select any one color. It is worse to mislabel an image with a color than not to label it at all. That is, the program can determine that it is hard to distinguish the main product color from other colors in the image and choose simply not to label the image.
In addition to the image, we also provide the title of the product which can be used to better understand the image. As an example, the following image with the title “Marrow Finished V-Neck Tee” should be labelled “orange”:
You can get a bigger sample solution here: https://s3.amazonaws.com/br-user/puzzles/color_me_purple_sample.tsv
Two-column, tab-separated file with the image URL followed by the product title:
The color pallete. Each line contains a comma separated LAB color value followed by a tab and the name of the color:
Two-column, tab-separated file where the first column is the image URL and the second contains the color from the palette or “-1” if your algorithm chooses not to not classify the image with a color.
Up Scrambled All
All the pieces of text on the Internet can be modelled as a language in itself. Academic writing, personal blogs, online catalogs all have their own distinctive language structure. When people search for products that they want to buy or retailers advertise for the products that they are selling, they have their own language.
In this challenge, we will examine the structure of product descriptions as present on retailers’ websites. Specifically, we will focus on the product heading or title which is the most concise description of a product. We are going to work with the words shuffled and determine programmatically if we can recover the most likely title word order.
For example if the shuffled words are “espresso frother milk maker citiz nespresso with” then the product as advertised would most likely be “nespresso citiz espresso maker with milk frother.” Sometimes multiple orderings can both be acceptable. For instance, “frozen organic blueberry muffins” and “organic frozen blueberry muffins” are both reasonable product titles but “muffins frozen blueberry organic” is not likely a product title as advertised. If multiple word orderings seem valid, then as part of this exercise any one order can be selected by your program.
A file with one line each containing possibly shuffled words:
A two-column, tab-separated file where the lines correspond to the input lines. The first column is the same as the input file and the second column contains the best guess for the recovered product title.