Illustration

Map Reduce Data Mining and Ranking

Given a Google Search Query, find the AdWords Categories best describing it


Problem Statement

You have one file with (query, URL, score) comma separated row data in which for each URL, the score of its relevance to the given query is given. You have another file that has (URL, categories) info where categories are also comma separated and describe the URL content. Given a query, find the categories best describing its results.

Example:


search, www.google.com, 100

search, www.facebook.com, 20
social, www.google.com, 2
awesomeness, www.aminariana.com, 100

and


www.google.com, advertising, engineering, internet

www.facebook.com, advertising, php


Evaluation


  1. Writing an algorithm that outputs some meaningful results (25%)

  2. Being memory-efficient at scale (25%)

  3. Producing complete result coverage (25%)

  4. Producing percentages for category rankings (25%)


References

Sashi is from Google. He asked me this question in my interview.

Amin A.

Written by

Amin Ariana

A software entrepreneur from San Francisco