Language Vocabulary Sampling

You are given a stream of conversation in a mysterious language from planet Mars. Approximate the set of word vocabulary in this language.


Problem Statement

You are given a reference to an instance of a class called "WordStream" that has a method called "string GetNextMartianWord()", where Martian is a mysterious language from planet Mars. This method gives you a random word from the Martian language, picked with equal probability, and with replacement (i.e. it could pick the same word more than once). Given N is the maximum number of times that you can sample Martian language by calling this method, and the guarantee that N is at least an order of magnitude larger than the number of possible Martian words, write a pseudo-code algorithm to approximate the number of words Martians have in their language.

Note: The signature of your method is:


Evaluation


  • Completeness of the pseudo-code algorithm (25%)

  • Correctness of the algorithm, demonstrated by quick manual unit tests (25%)

  • What is the time and space complexity in terms of N? (25%)

  • Describe how you would change your code to print the most frequently encountered 3-letter prefix before returning. (25%)


References

I designed this question

Amin A.

Written by

Amin Ariana

A software entrepreneur from San Francisco