MillionGazillion™: A New Search Engine – Practice Questions for the Facebook Interview – Queslers

Get Practice Questions for the Facebook Interview

Facebook’s coding interviews are hard, but not impossible. Like anything else, it just takes practice. We’ll walk you through it, step by step.

MillionGazillion™: A New Search Engine

I’m making a search engine called MillionGazillion™.

I wrote a crawler that visits web pages, stores a few keywords in a database, and follows links to other web pages. I noticed that my crawler was wasting a lot of time visiting the same pages over and over, so I made a set, visited, where I’m storing URLs I’ve already visited. Now the crawler only visits a URL if it hasn’t already been visited.

Thing is, the crawler is running on my old desktop computer in my parents’ basement (where I totally don’t live anymore), and it keeps running out of memory because visited is getting so huge.

How can I trim down the amount of space taken up by visited?


Your strategy shouldn’t take a hit on runtime.

Replacing common substrings like “.com” and “www” with characters that aren’t allowed in URLs definitely wins us something, but we can do even better. How can we even further exploit overlaps or shared prefixes between URLs?


Notice that a boatload of URLs start with “www.”

We could make visited a nested dictionary where the outer key is the subdomain and the inner key is the rest of the URL, so for example visited[‘www.’][‘’] = True and visited[‘www.’][‘’] = True. Now instead of storing the “www.” for each of these URLs, we’ve just stored it once in memory. If we have 1,000 URLs and half of them start with “www.” then we’ve replaced 500 * 4 characters with just 4 characters in memory.

But we can do even better.

What if we used this same approach of separating out shared prefixes recursively? How long should we make the prefixes?

What if we made the prefixes just one character?


We can use a trie. If you’ve never heard of a trie, think of it this way:

Let’s make visited a nested dictionary where each map has keys of just one character. So we would store ‘’ as visited[‘g’][‘o’][‘o’][‘g’][‘l’][‘e’][‘.’][‘c’][‘o’][‘m’][‘*’] = True.

The ‘*’ at the end means ‘this is the end of an entry’. Otherwise we wouldn’t know what parts of visited are real URLs and which parts are just prefixes. In the example above, ‘’ is a prefix that we might think is a visited URL if we didn’t have some way to mark ‘this is the end of an entry.’

Now when we go to add ‘’ to visited, we only have to add the characters ‘/maps’, because the ‘’ prefix is already there. Same with ‘’.

We can visualize this as a tree, where each character in a string corresponds to a node.

A trie is a type of tree.

To check if a string is in the trie, we just descend from the root of the tree to a leaf, checking for a node in the tree corresponding to each character in the string.

A trie containing "", "", "", "", "", and ""

A trie containing “”, “”, “”, “”, “”, and “”

How could we implement this structure? There are lots of ways! We could use nested dictionaries, nodes and pointers, or some combination of the two. Evaluating the pros and cons of different options and choosing one is a great thing to do in a programming interview.

In our implementation, we chose to use nested dictionaries. To determine if a given site has been visited, we just call add_word(), which adds a word to the trie if it’s not already there.

  class Trie(object):
    def __init__(self):
        self.root_node = {}

    def add_word(self, word):
        current_node = self.root_node
        is_new_word = False

        # Work downwards through the trie, adding nodes
        # as needed, and keeping track of whether we add
        # any nodes.
        for char in word:
            if char not in current_node:
                is_new_word = True
                current_node[char] = {}
            current_node = current_node[char]

        # Explicitly mark the end of a word.
        # Otherwise, we might say a word is
        # present if it is a prefix of a different,
        # longer word that was added earlier.
        if "End Of Word" not in current_node:
            is_new_word = True
            current_node["End Of Word"] = {}

        return is_new_word

If you used a ternary search tree, that’s a great answer too. A bloom filter also works—specially if you use run-length encoding.


How much space does this save? This is about to get MATHEMATICAL.

How many characters were we storing in our flat dictionary approach? Suppose visited includes all possible URLs of length 5 or fewer characters. Let’s ignore non-alphabetical characters to simplify, sticking to the standard 26 English letters in lowercase. There are 26^5 different possible 5-character URLs (26 options for the first character, times 26 options for the 2nd character, etc), and of course 26^4 different possible 4-character URLs, etc. If we store each 5-character URL as a normal string in memory, we are storing 5 characters per string, for a total of 5 * 26^5 characters for all possible 5-character strings (and 4 * 26^4 total characters for all 4-character strings, etc). So for all 1, 2, 3, 4, or 5 character URLs, our total number of characters stored is:

5 * 26^5 + 4 * 26^4 + 3 * 26^3 + 2 * 26^2 + 1 * 26 ^ 1

So for all possible URLs of length n or fewer, our total storage space is:

n26^n + (n-1)26^{(n-1)} + . . . + 1 * 26 ^ 1

This is O(n26^n).

How many characters are stored in our trie? The first layer has 26 nodes (and thus 26 characters), one for each possible starting character. On the second layer, each of those 26 nodes has 26 children, for a total of 26^2 nodes. The fifth layer has 26^5 nodes. To store all 1, 2, 3, 4, or 5 character URLs our trie will have 5 layers. So the total number of nodes is:


So for all URLs of length nn or fewer, we have:


This is O(26n). We’ve shaved off a factor of n.

Bonus trivia: although the HTTP spec allows for unlimited URL length, in practice many web browsers won’t support URLs over 2,000 characters.

What We Learned:

We ended up using a trie. Even if you’ve never heard of a trie before, you can reason your way to deriving one for this question. That’s what we did: we started with a strategy for compressing a common prefix (“www”) and then we asked ourselves, “How can we take this idea even further?” That gave us the idea to treat each character as a common prefix.

That strategy—starting with a small optimization and asking, “How can we take this same idea even further?”—is hugely powerful. It’s one of the keys to unlocking complex algorithms and data structures for problems you’ve never seen before.

Practice Questions for the Facebook Interview Review:

In our experience, we suggest you solve this Practice Questions for the Facebook Interview and gain some new skills from Professionals completely free and we assure you will be worth it.

If you are stuck anywhere between any coding problem, just visit Queslers to get the Practice Questions for the Facebook Interview

Find on Interview Cake


I hope this Practice Questions for the Facebook Interview would be useful for you to learn something new from this problem. If it helped you then don’t forget to bookmark our site for more Coding Solutions.

This Problem is intended for audiences of all experiences who are interested in learning about Data Science in a business context; there are no prerequisites.

Keep Learning!

More Coding Solutions >>

LeetCode Solutions

Hacker Rank Solutions

CodeChef Solutions

Leave a Reply

Your email address will not be published. Required fields are marked *