.
  • Interview Questions: Edit distance

    Edit distance related questions are frequently asked by many large companies such as Google, Facebook, Apple, Microsoft, Uber, Airbnb, Amazon, et al. In this post, we describe three kinds of problems related to edit distance.

    What is edit distance?

    Edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances can be applied in natural language processing for automatic spell correction by selecting words from a dictionary that have a low distance to the word in question. 

    [Read More...]
  • Aho-Corasick Algorithm

    aho-corasick

    Problem:

    Given an input text and an a dictionary, find all occurrences of all words in the input text.
    Example:

    We can solve this problem using KMP algorithm, but it is not efficient.

    The solution is Aho-Corasick automation (AC automaton), which is one of the most famous string pattern match algorithms. See https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm for an introduction.

    By using AC automaton, we can solve the problem in O(n + m +  z), where n is the total length of the words in the dictionary, m is the length of the string T,

    [Read More...]