Longest Increasing Subsequence

Intro

The algorithms to find the longest increasing subsequence in a sequence are discussed in various places (including Wikipedia), but in my opinion their exposition is not intuitive.

 

Wikipedia:

 

M[j] — stores the index k of the smallest value X[k] such that there is an increasing subsequence of length j ending at X[k] on the range ki.

 

Precise and succinct but not easy to parse and grasp. And why “index of smallest”? Why are we doing it this way?

Working through an example of discovery of longer and/or more promising subsequences will lead to a clean and hopefully intuitively understandable implementation.

Example

 

Let’s find the length of the longest increasing subsequence in the following sequence:

 

{4, 3, 8, 4, 10, 12, 5, 6, 7}

 

  1. We will consider all the values, from left to right, and keep track of all newly discovered increasing subsequences.
  2. We always have the empty sequence {} of length 0 before seeing any values from the input.
  3. An input value v can extend an existing sequence {x1, …, xk} iff v > xk producing a new sequence {x1, …, xk, v} of length k + 1
    1. eg. 5 can extend {3, 4} to {3, 4, 5} since 5 > 4
  4. The empty sequence can always be extended to a length 1 sequence
  5. We keep track of most promising discovered increasing subsequences listed in order of their growing length
  6. For any length of a discovered subsequence we prefer one that ends in a smaller number as having more potential for extension

 

Let’s look at an example.

Columns correspond to subsequence length.

Rows correspond to input values considered in order.

 

We will always try to extend the longest possible subsequence, so scan the list of sequences from the longest so far, back towards the empty sequence looking for the first opportunity to extend one. To find a sequence to extend we pay attention only to last elements of sequences, underlined in the table below.

 

 

0

1

2

3

4

5

 

{}

         

4

 

{4}

       

3

 

{3}

       

8

   

{3, 8}

     

4

   

{3, 4}

     

10

     

{3, 4, 10}

   

12

       

{3, 4, 10, 12}

 

5

     

{3, 4, 5}

   

6

       

{3, 4, 5, 6}

 

7

         

{3,  4, 5, 6 , 7}

 

  1. Before seeing any input we have the empty sequence.
  2. Number 4 can extend the empty sequence to form {4} of length 1.
  3. Number 3 cannot extend {4}, but it can extend the empty sequence to form {3}
    1. of {4} and {3} the latter is better (== worth keeping) for possible extension
  4. Number 8 can extend {3} to form the new longest seq {3, 8}
  5. Number 4 cannot extend {3, 8} but it can extend {3} to form {3, 4}
    1. {3, 4} replaces {3, 8} again because of its “extension potential”
  6. Number 10 extends {3, 4} to form the new longest {3, 4, 10}
    1. {3, 4, 10} is stored in the correct column for length 3
    2. {3, 4} remains unchanged in the column for length 2
    3. please note that 10 could also extend all shorter sequences, {3} and {}, but that would produce worse sequences that we already have, in this case eg. {3, 10} is worse than {3, 4}
  7. Number 12 easily extends {3, 4, 10} to {3, 4, 10, 12}, the new champion
  8. Number 5 cannot extend either {3, 4, 10, 12} or {3, 4, 10}. The longest seq it can extend is {3, 4}, forming {3, 4, 5}. As before, the new sequence of length 3 replaces {3, 4, 10} as better/preferable/of better extension potential as we will see shortly
  9. Number 6 cannot extend the current champion {3, 4, 10, 12} but it can extend {3, 4, 5}.
  10. Number 7 creates new champion {3, 4, 5, 6, 7}
  11. Please note the each and every input number introduced a new sequence
    1. created new champion, or
    2. extended empty {}, or
    3. improved an in-between sequence
      1. improvement of {3,8} to {3,4} enabled future extensions by 5,6,7

 

Please note that in the above table we have only used the underlined final values of the best subsequences in deciding whether a sequence can be extended.

And we only needed the smallest, most recently discovered such value (OK to overwrite previous findings).

The table above has shown all the discovered subsequences explicitly for illustrative purposes and because, logically, it is indeed these sequences that we were considering.

The implementation, however, can just use these underlined final values as proxies for whole sequences.

Storing just the ends of best sequences is sufficient for finding the length of a longest subsequence, but the subsequence itself; for that additional bookkeeping will be needed.

 

The table below keeps just the final values and shows the evolution of an array of (smallest) ending values of subsequences of various lengths. We underline values producing new champions, and bolden values representing an improvement in shorter sequences.

Unchanging values are copied forward from row to row.

 

 

0

1

2

3

4

5

 

{}

         

4

 

4

       

3

 

3

       

8

 

3

8

     

4

 

3

4

     

10

 

3

4

10

   

12

 

3

4

12

12

 

5

 

3

4

5

12

 

6

 

3

4

5

6

 

7

 

3

4

5

6

7

 

The length of the longest increasing subsequence is 5 — the highest column number with a non-zero entry.

 

// intuition: prefer smaller endings — more extension potential
   // it is enough to store only the latest seq found
   // each value extends some sequence, perhaps the empty seq
   // progress: extension of longest so far (which creates new longest)
   // or replacement of existing with one with better extension potential

   static int longestIncreasingSubsequenceLength(final int[] a) {
       final int n = a.length;
       // keep track of last elements of sequences indexed by seq length
       final int[] lastVal = new int[n + 1];
       int maxlen = 0; // empty sequence
       lastVal[0] = Integer.MIN_VALUE; // … so that the empty sequence can always be extended

       for (int i = 0; i < n; i++) {
           final int value = a[i];
           // search back for longest seq that can be extended
           for (int len = maxlen; len >= 0; –len) {
               if (value > lastVal[len]) { // seq of length len can be extended
                   lastVal[len + 1] = value;   // extend it by storing seq of length len+1
                   if (len == maxlen) {
                       maxlen = len + 1;   // we have a new maxlen
                   }
                   break;
               }
           }
       }
       return maxlen;
   }

This Java code is brief. The inner loop searches backwards for the longest subsequence to extend with the current value. Worst case, it will extend the empty sequence.

The “new champion?” test (len == maxlen?) can only be true on 1st execution of that loop.

That special border case can be moved outside the loop

 

static int longestIncreasingSubsequenceLength2(final int[] a) {
       final int n = a.length;
       // keep track of last elements of sequences indexed by seq length
       final int[] lastVal = new int[n + 1];

       int maxlen = 0; // empty sequence
       lastVal[0] = Integer.MIN_VALUE; // … so that the empty sequence can always be extended

       for (int i = 0; i < n; i++) {
           final int value = a[i];
           if (value > lastVal[maxlen]) {
               lastVal[++maxlen] = value;
           } else {
               // search back for longest seq that can be extended
               for (int len = maxlen; –len >= 0; ) {
                   if (value > lastVal[len]) { // seq of length len can be extended
                       lastVal[len + 1] = value;   // extend it by storing seq of length len+1
                       if (len == maxlen) {
                           maxlen = len + 1;   // we have a new maxlen
                       }
                       break;
                   }
               }
           }
       }
       return maxlen;
   }

 

Please note that final values of the best sequences are themselves increasing. That’s one of the loop’s invariants. This property enables binary search for a sequence to extend rather than linear scan.

 

In addition, to find an example of an actual longest subsequence we need more bookkeeping.

From: https://docs.google.com/document/d/1zSYdFPMFr_wHbF_l0Kb49Lz5Rc8n3ewttqV92XH_ies/edit#