Tags: , ,

# Median of a stream of numbers

Given a stream of integers, find the median of the stream of numbers received so far.

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

Median of stream of numbers can be queried at multiple times at different point of time. Insertion and median functions can be called in any order. For example:

First solution be to store the stream received in an unsorted array. Finding median of stream of numbers will take O(N log N) (because we have to sort array first before returning median), time complexity for insertion will be O(1).

How about storing integer in a sorted array? Insertion will take O(N) as we need to shift elements once correct place is identified for new integer. Median can be found O(1).

A better solution:

We can use heap to solve this problem.

In Java, the `PriorityQueue` class is a priority heap. We can use two heaps to store the lower half and the higher half of the data stream. The size of the two heaps differs at most 1.

Follow up:

What if there are infinite numbers such that it is impossible to hold the data in memory?

There are many solutions mentioned in this thesis:

An Experimental Study of Distributed Quantile Estimation https://arxiv.org/pdf/1508.05710.pdf

q-digest : an algorithm for computing approximate quantiles on a collection of integers. It lets you estimate arbitrary quantiles (median would be the 50th percentile).

The original paper of Q-digest: http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf

Quantiles on Streams is another paper worth to read.