# Leetcode H-Index

**Problem Description**

*h*if

*h*of his/her

*N*papers have at least

*h*citations each, and the other

*N − h*papers have no more than

*h*citations each.”

`citations = [3, 0, 6, 1, 5]`

, which means the researcher has `5`

papers in total and each of them had received `3, 0, 6, 1, 5`

citations respectively. Since the researcher has `3`

papers with at least `3`

citations each and the remaining two with no more than `3`

citations each, his h-index is `3`

.`h`

, the maximum one is taken as the h-index.

There are many blogs to describe this problem. I found the following two articles are easiest to understand. Thanks for the author.

The following articles are from:

http://traceformula.blogspot.com/2015/09/h-index-leetcode.html

http://traceformula.blogspot.com/2015/09/h-index-ii-leetcode.html

**Solution**

As we know, H-Index is a measure of productivity and citation impact of a researcher. It is also called Hirsch index or Hirsch number after the name of the physicist Jorge Hirsch.

*citations*array (Credit: Wiki)

*citations*is an array of numbers of citations of a researcher. We can easily see that the following formula holds if

*citations*is sorted in decreasing order:

**h-index of**

*citations =*max (min (*citations[i], i*) for all i from 1 to number of papers)Therefore, we can easily implement the calculation of h-index by sorting the array in decreasing order then applying the above formula. Below is the **2-line Python code in O(nlogn) time**.

1 2 3 4 5 6 7 8 9 |
class Solution(object): def hIndex(self, citations): """ :type citations: List[int] :rtype: int """ citations.sort(reverse=True) return max([min(k+1, v) for k,v in enumerate(citations)]) if citations else 0 |

By sorting, it takes O(nlogn) time complexity.

However, we can do much better by an O(n) time algorithm. Pause yourself for a few minutes to think about it before continue reading ^_^.

We have some simple observation here, but it really helps to improve the performance. For each number *h* from *0* to *n *where *n *is the number of papers, we need to find out how many citations of a paper equal *h *called *equal_h*. Based on this, we can find the number of citations values that is at least *h, *and no more than h. To find the number of citations value that is at least h, we take sum of *equal_h[i]* for *i* from *h* to *n*!!! And similarly, to find the number of citations values that is no more than *h *citations each, we can sum up *equal_h[i] *for *i*from 0 to* i. *And we can find the h-index by simple running from the back of the array *equal_h.*

So if running from the back of the array *equal_h *, and h is the first index that satisfies the following conditions:*equal_h[h] + equal_h[h+1] + … + equal_h[n] >= h (*)* *equal_h[0] + equal_h[1] + … + equal_h[h] >= N-h (**)*

Then *h *is what we are looking for.

However, we have: *equal_h[0] + equal_h[1] + … + equal_h[n] = n *

Therefore: *equal_h[0] + equal_h[1] + … + equal_h[h] = n- (equal_h[h+1] + … + equal_h[n]) *

Another note is that since *h* is the first element satisfies the 2 above conditions, then *h+1 *does not satisfies one of them, meaning that either*(1): { equal_h[h+1] + equal_h[h+2] + … + equal_h[n] <= h *

or *(2): { equal_h[h+1] + equal_h[h+2] + … + equal_h[n] >= h+1 *

and *equal_h[0] + equal_h[1] + … + equal_h[h+1] < N-(h+1) }*

(1) suggests that: *equal_h[0] + equal_h[1] + … + equal_h[h] >=N-h *which is (**)

(2) suggests that: *equal_h[h+2] + equal_h[h+3] + … + equal_h[n] >= h+2 *

This inequality will be repeated until *equal_h[n] >= n *, which is wrong.

So all we need is to find the first *h *satisfies the condition (*), and we do not need to check the condition (**).

Below are the codes implement in different languages.

**I. Python code – O(n) **

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
public class Solution { public int hIndex(int[] citations) { int n = citations.length; int [] equal_h = new int[n+1]; for (int h = 0; h<n; h++){ if(citations[h] >= n) equal_h[n] += 1; else equal_h[citations[h]] += 1; } int s = 0; //we don't need check overflow here coz sum always <= n for (int h = n; h>0; h--){ s += equal_h[h]; if (s >= h) return h; } return 0; } } |

### H-Index II [LeetCode]

**Problem Description**

Follow up for H-Index: What if the

`citations`

array is sorted in ascending order? Could you optimize your algorithm?**Solution**

In the previous version, we sorted citations in reverse order. In this version of problem, the

*citations*array is in ascending order.

**I. O(n) Solution**

As we already know in the previous version of finding H-Index, we can solve the problem in linear time. Below is the 1-line Python solution:

1 2 3 4 5 6 7 8 |
class Solution(object): def hIndex(self, citations): """ :type citations: List[int] :rtype: int """ return max(min(v, len(citations)-k) for k, v in enumerate(citations + [0])) |

**II. Java Code**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
public class Solution { public int hIndex(int[] citations) { int n = citations.length; int start = 0; int end = n-1; int middle; while (start <= end){ middle = (end-start)/2 + start; if (citations[middle] == n-middle) return citations[middle]; else if(citations[middle] < n-middle) start = middle + 1; else end = middle - 1; } return n - end - 1; } } |