Finding H-index

Problem

Find h-index given array of numbers.

For more information about h-index, check out the following link.

https://en.wikipedia.org/wiki/H-index

The idea is that you need to find the last value in the array where the value is larger than its index.

For example,

 Say Scientist A has 5 publications with the following citation count.

  Publications = {10, 8, 5, 4, 3}.

In this case, the h-index is 4 since 4 whose index is 3 is the last value which value is larger than its index.


Answer

Given the fact that you are searching the sorted values, we can use a binary search to find the h-index.
This algorithm will run in O( log N).

My first attempt was a little lousy and ran in O(N). It used the brute force search.

This is the code using binary search. We assume that the input is already sorted in descending order.

If not, the running time will be O( N log N) due to the sorting. We can use either quick sort or merge sort for this.

#include <iostream>
#include <stdlib.h>
using namespace std;
int find_hindex(int *input, int s, int e, int last)
{
if (s > e)
return last;
int m = s+(e-s)/2;
if (input[m] >= m)
{
last = input[m];
s = m+1;
} else {
e = m-1;
}
return find_hindex(input, s, e, last);
}
int compare(const void *a, const void*b)
{
return (*(int*)b -*(int*)a);
}
int main()
{
int input[5] = {5, 3, 10, 4, 8};
qsort(input, 5, sizeof(int), compare);
cout << find_hindex(input, 0, 4, -1) << endl;
return 1;
}
view raw hindex.cpp hosted with ❤ by GitHub



UPDATE (2019-07-31): solved the problem again in python using the binary search. Made silly mistakes that was found after running the code.

def hindex_binary_search(list, s, e, current_max):
if s > e:
return current_max
half = s + (e - s)//2
if list[half] <= half:
# values right will be less than index, search left
return hindex_binary_search(list, s, half - 1, current_max)
else:
#list[half] > half
#values left will be greater than index, update max if necessary and search right.
if current_max == None or current_max < half:
current_max = half
return hindex_binary_search(list, half + 1, e, current_max)
def calculate_hindex(citations):
max_hindex_pos = hindex_binary_search(citations, 0, len(citations) - 1, None)
return citations[max_hindex_pos]
input = [10, 8, 5, 4, 3]
print ("input = {} h-index = {}".format(input, calculate_hindex(input)))

Practice statistics:

12:19: to write the correct algorithm for h-index
10:26: to write the code and test code (two bugs found after running. not returning the search result.)


UPDATE(2022-06-12): Solved the problem again. Took a while to understand what H-index is. Had a critical bug in the binary search to find the h-index.

class HIndexFinder:
def __init__(self):
self.index = None
def search(self, min, max, input):
if min > max:
return
half = (min + max)//2
if input[half] <= half:
self.index = half
self.search(min, half - 1, input)
else:
self.search(half + 1, max, input)
def find_h_index(self, input):
min = 0
max = len(input) - 1
# log (N)
self.search(min, max, input)
return self.index
input = [10, 8, 5, 4, 3]
finder = HIndexFinder()
h_index = finder.find_h_index(input)
print("h index = {} for {}".format(h_index, input))
input = [10, 8, 5, 3, 3]
finder = HIndexFinder()
h_index = finder.find_h_index(input)
print("h index = {} for {}".format(h_index, input))
input = [25, 8, 5, 3, 3]
finder = HIndexFinder()
h_index = finder.find_h_index(input)
print("h index = {} for {}".format(h_index, input))
view raw find_h_index.py hosted with ❤ by GitHub

Comments

Popular posts from this blog

Planting flowers with no adjacent flower plots

Find the shorted path from the vertex 0 for given list of vertices.