Finding H-index
Problem
Find h-index given array of numbers.For more information about h-index, check out the following link.
https://en.wikipedia.org/wiki/H-index
The idea is that you need to find the last value in the array where the value is larger than its index.
For example,
Say Scientist A has 5 publications with the following citation count.
Publications = {10, 8, 5, 4, 3}.
In this case, the h-index is 4 since 4 whose index is 3 is the last value which value is larger than its index.
Answer
Given the fact that you are searching the sorted values, we can use a binary search to find the h-index.
This algorithm will run in O( log N).
My first attempt was a little lousy and ran in O(N). It used the brute force search.
This is the code using binary search. We assume that the input is already sorted in descending order.
If not, the running time will be O( N log N) due to the sorting. We can use either quick sort or merge sort for this.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <iostream> | |
#include <stdlib.h> | |
using namespace std; | |
int find_hindex(int *input, int s, int e, int last) | |
{ | |
if (s > e) | |
return last; | |
int m = s+(e-s)/2; | |
if (input[m] >= m) | |
{ | |
last = input[m]; | |
s = m+1; | |
} else { | |
e = m-1; | |
} | |
return find_hindex(input, s, e, last); | |
} | |
int compare(const void *a, const void*b) | |
{ | |
return (*(int*)b -*(int*)a); | |
} | |
int main() | |
{ | |
int input[5] = {5, 3, 10, 4, 8}; | |
qsort(input, 5, sizeof(int), compare); | |
cout << find_hindex(input, 0, 4, -1) << endl; | |
return 1; | |
} |
UPDATE (2019-07-31): solved the problem again in python using the binary search. Made silly mistakes that was found after running the code.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def hindex_binary_search(list, s, e, current_max): | |
if s > e: | |
return current_max | |
half = s + (e - s)//2 | |
if list[half] <= half: | |
# values right will be less than index, search left | |
return hindex_binary_search(list, s, half - 1, current_max) | |
else: | |
#list[half] > half | |
#values left will be greater than index, update max if necessary and search right. | |
if current_max == None or current_max < half: | |
current_max = half | |
return hindex_binary_search(list, half + 1, e, current_max) | |
def calculate_hindex(citations): | |
max_hindex_pos = hindex_binary_search(citations, 0, len(citations) - 1, None) | |
return citations[max_hindex_pos] | |
input = [10, 8, 5, 4, 3] | |
print ("input = {} h-index = {}".format(input, calculate_hindex(input))) |
Practice statistics:
12:19: to write the correct algorithm for h-index
10:26: to write the code and test code (two bugs found after running. not returning the search result.)
UPDATE(2022-06-12): Solved the problem again. Took a while to understand what H-index is. Had a critical bug in the binary search to find the h-index.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class HIndexFinder: | |
def __init__(self): | |
self.index = None | |
def search(self, min, max, input): | |
if min > max: | |
return | |
half = (min + max)//2 | |
if input[half] <= half: | |
self.index = half | |
self.search(min, half - 1, input) | |
else: | |
self.search(half + 1, max, input) | |
def find_h_index(self, input): | |
min = 0 | |
max = len(input) - 1 | |
# log (N) | |
self.search(min, max, input) | |
return self.index | |
input = [10, 8, 5, 4, 3] | |
finder = HIndexFinder() | |
h_index = finder.find_h_index(input) | |
print("h index = {} for {}".format(h_index, input)) | |
input = [10, 8, 5, 3, 3] | |
finder = HIndexFinder() | |
h_index = finder.find_h_index(input) | |
print("h index = {} for {}".format(h_index, input)) | |
input = [25, 8, 5, 3, 3] | |
finder = HIndexFinder() | |
h_index = finder.find_h_index(input) | |
print("h index = {} for {}".format(h_index, input)) |
Comments
Post a Comment