With the following document, ResearchHub aims to provide Hackathon participants with guidelines, resources and examples to better understand the topic of academic reputation and how that is currently computed.

The object, and ultimate output, of the hackathon should be a mathematical formulation, or algorithm, that computes the reputation of any individual that registers on ResearchHub. Currently, the ResearchHub reputation algorithm only takes into account the activity that is carried out on the platform, hence any individual that joins ResearchHub as a new user, always starts from a default 100 points Lifetime Reputation (LR). This LR, however, in its current form is not able to capture the academic reputation that users have related to their track record in academia. Moreover, this is only a general metric, not able to fully capture the diversity and nuance of academic profiles of our users. Thus, it’s crucial to provide a reputation that is dynamic and can change depending on the field of expertise of the user.

This is the motivation behind the Reputation Hackathon, and the task that you, as hackers, are required to solve. LR should be an all encompassing metric that, at a minimum, takes into account:

Academic credentials, such as the traditional metrics used in academia (e.g. affiliation, number of published papers, citations, etc.)
Field of expertise, so that each user can have different reputation scores depending on the context where that is visualized (i.e. a researcher that develops biomedical implants has a very different reputation in the field of medicine and engineering with respect to fields like finance and geophysics)
ResearchHub activity, which could help individuals that are underrepresented in academia or who are citizen scientists create their own reputation by contributing to the scientific discussion that happens on ResearchHub.

Each of these elements has several variables that could be used in computing a mathematical formulation, and the choice of these variables is part of your task as participants of the ResearchHub Reputation Hackathon.

Reputation Algorithms

Examples exist already of other platforms that have tried to compute their own reputation scores, including ResearchGate and SciVal as detailed briefly below. There are surely other applications of reputations developed elsewhere and participants are freely able to browse as needed. These specific examples are simply provided due to their prevalence in academia.

ResearchGate

One such example is ResearchGate, which had a reputation score called the ResearchGate score (RG score) until 2022, but that then was removed. Interestingly, Researchgate provided a closing update that detailed some of the outcomes and user experiences from the RG score, some of which we highlight below

“From the start, our members appreciated the ability to quickly evaluate their own and other researchers’ contributions to science through the RG Score.”
“On the flip side, some members were frustrated with the RG Score’s intransparency and with their research impact being bound to a single metric”
“We also heard from members who reported fluctuations in their score that they couldn’t explain”
“At the same time, we have been following the movement within the academic community towards responsible use of research metrics and a more holistic approach to assessing research impact.”

These reviews lead us to note a few important considerations when developing a Reputation Algorithm. A high reputation in research has the potential to become an important determinant of an individual’s success and opportunity in the field of research. Therefore, it should be made readily available for the researcher to understand exactly which aspects of their research Reputation that needs improving. This could mean providing an in-depth user breakdown of various skills, records, and areas of improvement as well as making sure the weighting of each of these individual aspects is clear and transparent. Further, the sensitivity of the metrics should be robust against small changes in the variables so as to provide a reasonable degree of stability for the users being evaluated.

If you would like to read more about the RG score, which has also receive mixed critiques from various authors, please feel free to browse the following links:

SciVal

SciVal is a tool curated by Elsevier that pulls from over 55 million publication records from 5000+ publishers worldwide to quantify the impact of a given researcher (Elsevier’s SciVal). Importantly, SciVal has provided a Research Metrics Guidebook as a downloadable PDF (SciVal Guidebook) which has a variety of useful resources including a section dedicated to the selection (3.) and discussion of research metrics (4.0) with previews shown below:

Figure: Section overviews from the SciVal Guidebook provided on Elsevier’s website. Chapters 3.0 and 4.0 detail potentially useful discussion on research reputation metrics.

Example Equation from ChatGPT

In order to visualize what it means to compute a reputation algorithm we went one step further and decided to ask ChatGPT how that should be composed, and the figure below shows the element that the software decided to include:

Figure: Example hackathon deliverable, written by ChatGPT, which details a calculate_reputation_score() function that has several research variables including citations, h-index, altmetrics, and more.

Example Output Visualization

One of the key limitations of reputation metrics currently is their inability to specify the discrete aspects of a researcher’s skillset that defines them as well as clearly and concisely show these to the user. In an ideal algorithm, the individual traits that comprise a researcher’s background would be visible such as is possible using a Radar plot (example image shown below). In the radar plot implementation, a user could be provided an "Estimated Reputation" as an average/median/integral of individual metrics but then also provide those discrete metrics individually on the radar chart. This enables the specification of individual user skill sets for specific tasks without requiring mastery of all subsets. As an example, if someone needs a peer review of a biotechnology paper they should be able to find users who are highly skilled in that discrete metric.

Figure: Radar plot example showing the different values for each relative sub-field. Image transposed from datanovia.com.

While these are only simple examples of reputation algorithms and outputs, it should already help you hackers have an example of an output that could be presented during the Demo Day.

Hackers are allowed to use any resource they find valuable and relevant to the challenge, however the use of these resources will have to be disclosed and motivated when presenting the final result, as the motivation behind picking a particular metric is sometimes more important than the metric itself.

OpenAlex API

“The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. There are five types of entities: works, authors, venues, institutions, and concepts.

Together, these make a huge web (or more technically, heterogeneous directed graph) of hundreds of millions of entities and billions of connections between them all.”

– Docs.openalex.org

Hackathon participants are encouraged to utilize OpenAlex API, which ResearchHub is already employing for some elements of its codebase. OpenAlex is already aggregating data from several projects, with two important sources being MAG and Crossref. Other key sources used include:

ORCID
ROR
DOAJ
Unpaywall
Pubmed
Pubmed Central
The ISSN International Centre
The General Indexg
Web crawls
Subject-area and institutional repositories from arXiv to Zenodo

Additional details on how OpenAlex works and the kind of resources it provides can be found in their docs (https://docs.openalex.org/api). Also, a step-by-step tutorial on how to use OpenAlex API can be found at: https://docs.openalex.org/api/api-tutorial. By using OpenAlex you would be able to get access to five different kind of resources being:

Works (papers, books, datasets, etc..)
Authors
Venues (journals and repositories that host works)
Institutions (affiliates and universities)
Concepts (tags)

Figure: Graph connections of available resources and their linkages from the OpenAlex API documentation.

Validation of Algorithm

The best way to see if an algorithm for reputation is working well is to test it on real world data. Luckily, the resources that we provided are already sufficient for testing the algorithm that you will develop on real data. However, participants are free to pull any additional data from available sources online, if needed for testing and showcasing the algorithm. As in the case of the resources used to develop the algorithm, participants will also be required to provide details on where these real world data were gathered from, such that the team would be able to verify the validity of these data.

100%

Discussion