Reputation v2 & Healthy Research Rewards

Tyler Diorio, Patrick Joyce, Kobe Attias, Taki Koutsomitis, Gregor Zurowski, Brian Armstrong, Riccardo Goldoni, Jeffrey Koury

Healthy Research Rewards

The purpose of our newly revamped Reputation v2 and Healthy Research Rewards is to provide new incentives that promote research content that is maximally reproducible. To this end, we’ve implemented two categories of incentives that we’ll dive deeper into below:

Reputation (REP v2)
Financial (ResearchCoin Rewards):

Reputational Incentives:

Proper documentation and assessment of reputation is an integral aspect of science. At ResearchHub we want to give users more context on who they interact with, who is making claims, and what their expertise is. REP v2 is surfaced most notably in the newly revamped Author Profile Pages which highlight the “Reputation” of a given author in their fields of expertise along with key statistics from their works, research achievements, and more.

Subfields

Reputation is quantified across 248 research areas, termed “Subfields”. These Subfields are labeled using the OpenAlex topic classification system which leverages Scopus’ ASJC structure (see OpenAlex Topic Classification Whitepaper.docx). Through OpenAlex’s research information, we will be able to initialize Reputation and Author Profile Pages automatically for any researcher that verifies their identity, even one who has just joined ResearchHub for the first time.

An Iterative Approach to Reputation: "REP"

Reputation is quantified by assigning point values to various actions and outcomes - these points are termed REP. We are using citations to initialize Reputation in a way that allows us to continually iterate towards better, more comprehensive metrics. We’ve opted to begin with citations alongside upvotes as it is important to recognize the existing corpus of researchers. These researchers have contributed to their fields through publication and peer reviewed research in addition to ResearchHub specific actions (i.e. upvotes). We’ve built the framework for REP using a composable structure that allows us plenty of room to be creative and continually add more nuance. Things we’re excited about considering in the future are: open access scores, reproducibility, cross-disciplinary work, innovation, and more. We’re excited to iterate on REP with the ResearchHub community over the next few years and continue to push towards our mission of accelerating the pace of scientific research.

Reputation Tiers

There are now 4 tiers of Reputation for every subfield on ResearchHub. These tiers are calculated by sampling all of the original research articles and preprints within that subfield and processing their citation counts. It is important to consider that the supply of citations in any given subfield varies considerably when comparing against the other subfields. For example, the average number of citations in the “Molecular Biology” subfield is X per article or preprint, while the average number in the “Machine Learning” subfield is just Y per article or preprint. This is because the culture, standards, and pace of research across subfields are not uniform and require nuance.

We’ve attempted to account for this nuance by ranking authors within each subfield based on their citation count. Tier 1 researchers who have built Reputation solely through citations will be ranked in the 0-50^th percentile of their subfield. Tier 2 researchers in the 50-70th percentile, Tier 3 in the 70-90th percentile, and Tier 4 in the 90-100th percentile. A value of REP is assigned per citation based on the supply of citations and log₁₀ point values for tiers:

Tier 1: 0 to 1,000 REP
Tier 2: 1,001 to 10,000 REP
Tier 3: 10,001 to 100,000 REP
Tier 4: 100,001 to 1,000,000 REP

Earning Reputation

Reputation can be gained through actions on ResearchHub that result in upvotes or citations such as:

Writing peer reviews
Preregistering an experiment
Completing research bounties
Discussing scientific literature
Publishing original research or review articles

Upvotes: Users will earn at least 1 REP per upvote received.

Citations Users will earn a variable amount of REP per citation, depending on their subfield and their Reputation Tier within that subfield. The amount of REP per citation is calculated as follows in any given subfield:

This results in a uniquely dynamic REP value per citation for each subfield, as a function of the supply of citations available in the field. To help illustrate this, you can find three examples below from the subfields of Molecular Biology (high supply of citations), Artificial Intelligence (medium supply of citations), and Philosophy (low supply of citations):

Here are three examples of the calculation of REP earned per citation using the Molecular Biology, Artificial Intelligence, and Philosophy subfield:

Example 1: Molecular Biology

The most cited researcher in Molecular Biology has 312,048 citations in original research articles or preprints.

Example 2: Artificial Intelligence subfield

Te most cited researcher in Artificial Intelligence has 147,078 citations in original research articles or preprints.

Example 3: Philosophy subfield

The most cited researcher in Philosophy has 16,097 citations in original research articles or preprints.

These values of REP per citation are likely to change in the future as we continue iterating on the REP metric. We anticipate adjusting the balance of REP earned per upvote compared against citations in the medium term and will rely strongly on the feedback provided by the ResearchHub community.

Benefits of Reputation Tiers

Reputation tiers will eventually serve multiple purposes on the platform - as a starting point, REP tiers will provide:

At-a-glance reputation - No need to wonder if you can trust a peer review, check for yourself if the researcher has a record of expertise in that subfield.
Faster withdraws - All users will be able to withdraw at increased frequencies after they’ve verified their identity.
- Unverified Users: 2-week withdrawal cooldown
- Verified* Users (Tiers 1 to 4): 24-hour withdrawal cooldown

*Note: Users who have previously verified their accounts before July 31, 2024 will need to verify again to establish these benefits.

Financial Incentives:

Beyond reputational incentives, we want to financially reward researchers who have participated in healthy research behaviors including:

Publishing open-access articles/preprints
Providing open-access data
Preregistering studies

To this end, we have created author profiles for all researchers, and allowed 1st authors to claim RSC rewards for publications based on citation counts filtered through Open Access, Open Data, and Preregistration. The amount of RSC distributed per publication is dependent on the 1) subfield of the publication 2) number of citations and 3) the degree of “openness”, including open access, preregistration, and open data status.

*It is important to note that ResearchHub has implemented these financial incentives in a way that allows for and encourages continuous iterations/improvement. All of the structures described below are upgradable; please continue to provide us with feedback as we launch these new rewards.

RSC Allocation per Subfield

Over the past 5 years of ResearchHub, the reward algorithm has under-emitted a total of 105M RSC. This RSC is set aside to be specifically used for incentivizing behavior that accelerates the pace of science - including actions that increase reproducibility. This provides a significant and unique opportunity to retrospectively reward researchers who have been practicing healthy research behaviors over the years. The challenge of establishing an initial condition for divvying up RSC across research disciplines is non-trivial. As a first pass, we’ve opted to distribute RSC as a function of the number of papers and number of citations in all subfields. We’ve used the date of the inception of ResearchHub (5 years ago - August 1st 2019), to the present day to represent this mission.

To this end, we calculated the total number of citations and total number of papers in each of the 248 subfields over the past 5 years. This provides us with a percentage value of the amount of citations and publications within a given subfield compared to the global research ecosystem. Given that both metrics alone would be insufficient for allocation, we opted to average these percentage weights to compute the RSC distributions. Final RSC distributions across all 248 subfields can be seen here: Healthy Rewards Per Subfield.xlsx as well as visually in the pie chart below.

RSC Per Paper (retrospective)

Choosing the ideal distribution of healthy research rewards within a subfield is also non-trivial and deserving of nuance. As an initial approach, we’ve chosen to use a Zipf’s law rank-based ordering system where publications within a given subfield are first ranked from 1 to N, where 1 is the most highly cited paper. You can find a step-by-step walkthrough of the calculations below.

1. Rank order all publications within each subfield by citation count

2. Fit a Zipf’s law model to each subfield

Calculate a raw f(r) function value for each citation rank. We used an `s` value of 0.70 after qualitative inspection of RSC distributions.

Normalize all ranks by the summation of all f(r) values across citation ranks. This provides us with a rank-based weighting to distribute RSC within a subfield based on these fractional weights (f(r)).

RSC per citation was computed by multiplying these fractional weights (f(r)) by the RSC distributions per subfield (Healthy Rewards Per Subfield.xlsx).

3. Approximate the Zipf’s law model for compatibility with the ResearchHub web app

Generating a look-up table based on this rank-based system would be impractical for the ResearchHub web application. We were able to recover the Zipf’s law model in a more convenient manner by fitting a series of log-linear models to the RSC distributions, ordered by citations, which resulted in a qualitatively strong fit.
We fit models for RSC distribution per citation in a series of 8 bins. These 8 bins allow us to utilize 8 log-linear models to approximate the more complicated underlying data in a manner that is easy to extend to future publications or non-perfectly fitting data (as exact citation counts are often in flux and vary by source). The log-linear models all follow the form shown below, where A and B are constants fit to each bin of the subfield models:

All of the log-linear models and some RSC distribution metrics can be found in this Google Sheet: Healthy Rewards Piecewise Models

4. Scale the distributions by healthy research status

The distributions shown in the Google Sheets and Plots are the upper limit of RSC that can be earned if the research is open Access, preregistered and has open data.
Specifically, the following multipliers will be available for researchers:
1. Open Access: 1x
2. Open Data: 3x
3. Preregistered: 2x
Additionally, a flat multiplier of 5x will be applied to all rewards to account for the estimated fraction of articles/preprints that have reported open data and preregistrations.
- Open data is reportedly available in as little as 9-76% of cases, with 14-41% of open data requests to authors also being ignored [Tedersoo et al. 2021].
- Preregistrations are uncommon outside of the social sciences, where only 39% of authors had reported ever submitting a preregistration with their works [Sarafoglou et al. 2022].

*The "multipliers" for open data and preregistration status are additive as shown above. For example, having open data and a preregistration attached to an open access article would result in 6x base rewards (open access: 2000 RSC base reward, open data: +3*2000 RSC (+6000 RSC), preregistration: +2x*2000 RSC (+4000 RSC, for a total of 12000 RSC)

Moving Forward

RSC per Citation (continuous)

As we’ve greatly enriched the data that underlies works on ResearchHub, we’re now able to provide rewards for more actions in scientific research. We will be adding the capability to distribute RSC rewards on citations moving forward in addition to upvotes. The exact details of this implementation are still being built but we’d love to hear your feedback about the best way to do this. Some open questions we have are:

How should we reward citations on the various types of academic outputs (preprints, publications, conference papers, review papers)?
Should upvotes distribute a flat 1 RSC per upvote?
Should certain ResearchHub specific content/actions receive more distribution signal than others (e.g. peer reviews, preregistrations, original ResearchHub posts)?

Closing Thoughts

We are eager to release these new incentive structures on the ResearchHub platform and are looking forward to hearing critical feedback from our community. It is important that we make sure these rewards are incentivizing healthy research behavior. In the words of the great American biochemist, Linus Pauling:

"If you want to have good ideas you must have many ideas. Most of them will be wrong, and what you have to learn is which ones to throw away."
- Linus Pauling

We will continue to iterate on these models for the foreseeable future in hopes of stepping closer towards a better future for scientific research.

Tell us what you think

If you’d like to chat with members of the ResearchHub and ResearchHub Foundation team, consider connecting with us in the ResearchHub Discord or reach out to us via X/Twitter, LinkedIn, or email.

ResearchHub

Discord: https://discord.com/invite/ZcCYgcnUp5
X/Twitter: https://x.com/researchhub
LinkedIn: https://www.linkedin.com/company/researchhubtechnologies/

ResearchHub Foundation

X/Twitter: https://x.com/researchhubf
LinkedIn: https://www.linkedin.com/company/researchhub-foundation