ResearchHub | Open Science Community

RP

Ronak Pradeep

Author with expertise in Statistical Machine Translation and Natural Language Processing

Achievements

Cited Author

Open Access Advocate

Key Stats

Upvotes received:

0

Publications:

4

(50% Open Access)

Cited by:

489

h-index:

11

/

i10-index:

12

Reputation

Biology

< 1%

Chemistry

< 1%

Economics

< 1%

Show more

How is this calculated?

Publications

Document Ranking with a Pretrained Sequence-to-Sequence Model

Rodrigo Nogueira et al.Jan 1, 2020

This work proposes the use of a pretrained sequence-to-sequence model for document ranking. Our approach is fundamentally different from a commonly adopted classification-based formulation based on encoder-only pretrained transformer architectures such as BERT. We show how a sequence-to-sequence model can be trained to generate relevance labels as “target tokens”, and how the underlying logits of these target tokens can be interpreted as relevance probabilities for ranking. Experimental results on the MS MARCO passage ranking task show that our ranking approach is superior to strong encoder-only models. On three other document retrieval test collections, we demonstrate a zero-shot transfer-based approach that outperforms previous state-of-the-art models requiring in-domain cross-validation. Furthermore, we find that our approach significantly outperforms an encoder-only architecture in a data-poor setting. We investigate this observation in more detail by varying target tokens to probe the model’s use of latent knowledge. Surprisingly, we find that the choice of target tokens impacts effectiveness, even for words that are closely related semantically. This finding sheds some light on why our sequence-to-sequence formulation for document ranking is effective. Code and models are available at pygaggle.ai.

Artificial Intelligence

0

Paper

Save

Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations

Jimmy Lin et al.Jul 11, 2021

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. Around this toolkit, our group has built a culture of reproducibility through shared norms and tools that enable rigorous automated testing.

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models

Ronak Pradeep et al.Jan 1, 2024

Artificial Intelligence

Theoretical Computer Science

0

Paper

Artificial Intelligence

Save

Entity Disambiguation via Fusion Entity Decoding

Junxiong Wang et al.Jan 1, 2024

Artificial Intelligence

Information Systems

0

Paper

Artificial Intelligence

Information Systems

Save