Tech »  Topic »  Introducing MS MARCO Web Search: A New Era for LLM and IR Data

Introducing MS MARCO Web Search: A New Era for LLM and IR Data


by datasets... June 28th, 2025

Witness the arrival of MS MARCO Web Search, the first colossal, authentic, and information-rich web dataset with millions of clicked query-document labels—igniting breakthroughs for LLMs and information retrieval.

Table of Links

Abstract and 1 Introduction

2 Background and Related work

2.1 Web Scale Information Retrieval

2.2 Existing Datasets

3 MS Marco Web Search Dataset and 3.1 Document Preparation

3.2 Query Selection and Labeling

3.3 Dataset Analysis

3.4 New Challenges Raised by MS MARCO Web Search

4 Benchmark Results and 4.1 Environment Setup

4.2 Baseline Methods

4.3 Evaluation Metrics

4.4 Evaluation of Embedding Models and 4.5 Evaluation of ANN Algorithms

4.6 Evaluation of End-to-end Performance

5 Potential Biases and Limitations

6 Future Work and Conclusions, and References

3 MS MARCO WEB SEARCH DATASET

In this paper, we present MS MARCO Web Search, a ...


Copyright of this story solely belongs to hackernoon.com . To see the full text click HERE