IRDataset

class lightning_ir.data.dataset.IRDataset(dataset: str)[source]

Bases: object

__init__(dataset: str) None[source]

Initializes a new IRDataset.

Parameters:

dataset (str) – Dataset name.

Methods

__init__(dataset)

Initializes a new IRDataset.

prepare_constituent(constituent)

Downloads the constituent of the dataset using ir_datasets if needed.

Attributes

DASHED_DATASET_MAP

Map of dataset names with dashes to dataset names with slashes.

dashed_docs_dataset_id

Dataset id with dashes instead of slashes for the documents dataset.

dataset

Dataset name.

dataset_id

Dataset id.

docs

Documents in the dataset.

docs_dataset_id

ID of the dataset containing the documents.

ir_dataset

Instance of ir_datasets.Dataset.

qrels

Qrels in the dataset.

queries

Queries in the dataset.

property DASHED_DATASET_MAP: Dict[str, str]

Map of dataset names with dashes to dataset names with slashes.

Returns:

Dataset map.

Return type:

Dict[str, str]

property dashed_docs_dataset_id: str

Dataset id with dashes instead of slashes for the documents dataset.

Returns:

Document dataset id with dashes.

Return type:

str

property dataset: str

Dataset name.

Returns:

Dataset name.

Return type:

str

property dataset_id: str

Dataset id.

Returns:

Dataset id.

Return type:

str

property docs: Docstore | Dict[str, GenericDoc]

Documents in the dataset.

Returns:

Documents.

Return type:

ir_datasets.indices.Docstore | Dict[str, GenericDoc]

Raises:

ValueError – If no documents are found in the dataset.

property docs_dataset_id: str

ID of the dataset containing the documents.

Returns:

Document dataset id.

Return type:

str

property ir_dataset: Dataset | None

Instance of ir_datasets.Dataset.

Returns:

Instance of ir_datasets.Dataset or None if the dataset is not found.

Return type:

ir_datasets.Dataset | None

prepare_constituent(constituent: 'qrels' | 'queries' | 'docs' | 'scoreddocs' | 'docpairs') None[source]

Downloads the constituent of the dataset using ir_datasets if needed.

Parameters:

constituent (Literal["qrels", "queries", "docs", "scoreddocs", "docpairs"]) – Constituent to download.

property qrels: DataFrame | None

Qrels in the dataset.

Returns:

Qrels.

Return type:

pd.DataFrame | None

property queries: Series

Queries in the dataset.

Returns:

Queries.

Return type:

pd.Series

Raises:

ValueError – If no queries are found in the dataset.