IRDataset

class lightning_ir.data.dataset.IRDataset(dataset: str)[source]

Bases: object

__init__(dataset: str) None[source]

Methods

__init__(dataset)

prepare_constituent(constituent)

Downloads the constituent of the dataset using ir_datasets if needed.

Attributes

DASHED_DATASET_MAP

Map of dataset names with dashes to dataset names with slashes.

dataset

Dataset name.

dataset_id

Dataset id.

docs

Documents in the dataset.

docs_dataset_id

ID of the dataset containing the documents.

ir_dataset

Instance of ir_datasets.Dataset.

qrels

Qrels in the dataset.

queries

Queries in the dataset.

property DASHED_DATASET_MAP: Dict[str, str]

Map of dataset names with dashes to dataset names with slashes.

Returns:

Dataset map

Return type:

Dict[str, str]

property dataset: str

Dataset name.

Returns:

Dataset name

Return type:

str

property dataset_id: str

Dataset id.

Returns:

Dataset id

Return type:

str

property docs: Docstore | Dict[str, GenericDoc]

Documents in the dataset.

Raises:

ValueError – If no documents are found in the dataset

Returns:

Documents

Return type:

ir_datasets.indices.Docstore | Dict[str, GenericDoc]

property docs_dataset_id: str

ID of the dataset containing the documents.

Returns:

Document dataset id

Return type:

str

property ir_dataset: Dataset | None

Instance of ir_datasets.Dataset.

Returns:

ir_datasets dataset

Return type:

ir_datasets.Dataset | None

prepare_constituent(constituent: Literal['qrels', 'queries', 'docs', 'scoreddocs', 'docpairs']) None[source]

Downloads the constituent of the dataset using ir_datasets if needed.

Parameters:

constituent (Literal["qrels", "queries", "docs", "scoreddocs", "docpairs"]) – Constituent to download

property qrels: DataFrame | None

Qrels in the dataset.

Returns:

Qrels

Return type:

pd.DataFrame | None

property queries: Series

Queries in the dataset.

Raises:

ValueError – If no queries are found in the dataset

Returns:

Queries

Return type:

pd.Series