RegisterLocalDatasetCallback

class lightning_ir.callbacks.callbacks.RegisterLocalDatasetCallback(dataset_id: str, docs: str | None = None, queries: str | None = None, qrels: str | None = None, docpairs: str | None = None, scoreddocs: str | None = None, qrels_defs: Dict[int, str] | None = None)[source]

Bases: Callback

__init__(dataset_id: str, docs: str | None = None, queries: str | None = None, qrels: str | None = None, docpairs: str | None = None, scoreddocs: str | None = None, qrels_defs: Dict[int, str] | None = None)[source]

Registers a local dataset with ir_datasets. After registering the dataset, it can be loaded using ir_datasets.load(dataset_id). Currently, the following (optionally gzipped) file types are supported:

  • .tsv, .json, or .jsonl for documents and queries

  • .tsv or .qrels for qrels

  • .tsv for training n-tuples

  • .tsv or .run for scored documents / run files

Parameters:
  • dataset_id (str) – Dataset id

  • docs (str | None, optional) – Path to documents file or valid ir_datasets id from which documents should be taken, defaults to None

  • queries (str | None, optional) – Path to queries file or valid ir_datastes id from which queries should be taken, defaults to None

  • qrels (str | None, optional) – Path to qrels file or valid ir_datasets id from which qrels will be taken, defaults to None

  • docpairs (str | None, optional) – Path to training n-tuple file or valid ir_datasets id from which training tuples will be taken, defaults to None

  • scoreddocs (str | None, optional) – Path to run file or valid ir_datasets id from which scored documents will be taken, defaults to None

  • qrels_defs (Dict[int, str] | None, optional) – Optional dictionary describing the relevance levels of the qrels, defaults to None

Methods

__init__(dataset_id[, docs, queries, qrels, ...])

Registers a local dataset with ir_datasets.

setup(trainer, pl_module, stage)

Hook that registers dataset.

Attributes

setup(trainer: Trainer, pl_module: LightningIRModule, stage: str) None[source]

Hook that registers dataset.

Parameters:
  • trainer (Trainer) – PyTorch Lightning Trainer

  • pl_module (LightningIRModule) – Lightning IR module

  • stage (str) – Stage of the trainer