RegisterLocalDatasetCallback
- class lightning_ir.callbacks.callbacks.RegisterLocalDatasetCallback(dataset_id: str, docs: str | None = None, queries: str | None = None, qrels: str | None = None, docpairs: str | None = None, scoreddocs: str | None = None, qrels_defs: Dict[int, str] | None = None)[source]
Bases:
Callback
- __init__(dataset_id: str, docs: str | None = None, queries: str | None = None, qrels: str | None = None, docpairs: str | None = None, scoreddocs: str | None = None, qrels_defs: Dict[int, str] | None = None)[source]
Registers a local dataset with
ir_datasets
. After registering the dataset, it can be loaded usingir_datasets.load(dataset_id)
. Currently, the following (optionally gzipped) file types are supported:.tsv
,.json
, or.jsonl
for documents and queries.tsv
or.qrels
for qrels.tsv
for training n-tuples.tsv
or.run
for scored documents / run files
- Parameters:
dataset_id (str) – Dataset id
docs (str | None, optional) – Path to documents file or valid ir_datasets id from which documents should be taken, defaults to None
queries (str | None, optional) – Path to queries file or valid ir_datastes id from which queries should be taken, defaults to None
qrels (str | None, optional) – Path to qrels file or valid ir_datasets id from which qrels will be taken, defaults to None
docpairs (str | None, optional) – Path to training n-tuple file or valid ir_datasets id from which training tuples will be taken, defaults to None
scoreddocs (str | None, optional) – Path to run file or valid ir_datasets id from which scored documents will be taken, defaults to None
qrels_defs (Dict[int, str] | None, optional) – Optional dictionary describing the relevance levels of the qrels, defaults to None
Methods
__init__
(dataset_id[, docs, queries, qrels, ...])Registers a local dataset with
ir_datasets
.setup
(trainer, pl_module, stage)Hook that registers dataset.
Attributes
- setup(trainer: Trainer, pl_module: LightningIRModule, stage: str) None [source]
Hook that registers dataset.
- Parameters:
trainer (Trainer) – PyTorch Lightning Trainer
pl_module (LightningIRModule) – Lightning IR module
stage (str) – Stage of the trainer