IndexCallback
- class lightning_ir.callbacks.callbacks.IndexCallback(index_config: IndexConfig, index_dir: Path | str | None = None, index_name: str | None = None, overwrite: bool = False, verbose: bool = False)[source]
Bases:
Callback
,_GatherMixin
,_IndexDirMixin
,_OverwriteMixin
- __init__(index_config: IndexConfig, index_dir: Path | str | None = None, index_name: str | None = None, overwrite: bool = False, verbose: bool = False) None [source]
Callback to index documents using an
Indexer
.- Parameters:
index_config (IndexConfig) – Configuration for the indexer
index_dir (Path | str | None, optional) – Directory to save index(es) to. If None, indexes will be stored in the model’s directory, defaults to None
index_name (str | None, optional) – Name of the index. If None, the dataset’s dataset_id or file name will be used, defaults to None
overwrite (bool, optional) – Whether to skip or overwrite already existing indexes, defaults to False
verbose (bool, optional) – Toggle verbose output, defaults to False
Methods
__init__
(index_config[, index_dir, ...])Callback to index documents using an
Indexer
.on_test_batch_end
(trainer, pl_module, ...[, ...])Hook to pass encoded documents to the indexer
on_test_batch_start
(trainer, pl_module, ...)Hook to setup the indexer between datasets.
on_test_start
(trainer, pl_module)Hook to test datasets are configured correctly.
setup
(trainer, pl_module, stage)Hook to setup the callback.
teardown
(trainer, pl_module, stage)Hook to cleanup the callback.
Attributes
index_dir
index_name
- on_test_batch_end(trainer: Trainer, pl_module: BiEncoderModule, outputs: BiEncoderOutput, batch: Any, batch_idx: int, dataloader_idx: int = 0) None [source]
Hook to pass encoded documents to the indexer
- Parameters:
trainer (Trainer) – PyTorch Lightning Trainer
pl_module (BiEncoderModule) – LightningIR bi-encoder module
outputs (BiEncoderOutput) – Encoded documents
batch (Any) – Batch of input data
batch_idx (int) – Index of batch in the current dataset
dataloader_idx (int, optional) – Index of the dataloader, defaults to 0
- on_test_batch_start(trainer: Trainer, pl_module: BiEncoderModule, batch: Any, batch_idx: int, dataloader_idx: int = 0) None [source]
Hook to setup the indexer between datasets.
- Parameters:
trainer (Trainer) – PyTorch Lightning Trainer
pl_module (BiEncoderModule) – LightningIR bi-encoder module
batch (Any) – Batch of input data
batch_idx (int) – Index of batch in the current dataset
dataloader_idx (int, optional) – Index of the dataloader, defaults to 0
- on_test_start(trainer: Trainer, pl_module: BiEncoderModule) None [source]
Hook to test datasets are configured correctly.
- Parameters:
trainer (Trainer) – PyTorch Lightning Trainer
pl_module (BiEncoderModule) – LightningIR BiEncoderModule
- Raises:
ValueError – If no test_dataloaders are found
ValueError – If not all test datasets are
DocDataset
- setup(trainer: Trainer, pl_module: BiEncoderModule, stage: str) None [source]
Hook to setup the callback.
- Parameters:
trainer (Trainer) – PyTorch Lightning Trainer
pl_module (BiEncoderModule) – LightningIR bi-encoder module used for indexing
stage (str) – Stage of the trainer, must be “test”
- Raises:
ValueError – If the stage is not “test”
- teardown(trainer: Trainer, pl_module: BiEncoderModule, stage: str) None [source]
Hook to cleanup the callback.
- Parameters:
trainer (Trainer) – PyTorch Lightning Trainer
pl_module (BiEncoderModule) – LightningIR bi-encoder module used for indexing
stage (str) – Stage of the trainer, must be “test”