IndexCallback

class lightning_ir.callbacks.callbacks.IndexCallback(index_config: IndexConfig, index_dir: Path | str | None = None, index_name: str | None = None, overwrite: bool = False, verbose: bool = False)[source]

Bases: Callback, _GatherMixin, _IndexDirMixin, _OverwriteMixin

__init__(index_config: IndexConfig, index_dir: Path | str | None = None, index_name: str | None = None, overwrite: bool = False, verbose: bool = False) → None[source]

Callback to index documents using an Indexer.

Parameters:

index_config (IndexConfig) – Configuration for the indexer.
index_dir (Path | str | None) – Directory to save index(es) to. If None, indexes will be stored in the model’s directory. Defaults to None.
index_name (str | None) – Name of the index. If None, the dataset’s dataset_id or file name will be used. Defaults to None.
overwrite (bool) – Whether to skip or overwrite already existing indexes. Defaults to False.
verbose (bool) – Toggle verbose output. Defaults to False.

Methods

`__init__`(index_config[, index_dir, ...])	Callback to index documents using an `Indexer`.
`on_test_batch_end`(trainer, pl_module, ...[, ...])	Hook to pass encoded documents to the indexer
`on_test_batch_start`(trainer, pl_module, ...)	Hook to setup the indexer between datasets.
`on_test_start`(trainer, pl_module)	Hook to test datasets are configured correctly.
`setup`(trainer, pl_module, stage)	Hook to setup the callback.
`teardown`(trainer, pl_module, stage)	Hook to cleanup the callback.

Attributes

`index_dir`
`index_name`

on_test_batch_end(trainer: Trainer, pl_module: BiEncoderModule, outputs: BiEncoderOutput, batch: Any, batch_idx: int, dataloader_idx: int = 0) → None[source]

Hook to pass encoded documents to the indexer

Parameters:

trainer (Trainer) – PyTorch Lightning Trainer.
pl_module (BiEncoderModule) – LightningIR bi-encoder module.
outputs (BiEncoderOutput) – Encoded documents.
batch (Any) – Batch of input data.
batch_idx (int) – Index of batch in the current dataset.
dataloader_idx (int, optional) – Index of the dataloader. Defaults to 0.

on_test_batch_start(trainer: Trainer, pl_module: BiEncoderModule, batch: Any, batch_idx: int, dataloader_idx: int = 0) → None[source]

Hook to setup the indexer between datasets.

Parameters:

trainer (Trainer) – PyTorch Lightning Trainer.
pl_module (BiEncoderModule) – LightningIR bi-encoder module.
batch (Any) – Batch of input data.
batch_idx (int) – Index of batch in the current dataset.
dataloader_idx (int, optional) – Index of the dataloader. Defaults to 0.

on_test_start(trainer: Trainer, pl_module: BiEncoderModule) → None[source]

Hook to test datasets are configured correctly.

Parameters:

trainer (Trainer) – PyTorch Lightning Trainer.
pl_module (BiEncoderModule) – LightningIR bi-encoder module.

Raises:

ValueError – If no test_dataloaders are found.
ValueError – If not all test datasets are DocDataset.

setup(trainer: Trainer, pl_module: BiEncoderModule, stage: str) → None[source]

Hook to setup the callback.

Parameters:

trainer (Trainer) – PyTorch Lightning Trainer.
pl_module (BiEncoderModule) – LightningIR bi-encoder module used for indexing.
stage (str) – Stage of the trainer, must be “test”.

Raises:

ValueError – If the stage is not “test”.

teardown(trainer: Trainer, pl_module: BiEncoderModule, stage: str) → None[source]

Hook to cleanup the callback.

Parameters:

trainer (Trainer) – PyTorch Lightning Trainer.
pl_module (BiEncoderModule) – LightningIR bi-encoder module used for indexing.
stage (str) – Stage of the trainer.