IndexCallback

class lightning_ir.callbacks.callbacks.IndexCallback(index_config: IndexConfig, index_dir: Path | str | None = None, index_name: str | None = None, overwrite: bool = False, verbose: bool = False)[source]

Bases: Callback, _GatherMixin, _IndexDirMixin, _OverwriteMixin

__init__(index_config: IndexConfig, index_dir: Path | str | None = None, index_name: str | None = None, overwrite: bool = False, verbose: bool = False) None[source]

Callback to index documents using an Indexer.

Parameters:
  • index_config (IndexConfig) – Configuration for the indexer

  • index_dir (Path | str | None, optional) – Directory to save index(es) to. If None, indexes will be stored in the model’s directory, defaults to None

  • index_name (str | None, optional) – Name of the index. If None, the dataset’s dataset_id or file name will be used, defaults to None

  • overwrite (bool, optional) – Whether to skip or overwrite already existing indexes, defaults to False

  • verbose (bool, optional) – Toggle verbose output, defaults to False

Methods

__init__(index_config[, index_dir, ...])

Callback to index documents using an Indexer.

on_test_batch_end(trainer, pl_module, ...[, ...])

Hook to pass encoded documents to the indexer

on_test_batch_start(trainer, pl_module, ...)

Hook to setup the indexer between datasets.

on_test_start(trainer, pl_module)

Hook to test datasets are configured correctly.

setup(trainer, pl_module, stage)

Hook to setup the callback.

teardown(trainer, pl_module, stage)

Hook to cleanup the callback.

Attributes

index_dir

index_name

on_test_batch_end(trainer: Trainer, pl_module: BiEncoderModule, outputs: BiEncoderOutput, batch: Any, batch_idx: int, dataloader_idx: int = 0) None[source]

Hook to pass encoded documents to the indexer

Parameters:
  • trainer (Trainer) – PyTorch Lightning Trainer

  • pl_module (BiEncoderModule) – LightningIR bi-encoder module

  • outputs (BiEncoderOutput) – Encoded documents

  • batch (Any) – Batch of input data

  • batch_idx (int) – Index of batch in the current dataset

  • dataloader_idx (int, optional) – Index of the dataloader, defaults to 0

on_test_batch_start(trainer: Trainer, pl_module: BiEncoderModule, batch: Any, batch_idx: int, dataloader_idx: int = 0) None[source]

Hook to setup the indexer between datasets.

Parameters:
  • trainer (Trainer) – PyTorch Lightning Trainer

  • pl_module (BiEncoderModule) – LightningIR bi-encoder module

  • batch (Any) – Batch of input data

  • batch_idx (int) – Index of batch in the current dataset

  • dataloader_idx (int, optional) – Index of the dataloader, defaults to 0

on_test_start(trainer: Trainer, pl_module: BiEncoderModule) None[source]

Hook to test datasets are configured correctly.

Parameters:
  • trainer (Trainer) – PyTorch Lightning Trainer

  • pl_module (BiEncoderModule) – LightningIR BiEncoderModule

Raises:
  • ValueError – If no test_dataloaders are found

  • ValueError – If not all test datasets are DocDataset

setup(trainer: Trainer, pl_module: BiEncoderModule, stage: str) None[source]

Hook to setup the callback.

Parameters:
  • trainer (Trainer) – PyTorch Lightning Trainer

  • pl_module (BiEncoderModule) – LightningIR bi-encoder module used for indexing

  • stage (str) – Stage of the trainer, must be “test”

Raises:

ValueError – If the stage is not “test”

teardown(trainer: Trainer, pl_module: BiEncoderModule, stage: str) None[source]

Hook to cleanup the callback.

Parameters:
  • trainer (Trainer) – PyTorch Lightning Trainer

  • pl_module (BiEncoderModule) – LightningIR bi-encoder module used for indexing

  • stage (str) – Stage of the trainer, must be “test”