BiEncoderModule

class lightning_ir.bi_encoder.bi_encoder_module.BiEncoderModule(model_name_or_path: str | None = None, config: BiEncoderConfig | None = None, model: BiEncoderModel | None = None, BackboneModel: Type[PreTrainedModel] | None = None, loss_functions: Sequence[LossFunction | Tuple[LossFunction, float]] | None = None, evaluation_metrics: Sequence[str] | None = None, index_dir: Path | None = None, search_config: SearchConfig | None = None, model_kwargs: Mapping[str, Any] | None = None)[source]

Bases: LightningIRModule

__init__(model_name_or_path: str | None = None, config: BiEncoderConfig | None = None, model: BiEncoderModel | None = None, BackboneModel: Type[PreTrainedModel] | None = None, loss_functions: Sequence[LossFunction | Tuple[LossFunction, float]] | None = None, evaluation_metrics: Sequence[str] | None = None, index_dir: Path | None = None, search_config: SearchConfig | None = None, model_kwargs: Mapping[str, Any] | None = None)[source]

LightningIRModule for bi-encoder models. It contains a BiEncoderModel and a BiEncoderTokenizer and implements the training, validation, and testing steps for the model.

Parameters:
  • model_name_or_path (str | None) – Name or path of backbone model or fine-tuned Lightning IR model. Defaults to None.

  • config (BiEncoderConfig | None) – BiEncoderConfig to apply when loading from backbone model. Defaults to None.

  • model (BiEncoderModel | None) – Already instantiated BiEncoderModel. Defaults to None.

  • BackboneModel (Type[PreTrainedModel] | None) – Huggingface PreTrainedModel class to use as backbone instead of the default AutoModel. Defaults to None.

  • loss_functions (Sequence[LossFunction | Tuple[LossFunction, float]] | None) – Loss functions to apply during fine-tuning, optional loss weights can be provided per loss function Defaults to None.

  • evaluation_metrics (Sequence[str] | None) – Metrics corresponding to ir-measures measure strings to apply during validation or testing. Defaults to None.

  • index_dir (Path | None) – Path to an index used for retrieval. Defaults to None.

  • search_config (SearchConfig | None) – Configuration to use during retrieval. Defaults to None.

  • model_kwargs (Mapping[str, Any] | None) – Additional keyword arguments to pass to from_pretrained when loading a model. Defaults to None.

Methods

__init__([model_name_or_path, config, ...])

LightningIRModule for bi-encoder models.

forward(batch)

Runs a forward pass of the model on a batch of data.

on_test_start()

Called at the beginning of testing.

score(queries, docs)

Computes relevance scores for queries and documents.

validation_step(batch, batch_idx[, ...])

Handles the validation step for the model.

Attributes

searcher

Searcher used for retrieval if index_dir and search_config are set.

training

forward(batch: RankBatch | IndexBatch | SearchBatch) BiEncoderOutput[source]

Runs a forward pass of the model on a batch of data. The output will vary depending on the type of batch. If the batch is a :class`.RankBatch`, query and document embeddings are computed and the relevance score is the similarity between the two embeddings. If the batch is an IndexBatch, only document embeddings are comuputed. If the batch is a SearchBatch, only query embeddings are computed and the model will additionally retrieve documents if searcher is set.

Parameters:

batch (RankBatch | IndexBatch | SearchBatch) – Input batch containing queries and/or documents.

Returns:

Output of the model.

Return type:

BiEncoderOutput

Raises:

ValueError – If the input batch contains neither queries nor documents.

on_test_start() None[source]

Called at the beginning of testing. Initializes the searcher if index_dir and search_config are set.

score(queries: Sequence[str] | str, docs: Sequence[Sequence[str]] | Sequence[str]) BiEncoderOutput[source]

Computes relevance scores for queries and documents.

Parameters:
  • queries (Sequence[str] | str) – Queries to score.

  • docs (Sequence[Sequence[str]] | Sequence[str]) – Documents to score.

Returns:

Output of the model.

Return type:

BiEncoderOutput

property searcher: Searcher | None

Searcher used for retrieval if index_dir and search_config are set.

Returns:

Searcher class.

Return type:

Searcher

validation_step(batch: TrainBatch | IndexBatch | SearchBatch | RankBatch, batch_idx: int, dataloader_idx: int = 0) BiEncoderOutput[source]

Handles the validation step for the model.

Parameters:
  • batch (TrainBatch | IndexBatch | SearchBatch | RankBatch) – Batch of validation or testing data.

  • batch_idx (int) – Index of the batch.

  • dataloader_idx (int | None) – Index of the dataloader. Defaults to 0.

Returns:

Output of the model.

Return type:

BiEncoderOutput