MultiVectorBiEncoderModel

class lightning_ir.bi_encoder.bi_encoder_model.MultiVectorBiEncoderModel(config: MultiVectorBiEncoderConfig, *args, **kwargs)[source]

Bases: BiEncoderModel

__init__(config: MultiVectorBiEncoderConfig, *args, **kwargs) None[source]

Initializes a multi-vector bi-encoder model given a MultiVectorBiEncoderConfig.

Parameters:

config (MultiVectorBiEncoderConfig) – Configuration for the multi-vector bi-encoder model

Raises:
  • ValueError – If mask scoring tokens are specified in the configuration but the tokenizer is not available

  • ValueError – If the specified mask scoring tokens are not in the tokenizer vocab

Methods

__init__(config, *args, **kwargs)

Initializes a multi-vector bi-encoder model given a MultiVectorBiEncoderConfig.

aggregate_similarity(similarity, ...[, num_docs])

Aggregates the matrix of query-document similarities into a single score based on the configured aggregation strategy.

score(output[, num_docs])

Compute relevance scores between queries and documents.

scoring_mask(encoding, input_type)

Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Attributes

supports_retrieval_models

training

aggregate_similarity(similarity: Tensor, query_scoring_mask: Tensor, doc_scoring_mask: Tensor, num_docs: Sequence[int] | int | None = None) Tensor[source]

Aggregates the matrix of query-document similarities into a single score based on the configured aggregation strategy.

Parameters:
  • similarity (torch.Tensor) – Query–document similarity matrix.

  • query_scoring_mask (torch.Tensor) – Which query vectors should be masked out during scoring.

  • doc_scoring_mask (torch.Tensor) – Which document vectors should be masked out during scoring.

Returns:

Aggregated similarity scores.

Return type:

torch.Tensor

config_class

Configuration class for the single-vector bi-encoder model.

alias of MultiVectorBiEncoderConfig

score(output: BiEncoderOutput, num_docs: Sequence[int] | int | None = None) BiEncoderOutput[source]

Compute relevance scores between queries and documents.

Parameters:
  • output (BiEncoderOutput) – Output containing embeddings and scoring mask.

  • num_docs (Sequence[int] | int | None) – Specifies how many documents are passed per query. If a sequence of integers, len(num_doc) should be equal to the number of queries and sum(num_docs) equal to the number of documents, i.e., the sequence contains one value per query specifying the number of documents for that query. If an integer, assumes an equal number of documents per query. If None, tries to infer the number of documents by dividing the number of documents by the number of queries. Defaults to None.

Returns:

Output containing relevance scores.

Return type:

BiEncoderOutput

Raises:
  • ValueError – If query or document embeddings are not provided in the output.

  • ValueError – If scoring masks are not provided for the embeddings.

scoring_mask(encoding: BatchEncoding, input_type: 'query' | 'doc') Tensor[source]

Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Parameters:
  • encoding (BatchEncoding) – Tokenizer encodings for the text sequence.

  • input_type (Literal["query", "doc"]) – Type of input, either “query” or “doc”.

Returns:

Scoring mask.

Return type:

torch.Tensor