MultiVectorBiEncoderConfig

class lightning_ir.bi_encoder.bi_encoder_config.MultiVectorBiEncoderConfig(query_length: int | None = 32, doc_length: int | None = 512, similarity_function: 'cosine' | 'dot' = 'dot', normalization: 'l2' | None = None, sparsification: None | 'relu' | 'relu_log' | 'relu_2xlog' = None, add_marker_tokens: bool = False, query_mask_scoring_tokens: Sequence[str] | 'punctuation' | None = None, doc_mask_scoring_tokens: Sequence[str] | 'punctuation' | None = None, query_aggregation_function: 'sum' | 'mean' | 'max' = 'sum', doc_aggregation_function: 'sum' | 'mean' | 'max' = 'max', **kwargs)[source]

Bases: BiEncoderConfig

Configuration class for a multi-vector bi-encoder model.

__init__(query_length: int | None = 32, doc_length: int | None = 512, similarity_function: 'cosine' | 'dot' = 'dot', normalization: 'l2' | None = None, sparsification: None | 'relu' | 'relu_log' | 'relu_2xlog' = None, add_marker_tokens: bool = False, query_mask_scoring_tokens: Sequence[str] | 'punctuation' | None = None, doc_mask_scoring_tokens: Sequence[str] | 'punctuation' | None = None, query_aggregation_function: 'sum' | 'mean' | 'max' = 'sum', doc_aggregation_function: 'sum' | 'mean' | 'max' = 'max', **kwargs)[source]

A multi-vector bi-encoder model keeps the representation of all tokens in query or document and computes a relevance score by aggregating the similarities of query-document token pairs. Optionally, some tokens can be masked out during scoring.

Parameters:
  • query_length (int | None) – Maximum number of tokens per query. If None does not truncate. Defaults to 32.

  • doc_length (int | None) – Maximum number of tokens per document. If None does not truncate. Defaults to 512.

  • similarity_function (Literal['cosine', 'dot']) – Similarity function to compute scores between query and document embeddings. Defaults to “dot”.

  • normalization (Literal['l2'] | None) – Whether to normalize query and document embeddings. Defaults to None.

  • sparsification (Literal['relu', 'relu_log', 'relu_2xlog'] | None) – Whether and which sparsification function to apply. Defaults to None.

  • add_marker_tokens (bool) – Whether to prepend extra marker tokens [Q] / [D] to queries / documents. Defaults to False.

  • query_mask_scoring_tokens (Sequence[str] | Literal['punctuation'] | None) – Whether and which query tokens to ignore during scoring. Defaults to None.

  • doc_mask_scoring_tokens (Sequence[str] | Literal['punctuation'] | None) – Whether and which document tokens to ignore during scoring. Defaults to None.

  • query_aggregation_function (Literal['sum', 'mean', 'max']) – How to aggregate similarity scores over query tokens. Defaults to “sum”.

  • doc_aggregation_function (Literal['sum', 'mean', 'max']) – How to aggregate similarity scores over doc tokens. Defaults to “max”.

Methods

__init__([query_length, doc_length, ...])

A multi-vector bi-encoder model keeps the representation of all tokens in query or document and computes a relevance score by aggregating the similarities of query-document token pairs.

Attributes

model_type

Model type for multi-vector bi-encoder models.

model_type: str = 'multi-vector-bi-encoder'

Model type for multi-vector bi-encoder models.