SingleVectorBiEncoderConfig
- class lightning_ir.bi_encoder.bi_encoder_config.SingleVectorBiEncoderConfig(query_length: int | None = 32, doc_length: int | None = 512, similarity_function: 'cosine' | 'dot' = 'dot', normalization: 'l2' | None = None, sparsification: 'relu' | 'relu_log' | 'relu_2xlog' | None = None, add_marker_tokens: bool = False, query_pooling_strategy: 'first' | 'mean' | 'max' | 'sum' = 'mean', doc_pooling_strategy: 'first' | 'mean' | 'max' | 'sum' = 'mean', **kwargs)[source]
Bases:
BiEncoderConfigConfiguration class for a single-vector bi-encoder model.
- __init__(query_length: int | None = 32, doc_length: int | None = 512, similarity_function: 'cosine' | 'dot' = 'dot', normalization: 'l2' | None = None, sparsification: 'relu' | 'relu_log' | 'relu_2xlog' | None = None, add_marker_tokens: bool = False, query_pooling_strategy: 'first' | 'mean' | 'max' | 'sum' = 'mean', doc_pooling_strategy: 'first' | 'mean' | 'max' | 'sum' = 'mean', **kwargs)[source]
Configuration class for a single-vector bi-encoder model. A single-vector bi-encoder model pools the representations of queries and documents into a single vector before computing a similarity score.
- Parameters:
query_length (int | None) – Maximum number of tokens per query. If None does not truncate. Defaults to 32.
doc_length (int | None) – Maximum number of tokens per document. If None does not truncate. Defaults to 512.
similarity_function (Literal['cosine', 'dot']) – Similarity function to compute scores between query and document embeddings. Defaults to “dot”.
normalization (Literal['l2'] | None) – Whether to normalize query and document embeddings. Defaults to None.
sparsification (Literal['relu', 'relu_log', 'relu_2xlog'] | None) – Whether and which sparsification function to apply. Defaults to None.
add_marker_tokens (bool) – Whether to prepend extra marker tokens [Q] / [D] to queries / documents. Defaults to False.
query_pooling_strategy (Literal['first', 'mean', 'max', 'sum'] | str) – How to pool the query token embeddings. Defaults to “mean”.
doc_pooling_strategy (Literal['first', 'mean', 'max', 'sum'] | str) – How to pool document token embeddings. Defaults to “mean”.
Methods
__init__([query_length, doc_length, ...])Configuration class for a single-vector bi-encoder model.
Attributes
Model type for single-vector bi-encoder models.