BiEncoderConfig

class lightning_ir.bi_encoder.bi_encoder_config.BiEncoderConfig(query_length: int | None = 32, doc_length: int | None = 512, similarity_function: 'cosine' | 'dot' = 'dot', normalization_strategy: 'l2' | None = None, sparsification_strategy: 'relu' | 'relu_log' | 'relu_2xlog' | None = None, add_marker_tokens: bool = False, **kwargs)[source]

Bases: LightningIRConfig

Configuration class for a bi-encoder model.

A bi-encoder model encodes queries and documents separately and computes a relevance score based on the similarity of the query and document embeddings. Normalization and sparsification can be applied to the embeddings before computing the similarity score.

Parameters:

query_length (int | None) – Maximum number of tokens per query. If None does not truncate. Defaults to 32.
doc_length (int | None) – Maximum number of tokens per document. If None does not truncate. Defaults to 512.
similarity_function (Literal['cosine', 'dot']) – Similarity function to compute scores between query and document embeddings. Defaults to “dot”.
normalization_strategy (Literal['l2'] | None) – Whether to normalize query and document embeddings. Defaults to None.
sparsification_strategy (Literal['relu', 'relu_log', 'relu_2xlog'] | None) – Whether and which sparsification function to apply. Defaults to None.
add_marker_tokens (bool) – Whether to prepend extra marker tokens [Q] / [D] to queries / documents. Defaults to False.

Methods

`__init__`([query_length, doc_length, ...])	A bi-encoder model encodes queries and documents separately and computes a relevance score based on the similarity of the query and document embeddings.
`to_diff_dict`()	Removes all attributes from the configuration that correspond to the default config attributes for better readability, while always retaining the config attribute from the class.

Attributes

model_type

Model type for bi-encoder models.

model_type: str = 'bi-encoder': Model type for bi-encoder models.

to_diff_dict() → dict[str, Any][source]

Removes all attributes from the configuration that correspond to the default config attributes for better readability, while always retaining the config attribute from the class. Serializes to a Python dictionary.

Returns:: Dictionary of all the attributes that make up this configuration instance.
Return type:: dict[str, Any]