BiEncoderConfig
- class lightning_ir.bi_encoder.bi_encoder_config.BiEncoderConfig(query_length: int | None = 32, doc_length: int | None = 512, similarity_function: 'cosine' | 'dot' = 'dot', normalization: 'l2' | None = None, sparsification: 'relu' | 'relu_log' | 'relu_2xlog' | None = None, add_marker_tokens: bool = False, **kwargs)[source]
Bases:
LightningIRConfigConfiguration class for a bi-encoder model.
- __init__(query_length: int | None = 32, doc_length: int | None = 512, similarity_function: 'cosine' | 'dot' = 'dot', normalization: 'l2' | None = None, sparsification: 'relu' | 'relu_log' | 'relu_2xlog' | None = None, add_marker_tokens: bool = False, **kwargs)[source]
A bi-encoder model encodes queries and documents separately and computes a relevance score based on the similarity of the query and document embeddings. Normalization and sparsification can be applied to the embeddings before computing the similarity score.
- Parameters:
query_length (int | None) – Maximum number of tokens per query. If None does not truncate. Defaults to 32.
doc_length (int | None) – Maximum number of tokens per document. If None does not truncate. Defaults to 512.
similarity_function (Literal['cosine', 'dot']) – Similarity function to compute scores between query and document embeddings. Defaults to “dot”.
normalization (Literal['l2'] | None) – Whether to normalize query and document embeddings. Defaults to None.
sparsification (Literal['relu', 'relu_log', 'relu_2xlog'] | None) – Whether and which sparsification function to apply. Defaults to None.
add_marker_tokens (bool) – Whether to prepend extra marker tokens [Q] / [D] to queries / documents. Defaults to False.
Methods
__init__([query_length, doc_length, ...])A bi-encoder model encodes queries and documents separately and computes a relevance score based on the similarity of the query and document embeddings.
Removes all attributes from the configuration that correspond to the default config attributes for better readability, while always retaining the config attribute from the class.
Attributes
Model type for bi-encoder models.
- to_diff_dict() dict[str, Any][source]
Removes all attributes from the configuration that correspond to the default config attributes for better readability, while always retaining the config attribute from the class. Serializes to a Python dictionary.
- Returns:
Dictionary of all the attributes that make up this configuration instance.
- Return type:
dict[str, Any]