DprConfig

class lightning_ir.models.bi_encoders.dpr.DprConfig(query_length: int | None = 32, doc_length: int | None = 512, similarity_function: 'cosine' | 'dot' = 'dot', normalize: bool = False, sparsification: 'relu' | 'relu_log' | 'relu_2xlog' | None = None, add_marker_tokens: bool = False, query_pooling_strategy: 'first' | 'mean' | 'max' | 'sum' = 'first', doc_pooling_strategy: 'first' | 'mean' | 'max' | 'sum' = 'first', embedding_dim: int | None = None, projection: 'linear' | 'linear_no_bias' | None = 'linear', **kwargs)[source]

Bases: SingleVectorBiEncoderConfig

Configuration class for a DPR model.

__init__(query_length: int | None = 32, doc_length: int | None = 512, similarity_function: 'cosine' | 'dot' = 'dot', normalize: bool = False, sparsification: 'relu' | 'relu_log' | 'relu_2xlog' | None = None, add_marker_tokens: bool = False, query_pooling_strategy: 'first' | 'mean' | 'max' | 'sum' = 'first', doc_pooling_strategy: 'first' | 'mean' | 'max' | 'sum' = 'first', embedding_dim: int | None = None, projection: 'linear' | 'linear_no_bias' | None = 'linear', **kwargs) None[source]

A DPR model encodes queries and documents separately. Before computing the similarity score, the contextualized token embeddings are aggregated to obtain a single embedding using a pooling strategy. Optionally, the pooled embeddings can be projected using a linear layer.

Parameters:
  • query_length (int | None) – Maximum number of tokens per query. If None does not truncate. Defaults to 32.

  • doc_length (int | None) – Maximum number of tokens per document. If None does not truncate. Defaults to 512.

  • similarity_function (Literal["cosine", "dot"]) – Similarity function to compute scores between query and document embeddings. Defaults to “dot”.

  • normalize (bool) – Whether to normalize the embeddings. Defaults to False.

  • sparsification (Literal['relu', 'relu_log', 'relu_2xlog'] | None) – Whether and which sparsification function to apply. Defaults to None.

  • add_marker_tokens (bool) – Whether to add marker tokens to the input sequences. Defaults to False.

  • query_pooling_strategy (Literal["first", "mean", "max", "sum"]) – Pooling strategy for query embeddings. Defaults to “first”.

  • doc_pooling_strategy (Literal["first", "mean", "max", "sum"]) – Pooling strategy for document embeddings. Defaults to “first”.

  • embedding_dim (int | None) – Dimension of the final embeddings. If None, it will be set to the hidden size of the backbone model. Defaults to None.

  • projection (Literal["linear", "linear_no_bias"] | None) – Type of projection layer to apply on the pooled embeddings. If None, no projection is applied. Defaults to “linear”.

Methods

__init__([query_length, doc_length, ...])

A DPR model encodes queries and documents separately.

Attributes

model_type

Model type for a DPR model.

model_type: str = 'lir-dpr'

Model type for a DPR model.