DprConfig

Bases: SingleVectorBiEncoderConfig

Configuration class for a DPR model.

A DPR model encodes queries and documents separately. Before computing the similarity score, the contextualized token embeddings are aggregated to obtain a single embedding using a pooling strategy. Optionally, the pooled embeddings can be projected using a linear layer.

Parameters:

query_length (int | None) – Maximum number of tokens per query. If None does not truncate. Defaults to 32.
doc_length (int | None) – Maximum number of tokens per document. If None does not truncate. Defaults to 512.
similarity_function (Literal["cosine", "dot"]) – Similarity function to compute scores between query and document embeddings. Defaults to “dot”.
normalization_strategy (Literal['l2'] | None) – Whether to normalization_strategy query and document embeddings. Defaults to None.
sparsification_strategy (Literal['relu', 'relu_log', 'relu_2xlog'] | None) – Whether and which sparsification_strategy function to apply. Defaults to None.
add_marker_tokens (bool) – Whether to add marker tokens to the input sequences. Defaults to False.
pooling_strategy (Literal["first", "mean", "max", "sum"]) – Pooling strategy for query and document embeddings. Defaults to “first”.
embedding_dim (int | None) – Dimension of the final embeddings. If None, it will be set to the hidden size of the backbone model. Defaults to None.
projection (Literal["linear", "linear_no_bias"] | None) – type of projection layer to apply on the pooled embeddings. If None, no projection is applied. Defaults to “linear”.

Methods

__init__([query_length, doc_length, ...])

A DPR model encodes queries and documents separately.

Attributes

model_type

Model type for a DPR model.

model_type: str = 'lir-dpr': Model type for a DPR model.