MvrModel

class lightning_ir.models.bi_encoders.mvr.MvrModel(config: MvrConfig, *args, **kwargs)[source]

MVR model for multi-view representation learning.

__init__(config: MvrConfig, *args, **kwargs)[source]

Initializes a multi-vector bi-encoder model given a MultiVectorBiEncoderConfig.

Parameters:

config (MultiVectorBiEncoderConfig) – Configuration for the multi-vector bi-encoder model

Raises:

ValueError – If mask scoring tokens are specified in the configuration but the tokenizer is not available
ValueError – If the specified mask scoring tokens are not in the tokenizer vocab

Methods

`__init__`(config, args, *kwargs)	Initializes a multi-vector bi-encoder model given a `MultiVectorBiEncoderConfig`.
`encode`(encoding, input_type)	Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.
`scoring_mask`(encoding, input_type)	Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Attributes

training

config_class

Configuration class for MVR models.

encode(encoding: BatchEncoding, input_type: 'query' | 'doc') → BiEncoderEmbedding[source]

Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.

Parameters:

encoding (BatchEncoding) – Tokenizer encodings for the text sequence.
input_type (Literal["query", "doc"]) – type of input, either “query” or “doc”.

Returns:

Embeddings and scoring mask.

Return type:

BiEncoderEmbedding

scoring_mask(encoding: BatchEncoding, input_type: 'query' | 'doc') → Tensor[source]

Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Parameters:

encoding (BatchEncoding) – Tokenizer encodings for the text sequence.
input_type (Literal["query", "doc"]) – type of input, either “query” or “doc”.

Returns:

Scoring mask.

Return type:

torch.Tensor