MvrModel

class lightning_ir.models.bi_encoders.mvr.MvrModel(config: MvrConfig, *args, **kwargs)[source]

Bases: MultiVectorBiEncoderModel

MVR model for multi-view representation learning.

__init__(config: MvrConfig, *args, **kwargs)[source]

Initializes a multi-vector bi-encoder model given a MultiVectorBiEncoderConfig.

Parameters:

config (MultiVectorBiEncoderConfig) – Configuration for the multi-vector bi-encoder model

Raises:
  • ValueError – If mask scoring tokens are specified in the configuration but the tokenizer is not available

  • ValueError – If the specified mask scoring tokens are not in the tokenizer vocab

Methods

__init__(config, *args, **kwargs)

Initializes a multi-vector bi-encoder model given a MultiVectorBiEncoderConfig.

encode(encoding, input_type)

Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.

scoring_mask(encoding, input_type)

Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Attributes

training

config_class

Configuration class for MVR models.

alias of MvrConfig

encode(encoding: BatchEncoding, input_type: 'query' | 'doc') BiEncoderEmbedding[source]

Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.

Parameters:
  • encoding (BatchEncoding) – Tokenizer encodings for the text sequence.

  • input_type (Literal["query", "doc"]) – Type of input, either “query” or “doc”.

Returns:

Embeddings and scoring mask.

Return type:

BiEncoderEmbedding

scoring_mask(encoding: BatchEncoding, input_type: 'query' | 'doc') Tensor[source]

Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Parameters:
  • encoding (BatchEncoding) – Tokenizer encodings for the text sequence.

  • input_type (Literal["query", "doc"]) – Type of input, either “query” or “doc”.

Returns:

Scoring mask.

Return type:

torch.Tensor