MvrModel
- class lightning_ir.models.bi_encoders.mvr.MvrModel(config: MvrConfig, *args, **kwargs)[source]
Bases:
MultiVectorBiEncoderModelMVR model for multi-view representation learning.
- __init__(config: MvrConfig, *args, **kwargs)[source]
Initializes a multi-vector bi-encoder model given a
MultiVectorBiEncoderConfig.- Parameters:
config (MultiVectorBiEncoderConfig) – Configuration for the multi-vector bi-encoder model
- Raises:
ValueError – If mask scoring tokens are specified in the configuration but the tokenizer is not available
ValueError – If the specified mask scoring tokens are not in the tokenizer vocab
Methods
__init__(config, *args, **kwargs)Initializes a multi-vector bi-encoder model given a
MultiVectorBiEncoderConfig.encode(encoding, input_type)Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.
scoring_mask(encoding, input_type)Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.
Attributes
training- config_class
Configuration class for MVR models.
alias of
MvrConfig
- encode(encoding: BatchEncoding, input_type: 'query' | 'doc') BiEncoderEmbedding[source]
Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.
- Parameters:
encoding (BatchEncoding) – Tokenizer encodings for the text sequence.
input_type (Literal["query", "doc"]) – Type of input, either “query” or “doc”.
- Returns:
Embeddings and scoring mask.
- Return type:
- scoring_mask(encoding: BatchEncoding, input_type: 'query' | 'doc') Tensor[source]
Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.
- Parameters:
encoding (BatchEncoding) – Tokenizer encodings for the text sequence.
input_type (Literal["query", "doc"]) – Type of input, either “query” or “doc”.
- Returns:
Scoring mask.
- Return type:
torch.Tensor