ColModel
- class lightning_ir.models.bi_encoders.col.ColModel(config: ColConfig, *args, **kwargs)[source]
Bases:
MultiVectorBiEncoderModelMulti-vector late-interaction Col model. See
ColConfigfor configuration options.- __init__(config: ColConfig, *args, **kwargs) None[source]
Initializes a Col model given a
ColConfig.- Parameters:
config (ColConfig) – Configuration for the Col model.
- Raises:
ValueError – If the embedding dimension is not specified in the configuration.
Methods
__init__(config, *args, **kwargs)Initializes a Col model given a
ColConfig.encode(encoding, input_type)Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.
scoring_mask(encoding, input_type)Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.
Attributes
training- config_class
Configuration class for the Col model.
alias of
ColConfig
- encode(encoding: BatchEncoding, input_type: 'query' | 'doc') BiEncoderEmbedding[source]
Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.
- Parameters:
encoding (BatchEncoding) – Tokenizer encodings for the text sequence.
input_type (Literal["query", "doc"]) – Type of input, either “query” or “doc”.
- Returns:
Embeddings and scoring mask.
- Return type:
- scoring_mask(encoding: BatchEncoding, input_type: 'query' | 'doc') Tensor[source]
Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.
- Parameters:
encoding (BatchEncoding) – Tokenizer encodings for the text sequence.
input_type (Literal["query", "doc"]) – Type of input, either “query” or “doc”.
- Returns:
Scoring mask.
- Return type:
torch.Tensor