ColModel

class lightning_ir.models.bi_encoders.col.ColModel(config: ColConfig, *args, **kwargs)[source]

Bases: MultiVectorBiEncoderModel

Multi-vector late-interaction Col model. See ColConfig for configuration options.

__init__(config: ColConfig, *args, **kwargs) None[source]

Initializes a Col model given a ColConfig.

Parameters:

config (ColConfig) – Configuration for the Col model.

Raises:

ValueError – If the embedding dimension is not specified in the configuration.

Methods

__init__(config, *args, **kwargs)

Initializes a Col model given a ColConfig.

encode(encoding, input_type)

Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.

scoring_mask(encoding, input_type)

Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Attributes

training

config_class

Configuration class for the Col model.

alias of ColConfig

encode(encoding: BatchEncoding, input_type: 'query' | 'doc') BiEncoderEmbedding[source]

Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.

Parameters:
  • encoding (BatchEncoding) – Tokenizer encodings for the text sequence.

  • input_type (Literal["query", "doc"]) – Type of input, either “query” or “doc”.

Returns:

Embeddings and scoring mask.

Return type:

BiEncoderEmbedding

scoring_mask(encoding: BatchEncoding, input_type: 'query' | 'doc') Tensor[source]

Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Parameters:
  • encoding (BatchEncoding) – Tokenizer encodings for the text sequence.

  • input_type (Literal["query", "doc"]) – Type of input, either “query” or “doc”.

Returns:

Scoring mask.

Return type:

torch.Tensor