ColModel

class lightning_ir.models.bi_encoders.col.ColModel(config: ColConfig, *args, **kwargs)[source]

Bases: MultiVectorBiEncoderModel

Multi-vector late-interaction Col model. See ColConfig for configuration options.

__init__(config: ColConfig, *args, **kwargs) None[source]

Initializes a Col model given a ColConfig.

Parameters:

config (ColConfig) – Configuration for the Col model.

Raises:

ValueError – If the embedding dimension is not specified in the configuration.

Methods

__init__(config, *args, **kwargs)

Initializes a Col model given a ColConfig.

encode(encoding, input_type)

Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.

score(output[, num_docs])

Compute relevance scores between queries and documents.

scoring_mask(encoding, input_type)

Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Attributes

training

config_class

Configuration class for the Col model.

alias of ColConfig

encode(encoding: BatchEncoding, input_type: 'query' | 'doc') BiEncoderEmbedding[source]

Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.

Parameters:
  • encoding (BatchEncoding) – Tokenizer encodings for the text sequence.

  • input_type (Literal["query", "doc"]) – type of input, either “query” or “doc”.

Returns:

Embeddings and scoring mask.

Return type:

BiEncoderEmbedding

score(output: BiEncoderOutput, num_docs: Sequence[int] | int | None = None) BiEncoderOutput[source]

Compute relevance scores between queries and documents.

Parameters:
  • output (BiEncoderOutput) – Output containing embeddings and scoring mask.

  • num_docs (Sequence[int] | int | None) – Specifies how many documents are passed per query. If a sequence of integers, len(num_doc) should be equal to the number of queries and sum(num_docs) equal to the number of documents, i.e., the sequence contains one value per query specifying the number of documents for that query. If an integer, assumes an equal number of documents per query. If None, tries to infer the number of documents by dividing the number of documents by the number of queries. Defaults to None.

Returns:

Output containing relevance scores.

Return type:

BiEncoderOutput

scoring_mask(encoding: BatchEncoding, input_type: 'query' | 'doc') Tensor[source]

Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.

Parameters:
  • encoding (BatchEncoding) – Tokenizer encodings for the text sequence.

  • input_type (Literal["query", "doc"]) – type of input, either “query” or “doc”.

Returns:

Scoring mask.

Return type:

torch.Tensor