ColModel
- class lightning_ir.models.bi_encoders.col.ColModel(config: ColConfig, *args, **kwargs)[source]
Bases:
MultiVectorBiEncoderModelMulti-vector late-interaction Col model. See
ColConfigfor configuration options.- __init__(config: ColConfig, *args, **kwargs) None[source]
Initializes a Col model given a
ColConfig.- Parameters:
config (ColConfig) – Configuration for the Col model.
- Raises:
ValueError – If the embedding dimension is not specified in the configuration.
Methods
__init__(config, *args, **kwargs)Initializes a Col model given a
ColConfig.encode(encoding, input_type)Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.
score(output[, num_docs])Compute relevance scores between queries and documents.
scoring_mask(encoding, input_type)Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.
Attributes
training- config_class
Configuration class for the Col model.
alias of
ColConfig
- encode(encoding: BatchEncoding, input_type: 'query' | 'doc') BiEncoderEmbedding[source]
Encodes a batched tokenized text sequences and returns the embeddings and scoring mask.
- Parameters:
encoding (BatchEncoding) – Tokenizer encodings for the text sequence.
input_type (Literal["query", "doc"]) – type of input, either “query” or “doc”.
- Returns:
Embeddings and scoring mask.
- Return type:
- score(output: BiEncoderOutput, num_docs: Sequence[int] | int | None = None) BiEncoderOutput[source]
Compute relevance scores between queries and documents.
- Parameters:
output (BiEncoderOutput) – Output containing embeddings and scoring mask.
num_docs (Sequence[int] | int | None) – Specifies how many documents are passed per query. If a sequence of integers, len(num_doc) should be equal to the number of queries and sum(num_docs) equal to the number of documents, i.e., the sequence contains one value per query specifying the number of documents for that query. If an integer, assumes an equal number of documents per query. If None, tries to infer the number of documents by dividing the number of documents by the number of queries. Defaults to None.
- Returns:
Output containing relevance scores.
- Return type:
- scoring_mask(encoding: BatchEncoding, input_type: 'query' | 'doc') Tensor[source]
Computes a scoring mask for batched tokenized text sequences which is used in the scoring function to mask out vectors during scoring.
- Parameters:
encoding (BatchEncoding) – Tokenizer encodings for the text sequence.
input_type (Literal["query", "doc"]) – type of input, either “query” or “doc”.
- Returns:
Scoring mask.
- Return type:
torch.Tensor