MvrTokenizer
- class lightning_ir.models.bi_encoders.mvr.MvrTokenizer(*args, query_length: int | None = 32, doc_length: int | None = 512, add_marker_tokens: bool = False, num_viewer_tokens: int = 8, **kwargs)[source]
Bases:
BiEncoderTokenizer- __init__(*args, query_length: int | None = 32, doc_length: int | None = 512, add_marker_tokens: bool = False, num_viewer_tokens: int = 8, **kwargs)[source]
LightningIRTokenizerfor bi-encoder models. Encodes queries and documents separately. Optionally adds marker tokens are added to encoded input sequences.- Parameters:
query_length (int | None) – Maximum number of tokens per query. If None does not truncate. Defaults to 32.
doc_length (int | None) – Maximum number of tokens per document. If None does not truncate. Defaults to 512.
add_marker_tokens (bool) – Whether to add marker tokens to the query and document input sequences. Defaults to False.
- Raises:
ValueError – If add_marker_tokens is True and a non-supported tokenizer is used.
Methods
__init__(*args[, query_length, doc_length, ...])LightningIRTokenizerfor bi-encoder models.viewer_token_id(viewer_token_id)The token id of the query token if marker tokens are added.
Attributes
- config_class
alias of
MvrConfig
- viewer_token_id(viewer_token_id: int) int | None[source]
The token id of the query token if marker tokens are added.
- Returns:
Token id of the query token
- Return type:
int | None