MonoConfig

class lightning_ir.models.mono.MonoConfig(query_length: int = 32, doc_length: int = 512, pooling_strategy: Literal['first', 'mean', 'max', 'sum', 'bert_pool'] = 'first', linear_bias: bool = False, scoring_strategy: Literal['mono', 'rank'] = 'rank', tokenizer_pattern: str | None = None, **kwargs)[source]

Bases: CrossEncoderConfig

Configuration class for mono cross-encoder models.

__init__(query_length: int = 32, doc_length: int = 512, pooling_strategy: Literal['first', 'mean', 'max', 'sum', 'bert_pool'] = 'first', linear_bias: bool = False, scoring_strategy: Literal['mono', 'rank'] = 'rank', tokenizer_pattern: str | None = None, **kwargs)[source]

Initialize the configuration for mono cross-encoder models.

Parameters:
  • query_length (int) – Maximum query length. Defaults to 32.

  • doc_length (int) – Maximum document length. Defaults to 512.

  • pooling_strategy (Literal["first", "mean", "max", "sum", "bert_pool"]) – Pooling strategy for the embeddings. Defaults to “first”.

  • linear_bias (bool) – Whether to use bias in the final linear layer. Defaults to False.

  • scoring_strategy (Literal["mono", "rank"]) – Scoring strategy to use. Defaults to “rank”.

  • tokenizer_pattern (str | None) – Optional pattern for tokenization. Defaults to None.

Methods

__init__([query_length, doc_length, ...])

Initialize the configuration for mono cross-encoder models.

Attributes

model_type

Model type for mono cross-encoder models.

backbone_model_type: str | None = None

Backbone model type for the configuration. Set by LightningIRModelClassFactory().

classmethod from_pretrained(pretrained_model_name_or_path: str | Path, *args, **kwargs) LightningIRConfig

Loads the configuration from a pretrained model. Wraps the transformers.PretrainedConfig.from_pretrained

Parameters:

pretrained_model_name_or_path (str | Path) – Pretrained model name or path.

Returns:

Derived LightningIRConfig class.

Return type:

LightningIRConfig

Raises:

ValueError – If pretrained_model_name_or_path is not a Lightning IR model and no LightningIRConfig is passed.

get_tokenizer_kwargs(Tokenizer: Type[LightningIRTokenizer]) Dict[str, Any]

Returns the keyword arguments for the tokenizer. This method is used to pass the configuration parameters to the tokenizer.

Parameters:

Tokenizer (Type[LightningIRTokenizer]) – Class of the tokenizer to be used.

Returns:

Keyword arguments for the tokenizer.

Return type:

Dict[str, Any]

model_type: str = 'mono'

Model type for mono cross-encoder models.

to_dict() Dict[str, Any]

Overrides the transformers.PretrainedConfig.to_dict method to include the added arguments and the backbone model type.

Returns:

Configuration dictionary.

Return type:

Dict[str, Any]