SetEncoderConfig

class lightning_ir.models.set_encoder.SetEncoderConfig(*args, depth: int = 100, add_extra_token: bool = False, sample_missing_docs: bool = True, **kwargs)[source]

Bases: MonoConfig

Configuration class for a SetEncoder model.

__init__(*args, depth: int = 100, add_extra_token: bool = False, sample_missing_docs: bool = True, **kwargs)[source]

A SetEncoder model encodes a query and a set of documents jointly. Each document’s embedding is updated with context from the entire set, and a relevance score is computed per document using a linear layer.

Parameters:
  • depth (int) – Number of documents to encode per query. Defaults to 100.

  • add_extra_token (bool) – Whether to add an extra token to the input sequence to separate the query from the documents. Defaults to False.

  • sample_missing_docs (bool) – Whether to sample missing documents when the number of documents is less than the specified depth. Defaults to True.

Methods

__init__(*args[, depth, add_extra_token, ...])

A SetEncoder model encodes a query and a set of documents jointly.

Attributes

model_type

Model type for a SetEncoder model.

backbone_model_type: str | None = None

Backbone model type for the configuration. Set by LightningIRModelClassFactory().

classmethod from_pretrained(pretrained_model_name_or_path: str | Path, *args, **kwargs) LightningIRConfig

Loads the configuration from a pretrained model. Wraps the transformers.PretrainedConfig.from_pretrained

Parameters:

pretrained_model_name_or_path (str | Path) – Pretrained model name or path.

Returns:

Derived LightningIRConfig class.

Return type:

LightningIRConfig

Raises:

ValueError – If pretrained_model_name_or_path is not a Lightning IR model and no LightningIRConfig is passed.

get_tokenizer_kwargs(Tokenizer: Type[LightningIRTokenizer]) Dict[str, Any]

Returns the keyword arguments for the tokenizer. This method is used to pass the configuration parameters to the tokenizer.

Parameters:

Tokenizer (Type[LightningIRTokenizer]) – Class of the tokenizer to be used.

Returns:

Keyword arguments for the tokenizer.

Return type:

Dict[str, Any]

model_type: str = 'set-encoder'

Model type for a SetEncoder model.

to_dict() Dict[str, Any]

Overrides the transformers.PretrainedConfig.to_dict method to include the added arguments and the backbone model type.

Returns:

Configuration dictionary.

Return type:

Dict[str, Any]