LightningIRTokenizerClassFactory
- class lightning_ir.base.class_factory.LightningIRTokenizerClassFactory(MixinConfig: Type[LightningIRConfig])[source]
Bases:
LightningIRClassFactoryClass factory for creating derived LightningIRTokenizer classes from HuggingFace tokenizer classes.
Methods
from_backbone_class(BackboneClass)Creates a derived LightningIRTokenizer from a transformers.PreTrainedTokenizerBase backbone tokenizer.
from_backbone_classes(BackboneClasses[, ...])Creates derived slow and fastLightningIRTokenizers from a tuple of backbone HuggingFace tokenizer classes.
from_pretrained(model_name_or_path, *args[, ...])Loads a derived LightningIRTokenizer from a pretrained HuggingFace tokenizer.
get_backbone_config(model_name_or_path)Grabs the tokenizer configuration class from a checkpoint of a pretrained HuggingFace tokenizer.
get_backbone_model_type(model_name_or_path, ...)Grabs the model type from a checkpoint of a pretrained HuggingFace tokenizer.
- from_backbone_class(BackboneClass: Type[PreTrainedTokenizerBase]) Type[LightningIRTokenizer][source]
Creates a derived LightningIRTokenizer from a transformers.PreTrainedTokenizerBase backbone tokenizer. If the backbone tokenizer is already a LightningIRTokenizer, it is returned as is.
- Parameters:
BackboneClass (Type[PreTrainedTokenizerBase]) – Backbone tokenizer class.
- Returns:
Derived LightningIRTokenizer.
- Return type:
Type[LightningIRTokenizer]
- from_backbone_classes(BackboneClasses: Tuple[Type[PreTrainedTokenizerBase] | None, Type[PreTrainedTokenizerBase] | None], BackboneConfig: Type[PretrainedConfig] | None = None) Tuple[Type[LightningIRTokenizer] | None, Type[LightningIRTokenizer] | None][source]
Creates derived slow and fastLightningIRTokenizers from a tuple of backbone HuggingFace tokenizer classes.
- Parameters:
BackboneClasses (Tuple[Type[PreTrainedTokenizerBase] | None, Type[PreTrainedTokenizerBase] | None]) – Slow and fast backbone tokenizer classes.
BackboneConfig (Type[PretrainedConfig] | None, optional) – Backbone configuration class. Defaults to None.
- Returns:
Slow and fast derived LightningIRTokenizers.
- Return type:
Tuple[Type[LightningIRTokenizer] | None, Type[LightningIRTokenizer] | None]
- from_pretrained(model_name_or_path: str | Path, *args, use_fast: bool = True, **kwargs) Type[LightningIRTokenizer][source]
Loads a derived LightningIRTokenizer from a pretrained HuggingFace tokenizer.
- Parameters:
model_name_or_path (str | Path) – Path to the tokenizer or its name.
use_fast (bool, optional) – Whether to use the fast tokenizer. Defaults to True.
- Returns:
Derived LightningIRTokenizer.
- Return type:
Type[LightningIRTokenizer]
- Raises:
ValueError – If no fast tokenizer is found when use_fast is True.
ValueError – If no slow tokenizer is found when use_fast is False.
- static get_backbone_config(model_name_or_path: str | Path) PretrainedConfig[source]
Grabs the tokenizer configuration class from a checkpoint of a pretrained HuggingFace tokenizer.
- Parameters:
model_name_or_path (str | Path) – Path to the tokenizer or its name.
- Returns:
Configuration class of the backbone tokenizer.
- Return type:
PretrainedConfig
- static get_backbone_model_type(model_name_or_path: str | Path, *args, **kwargs) str[source]
Grabs the model type from a checkpoint of a pretrained HuggingFace tokenizer.
- Parameters:
model_name_or_path (str | Path) – Path to the tokenizer or its name.
- Returns:
Model type of the backbone tokenizer.
- Return type:
str