LightningIRTokenizerClassFactory
- class lightning_ir.base.class_factory.LightningIRTokenizerClassFactory(MixinConfig: type[LightningIRConfig])[source]
Bases:
LightningIRClassFactoryClass factory for creating derived LightningIRTokenizer classes from HuggingFace tokenizer classes.
Methods
from_backbone_class(BackboneClass)Creates a derived LightningIRTokenizer from a transformers.PreTrainedTokenizerBase backbone tokenizer.
from_backbone_classes(BackboneClasses[, ...])Creates derived slow and fastLightningIRTokenizers from a tuple of backbone HuggingFace tokenizer classes.
from_pretrained(model_name_or_path, *args[, ...])Loads a derived LightningIRTokenizer from a pretrained HuggingFace tokenizer.
get_backbone_config(model_name_or_path)Grabs the tokenizer configuration class from a checkpoint of a pretrained HuggingFace tokenizer.
get_backbone_model_type(model_name_or_path, ...)Grabs the model type from a checkpoint of a pretrained HuggingFace tokenizer.
- from_backbone_class(BackboneClass: type[PreTrainedTokenizerBase]) type[LightningIRTokenizer][source]
Creates a derived LightningIRTokenizer from a transformers.PreTrainedTokenizerBase backbone tokenizer. If the backbone tokenizer is already a LightningIRTokenizer, it is returned as is.
- Parameters:
BackboneClass (type[PreTrainedTokenizerBase]) – Backbone tokenizer class.
- Returns:
Derived LightningIRTokenizer.
- Return type:
type[LightningIRTokenizer]
- from_backbone_classes(BackboneClasses: tuple[type[PreTrainedTokenizerBase] | None, type[PreTrainedTokenizerBase] | None], BackboneConfig: type[PretrainedConfig] | None = None) tuple[type[LightningIRTokenizer] | None, type[LightningIRTokenizer] | None][source]
Creates derived slow and fastLightningIRTokenizers from a tuple of backbone HuggingFace tokenizer classes.
- Parameters:
BackboneClasses (tuple[type[PreTrainedTokenizerBase] | None, type[PreTrainedTokenizerBase] | None]) – Slow and fast backbone tokenizer classes.
BackboneConfig (type[PretrainedConfig] | None, optional) – Backbone configuration class. Defaults to None.
- Returns:
Slow and fast derived LightningIRTokenizers.
- Return type:
tuple[type[LightningIRTokenizer] | None, type[LightningIRTokenizer] | None]
- from_pretrained(model_name_or_path: str | Path, *args, use_fast: bool = True, **kwargs) type[LightningIRTokenizer][source]
Loads a derived LightningIRTokenizer from a pretrained HuggingFace tokenizer.
- Parameters:
model_name_or_path (str | Path) – Path to the tokenizer or its name.
use_fast (bool, optional) – Whether to use the fast tokenizer. Defaults to True.
- Returns:
Derived LightningIRTokenizer.
- Return type:
type[LightningIRTokenizer]
- Raises:
ValueError – If no fast tokenizer is found when use_fast is True.
ValueError – If no slow tokenizer is found when use_fast is False.
- static get_backbone_config(model_name_or_path: str | Path) PretrainedConfig[source]
Grabs the tokenizer configuration class from a checkpoint of a pretrained HuggingFace tokenizer.
- Parameters:
model_name_or_path (str | Path) – Path to the tokenizer or its name.
- Returns:
Configuration class of the backbone tokenizer.
- Return type:
PretrainedConfig
- static get_backbone_model_type(model_name_or_path: str | Path, *args, **kwargs) str[source]
Grabs the model type from a checkpoint of a pretrained HuggingFace tokenizer.
- Parameters:
model_name_or_path (str | Path) – Path to the tokenizer or its name.
- Returns:
Model type of the backbone tokenizer.
- Return type:
str