LightningIRCLI
- class lightning_ir.main.LightningIRCLI(model_class: type[~lightning.pytorch.core.module.LightningModule] | ~typing.Callable[[...], ~lightning.pytorch.core.module.LightningModule] | None = None, datamodule_class: type[~lightning.pytorch.core.datamodule.LightningDataModule] | ~typing.Callable[[...], ~lightning.pytorch.core.datamodule.LightningDataModule] | None = None, save_config_callback: type[~lightning.pytorch.cli.SaveConfigCallback] | None = <class 'lightning.pytorch.cli.SaveConfigCallback'>, save_config_kwargs: dict[str, ~typing.Any] | None = None, trainer_class: type[~lightning.pytorch.trainer.trainer.Trainer] | ~typing.Callable[[...], ~lightning.pytorch.trainer.trainer.Trainer] = <class 'lightning.pytorch.trainer.trainer.Trainer'>, trainer_defaults: dict[str, ~typing.Any] | None = None, seed_everything_default: bool | int = True, parser_kwargs: dict[str, ~typing.Any] | dict[str, dict[str, ~typing.Any]] | None = None, parser_class: type[~lightning.pytorch.cli.LightningArgumentParser] = <class 'lightning.pytorch.cli.LightningArgumentParser'>, subclass_mode_model: bool = False, subclass_mode_data: bool = False, args: list[str] | dict[str, ~typing.Any] | ~jsonargparse._namespace.Namespace | None = None, run: bool = True, auto_configure_optimizers: bool = True, load_from_checkpoint_support: bool = True)[source]
Bases:
LightningCLILightning IR Command Line Interface that extends PyTorch LightningCLI for information retrieval tasks.
This CLI provides a unified command-line interface for fine-tuning neural ranking models and running information retrieval experiments. It extends the PyTorch LightningCLI with IR-specific subcommands and automatic configuration management for seamless integration between models, data, and training.
Examples
Command line usage:
# Fine-tune a model lightning-ir fit --config fine-tune.yaml # Index documents lightning-ir index --config index.yaml # Search for documents lightning-ir search --config search.yaml # Re-rank documents lightning-ir re_rank --config re-rank.yaml # Generate default configuration lightning-ir fit --print_config > config.yamlProgrammatic usage:
from lightning_ir.main import LightningIRCLI, LightningIRTrainer, LightningIRSaveConfigCallback # Create CLI instance cli = LightningIRCLI( trainer_class=LightningIRTrainer, save_config_callback=LightningIRSaveConfigCallback, save_config_kwargs={"config_filename": "pl_config.yaml", "overwrite": True} )YAML configuration example:
model: class_path: lightning_ir.BiEncoderModule init_args: model_name_or_path: bert-base-uncased loss_functions: - class_path: lightning_ir.InBatchCrossEntropy data: class_path: lightning_ir.LightningIRDataModule init_args: train_dataset: class_path: lightning_ir.TupleDataset init_args: dataset_id: msmarco-passage/train/triples-small train_batch_size: 32 trainer: max_steps: 100000 precision: "16-mixed" optimizer: class_path: torch.optim.AdamW init_args: lr: 5e-5Note
Automatically links model and data configurations (model_name_or_path, config)
Links trainer max_steps to learning rate scheduler num_training_steps
Supports all PyTorch Lightning CLI features including class path instantiation
Built-in support for warmup learning rate schedulers
Saves configuration files automatically during training
Methods
add_arguments_to_parser(parser)Add Lightning IR specific arguments and links to the CLI parser.
configure_optimizers(lightning_module, optimizer)Configure optimizers and learning rate schedulers for Lightning training.
Defines the list of available subcommands and the arguments to skip.
- add_arguments_to_parser(parser)[source]
Add Lightning IR specific arguments and links to the CLI parser.
This method extends the base Lightning CLI parser with IR-specific learning rate schedulers and automatically links related configuration arguments to ensure consistency between model, data, and trainer configurations.
- Parameters:
parser – The CLI argument parser to extend.
Note
Automatic argument linking: - model.init_args.model_name_or_path -> data.init_args.model_name_or_path - model.init_args.config -> data.init_args.config - trainer.max_steps -> lr_scheduler.init_args.num_training_steps
- static configure_optimizers(lightning_module: LightningModule, optimizer: Optimizer, lr_scheduler: WarmupLRScheduler | None = None) Any[source]
Configure optimizers and learning rate schedulers for Lightning training.
This method automatically configures the optimizer and learning rate scheduler combination for Lightning training. It handles warmup learning rate schedulers by setting the appropriate interval and returning the correct format expected by Lightning.
- Parameters:
lightning_module (LightningModule) – The Lightning module being trained.
optimizer (torch.optim.Optimizer) – The optimizer instance to use for training.
lr_scheduler (WarmupLRScheduler | None) – Optional warmup learning rate scheduler. If None, only the optimizer is returned.
- Returns:
- Either the optimizer alone (if no scheduler) or a tuple of optimizers and
schedulers list in Lightning’s expected format.
- Return type:
Any
Note
Warmup schedulers automatically set the correct interval based on scheduler type
Returns format compatible with Lightning’s configure_optimizers method
- static subcommands() Dict[str, Set[str]][source]
Defines the list of available subcommands and the arguments to skip.
Returns a dictionary mapping subcommand names to the set of configuration sections they require. This extends the base Lightning CLI with IR-specific subcommands for indexing, searching, and re-ranking operations.
- Returns:
- Dictionary mapping subcommand names to required config sections.
fit: Standard Lightning training subcommand with all sections
index: Document indexing requiring model, dataloaders, and datamodule
search: Document search requiring model, dataloaders, and datamodule
re_rank: Document re-ranking requiring model, dataloaders, and datamodule
- Return type:
Dict[str, Set[str]]