LightningIRCLI

class lightning_ir.main.LightningIRCLI(model_class: type[~lightning.pytorch.core.module.LightningModule] | ~typing.Callable[[...], ~lightning.pytorch.core.module.LightningModule] | None = None, datamodule_class: type[~lightning.pytorch.core.datamodule.LightningDataModule] | ~typing.Callable[[...], ~lightning.pytorch.core.datamodule.LightningDataModule] | None = None, save_config_callback: type[~lightning.pytorch.cli.SaveConfigCallback] | None = <class 'lightning.pytorch.cli.SaveConfigCallback'>, save_config_kwargs: dict[str, ~typing.Any] | None = None, trainer_class: type[~lightning.pytorch.trainer.trainer.Trainer] | ~typing.Callable[[...], ~lightning.pytorch.trainer.trainer.Trainer] = <class 'lightning.pytorch.trainer.trainer.Trainer'>, trainer_defaults: dict[str, ~typing.Any] | None = None, seed_everything_default: bool | int = True, parser_kwargs: dict[str, ~typing.Any] | dict[str, dict[str, ~typing.Any]] | None = None, parser_class: type[~lightning.pytorch.cli.LightningArgumentParser] = <class 'lightning.pytorch.cli.LightningArgumentParser'>, subclass_mode_model: bool = False, subclass_mode_data: bool = False, args: list[str] | dict[str, ~typing.Any] | ~jsonargparse._namespace.Namespace | None = None, run: bool = True, auto_configure_optimizers: bool = True, load_from_checkpoint_support: bool = True)[source]

Bases: LightningCLI

Lightning IR Command Line Interface that extends PyTorch LightningCLI for information retrieval tasks.

This CLI provides a unified command-line interface for fine-tuning neural ranking models and running information retrieval experiments. It extends the PyTorch LightningCLI with IR-specific subcommands and automatic configuration management for seamless integration between models, data, and training.

Examples

Command line usage:

# Fine-tune a model
lightning-ir fit --config fine-tune.yaml

# Index documents
lightning-ir index --config index.yaml

# Search for documents
lightning-ir search --config search.yaml

# Re-rank documents
lightning-ir re_rank --config re-rank.yaml

# Generate default configuration
lightning-ir fit --print_config > config.yaml

Programmatic usage:

from lightning_ir.main import LightningIRCLI, LightningIRTrainer, LightningIRSaveConfigCallback

# Create CLI instance
cli = LightningIRCLI(
    trainer_class=LightningIRTrainer,
    save_config_callback=LightningIRSaveConfigCallback,
    save_config_kwargs={"config_filename": "pl_config.yaml", "overwrite": True}
)

YAML configuration example:

model:
  class_path: lightning_ir.BiEncoderModule
  init_args:
    model_name_or_path: bert-base-uncased
    loss_functions:
      - class_path: lightning_ir.InBatchCrossEntropy

data:
  class_path: lightning_ir.LightningIRDataModule
  init_args:
    train_dataset:
      class_path: lightning_ir.TupleDataset
      init_args:
        dataset_id: msmarco-passage/train/triples-small
    train_batch_size: 32

trainer:
  max_steps: 100000
  precision: "16-mixed"

optimizer:
  class_path: torch.optim.AdamW
  init_args:
    lr: 5e-5

Note

  • Automatically links model and data configurations (model_name_or_path, config)

  • Links trainer max_steps to learning rate scheduler num_training_steps

  • Supports all PyTorch Lightning CLI features including class path instantiation

  • Built-in support for warmup learning rate schedulers

  • Saves configuration files automatically during training

Methods

add_arguments_to_parser(parser)

Add Lightning IR specific arguments and links to the CLI parser.

configure_optimizers(lightning_module, optimizer)

Configure optimizers and learning rate schedulers for Lightning training.

subcommands()

Defines the list of available subcommands and the arguments to skip.

add_arguments_to_parser(parser)[source]

Add Lightning IR specific arguments and links to the CLI parser.

This method extends the base Lightning CLI parser with IR-specific learning rate schedulers and automatically links related configuration arguments to ensure consistency between model, data, and trainer configurations.

Parameters:

parser – The CLI argument parser to extend.

Note

Automatic argument linking: - model.init_args.model_name_or_path -> data.init_args.model_name_or_path - model.init_args.config -> data.init_args.config - trainer.max_steps -> lr_scheduler.init_args.num_training_steps

static configure_optimizers(lightning_module: LightningModule, optimizer: Optimizer, lr_scheduler: WarmupLRScheduler | None = None) Any[source]

Configure optimizers and learning rate schedulers for Lightning training.

This method automatically configures the optimizer and learning rate scheduler combination for Lightning training. It handles warmup learning rate schedulers by setting the appropriate interval and returning the correct format expected by Lightning.

Parameters:
  • lightning_module (LightningModule) – The Lightning module being trained.

  • optimizer (torch.optim.Optimizer) – The optimizer instance to use for training.

  • lr_scheduler (WarmupLRScheduler | None) – Optional warmup learning rate scheduler. If None, only the optimizer is returned.

Returns:

Either the optimizer alone (if no scheduler) or a tuple of optimizers and

schedulers list in Lightning’s expected format.

Return type:

Any

Note

  • Warmup schedulers automatically set the correct interval based on scheduler type

  • Returns format compatible with Lightning’s configure_optimizers method

static subcommands() Dict[str, Set[str]][source]

Defines the list of available subcommands and the arguments to skip.

Returns a dictionary mapping subcommand names to the set of configuration sections they require. This extends the base Lightning CLI with IR-specific subcommands for indexing, searching, and re-ranking operations.

Returns:

Dictionary mapping subcommand names to required config sections.
  • fit: Standard Lightning training subcommand with all sections

  • index: Document indexing requiring model, dataloaders, and datamodule

  • search: Document search requiring model, dataloaders, and datamodule

  • re_rank: Document re-ranking requiring model, dataloaders, and datamodule

Return type:

Dict[str, Set[str]]