LightningIRDataModule
- class lightning_ir.data.datamodule.LightningIRDataModule(train_dataset: RunDataset | TupleDataset | None = None, train_batch_size: int | None = None, shuffle_train: bool = True, inference_datasets: Sequence[RunDataset | TupleDataset | QueryDataset | DocDataset] | None = None, inference_batch_size: int | None = None, num_workers: int = 0)[source]
Bases:
LightningDataModule- __init__(train_dataset: RunDataset | TupleDataset | None = None, train_batch_size: int | None = None, shuffle_train: bool = True, inference_datasets: Sequence[RunDataset | TupleDataset | QueryDataset | DocDataset] | None = None, inference_batch_size: int | None = None, num_workers: int = 0) None[source]
Initializes a new Lightning IR DataModule.
- Parameters:
train_dataset (RunDataset | TupleDataset | None) – A training dataset. Defaults to None.
train_batch_size (int | None) – Batch size to use for training. Defaults to None.
shuffle_train (bool) – Whether to shuffle the training data. Defaults to True.
inference_datasets (Sequence[RunDataset | TupleDataset | QueryDataset | DocDataset] | None) – List of datasets to use for inference (indexing, searching, and re-ranking). Defaults to None.
inference_batch_size (int | None) – Batch size to use for inference. Defaults to None.
num_workers (int) – Number of workers for loading data in parallel. Defaults to 0.
Methods
__init__([train_dataset, train_batch_size, ...])Initializes a new Lightning IR DataModule.
Returns a list of dataloaders for inference (validation, testing, or predicting).
Returns a list of dataloaders for predicting.
Downloads the data using ir_datasets if needed.
setup(stage)Sets up the data module for a given stage.
Returns a list of dataloaders for testing.
Returns a dataloader for training.
Returns a list of dataloaders for validation.
Attributes
- inference_dataloader() List[DataLoader][source]
Returns a list of dataloaders for inference (validation, testing, or predicting).
- Returns:
Dataloaders for inference.
- Return type:
List[DataLoader]
- predict_dataloader() Any[source]
Returns a list of dataloaders for predicting.
- Returns:
Dataloaders for predicting.
- Return type:
List[DataLoader]
- prepare_data() None[source]
Downloads the data using ir_datasets if needed.
- setup(stage: 'fit' | 'validate' | 'test') None[source]
Sets up the data module for a given stage.
- Parameters:
stage (Literal["fit", "validate", "test"]) – Stage to set up the data module for.
- Raises:
ValueError – If the stage is fit and no training dataset is provided.
- test_dataloader() List[DataLoader][source]
Returns a list of dataloaders for testing.
- Returns:
Dataloaders for testing.
- Return type:
List[DataLoader]
- train_dataloader() DataLoader[source]
Returns a dataloader for training.
- Returns:
Dataloader for training.
- Return type:
DataLoader
- Raises:
ValueError – If no training dataset is found.
- val_dataloader() List[DataLoader][source]
Returns a list of dataloaders for validation.
- Returns:
Dataloaders for validation.
- Return type:
List[DataLoader]