LightningIRDataModule

class lightning_ir.data.datamodule.LightningIRDataModule(train_dataset: RunDataset | TupleDataset | None = None, train_batch_size: int | None = None, shuffle_train: bool = True, inference_datasets: Sequence[RunDataset | TupleDataset | QueryDataset | DocDataset] | None = None, inference_batch_size: int | None = None, num_workers: int = 0)[source]

Bases: LightningDataModule

__init__(train_dataset: RunDataset | TupleDataset | None = None, train_batch_size: int | None = None, shuffle_train: bool = True, inference_datasets: Sequence[RunDataset | TupleDataset | QueryDataset | DocDataset] | None = None, inference_batch_size: int | None = None, num_workers: int = 0) None[source]

Initializes a new Lightning IR DataModule.

Parameters:
  • train_dataset (RunDataset | TupleDataset | None) – A training dataset. Defaults to None.

  • train_batch_size (int | None) – Batch size to use for training. Defaults to None.

  • shuffle_train (bool) – Whether to shuffle the training data. Defaults to True.

  • inference_datasets (Sequence[RunDataset | TupleDataset | QueryDataset | DocDataset] | None) – List of datasets to use for inference (indexing, searching, and re-ranking). Defaults to None.

  • inference_batch_size (int | None) – Batch size to use for inference. Defaults to None.

  • num_workers (int) – Number of workers for loading data in parallel. Defaults to 0.

Methods

__init__([train_dataset, train_batch_size, ...])

Initializes a new Lightning IR DataModule.

inference_dataloader()

Returns a list of dataloaders for inference (validation, testing, or predicting).

predict_dataloader()

Returns a list of dataloaders for predicting.

prepare_data()

Downloads the data using ir_datasets if needed.

setup(stage)

Sets up the data module for a given stage.

test_dataloader()

Returns a list of dataloaders for testing.

train_dataloader()

Returns a dataloader for training.

val_dataloader()

Returns a list of dataloaders for validation.

Attributes

inference_dataloader() List[DataLoader][source]

Returns a list of dataloaders for inference (validation, testing, or predicting).

Returns:

Dataloaders for inference.

Return type:

List[DataLoader]

predict_dataloader() Any[source]

Returns a list of dataloaders for predicting.

Returns:

Dataloaders for predicting.

Return type:

List[DataLoader]

prepare_data() None[source]

Downloads the data using ir_datasets if needed.

setup(stage: 'fit' | 'validate' | 'test') None[source]

Sets up the data module for a given stage.

Parameters:

stage (Literal["fit", "validate", "test"]) – Stage to set up the data module for.

Raises:

ValueError – If the stage is fit and no training dataset is provided.

test_dataloader() List[DataLoader][source]

Returns a list of dataloaders for testing.

Returns:

Dataloaders for testing.

Return type:

List[DataLoader]

train_dataloader() DataLoader[source]

Returns a dataloader for training.

Returns:

Dataloader for training.

Return type:

DataLoader

Raises:

ValueError – If no training dataset is found.

val_dataloader() List[DataLoader][source]

Returns a list of dataloaders for validation.

Returns:

Dataloaders for validation.

Return type:

List[DataLoader]