TupleDataset

class lightning_ir.data.dataset.TupleDataset(tuples_dataset: str, targets: 'order' | 'score' = 'order', num_docs: int | None = None)[source]

Bases: IRDataset, IterableDataset

__init__(tuples_dataset: str, targets: 'order' | 'score' = 'order', num_docs: int | None = None) None[source]

Dataset containing tuples of a query and n-documents. Used for fine-tuning models on ranking tasks.

Parameters:
  • tuples_dataset (str) – Path to file containing tuples or valid ir_datasets id.

  • targets (Literal["order", "score"], optional) – Data type to use as targets for a model during fine-tuning. Defaults to “order”.

  • num_docs (int | None, optional) – Maximum number of documents per query. Defaults to None.

Methods

__init__(tuples_dataset[, targets, num_docs])

Dataset containing tuples of a query and n-documents.

prepare_data()

Downloads tuples using ir_datasets if needed.

Attributes

prepare_data() None[source]

Downloads tuples using ir_datasets if needed.