DocSample

class lightning_ir.data.data.DocSample(doc_id: str, doc: str)[source]

Bases: object

A sample of document data containing a document and its id.

doc_id

Id of the document.

Type:

str

doc

Document text.

Type:

str

__init__(doc_id: str, doc: str) None

Methods

__init__(doc_id, doc)

from_ir_dataset_sample(sample[, text_fields])

Create a DocSample from an ir_datasets sample.

Attributes

doc_id

doc

classmethod from_ir_dataset_sample(sample: GenericDoc, text_fields: Sequence[str] | None = None) DocSample[source]

Create a DocSample from an ir_datasets sample.

Parameters:
  • sample (GenericDoc) – ir_datasets sample.

  • text_fields (Sequence[str] | None) – Optional fields to parse the text. If None uses the sample’s default_text(). Defaults to None.

Returns:

Document sample.

Return type:

DocSample