Sampler
- class lightning_ir.data.dataset.Sampler[source]
Bases:
objectHelper class for sampling subsets of documents from a ranked list.
Methods
log_random(documents, sample_size)Sampling strategy to randomly sample documents with a higher probability to sample documents from the top of the ranking.
random(documents, sample_size)Sampling strategy to randomly sample
sample_sizedocuments.sample(df, sample_size, sampling_strategy)Samples a subset of documents from a ranked list given a sampling_strategy.
single_relevant(documents, sample_size)Sampling strategy to randomly sample a single relevant document.
top(documents, sample_size)Sampling strategy to randomly sample a single relevant document.
top_and_random(documents, sample_size)Sampling strategy to randomly sample half the
sample_sizedocuments from the top of the ranking and the other half randomly.- static log_random(documents: DataFrame, sample_size: int) DataFrame[source]
Sampling strategy to randomly sample documents with a higher probability to sample documents from the top of the ranking.
- Parameters:
documents (pd.DataFrame) – Ranked list of documents.
sample_size (int) – Number of documents to sample.
- Returns:
Sampled documents.
- Return type:
pd.DataFrame
- static random(documents: DataFrame, sample_size: int) DataFrame[source]
Sampling strategy to randomly sample
sample_sizedocuments.- Parameters:
documents (pd.DataFrame) – Ranked list of documents.
sample_size (int) – Number of documents to sample.
- Returns:
Sampled documents.
- Return type:
pd.DataFrame
- static sample(df: DataFrame, sample_size: int, sampling_strategy: 'single_relevant' | 'top' | 'random' | 'log_random' | 'top_and_random') DataFrame[source]
Samples a subset of documents from a ranked list given a sampling_strategy.
- Parameters:
documents (pd.DataFrame) – Ranked list of documents.
sample_size (int) – Number of documents to sample.
- Returns:
Sampled documents.
- Return type:
pd.DataFrame
- static single_relevant(documents: DataFrame, sample_size: int) DataFrame[source]
Sampling strategy to randomly sample a single relevant document. The remaining
sample_size - 1are non-relevant.- Parameters:
documents (pd.DataFrame) – Ranked list of documents.
sample_size (int) – Number of documents to sample.
- Returns:
Sampled documents.
- Return type:
pd.DataFrame
- static top(documents: DataFrame, sample_size: int) DataFrame[source]
Sampling strategy to randomly sample a single relevant document. The remaining
sample_size - 1are non-relevant.- Parameters:
documents (pd.DataFrame) – Ranked list of documents.
sample_size (int) – Number of documents to sample.
- Returns:
Sampled documents.
- Return type:
pd.DataFrame
- static top_and_random(documents: DataFrame, sample_size: int) DataFrame[source]
Sampling strategy to randomly sample half the
sample_sizedocuments from the top of the ranking and the other half randomly.- Parameters:
documents (pd.DataFrame) – Ranked list of documents.
sample_size (int) – Number of documents to sample.
- Returns:
Sampled documents.
- Return type:
pd.DataFrame