TorchSparseIndexer

class lightning_ir.retrieve.pytorch.sparse_indexer.TorchSparseIndexer(index_dir: Path, index_config: TorchSparseIndexConfig, module: BiEncoderModule, verbose: bool = False)[source]

Bases: Indexer

Sparse indexer for bi-encoder models using PyTorch.

__init__(index_dir: Path, index_config: TorchSparseIndexConfig, module: BiEncoderModule, verbose: bool = False) None[source]

Initialize the TorchSparseIndexer.

Parameters:
  • index_dir (Path) – Directory to store the index.

  • index_config (TorchSparseIndexConfig) – Configuration for the sparse index.

  • module (BiEncoderModule) – The bi-encoder module to use for indexing.

  • verbose (bool) – Whether to print verbose output. Defaults to False.

Methods

__init__(index_dir, index_config, module[, ...])

Initialize the TorchSparseIndexer.

add(index_batch, output)

Add embeddings to the sparse index.

save()

Save the sparse index to disk.

to_cpu()

Move the index to CPU.

to_gpu()

Move the index to GPU if available.

to_sparse_csr(embeddings)

Convert embeddings to sparse CSR format.

add(index_batch: IndexBatch, output: BiEncoderOutput) None[source]

Add embeddings to the sparse index.

Parameters:
  • index_batch (IndexBatch) – The batch containing the embeddings to index.

  • output (BiEncoderOutput) – The output from the bi-encoder model containing embeddings.

Raises:

ValueError – If doc_embeddings are not present in the output.

save() None[source]

Save the sparse index to disk.

to_cpu() None[source]

Move the index to CPU.

to_gpu() None[source]

Move the index to GPU if available.

static to_sparse_csr(embeddings: Tensor) Tensor[source]

Convert embeddings to sparse CSR format.

Parameters:

embeddings (torch.Tensor) – The embeddings tensor to convert.

Returns:

Crow indices, column indices, and values of the sparse

matrix.

Return type:

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]