TorchSparseIndexer
- class lightning_ir.retrieve.pytorch.sparse_indexer.TorchSparseIndexer(index_dir: Path, index_config: TorchSparseIndexConfig, module: BiEncoderModule, verbose: bool = False)[source]
Bases:
IndexerSparse indexer for bi-encoder models using PyTorch.
- __init__(index_dir: Path, index_config: TorchSparseIndexConfig, module: BiEncoderModule, verbose: bool = False) None[source]
Initialize the TorchSparseIndexer.
- Parameters:
index_dir (Path) – Directory to store the index.
index_config (TorchSparseIndexConfig) – Configuration for the sparse index.
module (BiEncoderModule) – The bi-encoder module to use for indexing.
verbose (bool) – Whether to print verbose output. Defaults to False.
Methods
__init__(index_dir, index_config, module[, ...])Initialize the TorchSparseIndexer.
add(index_batch, output)Add embeddings to the sparse index.
save()Save the sparse index to disk.
to_cpu()Move the index to CPU.
to_gpu()Move the index to GPU if available.
to_sparse_csr(embeddings)Convert embeddings to sparse CSR format.
- add(index_batch: IndexBatch, output: BiEncoderOutput) None[source]
Add embeddings to the sparse index.
- Parameters:
index_batch (IndexBatch) – The batch containing the embeddings to index.
output (BiEncoderOutput) – The output from the bi-encoder model containing embeddings.
- Raises:
ValueError – If doc_embeddings are not present in the output.
- save() None[source]
Save the sparse index to disk.
- to_cpu() None[source]
Move the index to CPU.
- to_gpu() None[source]
Move the index to GPU if available.
- static to_sparse_csr(embeddings: Tensor) Tensor[source]
Convert embeddings to sparse CSR format.
- Parameters:
embeddings (torch.Tensor) – The embeddings tensor to convert.
- Returns:
- Crow indices, column indices, and values of the sparse
matrix.
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]