ResidualCodec
- class lightning_ir.retrieve.plaid.residual_codec.ResidualCodec(index_config: PlaidIndexConfig, centroids: torch.Tensor, bucket_cutoffs: torch.Tensor, bucket_weights: torch.Tensor, verbose: bool = False)[source]
Bases:
objectResidual Codec for Plaid, a residual-based search method for efficient retrieval.
- __init__(index_config: PlaidIndexConfig, centroids: torch.Tensor, bucket_cutoffs: torch.Tensor, bucket_weights: torch.Tensor, verbose: bool = False) None[source]
Initialize the ResidualCodec.
- Parameters:
index_config (PlaidIndexConfig) – Configuration for the Plaid indexer.
centroids (torch.Tensor) – The centroids used for indexing.
bucket_cutoffs (torch.Tensor) – The cutoffs for the residual buckets.
bucket_weights (torch.Tensor) – The weights for the residual buckets.
verbose (bool) – Whether to print verbose output. Defaults to False.
Methods
__init__(index_config, centroids, ...[, verbose])Initialize the ResidualCodec.
binarize(residuals)Binarize the residuals using the bucket cutoffs and weights.
compress(embeddings)Compress embeddings into codes and residuals.
compress_into_codes(embeddings)Compress embeddings into codes using the centroids.
decompress(codes, compressed_residuals)Decompress the codes and residuals into embeddings.
from_pretrained(index_config, index_dir[, ...])Load a ResidualCodec from the specified directory.
save(index_dir)Save the ResidualCodec to the specified directory.
train(index_config, train_embeddings[, verbose])Train the ResidualCodec using the provided training embeddings.
try_load_torch_extensions(use_gpu)Load the necessary C++ extensions for the ResidualCodec.
Attributes
Get the dimensionality of the centroids.
Get the number of centroids.
- binarize(residuals: Tensor) Tensor[source]
Binarize the residuals using the bucket cutoffs and weights.
- Parameters:
residuals (torch.Tensor) – The residuals to binarize.
- Returns:
The binarized residuals.
- Return type:
torch.Tensor
- compress(embeddings: Tensor) Tuple[Tensor, Tensor][source]
Compress embeddings into codes and residuals.
- Parameters:
embeddings (torch.Tensor) – The embeddings to compress.
- Returns:
A tuple containing the compressed codes and residuals.
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- compress_into_codes(embeddings: Tensor) Tensor[source]
Compress embeddings into codes using the centroids.
- Parameters:
embeddings (torch.Tensor) – The embeddings to compress.
- Returns:
The compressed codes.
- Return type:
torch.Tensor
- decompress(codes: PackedTensor, compressed_residuals: PackedTensor) PackedTensor[source]
Decompress the codes and residuals into embeddings.
- Parameters:
codes (PackedTensor) – The packed tensor containing the codes.
compressed_residuals (PackedTensor) – The packed tensor containing the compressed residuals.
- Returns:
The decompressed embeddings.
- Return type:
- classmethod from_pretrained(index_config: PlaidIndexConfig, index_dir: Path, device: torch.device | None = None) ResidualCodec[source]
Load a ResidualCodec from the specified directory.
- Parameters:
index_config (PlaidIndexConfig) – Configuration for the Plaid indexer.
index_dir (Path) – Directory containing the saved codec files.
device (torch.device | None) – Device to load the codec onto. Defaults to None, which uses the CPU.
- Returns:
An instance of the ResidualCodec loaded from the specified directory.
- Return type:
- save(index_dir: Path)[source]
Save the ResidualCodec to the specified directory.
- Parameters:
index_dir (Path) – Directory to save the codec files.
- Raises:
ValueError – If residual_codec is None.
- classmethod train(index_config: PlaidIndexConfig, train_embeddings: torch.Tensor, verbose: bool = False) ResidualCodec[source]
Train the ResidualCodec using the provided training embeddings.
- Parameters:
index_config (PlaidIndexConfig) – Configuration for the Plaid indexer.
train_embeddings (torch.Tensor) – The embeddings to use for training the codec.
verbose (bool) – Whether to print verbose output. Defaults to False.
- Returns:
An instance of the ResidualCodec trained on the provided embeddings.
- Return type:
- classmethod try_load_torch_extensions(use_gpu)[source]
Load the necessary C++ extensions for the ResidualCodec.
- Parameters:
cls – The class to load the extensions for.
use_gpu (bool) – Whether to use GPU for the extensions.