ResidualCodec

class lightning_ir.retrieve.plaid.residual_codec.ResidualCodec(index_config: PlaidIndexConfig, centroids: torch.Tensor, bucket_cutoffs: torch.Tensor, bucket_weights: torch.Tensor, verbose: bool = False)[source]

Bases: object

Residual Codec for Plaid, a residual-based search method for efficient retrieval.

__init__(index_config: PlaidIndexConfig, centroids: torch.Tensor, bucket_cutoffs: torch.Tensor, bucket_weights: torch.Tensor, verbose: bool = False) → None[source]

Initialize the ResidualCodec.

Parameters:

index_config (PlaidIndexConfig) – Configuration for the Plaid indexer.
centroids (torch.Tensor) – The centroids used for indexing.
bucket_cutoffs (torch.Tensor) – The cutoffs for the residual buckets.
bucket_weights (torch.Tensor) – The weights for the residual buckets.
verbose (bool) – Whether to print verbose output. Defaults to False.

Methods

`__init__`(index_config, centroids, ...[, verbose])	Initialize the ResidualCodec.
`binarize`(residuals)	Binarize the residuals using the bucket cutoffs and weights.
`compress`(embeddings)	Compress embeddings into codes and residuals.
`compress_into_codes`(embeddings)	Compress embeddings into codes using the centroids.
`decompress`(codes, compressed_residuals)	Decompress the codes and residuals into embeddings.
`from_pretrained`(index_config, index_dir[, ...])	Load a ResidualCodec from the specified directory.
`save`(index_dir)	Save the ResidualCodec to the specified directory.
`train`(index_config, train_embeddings[, verbose])	Train the ResidualCodec using the provided training embeddings.
`try_load_torch_extensions`(use_gpu)	Load the necessary C++ extensions for the ResidualCodec.

Attributes

`dim`	Get the dimensionality of the centroids.
`num_centroids`	Get the number of centroids.

binarize(residuals: Tensor) → Tensor[source]

Binarize the residuals using the bucket cutoffs and weights.

Parameters:: residuals (torch.Tensor) – The residuals to binarize.
Returns:: The binarized residuals.
Return type:: torch.Tensor

compress(embeddings: Tensor) → Tuple[Tensor, Tensor][source]

Compress embeddings into codes and residuals.

Parameters:: embeddings (torch.Tensor) – The embeddings to compress.
Returns:: A tuple containing the compressed codes and residuals.
Return type:: Tuple[torch.Tensor, torch.Tensor]

compress_into_codes(embeddings: Tensor) → Tensor[source]

Compress embeddings into codes using the centroids.

Parameters:: embeddings (torch.Tensor) – The embeddings to compress.
Returns:: The compressed codes.
Return type:: torch.Tensor

decompress(codes: PackedTensor, compressed_residuals: PackedTensor) → PackedTensor[source]

Decompress the codes and residuals into embeddings.

Parameters:

codes (PackedTensor) – The packed tensor containing the codes.
compressed_residuals (PackedTensor) – The packed tensor containing the compressed residuals.

Returns:

The decompressed embeddings.

Return type:

PackedTensor

property dim: int: Get the dimensionality of the centroids.

classmethod from_pretrained(index_config: PlaidIndexConfig, index_dir: Path, device: torch.device | None = None) → ResidualCodec[source]

Load a ResidualCodec from the specified directory.

Parameters:

index_config (PlaidIndexConfig) – Configuration for the Plaid indexer.
index_dir (Path) – Directory containing the saved codec files.
device (torch.device | None) – Device to load the codec onto. Defaults to None, which uses the CPU.

Returns:

An instance of the ResidualCodec loaded from the specified directory.

Return type:

ResidualCodec

property num_centroids: int: Get the number of centroids.

save(index_dir: Path)[source]

Save the ResidualCodec to the specified directory.

Parameters:: index_dir (Path) – Directory to save the codec files.
Raises:: ValueError – If residual_codec is None.

classmethod train(index_config: PlaidIndexConfig, train_embeddings: torch.Tensor, verbose: bool = False) → ResidualCodec[source]

Train the ResidualCodec using the provided training embeddings.

Parameters:

index_config (PlaidIndexConfig) – Configuration for the Plaid indexer.
train_embeddings (torch.Tensor) – The embeddings to use for training the codec.
verbose (bool) – Whether to print verbose output. Defaults to False.

Returns:

An instance of the ResidualCodec trained on the provided embeddings.

Return type:

ResidualCodec

classmethod try_load_torch_extensions(use_gpu)[source]

Load the necessary C++ extensions for the ResidualCodec.

Parameters:

cls – The class to load the extensions for.
use_gpu (bool) – Whether to use GPU for the extensions.