ResidualCodec

class lightning_ir.retrieve.plaid.residual_codec.ResidualCodec(index_config: PlaidIndexConfig, centroids: torch.Tensor, bucket_cutoffs: torch.Tensor, bucket_weights: torch.Tensor, verbose: bool = False)[source]

Bases: object

Residual Codec for Plaid, a residual-based search method for efficient retrieval.

__init__(index_config: PlaidIndexConfig, centroids: torch.Tensor, bucket_cutoffs: torch.Tensor, bucket_weights: torch.Tensor, verbose: bool = False) None[source]

Initialize the ResidualCodec.

Parameters:
  • index_config (PlaidIndexConfig) – Configuration for the Plaid indexer.

  • centroids (torch.Tensor) – The centroids used for indexing.

  • bucket_cutoffs (torch.Tensor) – The cutoffs for the residual buckets.

  • bucket_weights (torch.Tensor) – The weights for the residual buckets.

  • verbose (bool) – Whether to print verbose output. Defaults to False.

Methods

__init__(index_config, centroids, ...[, verbose])

Initialize the ResidualCodec.

binarize(residuals)

Binarize the residuals using the bucket cutoffs and weights.

compress(embeddings)

Compress embeddings into codes and residuals.

compress_into_codes(embeddings)

Compress embeddings into codes using the centroids.

decompress(codes, compressed_residuals)

Decompress the codes and residuals into embeddings.

from_pretrained(index_config, index_dir[, ...])

Load a ResidualCodec from the specified directory.

save(index_dir)

Save the ResidualCodec to the specified directory.

train(index_config, train_embeddings[, verbose])

Train the ResidualCodec using the provided training embeddings.

try_load_torch_extensions(use_gpu)

Load the necessary C++ extensions for the ResidualCodec.

Attributes

dim

Get the dimensionality of the centroids.

num_centroids

Get the number of centroids.

binarize(residuals: Tensor) Tensor[source]

Binarize the residuals using the bucket cutoffs and weights.

Parameters:

residuals (torch.Tensor) – The residuals to binarize.

Returns:

The binarized residuals.

Return type:

torch.Tensor

compress(embeddings: Tensor) Tuple[Tensor, Tensor][source]

Compress embeddings into codes and residuals.

Parameters:

embeddings (torch.Tensor) – The embeddings to compress.

Returns:

A tuple containing the compressed codes and residuals.

Return type:

Tuple[torch.Tensor, torch.Tensor]

compress_into_codes(embeddings: Tensor) Tensor[source]

Compress embeddings into codes using the centroids.

Parameters:

embeddings (torch.Tensor) – The embeddings to compress.

Returns:

The compressed codes.

Return type:

torch.Tensor

decompress(codes: PackedTensor, compressed_residuals: PackedTensor) PackedTensor[source]

Decompress the codes and residuals into embeddings.

Parameters:
  • codes (PackedTensor) – The packed tensor containing the codes.

  • compressed_residuals (PackedTensor) – The packed tensor containing the compressed residuals.

Returns:

The decompressed embeddings.

Return type:

PackedTensor

property dim: int

Get the dimensionality of the centroids.

classmethod from_pretrained(index_config: PlaidIndexConfig, index_dir: Path, device: torch.device | None = None) ResidualCodec[source]

Load a ResidualCodec from the specified directory.

Parameters:
  • index_config (PlaidIndexConfig) – Configuration for the Plaid indexer.

  • index_dir (Path) – Directory containing the saved codec files.

  • device (torch.device | None) – Device to load the codec onto. Defaults to None, which uses the CPU.

Returns:

An instance of the ResidualCodec loaded from the specified directory.

Return type:

ResidualCodec

property num_centroids: int

Get the number of centroids.

save(index_dir: Path)[source]

Save the ResidualCodec to the specified directory.

Parameters:

index_dir (Path) – Directory to save the codec files.

Raises:

ValueError – If residual_codec is None.

classmethod train(index_config: PlaidIndexConfig, train_embeddings: torch.Tensor, verbose: bool = False) ResidualCodec[source]

Train the ResidualCodec using the provided training embeddings.

Parameters:
  • index_config (PlaidIndexConfig) – Configuration for the Plaid indexer.

  • train_embeddings (torch.Tensor) – The embeddings to use for training the codec.

  • verbose (bool) – Whether to print verbose output. Defaults to False.

Returns:

An instance of the ResidualCodec trained on the provided embeddings.

Return type:

ResidualCodec

classmethod try_load_torch_extensions(use_gpu)[source]

Load the necessary C++ extensions for the ResidualCodec.

Parameters:
  • cls – The class to load the extensions for.

  • use_gpu (bool) – Whether to use GPU for the extensions.