What Do You Want to Do?

Start here. Lightning IR supports four top-level workflows, exposed as sub-commands of the lightning-ir CLI and as methods on LightningIRTrainer.

                    ┌──────────────────────┐
                    │  What is your goal?  │
                    └──────────┬───────────┘
         ┌─────────────────────┼─────────────────────┐
         │                     │                     │
         ▼                     ▼                     ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│    Fine-Tune     │  │   Retrieve docs  │  │ Improve existing │
│     a model      │  │   from a large   │  │     rankings     │
│                  │  │    collection    │  │                  │
│      ► fit       │  │     ► index      │  │    ► re_rank     │
│                  │  │     ► search     │  │                  │
└──────────────────┘  └──────────────────┘  └──────────────────┘

The table below summarizes the key ingredients for each workflow.

Workflow

CLI Sub-command

Module Type

Dataset Type

Required Callback

Fine-tune a model

fit

BiEncoderModule or CrossEncoderModule

TupleDataset or RunDataset (train)

(none — optional ModelCheckpoint)

Index documents

index

BiEncoderModule

DocDataset

IndexCallback

Search (retrieve)

search

BiEncoderModule

QueryDataset

SearchCallback

Re-rank

re_rank

BiEncoderModule or CrossEncoderModule

RunDataset

ReRankCallback

Tip

A typical end-to-end pipeline chains several workflows:

  1. fit — Fine-tune a model

  2. index — Encode all documents into an index (bi-encoder only)

  3. search — Retrieve candidate documents for queries

  4. re_rank — Re-score candidates with a more powerful model (often a cross-encoder)

You can enter the pipeline at any point. For example, if you already have a fine-tuned model from the Model Zoo, skip straight to index or re_rank.

Continue with the next decision: