What Do You Want to Do?
Start here. Lightning IR supports four top-level workflows, exposed as
sub-commands of the lightning-ir CLI and as methods on
LightningIRTrainer.
┌──────────────────────┐
│ What is your goal? │
└──────────┬───────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Fine-Tune │ │ Retrieve docs │ │ Improve existing │
│ a model │ │ from a large │ │ rankings │
│ │ │ collection │ │ │
│ ► fit │ │ ► index │ │ ► re_rank │
│ │ │ ► search │ │ │
└──────────────────┘ └──────────────────┘ └──────────────────┘
The table below summarizes the key ingredients for each workflow.
Workflow |
CLI Sub-command |
Module Type |
Dataset Type |
Required Callback |
|---|---|---|---|---|
Fine-tune a model |
|
|
(none — optional ModelCheckpoint) |
|
Index documents |
|
|||
Search (retrieve) |
|
|||
Re-rank |
|
Tip
A typical end-to-end pipeline chains several workflows:
fit — Fine-tune a model
index — Encode all documents into an index (bi-encoder only)
search — Retrieve candidate documents for queries
re_rank — Re-score candidates with a more powerful model (often a cross-encoder)
You can enter the pipeline at any point. For example, if you already have a fine-tuned model from the Model Zoo, skip straight to index or re_rank.
Continue with the next decision:
Which Model Architecture to Use? — Pick a model architecture
Which Index Type to Use? — Pick an index type (bi-encoder only)
Which Loss Function to Use? — Pick a loss function for training
Which Dataset Format to Use? — Pick a dataset format
End-to-End Recipes — Jump straight to a complete end-to-end recipe