Model Zoo

The following table lists models from the HuggingFace Model Hub that are supported in Lightning IR.

Native models were fine-tuned using Lightning IR and the model’s HuggingFace model card provides Lightning IR configurations for reproduction. Non-native models were fine-tuned externally but are supported in Lightning IR for inference.

Reranking Results

For each model, the table reports the re-ranking effectiveness in terms of nDCG@10 on the officially released run files containing 1,000 passages for TREC Deep Learning 2019 and 2020.

Reproduction

The following command can be used to reproduce the results:

lightning-ir re_rank --config config.yaml
config.yaml
trainer:
  logger: false
  enable_checkpointing: false
model:
  class_path: CrossEncoderModule # for cross-encoders
  # class_path: BiEncoderModule # for bi-encoders
  init_args:
    model_name_or_path: {MODEL_NAME}
    evaluation_metrics:
    - nDCG@10
data:
  class_path: LightningIRDataModule
  init_args:
    inference_datasets:
    - class_path: RunDataset
      init_args:
      run_path_or_id: msmarco-passage/trec-dl-2019/judged
    - class_path: RunDataset
      init_args:
      run_path_or_id: msmarco-passage/trec-dl-2020/judged

Model Name

Native

TREC DL 2019

TREC DL 2020

Cross-Encoders

webis/monoelectra-base

0.751

0.769

webis/monoelectra-large

0.750

0.791

castorini/monot5-base-msmarco

0.723

0.714

castorini/monot5-large-msmarco

0.720

0.728

castorini/monot5-3b-msmarco

0.726

0.752

Soyoung97/RankT5-base

0.734

0.745

Soyoung97/RankT5-large

0.737

0.759

Soyoung97/RankT5-3b

0.721

0.776

Bi-Encoders

webis/bert-bi-encoder

0.711

0.714

sentence-transformers/msmarco-bert-base-dot-v5

0.705

0.735

webis/colbert

0.751

0.749

colbert-ir/colbertv2.0

0.732

0.746

lightonai/GTE-ModernColBERT-v1

0.722

0.737

webis/splade

0.736

0.723

naver/splade-v3

0.715

0.749

Retrieval Results

For each model, the table reports the retrieval effectiveness in terms of nDCG@10 on the NanoBEIR dataset. The scores are reported on the 13 individual datasets of NanoBEIR and averaged to get an overall score for each model.

Reproduction

The following command can be used to reproduce the results:

lightning-ir index --config index.yaml # for indexing
lightning-ir search --config search.yaml # for retrieval
index.yaml
trainer:
  accelerator: auto
  strategy: auto
  devices: auto
  num_nodes: 1
  logger: false
  callbacks:
    - class_path: IndexCallback
      init_args:
        index_dir: {INDEX_DIR}
        index_config:
          class_path: TorchDenseIndexConfig # for Bi-Encoder and ColBERT
          # class_path: TorchSparseIndexConfig # for SPLADE

model:
  class_path: BiEncoderModule
  init_args:
    model_name_or_path: {MODEL_NAME}
    evaluation_metrics:
    - nDCG@10
data:
  class_path: LightningIRDataModule
  init_args:
    num_workers: 1
    inference_batch_size: 128
    inference_datasets:
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/climate-fever
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/dbpedia-entity
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/fever
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/fiqa
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/hotpotqa
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/msmarco
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/nfcorpus
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/nq
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/quora
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/scidocs
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/arguana
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/scifact
    - class_path: DocDataset
      init_args:
        doc_dataset: nano-beir/webis-touche2020
search.yaml
trainer:
  accelerator: auto
  strategy: auto
  devices: auto
  num_nodes: 1
  logger: false
  callbacks:
    - class_path: SearchCallback
      init_args:
        search_config:
          class_path: TorchDenseSearchConfig # for Bi-Encoder and ColBERT
          # class_path: TorchSparseSearchConfig # for SPLADE
          init_args:
            k: 100
        index_dir: {INDEX_DIR}
        use_gpu: true
        save_dir: ./runs
model:
  class_path: BiEncoderModule
  init_args:
    model_name_or_path: {MODEL_NAME}
    evaluation_metrics:
    - nDCG@10
data:
  class_path: LightningIRDataModule
  init_args:
    num_workers: 1
    inference_batch_size: 8
    inference_datasets:
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/climate-fever
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/dbpedia-entity
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/fever
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/fiqa
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/hotpotqa
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/msmarco
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/nfcorpus
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/nq
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/quora
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/scidocs
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/arguana
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/scifact
    - class_path: QueryDataset
      init_args:
        query_dataset: nano-beir/webis-touche2020

Model Name

Native

NanoBEIR

Bi-Encoders

webis/bert-bi-encoder

0.549

climate-fever

0.268

dbpedia-entity

0.591

fever

0.874

fiqa

0.410

hotpotqa

0.733

msmarco

0.622

nfcorpus

0.312

nq

0.627

quora

0.918

scidocs

0.313

arguana

0.395

scifact

0.539

webis-touche2020

0.536

sentence-transformers/msmarco-bert-base-dot-v5

0.574

climate-fever

0.260

dbpedia-entity

0.606

fever

0.838

fiqa

0.446

hotpotqa

0.764

msmarco

0.649

nfcorpus

0.320

nq

0.634

quora

0.936

scidocs

0.315

arguana

0.497

scifact

0.650

webis-touche2020

0.544

webis/colbert

0.635

climate-fever

0.266

dbpedia-entity

0.709

fever

0.937

fiqa

0.504

hotpotqa

0.873

msmarco

0.663

nfcorpus

0.362

nq

0.749

quora

0.942

scidocs

0.374

arguana

0.535

scifact

0.731

webis-touche2020

0.611

colbert-ir/colbertv2.0

0.613

climate-fever

0.292

dbpedia-entity

0.675

fever

0.941

fiqa

0.458

hotpotqa

0.790

msmarco

0.684

nfcorpus

0.366

nq

0.682

quora

0.931

scidocs

0.329

arguana

0.510

scifact

0.698

webis-touche

0.609

lightonai/GTE-ModernColBERT-v1

0.697

climate-fever

0.420

dbpedia-entity

0.730

fever

0.945

fiqa

0.566

hotpotqa

0.901

msmarco

0.709

nfcorpus

0.398

nq

0.772

quora

0.970

scidocs

0.398

arguana

0.551

scifact

0.833

webis-touche2020

0.593

webis/splade

0.619

climate-fever

0.295

dbpedia-entity

0.669

fever

0.927

fiqa

0.491

hotpotqa

0.821

msmarco

0.669

nfcorpus

0.354

nq

0.714

quora

0.922

scidocs

0.355

arguana

0.490

scifact

0.754

webis-touche2020

0.579

naver/splade-v3

0.635

climate-fever

0.307

dbpedia-entity

0.687

fever

0.913

fiqa

0.514

hotpotqa

0.826

msmarco

0.700

nfcorpus

0.387

nq

0.728

quora

0.908

scidocs

0.363

arguana

0.509

scifact

0.803

webis-touche2020

0.613