Reasoning with Foundation Models

Attribution Reasoning Survey December 26, 2023

overview

We organize the current foundation models into three categories: language foundation models, vision foundation models, and multimodal foundation models. Further, we elaborate the foundation models in reasoning tasks, including commonsense, mathematical, logical, causal, visual, audio, multimodal, agent reasoning, etc. Reasoning techniques, including pre-training, fine-tuning, alignment training, mixture of experts, in-context learning, and autonomous agent, are also summarized.

We welcome contributions to this repository to add more resources. Please submit a pull request if you want to contribute!

Table of Contents

table of contents

0 Survey

overview

This repository is primarily based on the following paper:

A Survey of Reasoning with Foundation Models

Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, and Zhenguo Li

If you find this repository helpful, please consider citing:

@article{sun2023survey,
  title={A Survey of Reasoning with Foundation Models},
  author={Sun, Jiankai and Zheng, Chuanyang and Xie, Enze and Liu, Zhengying and Chu, Ruihang and Qiu, Jianing and Xu, Jiaqi and Ding, Mingyu and Li, Hongyang and Geng, Mengzhe and others},
  journal={arXiv preprint arXiv:2312.11562},
  year={2023}
}
relevant surveys

(Back-to-Top)

  • Combating Misinformation in the Age of LLMs: Opportunities and Challenges - [arXiv] [Link]

  • The Rise and Potential of Large Language Model Based Agents: A Survey - [arXiv] [Link]

  • Multimodal Foundation Models: From Specialists to General-Purpose Assistants - [arXiv] [Tutorial]

  • A Survey on Multimodal Large Language Models - [arXiv] [Link]

  • Interactive Natural Language Processing - [arXiv] [Link]

  • A Survey of Large Language Models - [arXiv] [Link]

  • Self-Supervised Multimodal Learning: A Survey - [arXiv] [Link]

  • Large AI Models in Health Informatics: Applications, Challenges, and the Future - [arXiv] [Paper] [Link]

  • Towards Reasoning in Large Language Models: A Survey - [arXiv] [Paper] [Link]

  • Reasoning with Language Model Prompting: A Survey - [arXiv] [Paper] [Link]

  • Awesome Multimodal Reasoning - [Link]

2 Foundation Models

foundation models

(Back-to-Top)

foundation_models

Table of Contents - 2

foundation models (table of contents)

(Back-to-Top)

2.1 Language Foundation Models

LFMs

Foundation Models (Back-to-Top)


2.2 Vision Foundation Models

VFMs

Foundation Models (Back-to-Top)


2.3 Multimodal Foundation Models

MFMs

Foundation Models (Back-to-Top)


2.4 Reasoning Applications

reasoning applications

Foundation Models (Back-to-Top)


3 Reasoning Tasks

reasoning tasks

(Back-to-Top)

Table of Contents - 3

reasoning tasks (table of contents)

3.1 Commonsense Reasoning

commonsense reasoning

Reasoning Tasks (Back-to-Top)


3.1.1 Commonsense Question and Answering (QA)

3.1.2 Physical Commonsense Reasoning

3.1.3 Spatial Commonsense Reasoning

3.1.x Benchmarks, Datasets, and Metrics


3.2 Mathematical Reasoning

mathematical reasoning

Reasoning Tasks (Back-to-Top)


3.2.1 Arithmetic Reasoning

Mathematical Reasoning (Back-to-Top)

3.2.2 Geometry Reasoning

Mathematical Reasoning (Back-to-Top)

3.2.3 Theorem Proving

Mathematical Reasoning (Back-to-Top)

3.2.4 Scientific Reasoning

Mathematical Reasoning (Back-to-Top)

3.2.x Benchmarks, Datasets, and Metrics

Mathematical Reasoning (Back-to-Top)


3.3 Logical Reasoning

logical reasoning

Reasoning Tasks (Back-to-Top)


3.3.1 Propositional Logic

  • 2022/09 | Propositional Reasoning via Neural Transformer Language Models - [Paper]

3.3.2 Predicate Logic

3.3.x Benchmarks, Datasets, and Metrics


3.4 Causal Reasoning

causal reasoning

Reasoning Tasks (Back-to-Top)


3.4.1 Counterfactual Reasoning

3.4.x Benchmarks, Datasets, and Metrics


3.5 Visual Reasoning

visual reasoning

Reasoning Tasks (Back-to-Top)


3.5.1 3D Reasoning

3.5.x Benchmarks, Datasets, and Metrics


3.6 Audio Reasoning

audio reasoning

Reasoning Tasks (Back-to-Top)


3.6.1 Speech

3.6.x Benchmarks, Datasets, and Metrics


3.7 Multimodal Reasoning

multimodal reasoning

Reasoning Tasks (Back-to-Top)


3.7.1 Alignment

3.7.2 Generation

3.7.3 Multimodal Understanding

3.7.x Benchmarks, Datasets, and Metrics


3.8 Agent Reasoning

agent reasoning

Reasoning Tasks (Back-to-Top)


3.8.1 Introspective Reasoning

3.8.2 Extrospective Reasoning

3.8.3 Multi-agent Reasoning

3.8.4 Driving Reasoning

3.8.x Benchmarks, Datasets, and Metrics


3.9 Other Tasks and Applications

other tasks and applications

Reasoning Tasks (Back-to-Top)

3.9.1 Theory of Mind (ToM)

3.9.2 LLMs for Weather Prediction

  • 2022/09 | MetNet-2 | Deep learning for twelve hour precipitation forecasts - [Paper]

  • 2023/07 | Pangu-Weather | Accurate medium-range global weather forecasting with 3D neural networks - [Paper]

3.9.3 Abstract Reasoning

3.9.4 Defeasible Reasoning

3.9.5 Medical Reasoning

  • 2024/01 | CheXagent / CheXinstruct / CheXbench | Chen et al. citations Star
    CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
    [arXiv] [paper] [code] [project] [huggingface]

  • 2024/01 | EchoGPT | Chao et al. citations
    EchoGPT: A Large Language Model for Echocardiography Report Summarization
    [medRxiv] [paper]

  • 2023/10 | GPT4V-Medical-Report | Yan et al. citations Star
    Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
    [arXiv] [paper] [code]

  • 2023/10 | VisionFM | Qiu et al. citations
    VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
    [arXiv] [paper]

  • 2023/09 | Yang et al. citations
    The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
    [arXiv] [paper]

  • 2023/09 | RETFound | Zhou et al., Nature citations Star
    A foundation model for generalizable disease detection from retinal images
    [paper] [code]

  • 2023/08 | ELIXR | Xu et al. citations
    ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
    [arXiv] [paper]

  • 2023/07 | Med-Flamingo | Moor et al. citations Star
    Med-Flamingo: a Multimodal Medical Few-shot Learner
    [arXiv] [paper] [code]

  • 2023/07 | Med-PaLM M | Tu et al. citations Star
    Towards Generalist Biomedical AI
    [arXiv] [paper] [code]

  • 2023/06 | Endo-FM | Wang et al., MICCAI 2023 citations Star
    Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
    [arXiv] [paper] [code]

  • 2023/06 | XrayGPT | Thawkar et al. citations Star
    XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models
    - [arXiv] [paper] [code]

  • 2023/06 | LLaVA-Med | Li et al., NeurIPS 2023 citations Star
    LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
    [arXiv] [paper] [code]

  • 2023/05 | HuatuoGPT | Zhang et al., Findings of EMNLP 2023 citations Star
    HuatuoGPT, Towards Taming Language Model to Be a Doctor
    [arXiv] [paper] [code]

  • 2023/05 | Med-PaLM 2 | Singhal et al. citations
    Towards Expert-Level Medical Question Answering with Large Language Models
    [arXiv] [paper]

  • 2022/12 | Med-PaLM / MultiMedQA / HealthSearchQA | Singhal et al., Nature citations
    Large Language Models Encode Clinical Knowledge
    [arXiv] [paper]

3.9.6 Bioinformatics Reasoning

3.9.7 Long-Chain Reasoning


4 Reasoning Techniques

reasoning techniques

(Back-to-Top)

Table of Contents - 4

reasoning techniques (table of contents)

4.1 Pre-Training

pre-training

Reasoning Techniques (Back-to-Top)

4.1.1 Data

a. Data - Text
b. Data - Image
c. Data - Multimodality

4.1.2 Network Architecture

a. Encoder-Decoder
b. Decoder-Only
c. CLIP Variants
d. Others

4.2 Fine-Tuning

fine-tuning

Reasoning Techniques (Back-to-Top)

4.2.1 Data

4.2.2 Parameter-Efficient Fine-tuning

a. Adapter Tuning
b. Low-Rank Adaptation
c. Prompt Tuning
d. Partial Parameter Tuning
e. Mixture-of-Modality Adaption

4.3 Alignment Training

alignment training

Reasoning Techniques (Back-to-Top)

4.3.1 Data

a. Data - Human
b. Data - Synthesis

4.3.2 Training Pipeline

a. Online Human Preference Training
b. Offline Human Preference Training

4.4 Mixture of Experts (MoE)

mixture of experts

Reasoning Techniques (Back-to-Top)


4.5 In-Context Learning

in-context learning

Reasoning Techniques (Back-to-Top)


4.5.1 Demonstration Example Selection

a. Prior-Knowledge Approach
b. Retrieval Approach

4.5.2 Chain-of-Thought

a. Zero-Shot CoT
b. Few-Shot CoT
c. Multiple Paths Aggregation

4.5.3 Multi-Round Prompting

a. Learned Refiners
b. Prompted Refiners

4.6 Autonomous Agent

autonomous agent

Reasoning Techniques (Back-to-Top)