Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Artificial Intelligence

Authors

Shunyu Yao,Dian Yu

Jeffrey Zhao,Izhak Shafran,Thomas L. Griffiths,Yuan Cao

+4 authors

,Karthik Narasimhan

Published

May 19, 2023

DOI

10.48550/arXiv.2305.10601

License

cc-by

Save

Tip

Document

Submit new version

Download

Flag content

Tip

Save

Document

Submit new version

Download

Flag content

Tree of Thoughts: Deliberate Problem Solving
with Large Language Models
Shunyu Yao
Princeton University
Dian Yu
Google DeepMind
Jeffrey Zhao
Google DeepMind
Izhak Shafran
Google DeepMind
Thomas L. Grifﬁths
Princeton University
Yuan Cao
Google DeepMind
Karthik Narasimhan
Princeton University
Abstract
Language models are increasingly being deployed for general problem solving
across a wide range of tasks, but are still conﬁned to token-level, left-to-right
decision-making processes during inference. This means they can fall short in
tasks that require exploration, strategic lookahead, or where initial decisions play
a pivotal role. To surmount these challenges, we introduce a new framework for
language model inference, “Tree of Thoughts” (ToT), which generalizes over the
popular “Chain of Thought” approach to prompting language models, and enables
exploration over coherent units of text (“thoughts”) that serve as intermediate steps
toward problem solving. ToT allows LMs to perform deliberate decision making
by considering multiple different reasoning paths and self-evaluating choices to
decide the next course of action, as well as looking ahead or backtracking when
necessary to make global choices. Our experiments show that ToT signiﬁcantly
enhances language models’ problem-solving abilities on three novel tasks requiring
non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords.
For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only
solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all
prompts: https://github.com/ysymyth/tree-of-thought-llm.
1 Introduction
Originally designed to generate text, scaled-up versions of language models (LMs) such as GPT [ 22 ,
23 , 1, 20] and PaLM [5] have been shown to be increasingly capable of performing an ever wider
range of tasks requiring mathematical, symbolic, commonsense, and knowledge reasoning. It is
perhaps surprising that underlying all this progress is still the original autoregressive mechanism for
generating text, which makes token-level decisions one by one and in a left-to-right fashion. Is such
a simple mechanism sufﬁcient for a LM to be built toward a general problem solver? If not, what
problems would challenge the current paradigm, and what should be alternative mechanisms?
The literature on human cognition provides some clues to answer these questions. Research on “dual
process” models suggests that people have two modes in which they engage with decisions – a fast,
automatic, unconscious mode (“System 1”) and a slow, deliberate, conscious mode (“System 2”)
[ 27 , 28, 13 , 12 ]. These two modes have previously been connected to a variety of mathematical
models used in machine learning. For example, research on reinforcement learning in humans and
other animals has explored the circumstances under which they engage in associative “model free”
learning or more deliberative “model based” planning [6 ]. The simple associative token-level choices
of LMs are also reminiscent of “System 1”, and thus might beneﬁt from augmentation by a more
deliberate “System 2” planning process that (1) maintains and explores diverse alternatives for current
Preprint. Under review.
arXiv:2305.10601v1 [cs.CL] 17 May 2023

100%

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Scan to connect with one of our mobile apps

Coinbase Wallet app

Coinbase app

Or try the Coinbase Wallet browser extension