Tree of Thoughts: Deliberate Problem Solving with Large Language Models Shunyu Yao Princeton University Dian Yu Google DeepMind Jeffrey Zhao Google DeepMind Izhak Shafran Google DeepMind Thomas L. Griffiths Princeton University Yuan Cao Google DeepMind Karthik Narasimhan Princeton University Abstract Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, “Tree of Thoughts” (ToT), which generalizes over the popular “Chain of Thought” approach to prompting language models, and enables exploration over coherent units of text (“thoughts”) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models’ problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts:https://github.com/ysymyth/tree-of-thought-llm. 1Introduction Originally designed to generate text, scaled-up versions of language models (LMs) such as GPT [22, 23,1,20] and PaLM [5] have been shown to be increasingly capable of performing an ever wider range of tasks requiring mathematical, symbolic, commonsense, and knowledge reasoning. It is perhaps surprising that underlying all this progress is still the original autoregressive mechanism for generating text, which makes token-level decisions one by one and in a left-to-right fashion. Is such a simple mechanism sufficient for a LM to be built toward a general problem solver? If not, what problems would challenge the current paradigm, and what should be alternative mechanisms? The literature on human cognition provides some clues to answer these questions. Research on “dual process” models suggests that people have two modes in which they engage with decisions – a fast, automatic, unconscious mode (“System 1”) and a slow, deliberate, conscious mode (“System 2”) [27,28,13,12]. These two modes have previously been connected to a variety of mathematical models used in machine learning. For example, research on reinforcement learning in humans and other animals has explored the circumstances under which they engage in associative “model free” learning or more deliberative “model based” planning [6]. The simple associative token-level choices of LMs are also reminiscent of “System 1”, and thus might benefit from augmentation by a more deliberate “System 2” planning process that (1) maintains and explores diverse alternatives for current Preprint. Under review. arXiv:2305.10601v1 [cs.CL] 17 May 2023
100%
Scan to connect with one of our mobile apps
Coinbase Wallet app
Connect with your self-custody wallet
Coinbase app
Connect with your Coinbase account
Open Coinbase Wallet app
Tap Scan
Or try the Coinbase Wallet browser extension
Connect with dapps with just one click on your desktop browser
Add an additional layer of security by using a supported Ledger hardware wallet