Paper Reading ๐Ÿ“œ/Natural Language Processing

Tree of Thoughts: Deliberate Problem Solving with Large Language Models ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

2023. 6. 8. 22:32

The overview of this paper

 LM๋“ค์€ ์ ์  ๊ด‘๋ฒ”์œ„ํ•œ task์— ์ ์šฉ๋˜๊ณ  ์žˆ๋Š”๋ฐ ์•„์ง token-level left-to-right decision-making ํ”„๋กœ์„ธ์Šค์— ๊ตญํ•œ๋˜์–ด ์žˆ๋‹ค. ์ด๊ฒƒ์€ ํƒ๊ตฌ์™€ ์ „๋žต์ ์ธ ๋ฐฉ๋ฒ•์„ ํ•„์š”๋กœ ํ•˜๋Š” task์—์„œ๋Š” ๋ชจ๋ธ์ด ํ•œ๊ณ„๋ฅผ ๊ฒช๊ฑฐ๋‚˜ ์ดˆ๊ธฐ์˜ ๊ฒฐ์ •์ด ์ค‘์‹ฌ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด LM ์ถ”๋ก ์„ ์œ„ํ•œ ์ƒˆ ํ”„๋ ˆ์ž„์›Œํฌ์ธ 'Tree of Thoughts'(ToT)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ToT๋Š” CoT๋ฅผ ์ผ๋ฐ˜ํ™”ํ•˜๊ณ  ๋ฌธ์ œ ํ•ด๊ฒฐ์— ๋Œ€ํ•œ ์ค‘๊ฐ„ ์Šคํ…์œผ๋กœ ์—ฌ๊ฒจ์ง€๋Š” ์ผ๊ด€์„ฑ ์žˆ๋Š” ํ…์ŠคํŠธ์˜ ์œ ๋‹›์— ๋Œ€ํ•ด ํƒ๊ตฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด ์ค€๋‹ค. ToT๋Š” ์—ฌ๋Ÿฌ ์„œ๋กœ ๋‹ค๋ฅธ reasoning path๋ฅผ ๊ณ ๋ คํ•˜๊ณ  ๋‹ค์Œ ํ–‰๋™์˜ ์ฝ”์Šค๋ฅผ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด self-evaluating choice๋ฅผ ๊ณ ๋ คํ•จ์œผ๋กœ์จ ์ •๊ตํ•œ decision making์„ LM์ด ์ˆ˜ํ–‰ํ•˜๊ฒŒ ํ•œ๋‹ค.

 

 ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒ๋‹นํ•œ planning๊ณผ search๋ฅผ ํ•„์š”๋กœ ํ•˜๋Š” 3๊ฐœ์˜ ์ƒˆ๋กœ์šด task์— ๋Œ€ํ•ด ์‹คํ—˜์„ ์ง„ํ–‰ํ•œ๋‹ค: Game of 24, Creative Writing, Mini Crosswords. GPT-4 with CoT๋Š” 4% ์ •๋„ ๋ฌธ์ œ ํ•ด๊ฒฐ์ด ๊ฐ€๋Šฅํ–ˆ์ง€๋งŒ, ToT๋Š” ์ด๋ฅผ 74%๋กœ ๋Œ์–ด์˜ฌ๋ ธ๋‹ค.

 

 

Table of Contents

1. Introduction

2. Background

3. Tree of Thoughts: Deliberate Problem Solving with LM

4. Experiments

5. Discussion

 

 

1. Introduction

 ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด์— ๋””์ž์ธ๋œ ๋ชจ๋ธ๋“ค์€ GPT์™€ PaLM๊ณผ ๊ฐ™์€ LM์˜ scaled-up ๋ฒ„์ „๋“ค์ด๋‹ค. ํ•˜์ง€๋งŒ ์ด๋“ค์˜ ๊ทผ๋ณธ์ ์ธ progress๋Š” ๊ธฐ์กด์˜ auto-regressive ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด๋‹ค. ์ด ๋ฐฉ์‹์€ left-to-right ๋ฐฉ์‹์œผ๋กœ ํ•˜๋‚˜ํ•˜๋‚˜์”ฉ token-level decision์„ ๋งŒ๋“ค์–ด๊ฐ€๋Š” ๋ฐฉ์‹์ด๋‹ค. ํ•˜์ง€๋งŒ ์ด ๊ฐ„๋‹จํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด general problem solver LM์œผ๋กœ ์ถฉ๋ถ„ํ• ๊นŒ? ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด ์–ด๋–ค ๋ฌธ์ œ๊ฐ€ ํ˜„์žฌ ํŒจ๋Ÿฌ๋‹ค์ž„์— ๋„์ „ํ•˜๊ณ  ๋Œ€์ฒด ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ๋ฌด์—‡์ผ๊นŒ?

 

 ์‚ฌ๋žŒ์˜ ์ธ์ง€ ๊ณผ์ •์€ ์ด ์งˆ๋ฌธ์— ๋Œ€๋‹ตํ•˜๊ธฐ ์œ„ํ•œ ๋ช‡ ๊ฐ€์ง€ ์‹ค๋งˆ๋ฆฌ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. 'dual process'์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋Š” ์‚ฌ๋žŒ์ด ๊ฒฐ์ •์„ ๋‚ด๋ฆด ๋•Œ ๋‘ ๊ฐ€์ง€ ๋ชจ๋“œ๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ์ œ์•ˆํ•œ๋‹ค.

 

  • System 1: fast & automatic. ๋ฌด์˜์‹์  ๋ชจ๋“œ
  • System 2: slow & deliberate(=๊ณ„ํš์ ์ธ). ์˜์‹์  ๋ชจ๋“œ

 

 LM์˜ ๊ฐ„๋‹จํ•œ ์—ฐํ•ฉ์  token-level์˜ ์„ ํƒ์€ 'System 1'์„ ์—ฐ์ƒ์‹œํ‚ค๊ณ , ๋”ฐ๋ผ์„œ ๋”์šฑ ๊ณ„ํš์ ์ธ 'System 2' planning ํ”„๋กœ์„ธ์Šค์— ์˜ํ•œ augmentation์œผ๋กœ๋ถ€ํ„ฐ ์ด์ต์„ ์–ป๋Š”๋‹ค. ์—ฌ๊ธฐ์„œ 'Systme 2'์˜ planning ํ”„๋กœ์„ธ์Šค๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

  1. ํ•˜๋‚˜๋งŒ์„ ๋ฝ‘๋Š” ๋Œ€์‹ ์— ํ˜„์žฌ ์„ ํƒ์— ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ๋Œ€์•ˆ์„ ์œ ์ง€ & ํƒ๊ตฌ
  2. ํ˜„์žฌ ์ƒํƒœ๋ฅผ ํ‰๊ฐ€ํ•˜๊ณ  ์ ๊ทน์ ์œผ๋กœ ๋‚˜์•„๊ฐˆ์ง€ ๋˜๋Š” ๋”์šฑ ์ „์—ญ์ ์ธ ๊ฒฐ์ •์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด backtrack ํ•  ์ง€ ์ •ํ•จ

 

 ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ planning ํ”„๋กœ์„ธ์Šค๋ฅผ ๋””์ž์ธํ•˜๊ธฐ ์œ„ํ•ด ํŠธ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•œ Tree of Thoughts(ToT)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ToT๋Š” ์ ๊ทน์ ์œผ๋กœ tree of thought๋ฅผ ์œ ์ง€ํ•˜๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ thought๋Š” ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•œ ์ค‘๊ฐ„ ์Šคํ…์œผ๋กœ ์—ฌ๊ฒจ์ง€๋Š” ์ผ๊ด€๋œ language sequence์ด๋‹ค(ํ‘œ 1). ์ด๋Ÿฌํ•œ high-level semantic unit์€ LM์ด ์„œ๋กœ ๋‹ค๋ฅธ thought์— ๋Œ€ํ•ด ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์„ ์ž์—ฐ์–ด๋กœ ๋˜์–ด ์žˆ๋Š” ๊ณ„ํš์ ์ธ ์ถ”๋ก  ํ”„๋กœ์„ธ์Šค๋ฅผ ํ†ตํ•ด self-evaluate ํ•˜๊ฒŒ ํ—ˆ๋ฝํ•ด ์ค€๋‹ค(๊ทธ๋ฆผ 2, 4, 6). ๋งˆ์ง€๋ง‰์œผ๋กœ ๋‹ค์–‘ํ•œ thought๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์–ธ์–ด ๊ธฐ๋ฐ˜ ๋Šฅ๋ ฅ๊ณผ BFS์™€ DFS ๊ฐ™์€ ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋ฌถ์—ˆ๋‹ค. ์ด๊ฒƒ์€ lookahead์™€ backtracking์„ ์‚ฌ์šฉํ•˜์—ฌ tree of thoughts์˜ ์ฒด๊ณ„์  ํƒ๊ตฌ๋ฅผ ํ—ˆ๋ฝํ•ด ์ค€๋‹ค.

 

 ์‹คํ—˜ ์ชฝ์œผ๋กœ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ LM๋“ค๋„ ์–ด๋ ค์›€์„ ๊ฒช๊ณ  ์žˆ๋Š” 3๊ฐœ์˜ ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ ์ œ์•ˆํ•˜์˜€๋‹ค: Game of 24, Creative Writing, Crosswords(ํ‘œ 1). ์ด task๋“ค์€ ์—ฐ์—ญ์ , ์ˆ˜ํ•™์ , ๊ธฐ๋ณธ ์ƒ์‹, ์–ดํœ˜ ์ถ”๋ก  ๋Šฅ๋ ฅ, ์ฒด๊ณ„์  ๊ณ„ํš ์„ธ์šฐ๊ธฐ ๋˜๋Š” ๊ฒ€์ƒ‰์„ ํ•„์š”๋กœ ํ•œ๋‹ค. ์‹คํ—˜์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋Š” ๋’ค์—์„œ ๋” ์ž์„ธํ•˜๊ฒŒ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.

 

 

2. Background

 ToT์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ธฐ ์ „์— ๊ธฐ์กด์—๋Š” ์–ด๋–ค prompting ๋ฐฉ๋ฒ•๋“ค์ด ์žˆ์—ˆ๋Š”์ง€์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.

 

Input-output (IO) prompting.  ์ž…๋ ฅ $x$๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ $y$๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์ด๋‹ค: $y ~ p_{\theta}(y|prompt_{IO}(x))$. ์—ฌ๊ธฐ์„œ $prompt_{IO}(x)$๋Š” task instruction๊ณผ ์ž…๋ ฅ $x$๋ฅผ ๋ฌถ์€ ๊ฒƒ์œผ๋กœ ๋ช‡ ๊ฐœ์˜ few-shot input-output example์ด ๋“ค์–ด๊ฐ€๊ธฐ๋„ ํ•œ๋‹ค. 

 

Chain-of-thought (CoT) prompting.  input $x$์™€ output $y$์˜ ๋งคํ•‘์ด ์‚ฌ์†Œํ•˜์ง€ ์•Š๋‹ค๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ด prompting ๋ฐฉ์‹์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” $x$์™€ $y$๋ฅผ ์ž‡๊ธฐ ์œ„ํ•ด ์ƒ๊ฐ์˜ ๊ณ ๋ฆฌ(chain of thought) $z_1, \cdots, z_n$์„ ์†Œ๊ฐœํ–ˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์—ฌ๊ธฐ์„œ ๊ฐ $z_i$๋Š” ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•ด ์˜๋ฏธ ์žˆ๋Š” ์ค‘๊ฐ„ ์Šคํ…์œผ๋กœ ์—ฌ๊ฒจ์ง€๋Š” ์ผ๊ด€๋œ language sequence์ด๋‹ค. 

 

Self-consistency with CoT(CoT-SC).  CoT method์˜ ensemble ๋ฐฉ์‹์ธ ๊ฒฉ์œผ๋กœ $k$ ๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ chain of thought๋ฅผ ์ƒ˜ํ”Œ๋งํ•œ๋‹ค. ๊ทธ๋‹ค์Œ์— ๊ฐ€์žฅ ๋นˆ๋ฒˆํ•˜๊ฒŒ ์ถœ๋ ฅ๋œ output์„ ์ตœ์ข… output์œผ๋กœ ๋ฐ˜ํ™˜ํ•œ๋‹ค. CoT-SC๋Š” CoT๋ฅผ ๊ฐœ์„ ์‹œํ‚จ method๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์„œ๋กœ ๋‹ค๋ฅธ output decision์ด ์žˆ์„ ๊ฒƒ์ด๋ผ๋Š” ์•„์ด๋””์–ด์—์„œ ์ฐฉ์•ˆ๋˜์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋” ํ’๋ถ€ํ•œ thought ์„ธํŠธ๋ฅผ ํƒ๊ตฌํ•จ์œผ๋กœ์จ ๋”์šฑ ์‹ ๋ขฐ๋„ ์žˆ๋Š” output decision์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜์˜€๋‹ค. 

 

 ๋‹ค์Œ์˜ ๊ทธ๋ฆผ 1์„ ๋ณด๋ฉด ๋” ์ดํ•ด๊ฐ€ ์ž˜ ๋  ๊ฒƒ์ด๋‹ค.

 

๊ทธ๋ฆผ 1. ์—ฌ๋Ÿฌ prompting method์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐฉ์‹

 

 

 

3. Tree of Thoughts: Deliberate Problem Solving with LM

 ์‚ฌ๋žŒ์€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๋•Œ ์กฐํ•ฉ์  problem space์—์„œ search๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. tree์˜ ๋…ธ๋“œ๋Š” ๋ถ€๋ถ„์  ์†”๋ฃจ์…˜์œผ๋กœ ํ‘œํ˜„๋˜๊ณ , branch๋Š” ์ด๋“ค์„ ์กฐ์ •ํ•˜๋Š” ์—ฐ์‚ฐ์ž์— ํ•ด๋‹น๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ด€์ ์€ ๊ธฐ์กด์— LLM์ด ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ๋ฌธ์ œ์ ์„ ๊ฐ•์กฐํ•œ๋‹ค.

 

  1. Locally, thought process ๊ฐ„์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์—ฐ๊ฒฐ์„ ํƒํ—˜ํ•˜์ง€ ๋ชปํ•œ๋‹ค. โ†’ tree์˜ branch
  2. Globally, ์–ด๋– ํ•œ planning lookahead๊ณผ backtracking๋„ ํฌํ•จํ•˜์ง€ ๋ชปํ•œ๋‹ค. โ†’ heuristic-guided search

 

 ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” LM์ด thought์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ reasoning path๋ฅผ ํƒํ—˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ๋ฝํ•ด ์ฃผ๋Š” Tree of Thought(ToT)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค(๊ทธ๋ฆผ 1(c)). ToT๋Š” ๋ชจ๋“  ๋ฌธ์ œ๋ฅผ tree์— ๋Œ€ํ•œ ๊ฒ€์ƒ‰์œผ๋กœ ์—ฌ๊ธด๋‹ค. ์—ฌ๊ธฐ์„œ ๊ฐ๊ฐ์˜ ๋…ธ๋“œ๋Š” state $s = [x, z_{1\cdots i}]$๋กœ thought์˜ input๊ณผ sequence์™€ ํ•จ๊ป˜ ๋ถ€๋ถ„์  ์†”๋ฃจ์…˜์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ToT์˜ ๊ตฌ์ฒด์ ์ธ ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ์˜ 4๊ฐ€์ง€ ์งˆ๋ฌธ์— ๋Œ€๋‹ตํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

1. Thought decomposition.  CoT๋Š” ๋ถ„๋ช…ํ•œ ๋ถ„ํ•ด ์—†์ด ์ผ๊ด€์ ์œผ๋กœ thought๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๋ฐ˜๋ฉด์—, ToT๋Š” thought์˜ ์ค‘๊ฐ„ ์Šคํ…์„ ๋””์ž์ธํ•˜๊ณ  ๋ถ„ํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋ฌธ์ œ์˜ ํŠน์„ฑ์„ ํ™œ์šฉํ•œ๋‹ค. ํ‘œ 1์—์„œ ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, ์„œ๋กœ ๋‹ค๋ฅธ ๋ฌธ์ œ์— ๋”ฐ๋ผ, thought์˜ ํ˜•ํƒœ๋Š” ๋ณ€ํ™”ํ•œ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ thought๋Š” LM์ด ์œ ๋งํ•˜๊ณ  ๋‹ค์–‘ํ•œ sample์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ์ถฉ๋ถ„ํžˆ ์ž‘์•„์•ผ ํ•˜๊ณ  LM์ด ์ด๋“ค์˜ ์˜ˆ์ธก์„ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ๋Š” ์ปค์•ผ ํ•œ๋‹ค.

 

2. Thought generator $G(p_{\theta}, s, k)$.  tree state $s = [x, z_{1\cdots i}]$๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ๋‹ค์Œ thought step์— ๋Œ€ํ•ด $k$ ๊ฐœ์˜ ํ›„๋ณด๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ๋‘ ๊ฐ€์ง€ ์ „๋žต์„ ๊ณ ๋ คํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค:

 

  • Sample: CoT prompt๋กœ๋ถ€ํ„ฐ thought๋ฅผ ์ƒ˜ํ”Œ๋งํ•จ. ์ด ์ž‘์—…์€ thought space๊ฐ€ ํ’๋ถ€ํ•  ๋•Œ ๋” ์ž˜ ๋จ.
  • Propose: 'propose prompt'๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ˆœ์ฐจ์ ์œผ๋กœ thought ์ œ์•ˆํ•จ. ์ด ์ž‘์—…์€ thought process๊ฐ€ ๋”์šฑ ์ œํ•œ๋  ๋•Œ ์ž˜ ๋ผ์„œ, ๋˜‘๊ฐ™์€ context์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ  thought๋ฅผ ์ œ์•ˆํ•˜๋Š” ๊ฒƒ์€ ๋ณต์ œ๋ฅผ ํ”ผํ•˜๊ฒŒ ํ•ด ์คŒ.

 

3. State evaluator $V(p_{\theta|, S)$. ์„œ๋กœ ๋‹ค๋ฅธ state์˜ ๋ณ€๊ฒฝ์ด ์ฃผ์–ด์ง€๋ฉด state evaluator๋Š” progress๋ฅผ ํ‰๊ฐ€ํ•œ๋‹ค. ํœด๋ฆฌ์Šคํ‹ฑ์€ ๊ฒ€์ƒ‰ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์ด์—ˆ์ง€๋งŒ, ์ด๋“ค์€ ์ผ๋ฐ˜์ ์œผ๋กœ ํ”„๋กœ๊ทธ๋ž˜๋ฐ๋˜์–ด ์žˆ๊ฑฐ๋‚˜ ํ•™์Šต๋˜์–ด ์žˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ๊ณ„ํš์ ์œผ๋กœ state๋ฅผ ์ถ”๋ก ํ•˜๊ธฐ ์œ„ํ•ด LM์„ ์‚ฌ์šฉํ•˜๋Š” ์„ธ ๋ฒˆ์งธ ๋Œ€์•ˆ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ ์šฉ ์‹œ์— ์ด๋Ÿฌํ•œ ๊ณ„ํš์  ํœด๋ฆฌ์Šคํ‹ฑ์€ ํ”„๋กœ๊ทธ๋ž˜๋ฐ๋œ ๊ทœ์น™๋ณด๋‹ค ์œ ์—ฐํ•˜๊ณ  ํ•™์Šต๋œ ๋ชจ๋ธ๋ณด๋‹ค ์ƒ˜ํ”Œ ํšจ์œจ์ ์ด๋‹ค. thought generator์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๋…ผ๋ฌธ์—์„œ๋Š” state๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ๋˜๋Š” ํ•จ๊ป˜ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ๋‘ ๊ฐ€์ง€ ์ „๋žต์„ ์ œ์•ˆํ•˜์˜€๋‹ค:

 

  • Value: ๊ฐ state๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ - ์—ฌ๊ธฐ์„œ value prompt๋Š” ์Šค์นผ๋ผ ๊ฐ’(1-10) ๋˜๋Š” ํœด๋ฆฌ์Šคํ‹ฑ ํ•˜๊ฒŒ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜๋  ์ˆ˜ ์žˆ๋Š” ๋ถ„๋ฅ˜๋ฅผ ์ƒ์„ฑํ•ด ๋‚ด๊ธฐ ์œ„ํ•ด state $s$์— ๋Œ€ํ•ด ์ถ”๋ก ํ•œ๋‹ค. ์ด ์ž‘์—…์—์„œ๋Š” ์†Œ์ˆ˜์˜ lookahead ์‹œ๋ฎฌ๋ ˆ์ด์…˜ + commonsense๋ฅผ ํ†ตํ•ด ํ‰๊ฐ€๋ฅผ ํƒ๊ตฌํ•˜์˜€๋‹ค. ์—ฌ๊ธฐ์„œ ์ „์ž(์†Œ์ˆ˜์˜ lookahead ์‹œ๋ฎฌ๋ ˆ์ด์…˜)๋Š” 'good' state๋ฅผ ์ด‰์ง„ํ•˜๊ณ , ํ›„์ž(commonsense)๋Š” 'bad' state๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์„ ๋„์™€์ค€๋‹ค. ์ด๋Ÿฌํ•œ valuation์€ ์™„๋ฒฝํ•  ํ•„์š”๋Š” ์—†๊ณ , ๊ทผ์‚ฌ ์ •๋„๋งŒ ๋˜๋ฉด ๋œ๋‹ค.
  • Vote: state ๊ฐ„์— - ์—ฌ๊ธฐ์„œ 'good' state๋Š” vote prompt์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ state $S$๋ฅผ ๊ณ„ํš์ ์œผ๋กœ ๋น„๊ตํ•œ ๊ฒƒ์— ๊ธฐ๋ฐ˜ํ•ด์„œ ํˆฌํ‘œ๋œ๋‹ค. problem success๋ฅผ ์ง์ ‘์ ์ธ ๊ฐ’์œผ๋กœ ๋‚˜ํƒ€๋‚ด๊ธฐ ํž˜๋“ค ๋•Œ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ๋ถ€๋ถ„์  ์†”๋ฃจ์…˜์„ ๋น„๊ตํ•˜๊ณ  ๊ฐ€์žฅ ์œ ๋งํ•œ ๊ฒƒ์— ํˆฌํ‘œ๋ฅผ ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•œ๋‹ค. 

 

4. Search algorithm.  ๋…ผ๋ฌธ์—์„œ๋Š” ๋น„๊ต์  ๊ฐ„๋‹จํ•œ 2๊ฐœ์˜ ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•ด ํƒ๊ตฌํ•˜๊ณ  ๋”์šฑ ๋ฐœ์ „๋œ ํ•˜๋‚˜๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค:

 

  • Breadth-first search(BFS) (Alrorithm 1)์€ ์Šคํ… ๋‹น ๊ฐ€์žฅ ์œ ๋งํ•œ $b$๊ฐœ์˜ state ์„ธํŠธ๋ฅผ ์œ ์ง€์‹œํ‚จ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ Game of 24์— ์‚ฌ์šฉ๋˜๊ณ , ์—ฌ๊ธฐ์„œ tree depth๋Š” ์ œํ•œ๋˜์–ด ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ดˆ๊ธฐ์˜ thought step์ด ํ‰๊ฐ€๋  ์ˆ˜ ์žˆ๊ณ , small set์œผ๋กœ ๊ฐ€์ง€์น˜๊ธฐ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค.
  • Depth-first search(DFS) (Algorithm 2)์€ final output์— ๋„๋‹ฌํ•˜๊ธฐ ์ „๊นŒ์ง€๋‚˜ ํ˜„์žฌ state $s$์—์„œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๋ณด์ผ ๋•Œ ๊ฐ€์žฅ ์œ ๋งํ•œ state๋ฅผ ํƒ๊ตฌํ•ด ๋‚˜๊ฐ„๋‹ค. ํ›„์ž์˜ ๊ฒฝ์šฐ์— $s$์˜ ํ•˜์œ„ ํŠธ๋ฆฌ๋Š” ์ฐฉ์ทจ๋ฅผ ์œ„ํ•ด ํƒ์ƒ‰์„ ๊ตํ™˜ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์ง€์น˜๊ธฐ๋œ๋‹ค. 2๊ฐœ์˜ ๊ฒฝ์šฐ์—์„œ ํƒ๊ตฌ๋ฅผ ๊ณ„์†ํ•˜๊ธฐ ์œ„ํ•ด DFS๋Š” $s$์˜ parent state๋ฅผ backtrack ํ•œ๋‹ค.

 

 ToT๋Š” general problem-solving method๋กœ์จ ์—ฌ๋Ÿฌ ์ด์ต์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

 

  1. Generality: IO, CoT, CoT-SC, self-refinement๋Š” ToT์˜ ์ŠคํŽ˜์…œ ์ผ€์ด์Šค๋กœ ๋ณผ ์ˆ˜ ์žˆ์Œ
  2. Modularity: base LM ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ถ„ํ•ด, ์ƒ์„ฑ, ํ‰๊ฐ€, ๊ฒ€์ƒ‰ ํ”„๋กœ์‹œ์ €๋Š” ๋ชจ๋‘ ๋…๋ฆฝ์ ์œผ๋กœ ํ‰๊ฐ€๋  ์ˆ˜ ์žˆ์Œ
  3. Adaptability: ์„œ๋กœ ๋‹ค๋ฅธ ๋ฌธ์ œ ํŠน์„ฑ, LM ๋Šฅ๋ ฅ, ์ž์› ์ œ์•ฝ ๋“ฑ์„ ์ˆ˜์šฉํ•  ์ˆ˜ ์žˆ์Œ
  4. Convenience: ์ถ”๊ฐ€์ ์ธ ํ•™์Šต์ด ํ•„์š”ํ•˜์ง€ ์•Š๊ณ , ๋‹จ์ง€ PLM์ด๋ฉด ์ถฉ๋ถ„ํ•จ

 

4. Experiments

 ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ SoTA LM์œผ๋กœ๋ถ€ํ„ฐ ์ƒ˜ํ”Œ๋ง์„ ํ•  ๋•Œ์—๋„ ์–ด๋ ค์šด 3๊ฐœ์˜ task๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค: Game of 24, Creative Writing, 5x5 Crosswords. 

 

ํ‘œ 1. Task overview. Input, output, thought example์€ ํŒŒ๋ž€์ƒ‰

 

4-1. Game of 24

 

 Game of 24๋Š” 4๊ฐœ์˜ ์ˆซ์ž๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์‚ฌ์น™์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•˜์—ฌ 24๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š”์ง€ ์—†๋Š”์ง€ ํŒ๋‹จํ•˜๋Š” task์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด '4 9 10 13'์ด ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง€๋ฉด, output์€ '(10 - 4) * (13 - 9) = 24'์ธ ๊ฒƒ์ด๋‹ค.

 

๊ทธ๋ฆผ 2. Game of 24์—์„œ์˜ ToT. LM์€ (a) thought ์ƒ์„ฑ๊ณผ (b) valuation์„ ์œ„ํ•ด prompt๋œ๋‹ค

 

 Baselines.  ๋…ผ๋ฌธ์—์„œ๋Š” 5๊ฐœ์˜ in-context example๊ณผ ํ•จ๊ป˜ input-output(IO) prompt๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. chain-of-thought(CoT) prompting์„ ์œ„ํ•ด ๊ฐ input-output ์Œ์„ 3๊ฐœ์˜ ์ค‘๊ฐ„ ๋ฐฉ์ •์‹์œผ๋กœ augment ํ•˜๊ณ , ๊ฐ ๋ฐฉ์ •์‹์€ ๋‚˜๋จธ์ง€ 2๊ฐœ์˜ ์ˆซ์ž์— ๋Œ€ํ•ด ์ž‘๋™ํ•˜๊ฒŒ ํ•˜์˜€๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด input์œผ๋กœ ' 4 9 10 13'์ด ์ฃผ์–ด์ง€๋ฉด, thought๋Š” "13 - 9 = 4(๋‚จ์€ ์ˆ˜: 4, 4, 10); 10 - 4 = 6(๋‚จ์€ ์ˆ˜: 4, 6); 4 * 6 = 24(๋‚จ์€ ์ˆ˜: 24)"๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ CoT self-consistency baseline์™€ IO sample ์œ„์—์„œ ์ตœ๋Œ€ 10๋ฒˆ์˜ ๋ฐ˜๋ณต์„ ํ•˜๋Š” iterative-refine approach๋ฅผ ๊ณ ๋ คํ•˜์˜€๋‹ค. ๊ฐ ๋ฐ˜๋ณต์—์„œ LM์€ ์ถœ๋ ฅ์ด ์˜ฌ๋ฐ”๋ฅด์ง€ ์•Š์€ ๊ฒฝ์šฐ "์‹ค์ˆ˜๋ฅผ ๋ฐ˜์„ฑํ•˜๊ณ  ๊ฐœ์„ ๋œ ๋‹ต๋ณ€์„ ์ƒ์„ฑ"ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋“  ์ด์ „ ๊ธฐ๋ก์„ ๋งž์ถ˜๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ฐฉ์ •์‹ ์ •ํ™•์„ฑ์— ๋Œ€ํ•œ groundtruth ํ”ผ๋“œ๋ฐฑ ์‹ ํ˜ธ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

ToT Setup.  Game of 24๋ฅผ ToT์— ๋งž์ถ”๊ธฐ ์œ„ํ•ด, thought๋ฅผ ๊ฐ๊ฐ์ด ์ค‘๊ฐ„ ๋ฐฉ์ •์‹์ธ 3๊ฐœ์˜ step์œผ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ๊ฒƒ์ด ์ž์—ฐ์Šค๋Ÿฝ๋‹ค. ๊ทธ๋ฆผ 2(a)์—์„œ ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ ํŠธ๋ฆฌ์˜ ๋…ธ๋“œ์—์„œ ๋‚จ์•„์žˆ๋Š” ์ˆซ์ž๋ฅผ ์š”๊ตฌํ•˜๊ณ  LM์ด ๊ฐ€๋Šฅํ•œ ๋‹ค์Œ ์Šคํ…์„ ์ œ์•ˆํ•˜๋„๋ก ์ด‰๊ตฌํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ToT์—์„œ BFS๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋Š”๋ฐ, ๊ฐ ์Šคํ…์—์„œ ์ตœ๊ณ ์˜ $b=5$ ํ›„๋ณด๋ฅผ ์œ ์ง€ํ•˜์˜€๋‹ค. ๊ทธ๋ฆผ 2(b)์—์„œ ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ToT์—์„œ ๊ณ„ํš์ ์ธ BFS๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ thought ํ›„๋ณด๊ฐ€ 24์— ๋„๋‹ฌํ•˜๋Š”์ง€์— ๋”ฐ๋ผ ํ‰๊ฐ€ํ•˜๋„๋ก LM์„ ์ด‰๊ตฌํ•˜์˜€๋‹ค. ๋ชฉํ‘œ๋Š” ๋ช‡ ๊ฐœ์˜ lookahead ์‹คํ—˜ ๊ฐ„์— ํŒ๋‹จ๋  ์ˆ˜ ์žˆ๋Š” ์•Œ๋งž์€ ๋ถ€๋ถ„์  solution์„ ์ด‰์ง„ํ•˜๊ณ  'too big/small' commonsense์— ๊ธฐ๋ฐ˜ํ•œ ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ถ€๋ถ„์  solution์„ ์ œ๊ฑฐํ•˜๊ณ  'maybe'๋Š” ์œ ์ง€์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ thought์— ๋Œ€ํ•ด 3๋ฒˆ value๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜์˜€๋‹ค.

 

Results.  ํ‘œ 2์—์„œ ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ IO, CoT, CoT-SC prompting method๋Š” task์—์„œ ๊ฒจ์šฐ 7.3%, 4.0%, 9.0%์˜ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ์ •๋„๋กœ ์•ˆ ์ข‹์€ ๋ชจ์Šต์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด์™€๋Š” ๋ฐ˜๋Œ€๋กœ ํญ์ด $b=1$์ธ ToT๋Š” ์ด๋ฏธ 45%์˜ ์„ฑ๊ณต๋ฅ ์„ ๊ฐ€์ง€๊ณ , $b=5$์˜ ๊ฒฝ์šฐ์— 74%์˜ ์„ฑ๊ณต๋ฅ ์„ ๊ฐ€์ง„๋‹ค.

 

Error Analysis.  ๊ทธ๋ฆผ 3(b)์€ CoT์™€ ToT์—์„œ task์— ๋Œ€ํ•ด ์‹คํŒจํ•œ ์ƒ˜ํ”Œ์„ ๋ถ„ํ•ดํ•ด ๋ณธ๋‹ค. ํŠนํžˆ CoT ์ƒ˜ํ”Œ์˜ ์•ฝ 60%๋Š” ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„ ๋˜๋Š” ๋™๋“ฑํ•˜๊ฒŒ ์ฒ˜์Œ ์„ธ ๋‹จ์–ด๋ฅผ ์ƒ์„ฑํ•œ ํ›„ ์ด๋ฏธ task์— ๋Œ€ํ•ด ์‹คํŒจํ•˜์˜€๋‹ค. ์ด๋Š” left-to-right decoding์˜ ๋ฌธ์ œ์ ์„ ๊ฐ•์กฐํ•œ๋‹ค.

 

ํ‘œ 2. Game of 24 ๊ฒฐ๊ณผ / ๊ทธ๋ฆผ 3. Game of 24์˜ (a)๊ทœ๋ชจ ๋ถ„์„๊ณผ (b)์—๋Ÿฌ ๋ถ„์„

 

4-2. Creative Writing

 

 ๋…ผ๋ฌธ์—์„œ๋Š” 4๊ฐœ์˜ ๋žœ๋ค ํ•œ ๋ฌธ์žฅ์ด input์œผ๋กœ ๋“ค์–ด์˜ค๊ณ  ์ด ๋ฌธ์žฅ์œผ๋กœ ๋๋‚˜๋Š” ์ผ๊ด€๋œ 4๊ฐœ์˜ ๋ฌธ๋‹จ์„ ์ž‘์„ฑํ•˜๋Š” task๋ฅผ ๊ณ ์•ˆํ•˜์˜€๋‹ค. ์ด task๋Š” ์ฐฝ์˜์ ์ธ ์ƒ๊ฐ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋†’์€ ์ˆ˜์ค€์˜ ๊ณ„ํš์„ ํ•„์š”๋กœ ํ•œ๋‹ค.

 

Baselines.  ๋…ผ๋ฌธ์—์„œ๋Š” task ๋‹น 10๊ฐœ์˜ IO & CoT ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ ๊ฐ task์— ๋Œ€ํ•ด ๋žœ๋คํ•œ IO ์ƒ˜ํ”Œ์˜ ์œ„์— iterative-refine method๋ฅผ ๊ณ ๋ คํ•˜์˜€๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ LM์€ ์ž…๋ ฅ ์ œ์•ฝ์— ๋งž์ถฐ์ง€๊ณ , ๋งˆ์ง€๋ง‰์œผ๋กœ ์ƒ์„ฑ๋œ ๊ตฌ์ ˆ์ด ์ด๋ฏธ "์™„๋ฒฝํ•˜๊ฒŒ ์ผ๊ด€๋œ"์ง€ ์—ฌ๋ถ€๋ฅผ ๊ฒฐ์ •ํ•˜๊ณ  ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ ๊ฐœ์„ ๋œ ๊ตฌ์ ˆ์„ ์ƒ์„ฑํ•œ๋‹ค.

 

ToT Setup.  ๋…ผ๋ฌธ์—์„œ๋Š” ์˜ค์ง ํ•˜๋‚˜์˜ ์ค‘๊ฐ„ thought step์„ ๊ฐ€์ง€๋Š” ๊นŠ์ด๊ฐ€ 2์ธ ToT๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค - LM์€ ์ฒ˜์Œ์— $k=5$๊ฐœ์˜ ๊ณ„ํš์„ ์ƒ์„ฑํ•˜๊ณ  ์ตœ๊ณ ์˜ ํ•˜๋‚˜์— ๋Œ€ํ•ด ํˆฌํ‘œํ•œ๋‹ค(๊ทธ๋ฆผ 4). ๊ทธ๋‹ค์Œ์— ์ด์™€ ์œ ์‚ฌํ•˜๊ฒŒ ์ตœ๊ณ ์˜ ํ•˜๋‚˜๋กœ ๋ฝ‘ํžŒ ๊ณ„ํš์— ๊ธฐ๋ฐ˜ํ•ด์„œ $k=5$ ๊ฐœ์˜ ๊ตฌ์ ˆ์„ ๋งŒ๋“ ๋‹ค. ์—ฌ๊ธฐ์—์„œ ๋‹จ๊ณ„๋‹น ํ•˜๋‚˜์˜ ์„ ํƒ๋งŒ ์œ ์ง€๋˜๋ฏ€๋กœ ํญ ์ œํ•œ์€ $b = 1$์ด๋‹ค.

 

๊ทธ๋ฆผ 4. ๋žœ๋คํ•˜๊ฒŒ ๋ฝ‘ํžŒ Creative Writing task์—์„œ ๊ณ„ํš์ ์ธ ๊ฒ€์ƒ‰์˜ ์Šคํ…

 

Results.  ๊ทธ๋ฆผ 5(a)๋Š” 100๊ฐœ์˜ task์— ๋Œ€ํ•ด GPT-4์˜ ํ‰๊ท  score๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ์—ฌ๊ธฐ์„œ ToT(7.56)์€ IO(6.19)์™€ CoT(6.93) ๋ณด๋‹ค ํ‰๊ท ์ ์œผ๋กœ ๋” ์ผ๊ด€์ ์ธ ๊ตฌ์ ˆ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์œผ๋กœ ์—ฌ๊ฒจ์ง„๋‹ค. ์ด๋Ÿฌํ•œ automatic metric์€ ์žก์Œ์ด ์„ž์ผ ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ๊ทธ๋ฆผ 5(b)๋Š” human preference ๋˜ํ•œ CoT๋ณด๋‹ค ToT๋ฅผ ๋” ์„ ํ˜ธํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์•„ ์ด ๋ฐœ๊ฒฌ์„ ์ž…์ฆํ•ด ์ค€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, iterative-refine์€ ์ด ์ž์—ฐ์–ด task์—์„œ ๋”์šฑ ํšจ๊ณผ์ ์ธ๋ฐ, ์—ฌ๊ธฐ์„œ IO ์ผ๊ด€์„ฑ์€ 6.19์—์„œ 7.67๋กœ ๊ฐœ์„ ๋˜๊ณ , ToT๋Š” 7.56์—์„œ 7.91๋กœ ๊ฐœ์„ ๋œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์ด ๋ฐฉ๋ฒ•์ด ToT ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ thought ์ƒ์„ฑ์„ ์œ„ํ•œ 3๋ฒˆ์งธ ๋ฐฉ๋ฒ•์ด ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ ๋ฏฟ๋Š”๋‹ค.

 

๊ทธ๋ฆผ 5. Creative Writing ๊ฒฐ๊ณผ

 

4-3. Mini Crosswords

 

 Game of 24์™€ Creative Writing์—์„œ, ToT๋Š” ๋น„๊ต์  ์–•์€ ํ™œ์•ฝ์ƒ์„ ๋ณด์—ฌ์ค€๋‹ค - ์ตœ์ข… output์— ๋„๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด ์ตœ๋Œ€ 3๊ฐœ์˜ thought step์„ ํ•„์š”๋กœ ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” 5x5 mini crossword๋ฅผ ์ž์—ฐ์–ด๋ฅผ ํฌํ•จํ•˜๋Š” ๋” ์–ด๋ ค์šด ๊ฒ€์ƒ‰ ๋ฌธ์ œ๋กœ ํƒ๊ตฌํ•ด๋ณด๋ ค ํ•œ๋‹ค. ๋‹ค์‹œ ๋งํ•ด ๋ชฉํ‘œ๊ฐ€ ๋‹จ์ง€ task๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, LM์ด ์ž์‹ ์˜ ์ƒ๊ฐ์„ ํƒ๊ตฌํ•˜๊ณ  ํœด๋ฆฌ์Šคํ‹ฑ์œผ๋กœ ์˜๋„์  ์ถ”๋ก ์„ ํ†ตํ•ด ์ž์‹ ์˜ ํƒ๊ตฌ๋ฅผ ๊ฐ€์ด๋“œํ•˜๋Š” ์ผ๋ฐ˜์ ์ธ problem solver๋กœ์จ์˜ ํ•œ๊ณ„๋ฅผ ํƒ๊ตฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.

 

Baselines.  ๋…ผ๋ฌธ์—์„œ๋Š” IO prompt์—์„œ 5๊ฐœ์˜ example input-output ์Œ์„ ์ œ๊ณตํ•˜๊ณ , CoT prompt๋Š” ์ถ”๊ฐ€์ ์œผ๋กœ ์ค‘๊ฐ„ ๋‹จ์–ด๋ฅผ ํฌํ•จํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  10๊ฐœ์˜ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ๊ฐ prompt๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ท  ๋‚ด์—ˆ๋‹ค.

 

ToT Setup.  ๋…ผ๋ฌธ์—์„œ๋Š” state๊ฐ€ ๋” ์ด์ƒ ์œ ๋งํ•˜์ง€ ์•Š์„ ๋•Œ๊นŒ์ง€ ๊ฐ€์žฅ ์œ ๋งํ•œ ํ›„์† ๋‹จ์–ด ๋‹จ์„œ๋ฅผ ๊ณ„์† ํƒ์ƒ‰ํ•œ ๋‹ค์Œ parent state๋กœ ๋˜๋Œ์•„๊ฐ€ ๋Œ€์ฒด thought๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” DFS(Algorithm 2)์„ ํ™œ์šฉํ•œ๋‹ค. ๊ฒ€์ƒ‰์„ ๋‹ค๋ฃจ๊ธฐ ์‰ฝ๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ํ›„์† thought๋Š” ์ฑ„์›Œ์ง„ ๋‹จ์–ด๋‚˜ ๋ฌธ์ž๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋„๋ก ์ œํ•œ๋˜๋ฏ€๋กœ ToT์—๋Š” ์ตœ๋Œ€ 10๊ฐœ์˜ ์ค‘๊ฐ„ ๋‹จ๊ณ„๊ฐ€ ์žˆ๋‹ค. thought ์ƒ์„ฑ์„ ์œ„ํ•ด ๊ฐ state์—์„œ ๋ชจ๋“  ๊ธฐ์กด thought๋ฅผ ๋‚˜๋จธ์ง€ ๋‹จ์„œ, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์–ด๋””์— ๋ฌด์—‡์„ ์ฑ„์šธ ๊ฒƒ์ธ์ง€์— ๋Œ€ํ•œ ํ›„๋ณด๋ฅผ ์ œ์‹œํ•˜๊ธฐ ์œ„ํ•ด propose prompt๋ฅผ 5๋ฒˆ ์ด‰๋ฐœํ•œ๋‹ค. ๋˜ํ•œ LM์ด ๋‹ค์–‘ํ•œ thought์— ๋Œ€ํ•œ ์‹ ๋ขฐ ์ˆ˜์ค€์„ ์ œ๊ณตํ•˜๊ณ  ํƒ์ƒ‰ํ•  ๋‹ค์Œ ์ƒ๊ฐ์˜ ์ •๋ ฌ๋œ ๋ชฉ๋ก์„ ์–ป๊ธฐ ์œ„ํ•ด ์ œ์•ˆ ์ „์ฒด์—์„œ ์ด๋ฅผ ์ง‘๊ณ„ํ•˜๋„๋ก ์œ ๋„ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค(๊ทธ๋ฆผ 6(a)). state ํ‰๊ฐ€์˜ ๊ฒฝ์šฐ ๊ฐ state๋ฅผ ๋‚˜๋จธ์ง€ ๋‹จ์„œ์— ๋Œ€ํ•œ ๋ฌธ์ž ์ œ์•ฝ ์กฐ๊ฑด์œผ๋กœ ์œ ์‚ฌํ•˜๊ฒŒ ๋ณ€ํ™˜ํ•œ ๋‹ค์Œ ์ฃผ์–ด์ง„ ์ œ์•ฝ ์กฐ๊ฑด์„ ์ฑ„์šธ ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ ๊ฐ ๋‹จ์„œ์— ๋Œ€ํ•ด ํ‰๊ฐ€ํ•œ๋‹ค. ๋‚˜๋จธ์ง€ ๋‹จ์„œ๊ฐ€ ์ฑ„์šธ ์ˆ˜ ์—†๋Š” ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผ๋˜๋ฉด state์˜ ํ•˜์œ„ ํŠธ๋ฆฌ ํƒ์ƒ‰์ด ๊ฐ€์ง€์น˜๊ธฐ๋˜๊ณ  DFS๋Š” ๋‹ค์Œ ์œ ๋งํ•œ thought๋ฅผ ํƒ์ƒ‰ํ•˜๊ธฐ ์œ„ํ•ด ๋ถ€๋ชจ๋กœ ์—ญ์ถ”์ ํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” DFS ๊ฒ€์ƒ‰ ๋‹จ๊ณ„๋ฅผ 100์œผ๋กœ ์ œํ•œํ•˜๊ณ  ๊ฐ€์žฅ ๊นŠ์ด ํƒ์ƒ‰๋œ state(์—ฌ๋Ÿฌ ๊ฐœ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์ฒซ ๋ฒˆ์งธ ํƒ์ƒ‰๋œ state)๋ฅผ ์ตœ์ข… ์ถœ๋ ฅ์œผ๋กœ ๊ฐ„๋‹จํžˆ ๋ Œ๋”๋ง ํ•œ๋‹ค.

 

๊ทธ๋ฆผ 6. Mini Crossword์—์„œ (a) DFS๋ฅผ ์œ„ํ•œ ์šฐ์„ ์ˆœ์œ„ ํ์—์„œ thought๋“ค์ด ์ œ์•ˆ๋˜๊ณ  ๋ชจ์ด๋Š”์ง€์™€ (b) ๋‚จ์€ ๊ฐ ๋‹จ์–ด ๋‹จ์„œ๋ฅผ ์ฑ„์šธ ๊ฐ€๋Šฅ์„ฑ์— ๋”ฐ๋ผ state๋ฅผ ํ‰๊ฐ€ํ•˜๊ณ  ๋‚˜๋จธ์ง€ ๋‹จ์„œ๊ฐ€ LM์—์„œ ์ฑ„์šธ ์ˆ˜ ์—†๋‹ค๊ณ  ํŒ๋‹จ๋˜๋Š” ๊ฒฝ์šฐ ์–ด๋–ป๊ฒŒ ๊ฐ€์ง€์น˜๊ธฐํ•˜๋Š”์ง€

 

Results.  ํ‘œ 3์—์„œ ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, IO์™€ CoT prompting method๋Š” word-level ์„ฑ๊ณต๋ฅ ์ด 16%๋ณด๋‹ค ๋‚ฎ์ง€๋งŒ, ToT๋Š” ๋ชจ๋“  metric์—์„œ ๊ฐœ์„ ๋œ ๋ชจ์Šต์„ ๋ณด์—ฌ์ฃผ๋ฉด์„œ word-level์—์„œ 60%์˜ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์—ฌ์คฌ๊ณ  20๊ฐœ์˜ ๊ฒŒ์ž„ ์ค‘ 4๊ฐœ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๊ฐœ์„ ์€ ๋†€๋ผ์šด ๊ฒƒ์ด ์•„๋‹Œ ๊ฒŒ ์ฃผ์–ด์ง„ IO์™€ CoT๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ๋‹จ์„œ, ๊ฒฐ์ •์— ๋ณ€ํ™”์ฃผ๊ธฐ, backtrack์„ ์‹œ๋„ํ•  ๋งŒํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ๋ถ€์กฑํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

ํ‘œ 3. Mini Crossword ๊ฒฐ๊ณผ

 

Oracle and ablation studies.  task ๋‹น oracle best DFS state๋กœ๋ถ€ํ„ฐ ์ถœ๋ ฅํ•  ๋•Œ, ToT ์„ฑ๋Šฅ์€ ๋†’์•„์ง€๊ณ  7๊ฐœ์˜ ๊ฒŒ์ž„์„ ํ’€์–ด๋ฒ„๋ฆฐ๋‹ค. ์ด๊ฒƒ์€ ๋…ผ๋ฌธ์˜ ๊ฐ„๋‹จํ•œ output ํœด๋ฆฌ์Šคํ‹ฑ์ด ๊ฐœ์„ ๋˜์—ˆ๋‹ค๋Š” ๊ฒƒ์„ ๊ฐ€๋ฆฌํ‚จ๋‹ค. 

 

 

5. Discussion

Limitations and future directions.  ToT๋Š” ์ด๋ฏธ GPT-4๊ฐ€ ์ถฉ๋ถ„ํžˆ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” task์—์„œ ๋ณ„๋กœ ํ•„์š”ํ•˜์ง€๊ฐ€ ์•Š๋‹ค. ๊ทธ๋ž˜์„œ ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ LM๋“ค์ด ์–ด๋ ค์›€์„ ๊ฒช๋˜ 3๊ฐœ์˜ task์— ๋Œ€ํ•ด์„œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋Š”๋ฐ ์ข‹์€ ๋ชจ์Šต์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ์•ž์œผ๋กœ LM์€ real-world์—์„œ ์‚ฌ์šฉ๋  ๊ฒƒ์ด๊ธฐ์— ๋”์šฑ ๋ณต์žกํ•œ task๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์ด ํ•„์š”ํ•˜๋‹ค. ๋˜ํ•œ ToT ๊ฐ™์€ ๊ฒ€์ƒ‰ method๋Š” task ์„ฑ๋Šฅ์„ ํ–ฅ์ƒํ•˜๊ธฐ ์œ„ํ•ด ์ผ๋ฐ˜ ์ƒ˜ํ”Œ๋ง method๋ณด๋‹ค ๋” ๋งŽ์€ ์ž์›์„ ํ•„์š”๋กœ ํ•˜์ง€๋งŒ, ToT์˜ modular ์œตํ†ต์„ฑ์€ ์‚ฌ์šฉ์ž๋“ค์ด ์ด๋Ÿฌํ•œ performance-cost tradeoff๋ฅผ ์ปค์Šคํ„ฐ๋งˆ์ด์ง• ํ•˜๋„๋ก ํ—ˆ๋ฝํ•ด ์ฃผ๊ณ  ๊ณ„์† ์ง„ํ–‰๋˜๋Š” open-sourceํ™”๋Š” ๊ฐ€๊นŒ์šด ๋ฏธ๋ž˜์— ์ด๋Ÿฌํ•œ ๋น„์šฉ์„ ์ˆœ์กฐ๋กญ๊ฒŒ ์ค„์—ฌ๋‚˜๊ฐˆ ๊ฒƒ์ด๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์ด ๋…ผ๋ฌธ์—์„œ๋Š” off-the-shelf LM์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ ์ง‘์ค‘ํ•˜์˜€๊ณ , ToT-style์˜ high-level counterfactual decision making์„ ์‚ฌ์šฉํ•ด์„œ LM์„ fine-tune ํ•˜๋Š” ๊ฒƒ์€ LM์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ธฐํšŒ๋ฅผ ์ค„ ๊ฒƒ์ด๋‹ค.

 

 

 

 

์ถœ์ฒ˜

https://arxiv.org/abs/2305.10601

 

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require

arxiv.org

https://betterprogramming.pub/tree-of-thoughts-an-improvement-of-chain-of-thoughts-paper-review-7c52171602bd

 

Tree of Thoughts: An Improvement of Chain of Thoughts (Paper Review)

Review paper: a new prompting method called โ€œTree of Thoughtsโ€

betterprogramming.pub

 

'Paper Reading ๐Ÿ“œ > Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Why can GPT learn in-context? ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (0) 2023.06.12
LMSI: Large Language Models can Self-Improve ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (0) 2023.06.09
Instruction Tuning with GPT-4 ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (0) 2023.06.07
ChatGPT์— ๋ฐ˜๋ณต ๋ฉ”์ปค๋‹ˆ์ฆ˜(LSTM)์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด? - RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (0) 2023.05.29
LoRA: Low-Rank Adaptation of Large Language Models ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (0) 2023.05.26
'Paper Reading ๐Ÿ“œ/Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • Why can GPT learn in-context? ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
  • LMSI: Large Language Models can Self-Improve ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
  • Instruction Tuning with GPT-4 ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
  • ChatGPT์— ๋ฐ˜๋ณต ๋ฉ”์ปค๋‹ˆ์ฆ˜(LSTM)์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด? - RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
Cartinoe
Cartinoe
Welcome! I'm a student studying about deep learning(NLP) ๐Ÿ˜‰ The goal of my study is to develop a competent LLM helping people!
  • faviconinstagram
  • faviconfacebook
  • favicongithub
  • faviconLinkedIn
Cartinoe's paper review
Cartinoe
Cartinoe
Cartinoe's paper review
Cartinoe
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • My Posting (141)
    • Paper Reading ๐Ÿ“œ (113)
      • Natural Language Processing (67)
      • Alignment Problem of LLM (11)
      • Computer Vision (4)
      • Deep Learning (6)
      • multimodal models (17)
      • Mathematics(์„ ํ˜•๋Œ€์ˆ˜, ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„, ๋ฏธ.. (8)
    • Lecture ๐Ÿง‘โ€๐Ÿซ (16)
      • Hugging Face Course (1)
      • Coursera (15)
    • Insight ๐Ÿ˜Ž (10)
    • Research & Project ๐Ÿ”ฌ (2)

์ธ๊ธฐ ๊ธ€

์ตœ๊ทผ ๊ธ€

๊ณต์ง€์‚ฌํ•ญ

  • ๋ธ”๋กœ๊ทธ ๊ณต์ง€์‚ฌํ•ญ - ๋ชจ๋ฐ”์ผ ์ˆ˜์‹ ๊นจ์ง

ํƒœ๊ทธ

  • closed-source
  • context length
  • Evaluation Metric
  • open-source model
  • ChatGPT
  • Open-source
  • Vicuna Evaluation
  • proprietary model
  • scaling law
  • MT-Bench
  • RLHF
  • context window
  • GPT-4
  • transformer
  • closed-source model
  • LLAMA2
  • Chinchilla
  • LLM
  • LM
  • Vicuna
hELLO ยท Designed By ์ •์ƒ์šฐ.
Cartinoe
Tree of Thoughts: Deliberate Problem Solving with Large Language Models ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๊ฐœ์ธ์ •๋ณด

  • ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ
  • ํฌ๋Ÿผ
  • ๋กœ๊ทธ์ธ

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.