Paper Reading ๐Ÿ“œ/Natural Language Processing

LM์ด ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด? ๐Ÿ”ฌ: Large Language Models as Tool Makers ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

Cartinoe 2023. 6. 24. 15:56

The overview of this paper

 ์ตœ๊ทผ์˜ ์—ฐ๊ตฌ๋Š” LLM์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ ํ–ฅ์ƒ์˜ ์ž ์žฌ์„ฑ์„ ๋ณด์—ฌ์คฌ๋‹ค. ํ•˜์ง€๋งŒ, ์ด์ „ ์—ฐ๊ตฌ๋“ค์€ ๊ธฐ์กด ํˆด์˜ ๊ฐ€์šฉ์„ฑ์— ์ƒ๋‹นํžˆ ์˜์กดํ•œ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์˜์กด์„ฑ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด closed-loop ํ”„๋ ˆ์ž„์›Œํฌ์ธ LLM ATool Makers(LATM)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. LATM์—์„œ LLM์€ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•œ ์ž์‹ ๋งŒ์˜ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ํˆด์„ ์ƒ์„ฑํ•œ๋‹ค. LATM์€ 2๊ฐœ์˜ ๋ฉ”์ธ ํŽ˜์ด์ฆˆ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค: tool making & tool using.

 

 tool making์€ LLM์ด ์„œ๋กœ ๋‹ค๋ฅธ ์š”์ฒญ์— ์ ์šฉ๋  ์ˆ˜ ์žˆ๋Š” tool์„ ๊ณ„์†์ ์œผ๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์ค˜์„œ ํ–ฅํ›„ ์š”์ฒญ์€ task๋ฅผ ํ•ด๊ฒฐํ•  ๋•Œ ์šฐ์ตํ•˜๋‹ค๊ณ  ์ƒ๊ฐ๋  ๋•Œ ํ•ด๋‹น APT๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์ค€๋‹ค. ์ด๋ ‡๊ฒŒ ํ•ด์„œ ์ด 2 ํŽ˜์ด์ฆˆ๋Š” ์ƒ์„ฑ๋œ ํˆด๊ณผ ๋ฌธ์ œ ์†”๋ฃจ์…˜์˜ ํ€„๋ฆฌํ‹ฐ ์ €ํ•˜ ์—†์ด ๋น„์šฉ ํšจ์œจ์„ฑ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ์ค€๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” GPT-4๋ฅผ tool maker๋กœ, GPT-3.5๋ฅผ tool user๋กœ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

 

 

Table of Contents

1. Introduction

2. LLM as Tool Maker(LATM)

3. Experiments

 

 

1. Introduction

 ์ธ๊ฐ„์˜ ์ง„ํ™” ์—ญ์‚ฌ๋ฅผ ์‚ดํŽด๋ณด๋ฉด ์‚ฌ๋žŒ๋“ค์€ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ด๋“ค๋งŒ์˜ ํˆด์„ ๋งŒ๋“ค์–ด์„œ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ™”ํ•˜์˜€๋‹ค. ์ธ๊ฐ„์˜ ์—ญ์‚ฌ์—์„œ ๋ณธ tool-making์˜ ์ค‘์š”์„ฑ์— ์˜๊ฐ์„ ๋ฐ›์•„์„œ ์ด๋Ÿฌํ•œ ์ง„ํ™”์  ๊ฐœ๋…์„ LLM์˜ ์˜์—ญ์— ์ ์šฉํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” closed-loop ํ”„๋ ˆ์ž„์›Œํฌ์ธ LATM์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด method๋Š” LLM์ด ์ƒˆ๋กœ์šด task์— ๋„์ „ํ•˜๊ธฐ ์œ„ํ•ด ์ด๋“ค๋งŒ์˜ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ tool์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค€๋‹ค. ์ด LATM์€ ๋‹ค์Œ์˜ 2๊ฐœ์˜ ์ค‘์š” ์Šคํ…Œ์ด์ง€๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค:

 

  1. Tool Making: tool maker๋กœ ์•Œ๋ ค์ง„ LLM์œผ๋กœ ์ฃผ์–ด์ง„ task์— ๋Œ€ํ•œ tool์„ ๋””์ž์ธํ•จ → ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์„ ํ•„์š”๋กœ ํ•จ
  2. Tool Using: tool user๋กœ ์•Œ๋ ค์ง„ ๋‹ค๋ฅธ LLM์„ ์ƒˆ๋กœ์šด ์š”์ฒญ์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด tool์„ ์ ์šฉํ•จ → lightweight ๋ชจ๋ธ์„ ํ•„์š”๋กœ ํ•จ

 

 ์ด๋Ÿฌํ•œ 2 ์Šคํ…Œ์ด์ง€๋Š” LATM์ด ๊ฐ ์Šคํ…Œ์ด์ง€์—์„œ job์„ ๊ฐ€์žฅ ์ ํ•ฉํ•œ LLM์—๊ฒŒ ํ• ๋‹นํ•˜๋„๋ก ํ—ˆ๋ฝํ•ด์ค€๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ LLM์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋ฌธ์ œ ํ•ด๊ฒฐ์— ํ•„์š”ํ•œ ๊ณ„์‚ฐ ๋น„์šฉ๋„ ์ƒ๋‹นํžˆ ๊ฐ์†Œ์‹œํ‚จ๋‹ค.

 

๊ทธ๋ฆผ 1. LATM์˜ closed-loop ํ”„๋ ˆ์ž„์›Œํฌ

 

2. LLM as Tool Maker(LATM)

2-1. Making New Tools and Reuse Them

 

 LATM ํŒจ๋Ÿฌ๋‹ค์ž„์—์„œ ๋ฉ”์ธ ํ”„๋กœ์„ธ์Šค๋Š” 2๊ฐœ์˜ ์Šคํ…Œ์ด์ง€๋กœ ๋ถ„ํ• ๋  ์ˆ˜ ์žˆ๋‹ค: tool making & tool using. ๊ฐ ์Šคํ…Œ์ด์ง€๋Š” ์„ฑ๋Šฅ๊ณผ ๋น„์šฉ ํšจ์œจ์˜ ๋ฐธ๋Ÿฐ์Šค๋ฅผ ๋งž์ถ”๊ธฐ ์œ„ํ•ด LLM์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์œ ํ˜•์„ ํ™œ์šฉํ•œ๋‹ค.

 

๊ทธ๋ฆผ 2. LATM์˜ ํŒŒ์ดํ”„๋ผ์ธ

 

Tool Making.  ์ด ์Šคํ…Œ์ด์ง€์—์„œ๋Š” ๊ฐ•๋ ฅํ•˜์ง€๋งŒ ๋น„์šฉ์ด ๋น„์‹ผ ๋ชจ๋ธ์ธ GPT-4 ๊ฐ™์€ ๋ชจ๋ธ์„ tool maker๋กœ ํ™œ์šฉํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ tool maker์˜ ์—ญํ• ์€ task์˜ ์ ์€ ์„ค๋ช…์œผ๋กœ๋ถ€ํ„ฐ ์ผ๋ฐ˜์ ์ด๊ณ  ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ tool์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด ์Šคํ…Œ์ด์ง€๋Š” 3๊ฐœ์˜ sub-stage๋กœ ๋‚˜๋ˆ ์งˆ ์ˆ˜ ์žˆ๋‹ค:

 

  • Tool Proposing: ์ด ์Šคํ…Œ์ด์ง€์—์„œ tool maker๋Š” ์ฃผ์–ด์ง„ task์˜ ์„ค๋ช…์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Python ํ•จ์ˆ˜ ์ƒ์„ฑ์„ ์‹œ๋„ํ•œ๋‹ค. ์ด ํ”„๋กœ์„ธ์Šค๋Š” ์—ฌ๋Ÿฌ ์„ค๋ช…์ด ์ œ๊ณต๋˜๊ณ , ๋ชจ๋ธ์„ ์„ค๋ช…ํ•œ ํŠน์„ฑ์„ ์ƒ์„ฑํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ์ž‘์„ฑํ•˜๋„๋ก ์š”๊ตฌ๋˜๋Š” PbE ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋”ฐ๋ฅธ๋‹ค. ๋…ผ๋ฌธ์˜ ์‹คํ—˜์—์„œ๋Š” ์ด ์Šคํ…Œ์ด์ง€์—์„œ 3๊ฐœ์˜ ์„ค๋ช…์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ๋งŒ์•ฝ ์ œ์•ˆ๋œ ํˆด์ด ์‹คํ–‰ ๋ถˆ๊ฐ€๋Šฅํ•˜๊ฑฐ๋‚˜ ์—๋Ÿฌ๋ฅผ ๋งˆ์ฃผ์น˜๋ฉด tool maker๋Š” ํžˆ์Šคํ† ๋ฆฌ์— ์—๋Ÿฌ ๋ฉ”์‹œ์ง€๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ๋‹ค๋ฅธ ์‹œ๋„๋„ ํ•œ๋‹ค.
  • Tool Verification: ์ด ์Šคํ…Œ์ด์ง€์—์„œ validation ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•ด์„œ unit test๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ œ์•ˆ๋œ tool์—์„œ ํ…Œ์ŠคํŠธ๋ฅผ ์‹คํ–‰ํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” 3๊ฐœ์˜ validation ์ƒ˜ํ”Œ์„ ํ™œ์šฉํ•ด์„œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜๊ณ , tool์ด ์‹คํŒจ๋ฅผ ํ•˜๋ฉด, tool maker๋Š” ํžˆ์Šคํ† ๋ฆฌ์— ์—๋Ÿฌ๋ฅผ ๊ธฐ๋กํ•˜๊ณ , unit test์—์„œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์‹œ๋„ํ•œ๋‹ค. ํ•˜์ง€๋งŒ LATM ํŒŒ์ดํ”„๋ผ์ธ์˜ verification stage๋Š” ์‚ด์ง ๋‹ค๋ฅด๊ฒŒ ์‚ฌ์šฉ๋ผ์„œ ๋‹ค์Œ์˜ 2๊ฐ€์ง€ ์ค‘์š” ์—ญํ• ์„ ๋งŒ์กฑํ•œ๋‹ค.
    1. ์ž์—ฐ์–ด question์„ ์–ด๋–ป๊ฒŒ ํ•จ์ˆ˜ ํ˜ธ์ถœ๋กœ ๋ณ€ํ™˜ํ•˜๋Š”์ง€๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์˜ˆ์‹œ๋ฅผ ์ œ๊ณตํ•ด์คŒ
    2. tool์˜ ์‹ ๋ขฐ๋„๋ฅผ ์ž…์ฆํ•ด์„œ, ์ „์ฒด ํ”„๋กœ์„ธ์Šค๊ฐ€ ์™„์ „ํžˆ ์ž๋™ํ™”๋˜๋„๋ก ํ—ˆ๋ฝํ•ด ์คŒ
  • Tool Wrapping: tool maker๊ฐ€ tool user๋ฅผ ์œ„ํ•œ wrapping up๊ณผ task๋ฅผ ์–ด๋–ป๊ฒŒ ํ•จ์ˆ˜ ํ˜ธ์ถœ๋กœ ๋ณ€ํ™˜ํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ œ๊ณตํ•ด ์ค€๋‹ค. 

 

Tool Using.  ๋…ผ๋ฌธ์—์„œ๋Š” tool user๋กœ lightweight & ๋น„์šฉ ํšจ๊ณผ์  ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค. tool user์˜ ์—ญํ• ์€ task์˜ ๋‹ค์–‘ํ•œ ์ธ์Šคํ„ด์Šค๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ธ์ฆ๋œ ํˆด์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด ์Šคํ…Œ์ด์ง€์— ๋Œ€ํ•œ prompt๋Š” task๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•จ์ˆ˜์™€ task ์ฟผ๋ฆฌ๋ฅผ ์–ด๋–ป๊ฒŒ ํ•จ์ˆ˜ ํ˜ธ์ถœ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•œ ์„ค๋ช…์„ ํฌํ•จํ•˜๋Š” wrapped tool์ด๋‹ค. ์„ค๋ช…์„ ์‚ฌ์šฉํ•˜์—ฌ tool user๋Š” ์š”๊ตฌ๋œ ํ•จ์ˆ˜ ํ˜ธ์ถœ์„ in-context learning์œผ๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์ค€๋‹ค. task ํ•ด๊ฒฐ์„ ์œ„ํ•ด ํ•จ์ˆ˜ ํ˜ธ์ถœ์ด ์‹คํ–‰๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  task์˜ ํฌ๋งท์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ํ›„์ฒ˜๋ฆฌ๊ฐ€ ์ ์šฉ๋œ๋‹ค.

 

 ๊ฐ task ์œ ํ˜•์— ๋Œ€ํ•ด tool-making stage๋Š” ์˜ค์ง ํ•œ ๋ฒˆ๋งŒ ์ˆ˜ํ–‰๋˜์–ด์•ผ ํ•œ๋‹ค. ๊ฒฐ๊ณผ๋กœ ๋‚˜์˜จ tool์€ task์˜ ๋ชจ๋“  instance์— ๋Œ€ํ•ด ์žฌ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ์ด๊ฒƒ์€ LATM์ด ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ ํ˜ผ์ž๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์ƒ๋‹นํžˆ ํšจ์œจ์ ์ด๊ณ  ๋น„์šฉ ํšจ๊ณผ์ ์ด๊ฒŒ ๋งŒ๋“ ๋‹ค.

 

 ๊ทธ๋ฆผ 3์€ ์–ด๋–ป๊ฒŒ tool maker๊ฐ€ tool์„ ์ƒ์„ฑํ•จ์œผ๋กœ์จ Big-Bench์˜ ๋…ผ๋ฆฌ์  ์ถ”๋ก  task๋ฅผ ํ•ด๊ฒฐํ•˜๋Š”์ง€์˜ ์˜ˆ์‹œ๋ฅผ ์ œ๊ณตํ•ด ์ฃผ๊ณ , tool maker๊ฐ€ ์–ด๋–ป๊ฒŒ tool์„ ์‚ฌ์šฉํ•˜๋Š”์ง€์˜ ์˜ˆ์‹œ๋ฅผ ์ œ๊ณตํ•ด์ค€๋‹ค. task๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, tool maker๋Š” question์œผ๋กœ๋ถ€ํ„ฐ ์ œ์•ˆ์„ ์ถ”์ถœํ•จ์œผ๋กœ์จ task๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์ผ๋ฐ˜์  ํ”„๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•œ ๋‹ค์Œ, ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ํ† ํฐ ์ˆœ์—ด์„ ๊ฒ€์ƒ‰ํ•œ๋‹ค. tool user๋Š” ์ด ํ”„๋กœ๊ทธ๋žจ์„ task๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ณ , task์˜ ์ž์—ฐ์–ด instance๋กœ๋ถ€ํ„ฐ ๊ด€๋ จ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ํ•จ์ˆ˜ ํ˜ธ์ถœ์„ ์‚ฌ์šฉํ•œ๋‹ค.

 

๊ทธ๋ฆผ 3. Logical Deduction task์— ๋Œ€ํ•œ LATM ํŒŒ์ดํ”„๋ผ์ธ์˜ Tool Proposing๊ณผ Tool Using์˜ ๋ฌ˜์‚ฌ

 

2-2. Handling Streaming Data with Dispatcher

 

 real-world ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ task instance๋Š” ์ „ํ˜•์ ์œผ๋กœ ์‹œํ€€์Šค๋กœ ๋„๋ž˜ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ์˜ ํ๋ฆ„์„ ์ˆ˜์šฉํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ incoming task์— ๋Œ€ํ•ด tool user์™€ tool maker๋ฅผ ์ฐธ์—ฌ์‹œํ‚ฌ์ง€ ๋ง์ง€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” 3๋ฒˆ์งธ LLM์ธ dispatcher๋ฅผ ์„ค๋ช…ํ•œ๋‹ค. ๋…ผ๋ฌธ์˜ dispatcher๋Š” ๊ธฐ์กด ํˆด๋กœ ํ•ด๊ฒฐ๋  ์ˆ˜ ์—†๋Š” ์ƒˆ๋กœ์šด task๋ฅผ ์‹๋ณ„ํ•˜๊ธฐ ์œ„ํ•œ ๋Šฅ๋ ฅ์œผ๋กœ ๋šœ๋ ทํ•ด์ง€๊ณ , ์ด๋Ÿฌํ•œ task์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด tool์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด tool maker๋ฅผ ์ฐธ์—ฌ์‹œํ‚จ๋‹ค.

 

 ๊ตฌ์ฒด์ ์œผ๋กœ dispatcher๋Š” tool maker์— ์˜ํ•ด ์ƒ์„ฑ๋œ ๊ธฐ์กด tool์˜ ๊ธฐ๋ก์„ ์œ ์ง€ํ•œ๋‹ค. ์ƒˆ๋กœ์šด task instance๊ฐ€ ๋“ค์–ด์˜ค๋ฉด dispatcher๋Š” task์— ๋Œ€ํ•œ ์ ํ•ฉํ•œ tool์ด ์žˆ๋Š”์ง€๋ฅผ ์ดˆ๊ธฐ์— ๊ฒฐ์ •ํ•œ๋‹ค. ๋งŒ์•ฝ ์ ์ ˆํ•œ tool์ด ์žˆ์œผ๋ฉด instance์™€ ํ•ด๋‹น ํˆด์„ tool user์—๊ฒŒ ์ „๋‹ฌํ•œ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š๊ณ  ๋งŒ์•ฝ ์ ์ ˆํ•œ ํˆด์ด ์—†์œผ๋ฉด tool maker์—๊ฒŒ instance๋ฅผ ์ „๋‹ฌํ•ด์„œ ์ƒˆ๋กœ์šด tool์„ ๋งŒ๋“ค๊ฒŒ ํ•œ๋‹ค. dispatcher์˜ workflow๊ฐ€ ๊ทธ๋ฆผ 4์— ๋‚˜ํƒ€๋‚˜์žˆ๋‹ค. dispatching task์˜ ๊ฐ„๋‹จํ•จ์ด ์ฃผ์–ด์ง€๋ฉด dispatcher๋Š” ์ ์ ˆํ•œ prompt๋ฅผ ์‚ฌ์šฉํ•˜๋Š” lightweight ๋ชจ๋ธ์ด ๋  ์ˆ˜๋„ ์žˆ๋‹ค. 

 

๊ทธ๋ฆผ 4. dispatcher์˜ ์˜ˆ์‹œ

 

3. Experiments

3-1. Experimental Setup

 

Datasets.  ๋…ผ๋ฌธ์—์„œ๋Š” LATM์„ Big-Bench์˜ 5๊ฐœ์˜ task(Logical Deduction, Tracking Shuffled Objects, Dyck Language, Word Sorting, Chinese Remainder Theorem)์—์„œ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  LATM์˜ real-world ์‹œ๋‚˜๋ฆฌ์˜ค ์—์„œ์˜ ํšจ๊ณผ๋ฅผ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด Scheduling Meeting task๋ฅผ ๋งŒ๋“ค์–ด์„œ ํ‰๊ฐ€ํ•˜์˜€๋‹ค.

 

ํ‘œ 1. task๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด tool maker์— ์˜ํ•ด ์ƒ์„ฑ๋œ utility function

 

3-2. Effectiveness of the Tool-Making Stage

 

 tool-making ์Šคํ…Œ์ด์ง€์—์„œ๋Š” ๊ตฌ์ฒด์  task ํ•ด๊ฒฐ์„ ์œ„ํ•œ ์ผ๋ฐ˜์ ์ธ Python ํ•จ์ˆ˜๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ•๋ ฅํ•˜์ง€๋งŒ ๋А๋ฆฐ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด ์Šคํ…์€ ์˜ค์ง ํ•œ ๋ฒˆ๋งŒ ์ˆ˜ํ–‰๋œ๋‹ค. ๋…ผ๋ฌธ์˜ ์‹คํ—˜์—์„œ GPT-4๋ฅผ tool maker๋กœ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  LM์—๊ฒŒ ์—ฌ๋Ÿฌ few-shot ์˜ˆ์‹œ๋ฅผ ์ œ๊ณตํ•ด์ฃผ๊ณ  ๊ทธ๋ฆผ 3์ฒ˜๋Ÿผ ์ผ๋ฐ˜์ ์ธ Python ํ”„๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ์ œ์•ˆํ•˜์˜€๋‹ค.

 

 ๋…ผ๋ฌธ์—์„œ๋Š” GPT-4๊ฐ€ tool maker๋กœ ์‚ฌ์šฉ๋  ๋•Œ, ๋ชจ๋ธ์€ task๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ ์ ˆํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ž์ฃผ ๊ณ ์•ˆํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์˜ ์‹คํ—˜์—์„œ tool-verification ์Šคํ…Œ์ด์ง€๋Š” ์ฃผ๋กœ ์ž์—ฐ์–ด question์„ ์–ด๋–ป๊ฒŒ ํ•จ์ˆ˜ ํ˜ธ์ถœ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์˜ˆ์‹œ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋…ผ๋ฌธ์—์„œ๋Š” tool maker๊ฐ€ ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€ ์•ˆ๋‚ด๋กœ ์‹ค์ˆ˜๋ฅผ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋Š” 60๊ฐ€์ง€ ์‹œ๋„ ์ค‘ 2๊ฐ€์ง€ ๊ฒฝ์šฐ๋งŒ ๊ด€์ฐฐํ•˜์˜€๋‹ค. 

 

3-3. LATM Improves the Performance of Lightweight LLMs

 

 ํ‘œ 2์—์„œ๋Š” CoT์˜ ์„ฑ๋Šฅ์„ LATM์™€ ๋น„๊ตํ•˜์˜€๋‹ค. GPT-4๋ฅผ 6๊ฐœ์˜ task์— ๋Œ€ํ•œ tool์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ tool maker๋กœ ์‚ฌ์šฉํ•˜๊ณ  GPT-3.5 Turbo & GPT-4๋ฅผ tool user๋กœ ์‚ฌ์šฉํ•ด์„œ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ๋Š” tool์˜ ๋„์›€์œผ๋กœ GPT-3.5 Turbo ๊ฐ™์€ lightweight ๋ชจ๋ธ์€ CoT prompting์˜ ์„ฑ๋Šฅ์„ ์ƒ๋‹นํžˆ ๋Šฅ๊ฐ€ํ•˜๊ณ , GPT-4์™€ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์คฌ๋‹ค. ๋ฐ˜๋ฉด์— GPT-4์— ๋น„ํ•ด ๋น„์šฉ์€ ์ƒ๋‹นํžˆ ๊ฐ์†Œํ•˜์˜€๋‹ค. ํฅ๋ฏธ๋กœ์› ๋˜ ์ ์€ Dyck Language task์—์„œ๋Š” GPT-3.5 Turbo๊ฐ€ GPT-4๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์คฌ๋‹ค๋Š” ์ ์ด๋‹ค! ์‹คํŒจ ์‚ฌ๋ก€๋ฅผ ์กฐ์‚ฌํ•œ ๊ฒฐ๊ณผ, question์„ ํ•จ์ˆ˜ ํ˜ธ์ถœ๋กœ ๋ณ€ํ™˜ํ•  ๋•Œ GPT-4๊ฐ€ ๋•Œ๋•Œ๋กœ ๋ฌธ์ œ์˜ ์ผ๋ถ€๋ฅผ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ํ•ด๊ฒฐํ•˜์—ฌ ์ž˜๋ชป๋œ ํ•จ์ˆ˜ ์ถœ๋ ฅ์œผ๋กœ ์ด์–ด์ง€๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค.

 

ํ‘œ 2. LATM๊ณผ CoT ๊ฐ„์˜ ์„ฑ๋Šฅ ๋น„๊ต

 

3-4. Extending LATM to a Streaming Setting with a Mixture of Tasks

 

 LATM์„ streaming ์„ธํŒ…์œผ๋กœ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ๋ฐ, ์ด ๊ฒฝ์šฐ์—๋Š” dispatcher๋ฅผ ๋”ฐ๋กœ ํ•„์š”๋กœ ํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” GPT-3.5 Turbo๋ฅผ dispatcher๋กœ ์‚ฌ์šฉํ•˜๊ณ  ์ด๊ฒƒ์˜ ๋Šฅ๋ ฅ์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‰๊ฐ€ํ•˜์˜€๋‹ค:

 

  1. ๊ธฐ์กด tool incoming instance๋ฅผ ํ•ด๊ฒฐ ๊ฐ€๋Šฅํ•œ์ง€ ํŒ๋ณ„
  2. unseen task์˜ instance์— ๋Œ€ํ•ด tool-making์„ ํ•„์š”๋กœ ํ•จ

 

Identifying existing tools.  ๋…ผ๋ฌธ์—์„œ๋Š” ์ฃผ์–ด์ง„ instance์— ๋Œ€ํ•ด tool-making์„ ํ•„์š”๋กœ ํ•˜๋Š” dispatcher์˜ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. test set์˜ ๊ฐ instance์— ๋Œ€ํ•ด dispatcher๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ธฐ์กด tool๊ณผ ์—ฐ๊ด€๋œ task ์˜ˆ์‹œ๋ฅผ ํฌํ•จํ•˜๋Š” prompt์™€ ํ•จ๊ป˜ ์ ์ ˆํ•œ ๊ธฐ์กด ํˆด์„ ์ธ์‹ํ•˜์˜€๋‹ค. ์•Œ๋งž์€ ํˆด์„ ๊ฒฐ์ •ํ•˜๋Š” ์ •ํ™•๋„๋ฅผ ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ $94% \pm 2%$์˜ ์ •ํ™•๋„๋ฅผ ๋ณด์—ฌ์คฌ๋‹ค.

 

Requesting tool-making.  ๊ทธ๋‹ค์Œ์— unseen task์˜ instance๋ฅผ ์œ„ํ•œ tool-making์„ ์š”์ฒญํ•˜๊ธฐ ์œ„ํ•œ dispatcher์˜ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. test set์˜ ๊ฐ instance์— ๋Œ€ํ•ด dispatcher์„ ์‚ฌ์šฉํ•ด์„œ tool-making์„ ์š”์ฒญํ•ด์•ผ ํ•  ํ•„์š”๊ฐ€ ์žˆ๋Š”์ง€ ๋˜๋Š” instance๊ฐ€ ๊ธฐ์กด ํˆด์— ์˜ํ•ด ํ•ด๊ฒฐ๋  ์ˆ˜ ์žˆ๋Š”์ง€ ๊ฒฐ์ •ํ•œ๋‹ค. ์•Œ๋งž์€ ์š”์ฒญ์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ์ •ํ™•๋„๋Š” $95% \pm 4%$์˜€๋‹ค.

 

 ๊ฒฐ๊ณผ๋Š” dispatcher๊ฐ€ unseen task์— ๋Œ€ํ•ด ํšจ๊ณผ์ ์œผ๋กœ ๊ธฐ์กด ํˆด์„ ํŒ๋ณ„ํ•˜๊ณ  tool-making์„ ์š”์ฒญํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์คฌ๋‹ค. ์ด๊ฒƒ์€ LATM์ด task์˜ ๋ฌถ์Œ๊ณผ ํ•จ๊ป˜ streaming ์„ธํŒ…์œผ๋กœ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์—ฐ์žฅ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•œ๋‹ค.

 

ํ‘œ 3. GPT-4 vs GPT-3.5 Turbo๋ฅผ ์‚ฌ์šฉํ•œ tool-making ์—์„œ ์ƒˆ๋กœ์šด tool ์ƒ์„ฑ์˜ ์„ฑ๊ณต๋ฅ 

 

 

 

 

์ถœ์ฒ˜

https://arxiv.org/abs/2305.17126

 

Large Language Models as Tool Makers

Recent research shows the potential of enhancing the problem-solving ability of large language models (LLMs) through the use of external tools. However, prior work along this line depends on the availability of existing tools. In this work, we take an init

arxiv.org