Paper Reading ๐Ÿ“œ/Alignment Problem of LLM

Super-Natural Instructions: Generalization via Declarative Instructions on 1600+ NLP Tasks ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

2023. 5. 9. 14:07

The overview of this paper

 ์–ด๋–ป๊ฒŒ NLP ๋ชจ๋ธ๋“ค์€ task instruction์ด ์ฃผ์–ด์งˆ ๋•Œ ๋‹ค์–‘ํ•œ unseen task์— ๋Œ€ํ•ด์„œ ๊ทธ๋ ‡๊ฒŒ ์ž˜ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ์„๊นŒ? ์ด ์งˆ๋ฌธ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” 1,616๊ฐœ์˜ ๋‹ค์–‘ํ•œ NLP task์˜ ๋ฒค์น˜๋งˆํฌ์™€ ์ด๋“ค์˜ ์ „๋ฌธ๊ฐ€๊ฐ€ ์ž‘์„ฑํ•œ instruction์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” Super-Natural Instructions๋ฅผ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ์ด ํฌ๊ณ  ๋‹ค์–‘ํ•œ task์˜ ๋ชจ์Œ์€ instruction ํ•˜์—์„œ cross-task ์ผ๋ฐ˜ํ™”์˜ ์ฒ ์ €ํ•œ ๋ฒค์น˜๋งˆํฌ๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค - ๋ชจ๋ธ์ด task์˜ ์„œ๋ธŒ์…‹์—์„œ instruction์„ ๋”ฐ๋ฅด๋„๋ก ํ•™์Šต์‹œํ‚ค๊ณ  ๋‚จ์•„ ์žˆ๋Š” unseen task์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ํ•˜๋„๋ก ํ•˜์˜€๋‹ค.

 

 ๊ฒŒ๋‹ค๊ฐ€ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๋ฌธ๋งฅ instruction์„ ๋”ฐ๋ฅด๋„๋ก ํ•™์Šต๋˜๋Š” transformer ๋ชจ๋ธ์ธ Tk-Instruct๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. ๋…ผ๋ฌธ์˜ ์‹คํ—˜์„ ํ†ตํ•ด Tk-Instruct๊ฐ€ ๊ธฐ์กด instruction-following model(GPT-3) ๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์คฌ๋‹ค.

 

 

Table of Contents

1. Introduction

2. Super-Natural Instructions

3. Tk-Instruct: Learning to Follow Instructions at Scale

4. Benchmarking Cross-Task Generalization with Sup-NatInst

5. Experimental Results

6. Further Analysis

 

 

1. Introduction

 NLP ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ LLM์€ unseen task์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ๋ถ€๋ถ„์—์„œ ํฐ ์„ฑ๊ณต์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ, InstructGPT ๊ฐ™์€ ๋ชจ๋ธ์ด ์ฃผ๋ชฉํ•  ๋งŒํผ ๋‹ค์–‘ํ•œ ์„ค๊ณ„ ์„ ํƒ์ด ์„ฑ๊ณต์— ๊ธฐ์—ฌํ•˜๋Š” ๋ฐ”๋Š” ๋ถˆํˆฌ๋ช…ํ•˜๋‹ค. ํŠนํžˆ, ์ฃผ์š” ๋ชจ๋ธ๋“ค์—์„œ ๊ณต๊ฐœํ•œ ํ•œ์ •๋œ ๋ฐ์ดํ„ฐ ๋•Œ๋ฌธ์— supervised data์˜ ์—ญํ• ์€ ํ•ญ์ƒ understudy์— ๋จธ๋ฌผ๋Ÿฌ ์žˆ์—ˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ์—ฐ๊ตฌ์  ์ธก๋ฉด์—์„œ ์ด๋ ‡๊ฒŒ ๊ฑฐ๋Œ€ํ•œ ๋ชจ๋ธ์„ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ๋‹ค์‹œ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ถˆ๊ฐ€๋Šฅ์— ๊ฐ€๊น๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ด‘๋ฒ”์œ„ํ•œ NLP task์˜ ํฐ ๊ทœ๋ชจ์˜ ๊ณต๊ณต ๋ฒค์น˜๋งˆํฌ์˜ ๊ฐ€์šฉ์„ฑ๊ณผ unseen task์— ๋Œ€ํ•ด์„œ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์˜ ๋ฐœ์ „๊ณผ ํ‰๊ฐ€๋ฅผ ์šฉ์ดํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•œ instruction์ด ํ•„์ˆ˜์ ์ด๋‹ค.

 

 ์ด ๋…ผ๋ฌธ์—์„œ๋Š” instruction์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ด‘๋ฒ”์œ„ํ•œ NLP task๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” meta-dataset์„ ๋งŒ๋“ค์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  instruction์ด ์ฃผ์–ด์ง€๋ฉด ์ƒˆ๋กœ์šด task๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผฐ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด ๋ชจ๋ธ์€ InstructGPT๋ฅผ ๋Šฅ๊ฐ€ํ•˜์˜€๋‹ค. (16๋ฐฐ ๋” ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋ง์ด๋‹ค!)

 

๊ทธ๋ฆผ 1. Sup-NatInst์˜ ์˜ˆ์‹œ task

 

 ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹์ธ Super-Natural Instructions๋Š” 1,616๊ฐœ์˜ NLP task์™€ ์ด๋“ค์˜ instruction์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์—๋Š” 76๊ฐœ์˜ task ์œ ํ˜•๊ณผ 55๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์–ธ์–ด๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ๊ฐ๊ฐ์˜ task๋Š” ์ž…๋ ฅ ํ…์ŠคํŠธ๋ฅผ task ์ถœ๋ ฅ๊ณผ ์˜๋„๋˜๊ฑฐ๋‚˜ ์˜๋„๋˜์ง€ ์•Š์€ ์ถœ๋ ฅ์„ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ์˜ˆ์‹œ๋ฅผ ๋งคํ•‘ํ•˜๊ธฐ ์œ„ํ•œ task ์ •์˜๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” instruction๊ณผ ์Œ์„ ์ด๋ฃจ๊ณ  ์žˆ๋‹ค. (๊ทธ๋ฆผ 1์—์„œ ์˜ˆ์‹œ task๋ฅผ ํ™•์ธ) task์™€ ์ด๋“ค์˜ intsruction์€ 88๋ช…์˜ NLP๋ถ„์•ผ์— ํ˜„์—ญ์œผ๋กœ ๊ณ„์‹œ๋Š” ๋ถ„๋“ค์˜ ๋„์›€์„ ๋ฐ›์•„ ๋งŒ๋“ค์–ด์กŒ๋‹ค. ๋‹ค์–‘ํ•˜๊ณ  ํฐ ๊ทœ๋ชจ์˜ ๋ฐ์ดํ„ฐ๋Š” task๋ฅผ ์‹ ์ค‘ํ•˜๊ฒŒ train & test ์„ธํŠธ๋กœ ๋ถ„ํ• ์‹œํ‚ค๊ณ  SoTA ์„ธํŠธ๊ฐ€ ์ด๋“ค์—์„œ ์–ด๋–ป๊ฒŒ ์ˆ˜ํ–‰๋˜๋Š”์ง€ ์—ฐ๊ตฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋„์™€์ค€๋‹ค. ํ‘œ 1๊ณผ ๊ทธ๋ฆผ 2๋Š” ๋…ผ๋ฌธ์˜ ๋ฒค์น˜๋งˆํฌ์˜ task ๋‹ค์–‘์„ฑ๊ณผ instruction ์œ ํ˜•์„ ๊ฐ•์กฐํ•˜์˜€๋‹ค.

 

ํ‘œ 1. Sup-NatInst์™€ ๋‹ค๋ฅธ ์œ ๋ช…ํ•œ ๋ฐ์ดํ„ฐ์…‹๋“ค ๊ฐ„์˜ ๋น„๊ต

 

๊ทธ๋ฆผ 2. ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์…‹๋“ค๊ณผ์˜ ๋น„๊ต. Sup-NatInst๋Š” ๋”์šฑ ๋‹ค์–‘ํ•œ task ์œ ํ˜•์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

 

๋…ผ๋ฌธ์˜ ๋ชจ๋ธ์ธ Tk-Instruct๋Š” ์ƒํ™ฉ์— ๋งž๋Š” ์„œ์ˆ ๋ฌธ instruction์ด ์ฃผ์–ด์ง„ task ์ž…๋ ฅ์„ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•œ ์ƒ์„ฑ ๋ชจ๋ธ์ด๋‹ค. training set์—์„œ์˜ ๋ชจ๋“  task instruction์— ๋Œ€ํ•œ T5 ๋ชจ๋ธ์˜ multi-tasl training์— ์˜ํ•ด ๋งŒ๋“ค์–ด์ง€๊ณ , test set์˜ unseen task์—์„œ ํ‰๊ฐ€๋œ๋‹ค. ์‹ ๊ธฐํ•˜๊ฒŒ๋„ 11B Tk-Instruct๋Š” 175B InstructGPT๋ฅผ ์˜์–ด๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ์–ธ์–ด ๋ถ€๋ถ„์—์„œ ๋Šฅ๊ฐ€ํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์คฌ๋‹ค.

 

 

2. Super-Natural Instructions

 Super-Natrual Instructions๋Š” ๋‹ค์–‘ํ•œ NLP task์™€ task๋ฅผ ์ˆœ์ˆ˜ ์–ธ์–ด๋กœ ์„ค๋ช…ํ•˜๋Š” instruction๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” meta-dataset ์ด๋‹ค.

 

Instruction schema.  ๋ชจ๋“  task instruction์€ ๋‹ค์Œ์˜ ํŒŒํŠธ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” ๋˜‘๊ฐ™์€ ๊ท ์ผํ•œ schema๋ฅผ ๋”ฐ๋ฅธ๋‹ค. 

 

  • DEFINITION: ์ฃผ์–ด์ง„ task๋ฅผ ์ž์—ฐ์–ด๋กœ ์ •์˜ํ•˜๊ณ  ์ž…๋ ฅ ํ…์ŠคํŠธ๊ฐ€ ์–ด๋–ป๊ฒŒ ์ถœ๋ ฅ ํ…์ŠคํŠธ๋กœ ๋งคํ•‘๋˜๋Š”์ง€ ์ •์˜
  • Positive Examples: ์ž…๋ ฅ์˜ ์ƒ˜ํ”Œ & ์ด๋“ค์˜ ์•Œ๋งž์€ ์ถœ๋ ฅ. ๊ฐ๊ฐ์— ๋Œ€ํ•œ ์งง์€ ์„ค๋ช…์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Œ.
  • Negative Examples: ์ž…๋ ฅ์˜ ์ƒ˜ํ”Œ & ์ด๋“ค์˜ ํ‹€๋ฆฐ ์ถœ๋ ฅ. ๊ฐ๊ฐ์— ๋Œ€ํ•œ ์งง์€ ์„ค๋ช…์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Œ.

 

Task instances.  ๊ฐ๊ฐ์˜ task์— ๋Œ€ํ•œ instruction์ด ์ฃผ์–ด์ง€๋ฉด, ๋ชจ๋ธ์€ task์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ํ•ด๊ฒฐํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ธก๋œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋“  task์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์กฐ์งํ•˜๊ธฐ ์œ„ํ•ด ํ†ตํ•ฉ ํ˜•์‹์„ ์‚ฌ์šฉํ•œ๋‹ค. ๋”์šฑ ์ •ํ™•ํ•˜๊ฒŒ, ๊ฐ๊ฐ์˜ ์ธ์Šคํ„ด์Šค๋Š” ํ…์ŠคํŠธ ์ž…๋ ฅ๊ณผ ๋ฐ›์•„๋“ค์—ฌ์งˆ ๋งŒํ•œ ํ…์ŠคํŠธ ์ถœ๋ ฅ์˜ ๋ฆฌ์ŠคํŠธ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” task ๊ฐ„์˜ ๋ฐธ๋Ÿฐ์Šค๋ฅผ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ๊ฐ task์—์„œ ์ธ์Šคํ„ด์Šค์˜ ์ˆ˜๋ฅผ 6.5K๋กœ ์ œํ•œํ•˜์˜€๋‹ค.

 

Diversity of tasks.  Super-Natural Instructions๋ฅผ ์œ„ํ•œ task๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ๊ฒƒ์€ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ดํ•ด task, domain, language๋ฅผ ์ปค๋ฒ„ํ•˜๊ธฐ ์œ„ํ•ด ์„ธ๋ฐ€ํ•˜๊ฒŒ ๊ฐ๋…๋˜์—ˆ๋‹ค. ์ด ๋‹ค์–‘์„ฑ์„ ๋”์šฑ ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด, ๋…ผ๋ฌธ์—์„œ๋Š” 3๊ฐœ์˜ ๋‹ค๋ฅธ ์ฐจ์›๊ณผ ํ•จ๊ป˜ task๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ์นดํ…Œ๊ณ ๋ฆฌํ™”ํ•˜์˜€๋‹ค:

 

  • Task Type: ์ธ์Šคํ„ด์Šค ์ž…๋ ฅ์œผ๋กœ๋ถ€ํ„ฐ ์ถœ๋ ฅ์œผ๋กœ์˜ ๋งคํ•‘ ํ™˜๊ฒฝ
  • Language: ์ธ์Šคํ„ด์Šค์˜ ์–ธ์–ด๋ฅผ ๋‚˜ํƒ€๋ƒ„
  • Domain: task์˜ ํ…์ŠคํŠธ๊ฐ€ ์†ํ•ด ์žˆ๋Š” ๋„๋ฉ”์ธ์„ ๋‚˜ํƒ€๋ƒ„

 

 ์ด๋Ÿฌํ•œ ์นดํ…Œ๊ณ ๋ฆฌํ™”์˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜ํ™”์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์„ผ์Šค๋ฅผ ์—ฐ๊ตฌํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.

 

Statistics.  ํ‘œ 2๋Š” ๋ฒค์น˜๋งˆํฌ์— ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ํ†ต๊ณ„๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ์ข…ํ•ฉ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹์€ 1,616๊ฐœ์˜ task์™€ 5M์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ํ‰๊ท ์ ์œผ๋กœ ๊ฐ๊ฐ์˜ instruction์€ 2.8๊ฐœ์˜ positive example๊ณผ 2.4๊ฐœ์˜ negative example๊ณผ ์Œ์„ ์ด๋ฃฌ๋‹ค. ํ‰๊ท ์ ์ธ ์ •์˜ ๋ฌธ์žฅ์˜ ๊ธธ์ด๋Š” 56.6 ๋‹จ์–ด์ด๋‹ค.

 

ํ‘œ 2. Sup-NatInst์˜ ํ†ต๊ณ„

 

 

3. Tk-Instruct: Learning to Follow Instruction at Scale

Defining Generalization to Unseen Tasks.  ๊ฐ๊ฐ์˜ task $t$๋Š” ์ด task์˜ ์ž์—ฐ์–ด instruction $I_{t}$๋ฅผ ํ†ตํ•ด ์ •์˜๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ๊ฐ์˜ task๋Š” ์ž…๋ ฅ/์ถœ๋ ฅ ์ธ์Šคํ„ด์Šค ์„ธํŠธ $(X_{t}, Y_{t})$๋ฅผ ๊ฐ€์ง„๋‹ค. ๋ชจ๋ธ $M$์€ ์ž…๋ ฅ $x$์™€ task instruction $I_{t}$๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ์ถœ๋ ฅ $y$๋ฅผ ์ƒ์„ฑํ•œ๋‹ค: $M(I_{t}, x) = y, (x, y) \in (X_{t}, Y_{t})$. ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋ธ $M$์„ ๊ด€์ฐฐ๋˜์ง€ ์•Š์€ task์—์„œ ํ‰๊ฐ€ํ•œ๋‹ค. ์ถ”๋ก  ์‹œ์— task๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ์‹ ํ˜ธ์˜ ์œ ์ผํ•œ ์†Œ์Šค๋Š” task์˜ ์ •์˜์™€ ์„ค๋ช…์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” ๋ฌธ๋งฅ instruction $I_{t}$์ด๋‹ค.

 

Tk-Instruct.  ๋…ผ๋ฌธ์—์„œ๋Š” task์— ๋Œ€ํ•œ instruction์ด ์ฃผ์–ด์ง€๋ฉด task๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ Sup-NatInst์—์„œ meta-train ๋œ ๋ชจ๋ธ์ธ Tk-Instruct๋ฅผ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ์ด์ „์˜ ์—ฐ๊ตฌ๋“ค์—์„œ๋Š” ์ด๋Ÿฌํ•œ meta-training์ด ํšจ๊ณผ์ ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ฐํ˜”๋‹ค. Sup-NatInst์˜ ๊ด‘๋ฒ”์œ„ํ•œ task ๋•Œ๋ฌธ์— ์ด์ „๋ณด๋‹ค ๋” ํฐ ๊ทœ๋ชจ์—์„œ multi-task meta-training์„ ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์‹คํ—˜๊ณผ ๋ถ„์„์„ T5 ๋ชจ๋ธ์— ๊ธฐ๋ฐ˜ํ•ด์„œ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ๊ฐ๊ฐ์˜ instruction $I_{t}$๋Š” ์•ž์„œ instruction schema์—์„œ ์†Œ๊ฐœํ–ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ ๋‹ค์–‘ํ•œ ์š”์†Œ๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋Ÿฌํ•œ ์š”์†Œ๋“ค์„ ํ…์ŠคํŠธ ํ˜•์‹์œผ๋กœ ๋งคํ•‘ํ•ด์„œ ์ž…๋ ฅ ์ธ์Šคํ„ด์Šค ์ด์ „์— ์ถ”๊ฐ€ํ•˜์˜€๋‹ค. ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ instruction ์š”์†Œ๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. 

 

 

4. Benchmarking Cross-Task Generalization with Sup-NatInst

4-1. Evaluation Setup

 

An Evaluation Split of Unseen Tasks.  ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฑฐ๋Œ€ํ•œ task ๋ชจ์Œ์ธ Sup-NatInst๋ฅผ ๋‘ ๊ฐœ์˜ ์„œ๋ธŒ์…‹์œผ๋กœ ๋‚˜๋ˆด๋‹ค: ํ•˜๋‚˜๋Š” ํ‰๊ฐ€, ํ•˜๋‚˜๋Š” supervision. ํ‰๊ฐ€ task์— ๋Œ€ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ˆ˜๋™์œผ๋กœ ์„ ํƒ๋œ 154๊ฐœ์˜ task๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” 12๊ฐœ์˜ ์นดํ…Œ๊ณ ๋ฆฌ ๋ชจ์Œ์„ ์„ ํƒํ•˜์˜€๋‹ค. Sup-NatInst์˜ ํฌ๊ณ  ๋‹ค์–‘ํ•œ task๋Š” ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ๋‹ค์–‘ํ•œ task ์„ธํŠธ๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์คฌ๋‹ค.

 

Divided Tracks for English and X-lingual Tasks.  Sup-NatInst๋Š” ๋‹ค์–‘ํ•œ ์–ธ์–ด์— ๋Œ€ํ•œ task๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ์ด๋Š” ๋ชจ๋ธ์˜ unseen task์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ์˜์–ด๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค๋ฅธ ์–ธ์–ด์— ๋Œ€ํ•ด์„œ๋„ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์ค€๋‹ค. ๊ทธ๋ž˜์„œ ๋…ผ๋ฌธ์—์„œ๋Š” ํ‰๊ฐ€ task๋ฅผ ๋‘ ๊ฐœ์˜ ํŠธ๋ž™์„ ๋‚˜๋ˆ„์—ˆ๋‹ค: English-only cross-task generalization(119 tasks) & cross-lingual cross-task generalization(35 tasks).

 

Evaluation Metrics.  task์˜ ๋‹ค์–‘์„ฑ๊ณผ open-ended ์ƒ์„ฑ ํ™˜๊ฒฝ ๋•Œ๋ฌธ์—, ๋…ผ๋ฌธ์—์„œ๋Š” ์ข…ํ•ฉ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋กํ•˜๊ธฐ ์œ„ํ•ด ROGUE-L์„ ์ฑ„ํƒํ•˜์˜€๋‹ค. ์ด๊ฒƒ์€ ๊ด‘๋ฒ”์œ„ํ•œ ํ…์ŠคํŠธ ์ƒ์„ฑ ์ž‘์—…์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์†Œํ”„ํŠธ ๋ฌธ์ž์—ด ์ค‘์ฒฉ ๋ฉ”ํŠธ๋ฆญ์ด๋‹ค.

 

4-2. Baselines and Existing Models

 

Heuristic baselines.  ๋ฐ์ดํ„ฐ์—์„œ ๊ฐ€๋Šฅํ•œ ์ˆ์ปท์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์Œ์˜ ํœด๋ฆฌ์Šคํ‹ฑ์„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. 

 

  • Copying Demo Output: ๋žœ๋ค ์„ค๋ช… ์˜ˆ์‹œ์˜ ์ถœ๋ ฅ์„ ์นดํ”ผํ•œ๋‹ค. test task๋ฅผ ์œ„ํ•œ ๋ผ๋ฒจ์˜ ๋ฐธ๋Ÿฐ์Šค๋ฅผ ๋งž์ถ”๊ธฐ ๋•Œ๋ฌธ์—, ์ด baseline์˜ ์„ฑ๋Šฅ์€ random guess์™€ ๋น„์Šทํ•˜๊ฑฐ๋‚˜ ๋ถ„๋ฅ˜ task๋ฅผ ์œ„ํ•œ ์ฃผ์š” baseline๊ณผ ๋น„์Šทํ•  ๊ฒƒ์ด๋‹ค.
  • Copying Instance Input: ์ฃผ์–ด์ง„ ์ธ์Šคํ„ด์Šค ์ž…๋ ฅ์„ ์นดํ”ผํ•œ๋‹ค. ์ด ์ „๋žต์€ ํƒ€๊นƒ ์ถœ๋ ฅ์ด ์ž…๋ ฅ๊ณผ ์ƒ๋‹นํžˆ ์˜ค๋ฒ„๋žฉ๋˜๋Š” task์—์„œ ์ž˜ ์ž‘๋™ํ•œ๋‹ค. 

 

Off-the-shelf pre-trained language models.  ๋…ผ๋ฌธ์—์„œ๋Š” instruction-specified ๋ฐ์ดํ„ฐ๋กœ fine-tune ๋˜์ง€ ์•Š์€ ๊ธฐ์กด์˜ LM์„ ํ‰๊ฐ€ํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” Tk-Instruct์˜ ์ƒ๋Œ€๋กœ 11B T5๋ฅผ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. T5์˜ ๋นˆ ๊ณต๊ฐ„ ์ฑ„์šฐ๊ธฐ pre-training objective ๋•Œ๋ฌธ์— ํ…์ŠคํŠธ์— ์ž˜ ์ง„ํ–‰๋˜์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋ž˜์„œ ๋…ผ๋ฌธ์—์„œ๋Š” T5์˜ LM-adapted ๋ฒ„์ „์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด ๋ฒ„์ „์€ language modeling objective๋กœ ์ถ”๊ฐ€์ ์œผ๋กœ ํ•™์Šต๋˜์—ˆ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ๋…ผ๋ฌธ์—์„œ๋Š” 175B GPT-3๋„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค.

 

Instruction-tuned models.  ๋…ผ๋ฌธ์—์„œ๋Š” Tk-Instruct์™€ language instruction์„ ๋”ฐ๋ผ์„œ fine-tune ๋œ ๊ธฐ์กด์˜ ๋ชจ๋ธ๋“ค๊ณผ ๋น„๊ตํ•˜์˜€๋‹ค: InstructGPT & T0.

 

Upper bound estimates.  ๋…ผ๋ฌธ์—์„œ๋Š” oracle ๋ชจ๋ธ์„ task labeled ์ธ์Šคํ„ด์Šค์—์„œ fine-tune ํ•จ์œผ๋กœ์จ unseen task์— ๋Œ€ํ•ด์„œ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™”์— ๋Œ€ํ•œ ์ƒํ•œ์„ ์ธก์ •ํ•˜์˜€๋‹ค. ์ด ๋ชจ๋ธ์€ ํ‰๊ฐ€ task์˜ ์ˆจ๊ฒจ์ง„ ์ธ์Šคํ„ด์Šค๋ฅผ ๊ด€์ฐฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ •์˜์— ๋”ฐ๋ผ ์ผ๋ฐ˜ํ™” ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ถ”์ • ์ƒํ•œ์„ ์ด๋‹ค.

 

 

5. Experimental Results

5-1. Overall Results

 

 ํ‘œ 3์€ ์ „๋ฐ˜์ ์ธ ๋ฒค์น˜๋งˆํ‚น ๊ฒฐ๊ณผ๋ฅผ ์š”์•ฝํ•˜๊ณ  ์žˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋“  method์— ๋Œ€ํ•œ ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ instruction ์š”์†Œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋˜‘๊ฐ™์€ ์ž…๋ ฅ ์ธ์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์„œ๋กœ ๋‹ค๋ฅธ task์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ๋”์šฑ ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด, ๋…ผ๋ฌธ์—์„œ๋Š” task ์นดํ…Œ๊ณ ๋ฆฌ์— ๋”ฐ๋ผ์„œ ์„ฑ๋Šฅ์„ ๋ถ„ํ•ดํ•˜์˜€๋‹ค (๊ทธ๋ฆผ 4).

 

ํ‘œ 3. Sup-NatInst์˜ ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ unseen task์—์„œ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฉ”์†Œ๋“œ์˜ ์ „๋ฐ˜์ ์ธ ์„ฑ๋Šฅ

 

๊ทธ๋ฆผ 3. ํ‰๊ฐ€ task ์œ ํ˜• ๋ณ„ ์„ฑ๋Šฅ

 

Instruction-tuning์€ unseen task์— ๋Œ€ํ•ด ๊ฐ•๋ ฅํ•œ ์ผ๋ฐ˜ํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด ์คŒ.  instruction-tuned ๋ชจ๋ธ์€ untuned LM & heuristic baseline๊ณผ ๋น„๊ตํ•ด์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด๊ฒƒ์€ ๋ชจ๋ธ์ด instruction ๋ฐ์ดํ„ฐ์—์„œ ํ•™์Šตํ•จ์œผ๋กœ์จ isntruction์„ ๋”ฐ๋ฅด๊ธฐ ์œ„ํ•ด ํ•™์Šตํ•จ์œผ๋กœ์จ unseen task์— ๋Œ€ํ•œ ์ƒˆ instruction์— ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. T0์€ ์˜ˆ์™ธ์ ์œผ๋กœ T5-LM๋ณด๋‹ค ์‚ด์ง ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š”๋ฐ, ์ด๋Š” T0์˜ training data๊ฐ€ ๋…ผ๋ฌธ์˜ instruction ์Šคํƒ€์ผ๊ณผ ๋งค์šฐ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ถ”์ธกํ•œ๋‹ค.

 

Tk-Instruct๊ฐ€ InstructGPT๋ฅผ ๋Šฅ๊ฐ€ํ•จ.  ๋…ผ๋ฌธ์˜ Tk-Instruct๋Š” InstructGPT๋ณด๋‹ค ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค. ์‹ฌ์ง€์–ด English์™€ non-English ๋ชจ๋‘์—์„œ ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ InstructGPT์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋Š” ๊ณต๊ฐœ๋˜์–ด ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ํ‰๊ฐ€ task์™€ ๊ฒน์น˜๋Š”์ง€ ์—ฌ๋ถ€๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š๋‹ค๋Š” ์ ์— ์œ ์˜ํ•˜๊ณ  ์‹ถ๋‹ค.

 

๊ฐœ์„ ์— ๋Œ€ํ•œ ์ƒ๋‹นํ•œ ๊ฐญ์ด ์žˆ์Œ.  ํ˜„์žฌ ๋ชจ๋ธ์˜ ์ธ์ƒ์ ์ธ ์„ฑ๋Šฅ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , instruction-based ๋ชจ๋ธ๊ณผ supervised training ๋ฐฉ์‹ ๊ฐ„์—๋Š” ์•„์ง ์ƒ๋‹นํ•œ ๊ฐญ์ด ์žˆ๋‹ค.

 

5-2. Human Evaluation

 

 ์–ธ์–ด ์ƒ์„ฑ task์— ๋Œ€ํ•ด ์ž๋™์  ๋ฉ”ํŠธ๋ฆญ์€ ์‚ฌ๋žŒ ํ‰๊ฐ€์˜ ๊ทผ์‚ฌ์น˜์ผ ๋ฟ์ด๋‹ค. ๊ทธ๋ž˜์„œ ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ๋žŒ ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ๋กœ ๋‚˜์˜จ ์‚ฌ๋žŒ ํ‰๊ฐ€ ์ง€ํ‘œ๋Š” ๋ชจ๋ธ ์˜ˆ์ธก์ด ์ ์–ด๋„ ์ •๋‹ต ๋ผ๋ฒจ๋งŒํผ ์ข‹์€ ๊ฒƒ์œผ๋กœ ํ‰๊ฐ€๋œ ๋นˆ๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ด ์ง€ํ‘œ์˜ ์ด๋ก ์ƒ ์ƒํ•œ์€ ๋ชจ๋ธ์ด ๋ชจ๋“  ์ธ์Šคํ„ด์Šค์— ๋Œ€ํ•ด ์ ์–ด๋„ ์‹ค์ธก๋งŒํผ ์–‘ํ˜ธํ•˜๋‹ค๊ณ  ํ‰๊ฐ€๋  ๋•Œ 100%์ด๋‹ค. ์‚ฌ๋žŒ ํ‰๊ฐ€ ๊ฒฐ๊ณผ(๊ทธ๋ฆผ 3)๋Š” ์ž๋™ ๋ฉ”ํŠธ๋ฆญ๊ณผ ๋งค์šฐ ์ž˜ ์ผ์น˜ํ•˜๋ฉฐ ์ธ๊ฐ„์ด ์ธ์‹ํ•˜๋Š” ๋ชจ๋ธ์˜ ํ’ˆ์งˆ์„ ํ™•์ธํ•œ๋‹ค.

 

๊ทธ๋ฆผ 4. ์‚ฌ๋žŒ ํ‰๊ฐ€ vs. ROGUE-L

 

 

6. Further Analysis

6-1. Scaling Trends of Generalization

 

 ๋…ผ๋ฌธ์—์„œ๋Š” 3๊ฐœ์˜ scaling factor(training task์˜ ์ˆ˜, task ๋‹น ์ธ์Šคํ„ด์Šค์˜ ์ˆ˜, ๋ชจ๋ธ์˜ ํฌ๊ธฐ)์— ๊ด€ํ•œ Tk-Instruct์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์—ฐ๊ตฌํ•˜์˜€๋‹ค. ๊ทธ๋ฆผ 5๋Š” ๊ฐ๊ฐ์˜ scaling์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™”๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

 

๊ทธ๋ฆผ 5. ๋ชจ๋ธ ์„ฑ๋Šฅ์˜ scaling trend. a: training task์˜ ์ˆ˜ / b: training task ๋‹น ์ธ์Šคํ„ด์Šค์˜ ์ˆ˜ / c: model size

 

๊ด€์ฐฐ๋œ task๊ฐ€ ๋งŽ์„์ˆ˜๋ก ์ผ๋ฐ˜ํ™”๊ฐ€ ํ–ฅ์ƒ๋จ.  ๋…ผ๋ฌธ์—์„œ๋Š” Tk-Instruct๋ฅผ ์ „์ฒด training set๋กœ๋ถ€ํ„ฐ ๋žœ๋ค ํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋ง๋œ ์„œ๋กœ ๋‹ค๋ฅธ ์ˆ˜์˜ task๋กœ Tk-Instruct๋ฅผ fine-tune ํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ training์— ์‚ฌ์šฉ๋˜๋Š” task์˜ ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚จ์— ๋”ฐ๋ผ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์€ log ์„ ํ˜•์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜์˜€๋‹ค.

 

๋งŽ์€ ์ˆ˜์˜ training ์ธ์Šคํ„ด์Šค๋Š” ์ผ๋ฐ˜ํ™”๋ฅผ ๋„์™€์ฃผ์ง€ ์•Š์Œ ๐Ÿ›‘.  fine-tuning์— ์‚ฌ์šฉ๋˜๋Š” task ๋‹น ์ธ์Šคํ„ด์Šค์˜ ์ˆ˜๋ฅผ ๋‹ค์–‘ํ•˜๊ฒŒ ํ•ด ๋ดค๋‹ค. supervised learning์—์„œ ๊น”๋ ค์žˆ๋˜ ๋ฒ ์ด์Šค๋Š” ๋” ๋งŽ์€ training ์ธ์Šคํ„ด์Šค๋Š” ๋Œ€๊ฒŒ ๋„์›€์ด ๋œ๋‹ค๋Š” ๊ฒƒ์ด์—ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋…ผ๋ฌธ์˜ ์…‹์—…์—์„œ๋Š” task ๋‹น 64๊ฐœ์˜ ์ธ์Šคํ„ด์Šค์— ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ˆ˜๋ ดํ•˜๊ธฐ ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์คฌ๋‹ค. ๋งŽ์€ ์ˆ˜์˜ training ์ธ์Šคํ„ด์Šค๋Š” ๊ธด ํ•™์Šต ์‹œ๊ฐ„๊ณผ ์˜ค๋ฒ„ํ”ผํŒ…์˜ ์œ„ํ—˜์„ ์ด๋Œ ๋ฟ์ด์—ˆ๋‹ค.

 

instruction์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฑฐ๋Œ€ ๋ชจ๋ธ์„ ํŠœ๋‹ํ•˜๋Š” ๊ฒƒ์€ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋Ž.  ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋ธ ๊ทœ๋ชจ์˜ ํšจ๊ณผ ํŒŒ์•…์„ ์œ„ํ•ด Tk-Instruct์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์—ฌ๋Ÿฌ ๊ฐ€์ง€(small, base, large, xl, xxl)๋กœ ํ•ด์„œ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. (๊ทธ๋ฆผ 5์˜ c) ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ํ‚ค์šฐ๋Š” ๊ฒƒ์ด ์ผ๊ด€์ ์œผ๋กœ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜จ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐํ˜€๋ƒˆ๋‹ค. ๊ทธ๋ฆผ 5์˜ a์™€ b๋ฅผ ํ•ฉ์น˜๋ฉด, ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ์™€ task์˜ ์‚ฌ์ด์ฆˆ ๊ฐ„์— ์ƒ๊ด€์„ฑ์„ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” training task์˜ ๋‹ค์–‘์„ฑ์„ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์ด ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ๋Š” ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์˜ ๋Œ€์•ˆ์ด ๋œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๋ ค์ค€๋‹ค.

 

6-2. Instructing with Different Elements

 

 ๋…ผ๋ฌธ์—์„œ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ instruction ์š”์†Œ ํ•˜์—์„œ Tk-Instruct์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค.

 

ํ‘œ 4. ๋‹ค์–‘ํ•œ ์ธ์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•ด์„œ ํ•™์Šต๋˜๊ณ  ํ‰๊ฐ€๋œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ

 

์„œ๋กœ ๋‹ค๋ฅธ instruction ์š”์†Œ์˜ ์ด์ .  ๊ทธ๋ฆผ 1์—์„œ ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ Sup-NatInst๋Š” task๋ฅผ instruct ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์š”์†Œ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์š”์†Œ๋“ค์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์กฐํ•ฉ์„ ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต์‹œ์ผฐ๋‹ค. ํ‘œ 4์˜ ๋Œ€๊ฐ์„  ์…€๋“ค์€ ํŠน์ • instruction ์ธ์ฝ”๋”ฉ์—์„œ ํ•™์Šต๋˜๊ณ  ํ‰๊ฐ€๋˜์—ˆ์„ ๋•Œ์˜ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋Œ€๊ฐ์„  ์ˆซ์ž๋“ค์— ๊ธฐ๋ฐ˜ํ•ด์„œ task ์ •์˜๋ฅผ ํฌํ•จํ•˜๋Š” ๊ฒƒ์€ ๋ชจ๋ธ์ด ๋”์šฑ ์ž˜ ์ผ๋ฐ˜ํ™”ํ•˜๋„๋ก ๋„์™€์ค€๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ task ์ •์˜๋ฅผ positive ์„ค๋ช… example๊ณผ ๋ฌถ๋Š” ๊ฒƒ์€ ์ถ”๊ฐ€์  ๊ฐœ์„ ์„ ๋ณด์—ฌ์ค€๋‹ค. ํ•˜์ง€๋งŒ ๋” ๋งŽ์€ ์„ค๋ช… example์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ๋ฌด์‹œํ•ด๋„ ๋  ์ •๋„์˜ ๊ฐœ์„ ์„ ๋ณด์—ฌ์ค€๋‹ค. negative example์€ ์‚ด์ง ๋„์›€ ๋˜์ง€๋งŒ, explanation์€ ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚จ๋‹ค.

 

์„œ๋กœ ๋‹ค๋ฅธ ์ž…๋ ฅ ์ธ์ฝ”๋”ฉ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™”.  ๋…ผ๋ฌธ์—์„œ๋Š” ํŠน์ • ์ธ์ฝ”๋”ฉ์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์ด ๋‹ค๋ฅธ ์ธ์ฝ”๋”ฉ์— ๋Œ€ํ•ด์„œ๋„ ์ž˜ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์กฐ์‚ฌํ•˜์˜€๋‹ค. ์ด๋Š” ํ‘œ 4์˜ ๋Œ€๊ฐ์„ ์— ์žˆ์ง€ ์•Š์€ ์…€๋“ค์„ ๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ด๊ณณ์˜ ๋ถ€์ •์ ์ธ ๊ฒฐ๊ณผ๋Š” definition-only ๋ชจ๋ธ์ด example-only test ์ธ์ฝ”๋”ฉ์—๋Š” ์ž˜ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์—†์Œ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด์™€ ์œ ์‚ฌํ•˜๊ฒŒ example-only ๋ชจ๋ธ์€ definition-only test ์ธ์ฝ”๋”ฉ์—๋Š” ์ž˜ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์—†์Œ์„ ๋ณด์—ฌ์ค€๋‹ค. ํ•˜์ง€๋งŒ, definition๊ณผ example์„ ๋ชจ๋‘ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ์ธ์ฝ”๋”ฉ์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์€ ์„œ๋กœ ๋‹ค๋ฅธ ์ธ์ฝ”๋”ฉ ๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ ๋†€๋ผ์šธ ์ •๋„๋กœ robust ํ•œ ๋ชจ์Šต์„ ๋ณด์—ฌ์ค€๋‹ค.

 

 

 

 

์ถœ์ฒ˜

https://arxiv.org/abs/2204.07705

 

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our

arxiv.org

 

'Paper Reading ๐Ÿ“œ > Alignment Problem of LLM' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Exploring the Benefits of Training Expert Language Models over Instruction Tuning ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (3) 2023.05.15
Scaling Instruction-Finetuned Language Models ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (0) 2023.05.12
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-shot Learners ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (1) 2023.05.11
FLAN: Fine-tuned Language Models are Zero-shot Learners ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (0) 2023.05.03
T0: Multitask Prompted Training Enables Zero-shot Task Generalization ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (1) 2023.05.02
'Paper Reading ๐Ÿ“œ/Alignment Problem of LLM' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • Scaling Instruction-Finetuned Language Models ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
  • Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-shot Learners ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
  • FLAN: Fine-tuned Language Models are Zero-shot Learners ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
  • T0: Multitask Prompted Training Enables Zero-shot Task Generalization ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
Cartinoe
Cartinoe
Welcome! I'm a student studying about deep learning(NLP) ๐Ÿ˜‰ The goal of my study is to develop a competent LLM helping people!
  • faviconinstagram
  • faviconfacebook
  • favicongithub
  • faviconLinkedIn
Cartinoe's paper review
Cartinoe
Cartinoe
Cartinoe's paper review
Cartinoe
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • My Posting (141)
    • Paper Reading ๐Ÿ“œ (113)
      • Natural Language Processing (67)
      • Alignment Problem of LLM (11)
      • Computer Vision (4)
      • Deep Learning (6)
      • multimodal models (17)
      • Mathematics(์„ ํ˜•๋Œ€์ˆ˜, ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„, ๋ฏธ.. (8)
    • Lecture ๐Ÿง‘โ€๐Ÿซ (16)
      • Hugging Face Course (1)
      • Coursera (15)
    • Insight ๐Ÿ˜Ž (10)
    • Research & Project ๐Ÿ”ฌ (2)

์ธ๊ธฐ ๊ธ€

์ตœ๊ทผ ๊ธ€

๊ณต์ง€์‚ฌํ•ญ

  • ๋ธ”๋กœ๊ทธ ๊ณต์ง€์‚ฌํ•ญ - ๋ชจ๋ฐ”์ผ ์ˆ˜์‹ ๊นจ์ง

ํƒœ๊ทธ

  • Chinchilla
  • context length
  • RLHF
  • LM
  • scaling law
  • Evaluation Metric
  • open-source model
  • Vicuna Evaluation
  • proprietary model
  • closed-source
  • context window
  • ChatGPT
  • Vicuna
  • MT-Bench
  • LLAMA2
  • transformer
  • GPT-4
  • Open-source
  • LLM
  • closed-source model
hELLO ยท Designed By ์ •์ƒ์šฐ.
Cartinoe
Super-Natural Instructions: Generalization via Declarative Instructions on 1600+ NLP Tasks ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๊ฐœ์ธ์ •๋ณด

  • ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ
  • ํฌ๋Ÿผ
  • ๋กœ๊ทธ์ธ

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.