Paper Reading ๐Ÿ“œ/Natural Language Processing

CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-tuning ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

Cartinoe 2023. 7. 20. 20:15

The overview of this paper

 ์–ด๋–ป๊ฒŒ <100B LM์— step-by-step reasoning์„ ์ฃผ์ž…์‹œํ‚ฌ ์ˆ˜ ์žˆ์„๊นŒ? ์ด ์งˆ๋ฌธ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” 1,060๊ฐœ์˜ task์— ๊ฑธ์ณ์„œ 1.88M ๊ฐœ์˜ CoT ์˜ˆ์‹œ๋ฅผ ๊ฐ–๋Š” ์ƒˆ๋กœ์šด instruction-tuning ๋ฐ์ดํ„ฐ์…‹์ธ CoT Collection์„ ์†Œ๊ฐœํ•˜์˜€๋‹ค.

 

 ๋…ผ๋ฌธ์—์„œ๋Š” FLAN-T5๋ฅผ CoT Collection๊ณผ ํ•จ๊ป˜ ๊ณ„์†์ ์œผ๋กœ fine-tuning ํ•˜๋Š” ๊ฒƒ์ด unseen task์—์„œ ๋” ๋‚˜์€ CoT๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค์–ด ์ค€๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์คฌ๋‹ค. 

 

 

Table of Contents

1. Introduction

2. The CoT Collection

3. Zero-shot Generalization

4. Few-shot Generalization

 

 

1. Introduction

 CoT๋Š” ๋ชจ๋ธ์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œ์ผœ ์ฃผ์ง€๋งŒ, ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ์ ์„ ๊ฐ€์ง„๋‹ค.

 

  1. 100B ์ด์ƒ์˜ LLM์„ ํ•„์š”๋กœ ํ•จ
  2. smaller model์—์„œ๋„ ๋˜‘๊ฐ™์€ ์ด์ ์„ ๋ณด์—ฌ์ฃผ๋Š”์ง€ ์ž…์ฆ๋˜์ง€ ์•Š์Œ

 

 ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ํ•œ ๋งŽ์€ ์‹œ๋„๋“ค์ด ์žˆ์—ˆ์ง€๋งŒ, ๊ด‘๋ฒ”์œ„ํ•œ task์— ๋Œ€ํ•œ CoT๊ฐ€ ์•„์ง ์ค€๋น„๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋Ÿฌํ•œ ๊ฐญ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” FLAN Collection์œผ๋กœ๋ถ€ํ„ฐ ์ถ”์ถœ๋œ 1,060๊ฐœ์˜ task์— ๊ฑธ์นœ 1.88M ๊ฐœ์˜ CoT ์˜ˆ์‹œ๋ฅผ ๊ฐ€์ง€๋Š” instruction-tuning ๋ฐ์ดํ„ฐ์…‹์ธ CoT Collection์„ ์ œ์•ˆํ•˜์˜€๋‹ค. 

 

 ์ด CoT Collection์—์„œ๋Š” ๊ฐ instance๊ฐ€ input instance์— ์ถ”๊ฐ€๋˜๋Š” instruction, ground-truth output, CoT ์˜ˆ์‹œ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. CoT fine-tuning์„ instruction tuning์— ์‚ฌ์šฉํ•จ์œผ๋กœ์จ smaller LM์—๊ฒŒ ๊ฐœ์„ ๋œ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ์ฃผ๊ณ ์ž ํ•˜์˜€๋‹ค. 

 

 ๋…ผ๋ฌธ์˜ ๋ชจ๋ธ์ธ C2F2๋Š” CoT Collection์„ ์‚ฌ์šฉํ•˜์—ฌ FLAN-T5๋ฅผ ๊ณ„์†์ ์œผ๋กœ fine-tune ํ•จ์œผ๋กœ์จ ์–ป์–ด์กŒ๋‹ค. ์ด ๋ชจ๋ธ์€ unseen task์— ๋Œ€ํ•ด CoT prompting์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ zero-shot ๋Šฅ๋ ฅ์— ์ƒ๋‹นํ•œ ๊ฐœ์„ ์„ ๋ณด์—ฌ์คฌ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, CoT Collection์„ ์‚ฌ์šฉํ•ด instruction-tuned ๋ชจ๋ธ์„ ๊ณ„์†์ ์œผ๋กœ fine-tune ํ•˜๋Š” ๊ฒƒ์€ ์ž‘์€ ๊ทœ๋ชจ์˜ ๋ฐ์ดํ„ฐ & multilingual ๋ฐ์ดํ„ฐ์—์„œ๋„ ํšจ๊ณผ์ ์ด์—ˆ๋‹ค. zero-shot ์„ฑ๋Šฅ์— ์ถ”๊ฐ€์ ์œผ๋กœ ํ–ฅ์ƒ์ด ๊ด€์ฐฐ๋จ์— ๋”ฐ๋ผ C2F2๋Š” few-shot learning์—์„œ ๋” ๋‚˜์€ base model๋กœ ์—ฌ๊ฒจ์ง„๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. C2F2๋Š” HuggingFace์— ๊ณต๊ฐœ๋˜์–ด ์žˆ๊ณ , ์•„๋ž˜ ์ถœ์ฒ˜์— ๋งํฌ๋ฅผ ๋‚จ๊ฒจ๋‘๋„๋ก ํ•˜๊ฒ ๋‹ค.

 

 C2F2์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” CoT fine-tuning๊ณผ instruction tuning ๊ฐ„์˜ ์‹œ๋„ˆ์ง€๊ฐ€ ์ž ์žฌ์ ์œผ๋กœ smaller model์˜ ๊ฐœ์„ ์„ ๋‚ณ์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์˜ ๊ฒฐ๊ณผ๋Š” zero-shot & few-shot learning ์„ธํŒ…์—์„œ smaller model์ด ์ด์ ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ํŠนํžˆ, zero-shot & few-shot learning์˜ ๋ฌธ๋งฅ์—์„œ ๋…ผ๋ฌธ์˜ ๋ถ„์„์€ ๊ธฐ์กด ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋ถ€ํ„ฐ ํ’๋ถ€ํ•œ supervision์„ ์ถ”์ถœํ•˜๋Š” ๊ฐ„๋‹จํ•œ ๋ ˆ์‹œํ”ผ์™€ CoT fine-tuning์„ ํ†ตํ•ด CoT ๋Šฅ๋ ฅ์„ ์•ผ๊ธฐํ•˜๋Š” ๊ฒƒ์€ ๊ธฐ์กด LM์„ ์ถ”๊ฐ€์ ์œผ๋กœ ๊ฐœ์„ ์‹œํ‚ฌ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

 

 

2. The CoT Collection

 CoT Collection์€ 1,060๊ฐœ์˜ NLP task์— ๊ฑธ์ณ์„œ ์–ป์–ด์ง„ 1.88M ๊ฐœ์˜ CoT ์˜ˆ์‹œ๋ฅผ ํฌํ•จํ•˜๋Š” instruction-tuning ๋ฐ์ดํ„ฐ์…‹์ด๋‹ค. CoT Collection์€ HuggingFace์— ๋ชจ๋‘ ๊ณต๊ฐœ๋˜์–ด ์žˆ์œผ๋‹ˆ ๊ด€์‹ฌ์ด ์žˆ๋‹ค๋ฉด ์ฐพ์•„๋ณด๊ธธ ๋ฐ”๋ž€๋‹ค(์•„๋ž˜ ์ถœ์ฒ˜๋ฅผ ํ™•์ธํ•˜๊ธธ ๋ฐ”๋žŒ). ๊ทธ๋ฆผ 1์€ CoT Collection์ด ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š”์ง€๋ฅผ ๋ฌ˜์‚ฌํ•˜๊ณ  ์žˆ๋‹ค. 

 

๊ทธ๋ฆผ 1.&nbsp;์ „๋ฐ˜์ ์ธ task group์˜ ๋ฌ˜์‚ฌ์™€ CoT Collection์—์„œ CoT ์˜ˆ์‹œ๋ฅผ augmentํ•˜๊ธฐ ์œ„ํ•ด ์–ป์–ด์ง„ instance์˜ ๋ฐ์ดํ„ฐ์…‹ ์†Œ์Šค

 

2-1. CoT Rationale Augmentation

 

 input $X = [I, z]$๊ฐ€ ์ฃผ์–ด์ง€๋ฉด LLM์„ ์‚ฌ์šฉํ•œ ICL์„ ์ ์šฉํ•จ์œผ๋กœ์จ CoT ์˜ˆ์‹œ $r$์„ ์–ป๊ฒŒ ๋œ๋‹ค. ์—ฌ๊ธฐ์„œ $I$๋Š” instruction์ด๊ณ , $z$๋Š” answer $y$๊ฐ€ ํ•จ๊ป˜ ์žˆ๋Š” instance์ด๋‹ค. ์ด๊ฒƒ์€ ์ด์ „์˜ LLM์„ ์‚ฌ์šฉํ•ด ์ƒˆ๋กœ์šด instance๋ฅผ ์ƒ์„ฑํ•˜๋Š”๋ฐ ์ฃผ๋กœ ์ง‘์ค‘ํ•˜๋Š” ์—ฐ๊ตฌ๋“ค(Self-Consistency, Unnatural-Instructions ๋“ฑ)๊ณผ ๋‹ค๋ฅด๋‹ค. 

 

Source Dataset Selection.  CoT ์˜ˆ์‹œ ์ถ”์ถœ์„ ์œ„ํ•œ ์†Œ์Šค ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ FLAN Collection, Super-NI, FLAN์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹๋“ค๋กœ๋ถ€ํ„ฐ 1,060๊ฐœ์˜ task๋ฅผ ์„ ํƒํ•˜๊ณ , ๋‹ค์Œ์˜ ๊ธฐ์ค€์„ ํ†ตํ•ด ์ขํ˜”๋‹ค:

 

  • multilingual dataset์€ ์ œ์™ธํ•จ. T5๊ฐ€ ์ฃผ๋กœ ์˜์–ด ๋ฐ์ดํ„ฐ์…‹์—๋งŒ ์ง‘์ค‘ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž„.
  • long-form output์„ ๊ฐ€์ง€๋Š” ์ƒ์„ฑ task์˜ ์„œ๋ธŒ์…‹์„ ์ œ์™ธํ•จ.
  • publicly availableํ•˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ์…‹์€ ์ œ์™ธํ•จ.
  • input & output์ด ์„œ๋กœ ์ƒ๊ด€ ์—†๋Š” ๋ฐ์ดํ„ฐ์…‹์€ ์ œ์™ธํ•จ.
  • data overlap์ด ์žˆ์œผ๋ฉด data overlap์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹ ์ค‘ ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉํ•˜๊ณ , ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ ์ œ์™ธํ•จ.
  • LLM์— ์˜ํ•ด ์ƒ์„ฑ๋œ CoT ์˜ˆ์‹œ๋Š” ๋ช‡ ๊ฐœ์˜ task(sentiment analysis, sentence completion, coreference resolution, word disambiguations)์—์„œ uninformative ํ•˜๊ณ , ๋งค์šฐ ์งง์€ ๊ฒฝํ–ฅ์ด ์žˆ์Œ. ๊ทธ๋ž˜์„œ ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ์…‹๋“ค์€ ์ œ์™ธํ•จ.

 

Prompt Creation.  LLM์„ ์‚ฌ์šฉํ•ด ICL์„ ์ง„ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ task ๋‹น demonstration์„ ์ค€๋น„ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์ง๊ด€์ ์ด๋‹ค. ํ•˜์ง€๋งŒ ๋…ผ๋ฌธ์—์„œ๋Š” ์ข€ ๋” ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ๋น„์Šทํ•œ task๋ผ๋ฆฌ ๊ทธ๋ฃน์œผ๋กœ ๋ฌถ์–ด์„œ ๊ทธ๋ฃน ๋‹น 6~8๊ฐœ์˜ demonstration์„ ๋งŒ๋“ค์—ˆ๋‹ค.

 

 ๊ตฌ์ฒด์ ์œผ๋กœ FLAN Collection์œผ๋กœ๋ถ€ํ„ฐ ์ƒ˜ํ”Œ๋ง๋œ ์—ฌ๋Ÿฌ instance๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ๋‘ ์‚ฌ๋žŒ์€ CoT ์˜ˆ์‹œ๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ๋‚˜๋จธ์ง€ ํ•œ ์‚ฌ๋žŒ์€ A/B testing์„ ์‹ค์‹œํ•˜์—ฌ ๋‘˜ ์ค‘์— ๋” ๋‚˜์€ CoT ์˜ˆ์‹œ๋ฅผ ์„ ํƒํ•œ๋‹ค. ์ด ํ”„๋กœ์„ธ์Šค๋ฅผ ํ†ตํ•ด 26๊ฐœ์˜ task์— ๊ฑธ์ณ์„œ ์ด 135๊ฐœ์˜ CoT ์˜ˆ์‹œ๋ฅผ ์ƒ์„ฑํ•˜์˜€๋‹ค.

 

CoT Rationale Augmentation.  augmentation process์˜ ์ฃผ๋œ ๋ชฉํ‘œ๋Š” ์ผ๊ด€๋˜๋Š” CoT ์˜ˆ์‹œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, ๋…ผ๋ฌธ์—์„œ๋Š” OpenAI์˜ Codex ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์˜€๋‹ค. ์ˆ˜์‹์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ๋•Œ, $(X_{i}^{t}, y_{i}^{t})$๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ๋ชฉํ‘œ๋Š” ํ•ด๋‹นํ•˜๋Š” CoT ์˜ˆ์‹œ $r_{i}^{t}$๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์‚ฌ์ „ ์‹คํ—˜ ์ค‘์— demonstration์—์„œ ๋ผ๋ฒจ์„ ์˜ˆ์‹œ ์•ž์— ๋‘๋Š” ๊ฒƒ์ด ์ข‹์€ ํ€„๋ฆฌํ‹ฐ์˜ ์˜ˆ์‹œ๋ฅผ ๋งŒ๋“œ๋Š”๋ฐ ์ค‘์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ ์ด์œ ๋Š” LLM์ด task๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•„์š”๋ฅผ ์ค„์—ฌ์ฃผ๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ถ”์ธกํ•˜์˜€๋‹ค. 

 

Filtering.  ์—ฌ๋Ÿฌ CoT ์˜ˆ์‹œ๋ฅผ ์ƒ์„ฑํ•œ ํ›„์—, high-quality CoT ์˜ˆ์‹œ๋ฅผ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ์˜ ๊ธฐ์ค€์„ ์ ์šฉํ•ด ํ•„ํ„ฐ๋ง์„ ์ ์šฉํ•˜์˜€๋‹ค.

 

  • ์ตœ์†Œ ํ•œ ๋ฒˆ๋„ ground-truth answer๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š๋Š” CoT ์˜ˆ์‹œ๋Š” ์ œ์™ธํ•จ.
  • 256 ์ด์ƒ์˜ ํ† ํฐ์„ ๊ฐ€์ง€๋Š” CoT ์˜ˆ์‹œ๋Š” ์ œ์™ธํ•จ. 
  • ์ด์ „์— ์–ป์€ CoT ์˜ˆ์‹œ์™€ ๋™์ผํ•œ CoT ์˜ˆ์‹œ๋Š” ์ œ์™ธํ•จ.

 

2-2. Analysis of CoT Collections

 

Quality of Rationales.  CoT Collection์˜ ํ€„๋ฆฌํ‹ฐ๋ฅผ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ROSCOE๋ฅผ ์‚ฌ์šฉํ•ด ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ CoT Collection์€ human-authored CoT ์˜ˆ์‹œ์™€ ๋น„๊ตํ•ด์„œ ์‹ ๋ขฐ๋„ ์žˆ๊ณ , ๋ฐ˜๋ณต์ ์ด์ง€ ์•Š๊ณ , ์ •๋ณด์ ์ด๊ณ  ๋…ผ๋ฆฌ์ ์ธ CoT ์˜ˆ์‹œ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์—ˆ๋‹ค. 13๊ฐœ์˜ ROSCOE score๊ฐ€ ํ‘œ 1์— ๋‚˜ํƒ€๋‚˜์žˆ๋‹ค.

 

ํ‘œ 1.&nbsp;human-authored ์˜ˆ์‹œ์™€ mahcine-generated ์˜ˆ์‹œ ๊ฐ„์˜ ํ€„๋ฆฌํ‹ฐ ๋น„๊ต

 

Diversity of Rationales.  FLAN-T5์— ์‚ฌ์šฉ๋œ 9๊ฐœ์˜ CoT ๋ฐ์ดํ„ฐ์…‹์€ 'answer question'๊ณผ 'consider following'์— ํฐ ๋น„์ค‘์„ ๋‘” ๋ฐ˜๋ฉด CoT Collection์€ ๋‹ค์–‘ํ•œ ํ…์ŠคํŠธ ํ˜•์‹์„ ํฌํ•จํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค(๊ทธ๋ฆผ 2).

 

๊ทธ๋ฆผ 2.&nbsp;9๊ฐœ์˜ CoT task์˜ ์˜ˆ์‹œ์™€ ๋…ผ๋ฌธ์˜ ์˜ˆ์‹œ training data์˜ root verb์™€ noun object

 

3. Zero-shot Generalization

 CoT Collection์—์„œ CoT fine-tuning์„ ํ•˜๋Š” ๊ฒƒ์ด ์–ด๋–ป๊ฒŒ ํšจ๊ณผ์ ์œผ๋กœ unseen task๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ LM์˜ ๋Šฅ๋ ฅ์„ ๊ฐœ์„ ์‹œํ‚ค๋Š”์ง€๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค์Œ์˜ ๊ทธ๋ฆผ 3์€ C2F2์˜ ์ž‘๋™ ๋ฐฉ์‹์„ ๋ณด์—ฌ์ค€๋‹ค.

 

๊ทธ๋ฆผ 3.&nbsp;C2F2์˜ zero-shot & few-shot ์‹คํ—˜์˜ ์ „๋ฐ˜์ ์ธ ๋ฌ˜์‚ฌ

 

Experiment #1: FLAN-T5 Setting.  FLAN-T5๋ฅผ CoT Collection์„ ์‚ฌ์šฉํ•ด ๊ณ„์†์ ์œผ๋กœ fine-tune ํ•ด์„œ C2F2๋ฅผ ์–ป์—ˆ๋‹ค. FLAN-T5 ์™ธ์—๋„ T5-LM, T0, Tk-Instruct, GPT-3 ๊ฐ™์€ ์„œ๋กœ ๋‹ค๋ฅธ baseline๊ณผ ๋น„๊ตํ•˜์˜€๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, CoT fine-tuning์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ „ํ†ต์ ์ธ instruction tuning์— ๋น„ํ•ด ๋” ๋ฐ์ดํ„ฐ ํšจ์œจ์ ์ธ์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด T5-LM์„ CoT Collection์œผ๋กœ ํ•™์Šต์‹œ์ผœ ๋ดค๋‹ค. ํ•œ ๊ฐ€์ง€ ์•Œ์•„๋‘ฌ์•ผ ํ•  ์ ์€ FLAN Collection์€ CoT Collection๋ณด๋‹ค 7.98๋ฐฐ ๋” ๋งŽ์€ ์–‘์˜ 15M ๊ฐœ์˜ instance๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

 

 ๊ฒฐ๊ณผ๊ฐ€ ํ‘œ 2์— ๋ณด์ด๊ณ  ์žˆ๋‹ค. FLAN-T5๋ฅผ CoT Collection์—์„œ ๊ณ„์†์ ์œผ๋กœ ํ•™์Šต์‹œ์ผœ์„œ ์–ป์–ด์ง„ C2F2๋Š” CoT ํ‰๊ฐ€์—์„œ ๊ธฐ์กด FLAN-T5๋ณด๋‹ค ๊ฐœ์„ ๋œ ๋ชจ์Šต์„ ๋ณด์—ฌ์ค€๋‹ค. ๋†€๋ž๊ฒŒ๋„ CoT Collection์ด ์–ด๋– ํ•œ direct instruction data๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š์•„๋„ ๋ง์ด๋‹ค. ์ด๊ฒƒ์€ ์ถ”๊ฐ€์  CoT instruction ๋ฐ์ดํ„ฐ์™€ ํ•จ๊ป˜ instruction-tuned ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ด LM์œผ๋กœ ํ•˜์—ฌ๊ธˆ unseen task์— ์ ์‘ํ•˜๊ฒŒ ํ•œ๋‹ค๋Š” ์ฃผ์žฅ์„ ์ง€์ง€ํ•œ๋‹ค.

 

 ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ ์ธก๋ฉด์—์„œ T5 + CoT fine-tuning์€ FLAN-T5๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์คฌ๋‹ค. ๋˜ํ•œ, T5-3B + CoT fine-tuning์€ 4๋ฐฐ ํฐ T0-11B & TK-Instruct-11B๋„ direct & CoT ํ‰๊ฐ€์—์„œ ๋Šฅ๊ฐ€ํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” CoT fine-tuning์ด direct instruction tuning์— ๋น„ํ•ด ์ ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•  ๋•Œ๋„ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ๋‹ค๋Š” ๊ฒฐ๋ก ์„ ๋‚ด๋ ธ๋‹ค. 

 

ํ‘œ 2. BBH ๋ฒค์น˜๋งˆํฌ์˜ 27๊ฐœ์˜ unseen task์— ๋Œ€ํ•œ ํ‰๊ฐ€ ์„ฑ๋Šฅ. ์ตœ๊ณ  ์„ฑ๋Šฅ์€ ๋ณผ๋“œ์ฒด, ๋‘ ๋ฒˆ์งธ๋กœ ์ข‹์€ ์„ฑ๋Šฅ์€ ๋ฐ‘์ค„ ํ‘œ์‹œ ๋˜์–ด ์žˆ์Œ.

 

Experiment #2: T0 Setting.  CoT Collection์„ ์‚ฌ์šฉํ•œ CoT fine-tuning์ด ์ ์€ ์–‘์˜ task์—์„œ๋„ ํšจ๊ณผ์ ์ธ์ง€ test ํ•˜๊ธฐ ์œ„ํ•ด CoT Collection์˜ P3 ์„œ๋ธŒ์…‹์„ ์‚ฌ์šฉํ•ด์„œ T5 & T0 ๋ชจ๋ธ์— ์ ์šฉํ•˜์˜€๋‹ค. T0 ์™ธ์—๋„ T5-LM, RoE, KiC, Flipped๋„ ํฌํ•จ์‹œ์ผฐ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ƒํ•œ oracle ๊ฐ’์œผ๋กœ๋Š” T0-11B & GPT-3์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. 

 

 ํ‘œ 3์— ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜ํƒ€๋‚˜์žˆ๋‹ค. T0-3B + CoT fine-tuning์€ ์ถ”๊ฐ€์ ์œผ๋กœ T0-3B๋ฅผ ๊ฐœ์„ ์‹œ์ผฐ๋‹ค. ์ด๊ฒƒ์€ instruction-tuned ๋ชจ๋ธ์„ ์ถ”๊ฐ€์  CoT instruction data์™€ ํ•จ๊ป˜ ๊ณ„์†์ ์œผ๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์€ LM์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํ•ด๊ธˆ์‹œ์ผœ์ค€๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์คฌ๋‹ค. ๋”์šฑ ๋†€๋ž๊ฒŒ๋„, T5-3B + CoT fine-tuning์€ T0์— ๋น„ํ•ด ์˜ค์ง 3.22% ์ •๋„์˜ training data๋ฅผ ์‚ฌ์šฉํ•ด์„œ T0-3B๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๋‹ค. direct instructtion-tuned ๋ชจ๋ธ์ธ T0-3B์™€ ๋น„๊ตํ•ด์„œ ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” ์˜ˆ์‹œ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ 3B LM๋„ ๋”์šฑ ํšจ์œจ์ ์œผ๋กœ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์ค€๋‹ค๋Š” ๊ฒƒ์„ ๊ฐ€๋ฆฌํ‚จ๋‹ค.

 

ํ‘œ 3.&nbsp;11๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ unseen P3 ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ‰๊ฐ€ ์„ฑ๋Šฅ

 

Experiment #3: Multilingual Setting.  CoT fine-tuning์ด multilingual ์„ธํŒ…์—์„œ๋„ ํšจ๊ณผ์ ์ธ์ง€ ํ…Œ์ŠคํŠธํ•˜๊ธฐ ์œ„ํ•ด LMSI์˜ ์‹คํ—˜ ์„ธํŒ…์„ ์‚ฌ์šฉํ•ด MGSM ๋ฒค์น˜๋งˆํฌ๋ฅผ test bed๋กœ ์‚ฌ์šฉํ•ด์„œ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. mT0์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์— ๋น„๊ตํ•ด์„œ single ํƒ€๊นƒ ์–ธ์–ด์— ๋Œ€ํ•œ CoT instruction ๋ฐ์ดํ„ฐ๋ฅผ 0.001%๋งŒ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ํƒ€๊นƒ ์–ธ์–ด์— ๋Œ€ํ•œ instruction ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ธฐ ์œ„ํ•ด ChatGPT๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒˆ์—ญํ•˜์˜€๋‹ค. 

 

 ๊ฒฐ๊ณผ๋Š” ํ‘œ 4์— ๋‚˜ํƒ€๋‚˜์žˆ๋‹ค. 5๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์–ธ์–ด์— ๊ฑธ์ณ์„œ MT5-3.7B + CoT fine-tuning์€ MT0-3.7B๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๋‹ค. ํŠนํžˆ ๋” ์ ์€ ํ•œ๊ตญ์–ด, ์ผ๋ณธ์–ด, ์ค‘๊ตญ์–ด ๊ฐ™์€ ์–ธ์–ด์—์„œ CoT instruction data์™€ ํ•จ๊ป˜ ์‹ฑ๊ธ€ ํƒ€๊นƒ ์–ธ์–ด์—์„œ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ์—ฌ๋Ÿฌ ์–ธ์–ด์™€ ํ•จ๊ป˜ ํ•™์Šตํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์žฅ์ ์„ ๊ฐ€์ง„๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์—ฌ๋Ÿฌ ์–ธ์–ด์—์„œ์˜ ํ•™์Šต์€ forgetting ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, MT0-3.7B + CoT fine-tuning์€ base model mT0์„ ๊ฐœ์„ ์‹œํ‚ค๊ณ , ๋ชจ๋“  ์–ธ์–ด์—์„œ GPT-3์„ ๋Šฅ๊ฐ€ํ•˜์˜€๋‹ค. 

 

ํ‘œ 4.&nbsp;5๊ฐœ์˜ ์–ธ์–ด์— ๊ฑธ์นœ MGSM ๋ฒค์น˜๋งˆํฌ์—์„œ์˜ ํ‰๊ฐ€ ์„ฑ๋Šฅ

 

 

5. Few-shot Generalization

Dataset Setup.  ์ด ์„น์…˜์—์„œ๋Š”, CoT fine-tuning์„ C2F2์— ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด ์–ด๋–ป๊ฒŒ LM์ด ํšจ๊ณผ์ ์œผ๋กœ few-shot ์„ธํŒ…์— ์ ์‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š”์ง€ ๋ณด์—ฌ์ค€๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ์œ„ํ•ด ๋ฒ• & ์˜ํ•™ ๊ด€๋ จ ๋ฐ์ดํ„ฐ์…‹์ธ LEDGAR, Case Hold, MedNLI, RubMedQA๋ฅผ ํ™œ์šฉํ•˜์˜€๋‹ค. ๊ฐ๊ฐ์€ ๋žœ๋คํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋ง๋œ 64๊ฐœ์˜ instance๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. CoT Instruction Tuning๊ณผ ๋˜‘๊ฐ™์€ ํ”„๋กœ์‹œ์ €๋ฅผ ์‚ฌ์šฉํ•ด์„œ 64๊ฐœ์˜ instruction์— ๋Œ€ํ•œ ์˜ˆ์‹œ ๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„ํ•˜์˜€๋‹ค.

 

Training Setup.  LM์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด, ๋…ผ๋ฌธ์—์„œ๋Š” FLAN-T5์™€ C2F2 ์ด๋ ‡๊ฒŒ 2๊ฐœ์˜ baseline์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ๊ฐ์— full fine-tuning, CoT fine-tuning, LoRA fine-tuning, LoRA CoT fine-tuning์„ ์ ์šฉํ•˜์˜€๋‹ค. ๋˜ํ•œ, Claude์™€ ChatGPT๋ฅผ ์‚ฌ์šฉํ•ด์„œ ICL baseline์„ ํฌํ•จ์‹œ์ผฐ๋‹ค. LLM๊ณผ ํ•จ๊ป˜ CoT prompting์„ ํ•˜๊ธฐ ์œ„ํ•ด fine-tuning์— ์‚ฌ์šฉํ•œ augmented CoT ์˜ˆ์‹œ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋ถ€ํ„ฐ ์–ป์–ด์ง„ CoT demonstration์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

 

Experimental Results.  ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ํ‘œ 5์— ๋‚˜ํƒ€๋‚˜์žˆ๋‹ค. ๊ฒฐ๊ตญ LoRA CoT fine-tuning๊ฐ€ 4๊ฐœ์˜ ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๊ฑฐ๋’€๋‹ค.

 

 CoT fine-tuning์„ ์‚ฌ์šฉํ•œ C2F2๋Š” FLAN-T5 direct fine-tuning์— ๋น„ํ•ด ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ์ด๊ฒƒ์€ CoT fine-tuning์˜ ์กฐํ•ฉ์ด LM์˜ few-shot ์ ์‘์— ๋„์›€์„ ์ค€๋‹ค๋Š” ์•„์ด๋””์–ด๋ฅผ ์ง€์ง€ํ•œ๋‹ค.

 

 ๋งˆ์ง€๋ง‰์œผ๋กœ, fine-tuning method๋Š” ICL method์— ๋น„ํ•ด ์ „๋ฐ˜์ ์œผ๋กœ ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค. ์ด๊ฒƒ์€ ๋ฒ• & ์˜ํ•™ ๋ฐ์ดํ„ฐ์…‹ input์˜ ๊ธด ๊ธธ์ด ๋•Œ๋ฌธ์— ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ demonstration์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ถ”์ธกํ•˜์˜€๋‹ค.

 

ํ‘œ 5.&nbsp;4๊ฐœ์˜ domain-specific ๋ฐ์ดํ„ฐ์…‹์—์„œ์˜ ํ‰๊ฐ€ ๊ฒฐ๊ณผ

 

 

 

 

์ถœ์ฒ˜

https://arxiv.org/abs/2305.14045

 

The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Large Language Models (LLMs) have shown enhanced capabilities of solving novel tasks by reasoning step-by-step known as Chain-of-Thought (CoT) reasoning; how can we instill the same capability of reasoning step-by-step on unseen tasks into LMs that possess

arxiv.org

https://huggingface.co/datasets/kaist-ai/CoT-Collection

 

kaist-ai/CoT-Collection · Datasets at Hugging Face

"Which entity is this text about? Richard "Red" Skelton (July 18, 1913 - September 17, 1997) was an American comedy entertainer. He was best known for his national radio and television acts between 1937 and 1971, and as host of the television program The R

huggingface.co

https://huggingface.co/kaist-ai/CoT-T5-11B

 

kaist-ai/CoT-T5-11B · Hugging Face

TL;DR CoT-T5 is a language model using Flan-T5 as a base model, and CoT fine-tuned on 1.84 million rationales across 1,060 tasks from the CoT Collection. Since it was CoT fine-tuned on a large amount of rationales, it shows superior performance with CoT co

huggingface.co