์ „์ฒด ๊ธ€

Welcome! I'm a student studying about deep learning(NLP) ๐Ÿ˜‰ The goal of my study is to develop a competent LLM helping people!
Paper Reading ๐Ÿ“œ/Natural Language Processing

Why can GPT learn in-context? ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper ๊ฑฐ๋Œ€ PLM์€ ๋†€๋ผ์šด in-context learning(ICL) ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋†€๋ผ์šด ์„ฑ๋Šฅ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ด๋“ค์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์•„์ง open question์œผ๋กœ ๋‚จ์•„์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ด ๋…ผ๋ฌธ์—์„œ๋Š” LM์„ meta-optimizer๋กœ ์„ค๋ช…ํ•˜๊ณ  in-context learning์„ ์•”๋ฌต์ ์ธ fine-tuning์œผ๋กœ ์ดํ•ดํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ก ์ ์œผ๋กœ attention์€ ๋‹ค๋ฅธ ํ˜•ํƒœ์˜ gradient descent๋ผ๋Š” ๊ฒƒ์„ ์•Œ์•„๋ƒˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” in-context learning์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ดํ•ดํ•˜์˜€๋‹ค. GPT๊ฐ€ demonstration example์— ๋”ฐ๋ผ์„œ meta-gradient๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ด ๊ธฐ์šธ๊ธฐ๋Š” ICL ๋ชจ๋ธ ์ƒ์„ฑ์„ ์œ„ํ•ด ๊ธฐ์กด์˜ GPT์—..

Paper Reading ๐Ÿ“œ/Natural Language Processing

LMSI: Large Language Models can Self-Improve ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper LLM์€ fine-tune ํ•˜๋Š”๋ฐ ๊ด‘๋ฒ”์œ„ํ•œ supervision์„ ํ•„์š”๋กœ ํ•˜๋Š” ๋ฐ˜๋ฉด์— ์‚ฌ๋žŒ์€ ์™ธ๋ถ€์  ์ž…๋ ฅ ์—†์ด self-thinking์„ ํ•จ์œผ๋กœ์จ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” LLM๋„ ์˜ค์ง unlabeled dataset๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ self-improve ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์„ค๋ช…ํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” CoT prompting๊ณผ Self-Consistency๋ฅผ ์‚ฌ์šฉํ•ด์„œ unlabeled question์— ๋Œ€ํ•œ 'high-confidence' ratinoale-augmented answer๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด PLM์„ ์‚ฌ์šฉํ•˜๊ณ  ์ด self-generated solution์„ ์ด self-generated solution์„ ํƒ€๊นƒ output์œผ๋กœ ํ•ด์„œ ..

Paper Reading ๐Ÿ“œ/Natural Language Processing

Tree of Thoughts: Deliberate Problem Solving with Large Language Models ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper LM๋“ค์€ ์ ์  ๊ด‘๋ฒ”์œ„ํ•œ task์— ์ ์šฉ๋˜๊ณ  ์žˆ๋Š”๋ฐ ์•„์ง token-level left-to-right decision-making ํ”„๋กœ์„ธ์Šค์— ๊ตญํ•œ๋˜์–ด ์žˆ๋‹ค. ์ด๊ฒƒ์€ ํƒ๊ตฌ์™€ ์ „๋žต์ ์ธ ๋ฐฉ๋ฒ•์„ ํ•„์š”๋กœ ํ•˜๋Š” task์—์„œ๋Š” ๋ชจ๋ธ์ด ํ•œ๊ณ„๋ฅผ ๊ฒช๊ฑฐ๋‚˜ ์ดˆ๊ธฐ์˜ ๊ฒฐ์ •์ด ์ค‘์‹ฌ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด LM ์ถ”๋ก ์„ ์œ„ํ•œ ์ƒˆ ํ”„๋ ˆ์ž„์›Œํฌ์ธ 'Tree of Thoughts'(ToT)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ToT๋Š” CoT๋ฅผ ์ผ๋ฐ˜ํ™”ํ•˜๊ณ  ๋ฌธ์ œ ํ•ด๊ฒฐ์— ๋Œ€ํ•œ ์ค‘๊ฐ„ ์Šคํ…์œผ๋กœ ์—ฌ๊ฒจ์ง€๋Š” ์ผ๊ด€์„ฑ ์žˆ๋Š” ํ…์ŠคํŠธ์˜ ์œ ๋‹›์— ๋Œ€ํ•ด ํƒ๊ตฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด ์ค€๋‹ค. ToT๋Š” ์—ฌ๋Ÿฌ ์„œ๋กœ ๋‹ค๋ฅธ reasoning path๋ฅผ ๊ณ ๋ คํ•˜๊ณ  ๋‹ค์Œ ํ–‰๋™์˜ ์ฝ”์Šค๋ฅผ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด self-evaluating choice๋ฅผ ..

Paper Reading ๐Ÿ“œ/Natural Language Processing

Instruction Tuning with GPT-4 ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper ์ด์ „์˜ ์—ฐ๊ตฌ(Self-Instruct)์—์„œ๋Š” human-written instruction ์—†์ด machine-generated instruction๋งŒ์„ ์‚ฌ์šฉํ•ด์„œ LLM์„ fine-tune ํ•ด์„œ ์ƒˆ๋กœ์šด task์— ๋Œ€ํ•ด์„œ ์ข‹์€ zero-shot ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” GPT-4๋กœ instruction data๋ฅผ ๋งŒ๋“ค์–ด์„œ LLM fine-tuning์— ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ๋˜ํ•œ GPT-4๋กœ๋ถ€ํ„ฐ ํ”ผ๋“œ๋ฐฑ & ๋น„๊ต ๋ฐ์ดํ„ฐ ๋˜ํ•œ ์ˆ˜์ง‘ํ•ด์„œ ์ข…ํ•ฉ์ ์ธ ํ‰๊ฐ€์™€ reward model training์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. Table of Contents 1. Introduction 2. Dataset 3. Instruction-Tuning Language Mode..

Paper Reading ๐Ÿ“œ/Alignment Problem of LLM

Aligning Large Language Models through Synthetic Feedback ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper LLM์„ human value๋กœ align ํ•˜๋Š” ๊ฒƒ์€ LLM์˜ ์ •๊ตํ•œ ์กฐ์ข…์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด ์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ด์กŒ๋‹ค. ํ•˜์ง€๋งŒ alignment๋Š” ์ƒ๋‹นํ•œ ์–‘์˜ human demonstration๊ณผ ํ”ผ๋“œ๋ฐฑ์„ ํ•„์š”๋กœ ํ•œ๋‹ค. ์ตœ๊ทผ์˜ open-source model์€ ์ด๋ฏธ align ๋œ InstructGPT์™€ ChatGPT ๊ฐ™์€ LLM์œผ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ distill ํ•จ์œผ๋กœ์จ alignment learning ํ”„๋กœ์„ธ์Šค๋ฅผ ๋ณต์ œํ•˜์˜€๋‹ค. ์ด ํ”„๋กœ์„ธ์Šค๋Š” ์‚ฌ๋žŒ์˜ ๋…ธ๋ ฅ์„ ์ค„์—ฌ์ฃผ์ง€๋งŒ, teacher model์— ์ƒ๋‹นํžˆ ์˜์กด์ ์ด๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ๋žŒ์˜ ๋…ธ๋™์ด ๊ฑฐ์˜ ํ•„์š”ํ•˜์ง€ ์•Š๊ณ  pre-aligned LLM์— ์˜์กดํ•˜์ง€ ์•Š๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ์ด ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํ”„๋กœ์„ธ์Šค๋Š” ๋‹ค..

Paper Reading ๐Ÿ“œ/Natural Language Processing

ChatGPT์— ๋ฐ˜๋ณต ๋ฉ”์ปค๋‹ˆ์ฆ˜(LSTM)์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด? - RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper Transformer์˜ ๊ณ ์ • ์‚ฌ์ด์ฆˆ context๋Š” GPT๊ฐ€ long text๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์—†๊ฒŒ ๋งŒ๋“ ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” RNN์˜ ๋ฐ˜๋ณต ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ์–ธ์–ด ๊ธฐ๋ฐ˜ ๋ณต์ œ์ธ RecurrentGPT๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค. RecurrentGPT๋Š” ChatGPT ๊ฐ™์€ LLM์— ๊ธฐ๋ฐ˜ํ•ด์„œ ๋งŒ๋“ค์–ด์ง€๊ณ  LSTM์˜ Long-Short Term Memory์„ ๊ตฌ๋™ํ•˜๊ธฐ ์œ„ํ•ด ์ž์—ฐ์–ด๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๊ฐ timestep์—์„œ RecurrentGPT๋Š” ํ…์ŠคํŠธ์˜ ๋ฌธ๋‹จ์„ ์ƒ์„ฑํ•˜๊ณ , ํ•˜๋“œ ๋“œ๋ผ์ด๋ธŒ์™€ prompt ๊ฐ๊ฐ์— ์ €์žฅ๋˜์–ด ์žˆ๋Š” ์–ธ์–ด ๊ธฐ๋ฐ˜ Long-Short Term Memory๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค. ์ด ๋ฐ˜๋ณต ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ RecurrentGPT๊ฐ€ forgetting ์—†์ด ์ž„์˜์˜ ๊ธธ์ด์˜ ๊ธด ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„..

Paper Reading ๐Ÿ“œ/Alignment Problem of LLM

ICIL: In-Context Instruction Learning ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper instruction learning์€ instruction tuning๊ณผ RLHF๋ฅผ ํฌํ•จํ•˜๋Š” fune-tuning ๋ฌธ์ œ๋กœ ์ ‘๊ทผ๋˜์—ˆ๋‹ค. ์—ฌ๊ธฐ์„œ LLM์€ ๋‹ค์–‘ํ•œ task์—์„œ instruction๊ณผ ํ•จ๊ป˜ ๋‹ค์–‘ํ•œ task์—์„œ fine-tune ๋˜์—ˆ๋‹ค. in-context learning์„ instruction learning์— ์ ์šฉํ•œ ๊ฒƒ์ด In-Context Instruction Learning(ICIL)์ด๋‹ค. ICIL์€ pre-trained & instruction-finetned ๋ชจ๋ธ์˜ zero-shot task ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ƒ๋‹นํžˆ ๊ฐœ์„ ์‹œ์ผฐ๋‹ค. ICIL์˜ ํ•œ ๊ฐ€์ง€ ํ•ต์‹ฌ ์žฅ์ ์€ ๋ชจ๋“  task๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ cross-task๋ฅผ ์—ฐ๊ฒฐํ•œ ํ•˜๋‚˜์˜ ๊ณ ์ •..

Paper Reading ๐Ÿ“œ/Natural Language Processing

LoRA: Low-Rank Adaptation of Large Language Models ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

์ด๋ฒˆ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋Š” ๊ธฐ์กด ๋ฐฉ์‹๊ณผ ๋‹ค๋ฅด๊ฒŒ powerpoint๋กœ ์ž‘์„ฑํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์˜ ๊ฐ„๋‹จํ•œ ๊ฐœ์š”๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๊ณ , ๋…ผ๋ฌธ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ฒจ๋ถ€๋œ powerpoint ํŒŒ์ผ์„ ํ™•์ธํ•˜๊ธธ ๋ฐ”๋ž€๋‹ค. powerpoint์˜ ๋ฉ”๋ชจ์™€ ์Šฌ๋ผ์ด๋“œ ๋…ธํŠธ์— ์„ค๋ช…์„ ์ ์–ด๋’€์œผ๋‹ˆ ์ฐธ๊ณ ํ•˜๊ธธ ๋ฐ”๋ž€๋‹ค. ์ด ํฌ์ŠคํŒ…์€ ๋‹ค์Œ์˜ ์œ ํŠœ๋ธŒ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ์ž‘์„ฑ๋˜์—ˆ๋‹ค. The overview of this paper NLP์˜ ์ค‘์š” ํŒจ๋Ÿฌ๋‹ค์ž„์€ general domain ๋ฐ์ดํ„ฐ์—์„œ ๋Œ€๊ทœ๋ชจ pre-training์„ ํ•˜๊ณ  ํŠน์ • task ๋˜๋Š” domain์— ์ ์šฉ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. larger model์„ pre-train ํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์žฌํ•™์Šตํ•˜๋Š” full fine-tuning์€ ์‹คํ–‰ ๊ฐ€๋Šฅ์„ฑ์ด ๋–จ์–ด์ง„๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” pre-trained model์˜ ๊ฐ€์ค‘..

Paper Reading ๐Ÿ“œ/Alignment Problem of LLM

LIMA: Less Is More for Alignment ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper LLM์€ ๋‘ ๊ฐ€์ง€์˜ ๋‹จ๊ณ„๋กœ ํ•™์Šต๋œ๋‹ค. general-purpose representation์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด, raw text๋กœ๋ถ€ํ„ฐ unsupervised pre-training์„ ์‚ฌ์šฉ end task์™€ ์‚ฌ์šฉ์ž ์„ ํ˜ธ๋ฅผ align ํ•˜๊ธฐ ์œ„ํ•ด ๋Œ€๊ทœ๋ชจ instruction tuning & RL์„ ์‚ฌ์šฉ ์ด ๋‘ ๊ฐ€์ง€ stage์˜ ์ค‘์š”์„ฑ์„ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ์–ด๋– ํ•œ RL ๋˜๋Š” human preference modeling ์—†์ด ์˜ค์ง 1000๊ฐœ์˜ ์‹ ์ค‘ํ•˜๊ฒŒ ์„ ์ •๋œ prompt & response์—์„œ ๊ธฐ์กด supervised loss๋ฅผ ์‚ฌ์šฉํ•ด์„œ fine-tune ๋œ LLaMA-65B์ธ LIMA๋ฅผ ํ•™์Šต์‹œ์ผฐ๋‹ค. LIMA๋Š” ๋ณต์žกํ•œ ์ฟผ๋ฆฌ๋ฅผ ํฌํ•จํ•˜๋Š” training ๋ฐ์ดํ„ฐ์˜ ๋ช‡ ๊ฐ€์ง€ ์˜ˆ..

Paper Reading ๐Ÿ“œ/Natural Language Processing

OPT: Open Pre-trained Transformer Language Models ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper ํ•™์Šตํ•˜๋Š”๋ฐ ์ƒ๋‹นํžˆ ๋งŽ์€ compute๊ฐ€ ํ•„์š”ํ•œ LLM์€ zero-shot & few-shot learning์—์„œ ๋ˆˆ์— ๋Œ๋งŒํ•œ ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค. computational cost๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ์ƒ๋‹นํ•œ ์ž๋ณธ ์—†์ด ์ด๋ฅผ ๋ณต์ œํ•˜๋Š” ๊ฒƒ์€ ํž˜๋“ค๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์— ๋Œ€ํ•ด API๊ฐ€ ๊ณต๊ฐœ๋˜์–ด ์žˆ์ง€ ์•Š๊ณ  full model์˜ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ์ ‘๊ทผ์ด ํ—ˆ๋ฝ๋˜์–ด ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๋ฐ์— ์–ด๋ ค์›€์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฐ๊ตฌ์ž๋“ค์—๊ฒŒ ์™„์ „ํžˆ ๊ณต๊ฐœ๋œ dcoder-only pre-trained Transformer์ธ Open Pre-trained Transformer(OPT)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” OPT-175B๊ฐ€ GPT-3์— ๋น„ํ•ด ์˜ค์ง $\frac {1}{7}$์˜..

Cartinoe
Cartinoe's paper review