Insight ๐Ÿ˜Ž

๋‹น์‹ ๋„ Fine-tuning ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! with PEFT ๐Ÿค—

Cartinoe 2023. 8. 1. 17:57

The current trend of LM ๐Ÿ“ˆ

 2017๋…„ Vaswani ๊ป˜์„œ 'Attention Is All You Need'๋ผ๋Š” ๋…ผ๋ฌธ์œผ๋กœ Transformer๋ฅผ ์ฒ˜์Œ ์†Œ๊ฐœํ•˜์‹œ๊ณ , ๊ทธ ํ›„ 2018๋…„์— BERT์™€ GPT๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋˜๋ฉด์„œ๋ถ€ํ„ฐ LM(Language Model)์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋Š” ๊ทธ ์‹œ์ž‘์„ ์•Œ๋ ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด ๋‹น์‹œ์— ์†Œ๊ฐœ๋˜์—ˆ๋˜ pre-training & fine-tuning์ด๋ผ๋Š” ๊ฐœ๋…์€ ์•„์ง๊นŒ์ง€๋„ ๋„๋ฆฌ ์‚ฌ์šฉ๋  ์ •๋„๋กœ ํฌ๋‚˜ํฐ LM์˜ framework๋ฅผ ์ด๋ฃจ๊ฒŒ ๋˜์—ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ์•Œ์•„๋ณด๊ฒŒ ๋  PEFT(์ž์„ธํ•œ ๋œป์€ ์กฐ๊ธˆ ๋’ค์— ์•Œ๋ ค๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค! ๐Ÿ˜„)๋„ ์ด ์ค‘ fine-tuning์— ๊ด€๋ จ๋œ method์ด๋‹ค. PEFT์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ธฐ ์ „์— ์ด pre-training๊ณผ fine-tuning์ด ๊ณผ์—ฐ ์ •ํ™•ํžˆ ๋ฌด์—‡์ธ์ง€์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž!

 

Pre-training & Fine-tuning

pre-training์€ ์ด๋ฆ„์—์„œ๋ถ€ํ„ฐ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด ๋ง ๊ทธ๋Œ€๋กœ ์‚ฌ์ „์— ํ•™์Šต์‹œํ‚จ๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ๋ฌด์—‡์„ ์‚ฌ์ „์— ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ผ๊นŒ? ๋ฐ”๋กœ LM์„ ์‚ฌ์ „์— ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค! ๐Ÿ˜‰ ์ด๊ฒŒ ๋ฌด์Šจ ๋ง์ผ๊นŒ? LM์„ ์‚ฌ์ „์— ํ•™์Šต์‹œํ‚จ๋‹ค๋‹ˆ? LM์€ ํ•œ ๋‹จ๊ณ„๋กœ ๋งŒ๋“ค์–ด์ง€๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๋กœ ๋งŒ๋“ค์–ด์ง€๋Š” ๊ฒƒ์ธ ๊ฑธ๊นŒ? ์ •ํ™•ํ•˜๋‹ค! ํ˜„์žฌ์˜ LM๋“ค์€ ํ•œ ๋‹จ๊ณ„๋กœ ๋งŒ๋“ค์–ด์ง€์ง€ ์•Š๋Š”๋‹ค. LM์€ ๋ณดํ†ต pre-training๊ณผ fine-tuning์˜ 2๋‹จ๊ณ„๋กœ ํ•™์Šต๋œ๋‹ค.

 

  • Pre-training: LM์ด ์ „๋ฐ˜์ ์ธ ์–ธ์–ด์˜ ์ดํ•ด ๋Šฅ๋ ฅ์„ ๊ธฐ๋ฅผ ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋„๋ก ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์ „์— LM์„ ํ•™์Šต์‹œ์ผœ ๋‘๋Š” ๋‹จ๊ณ„ → ↑ ๋ฐ์ดํ„ฐ & ↑ ํ•™์Šต ์‹œ๊ฐ„ ๐Ÿ˜…
  • Fine-tuning: pre-training์„ ํ†ตํ•ด ์‚ฌ์ „์— ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๊ตฌ์ฒด์ ์ธ ๋„๋ฉ”์ธ์— ๋Œ€ํ•ด์„œ ๋” ์ž˜ ์ ์‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ทธ ๋„๋ฉ”์ธ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์‹œ ํ•œ๋ฒˆ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•ด ๋‚˜๊ฐ€๋ฉด์„œ ํ•™์Šต์‹œํ‚ค๋Š” ๋‹จ๊ณ„ → ↓ ๋ฐ์ดํ„ฐ & ↓ ํ•™์Šต ์‹œ๊ฐ„ ๐Ÿ’ช

 

 ๋‹ค์Œ์˜ ๊ทธ๋ฆผ์€ Pre-training๊ณผ Fine-tuning์„ ์„ค๋ช…ํ•˜๋Š” ์•„์ฃผ ์œ ๋ช…ํ•œ ๊ทธ๋ฆผ์œผ๋กœ, BERT์˜ ๋…ผ๋ฌธ์—์„œ ์„ ๋ณด์—ฌ์ง„ ๊ทธ๋ฆผ์ด๋‹ค. ๊ทธ๋ฆผ์„ ๋ณด๋ฉด ์™ผ์ชฝ์˜ ๋ชจ๋ธ์€ ๋ฒ”์šฉ์ ์œผ๋กœ ํ•™์Šต๋œ pre-trained model์ด๊ณ , ์ด pre-trained model์„ ๊ฐ task์— ๋Œ€ํ•ด์„œ fine-tuning ์‹œํ‚จ ๋ชจ๋ธ์ด ์˜ค๋ฅธ์ชฝ์˜ fine-tuned model์ด๋‹ค. ์ด๋ ‡๊ฒŒ pre-training์„ ํ†ตํ•ด ์ „๋ฐ˜์ ์ธ LM์˜ ์„ฑ๋Šฅ์„ ๋‹ค์ ธ๋‘๊ณ , fine-tuning์„ ํ†ตํ•ด ๊ตฌ์ฒด์ ์ธ task์— ๋Œ€ํ•ด์„œ ๋ชจ๋ธ์„ ํŠนํ™”์‹œํ‚ด์œผ๋กœ์จ ๋น„๊ต์  ์ ์€ ๋น„์šฉ์œผ๋กœ ์—ฌ๋Ÿฌ task์— ๋Œ€ํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ฒŒ ๋œ ๊ฒƒ์ด๋‹ค! ๐Ÿ˜† Pre-training & Fine-tuning์€ ๊ทธ ํšจ๊ณผ์™€ ํšจ์œจ์„ฑ์„ ์ธ์ •๋ฐ›์•„ ์•„์ง๊นŒ์ง€๋„ LM์˜ ๊ตต์€ ๋ผˆ๋Œ€๋ฅผ ์ด๋ฃจ๊ณ  ์žˆ๋‹ค.

 

Pre-training & Fine-tuning (์ถœ์ฒ˜: https://arxiv.org/abs/1810.04805)

 

What's the problem of Fine-tuning? ๐Ÿค”

 ์•ž์„œ ์„ค๋ช…ํ–ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ fine-tuning์˜ ์žฅ์ ์€ pre-training์— ๋น„ํ•ด์„œ ํ›จ์”ฌ ๋” ์ ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด์„œ tuningํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ›จ์”ฌ ๋” ์ ์€ computation์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์‹ค์ œ๋กœ ๋น„๊ตํ•ด ๋ด๋„ pre-training์— ๋น„ํ•ด fine-tuning์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์–‘์€ ํ›จ์”ฌ ๋” ์ ์€ ์–‘์ด ์‚ฌ์šฉ๋œ๋‹ค. ์ด๋กœ ์ธํ•ด์„œ fine-tuning์˜ ๋น„์šฉ์ด ์ƒ๋‹นํžˆ ์ค„์–ด๋“ค ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค. ํ•˜์ง€๋งŒ, ์™„๋ฒฝํ•œ ์ค„๋งŒ ์•Œ์•˜๋˜ fine-tuning์—๋„ ์–ด๋А ์ˆœ๊ฐ„๋ถ€ํ„ฐ ๊ทธ๋ฆผ์ž ๋“œ๋ฆฌ์šฐ๊ธฐ ์‹œ์ž‘ํ•˜์˜€์œผ๋‹ˆ,, ๐Ÿ˜ฅ

 

 fine-tuning์˜ ๋ฌธ์ œ์ ์€ ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ์™€ ๋ฐ์ดํ„ฐ์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ปค์ง€๋ฉด์„œ, pre-training์— ๋น„ํ•ด์„œ ํ›จ์”ฌ ๋” ์ ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ•ด๋„ ๊ฐœ๊ฐœ์ธ์ด tuning์„ ์ง„ํ–‰ํ•˜๊ธฐ์—๋Š” ์ปดํ“จํŒ… ์ž์›์  ์ œํ•œ์ด ์ƒ๊ธธ ์ˆ˜๋ฐ–์— ์—†๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ž˜์„œ ์‹ค์ œ๋กœ ๋ช‡ ๋ฐฑ๋งŒ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ(๋‹ค์‹œ ์ƒ๊ฐํ•ด ๋ณด๋‹ˆ ์ ์€ ์ˆซ์ž๋Š” ์•„๋‹ˆ๊ธด ํ•˜๋‹ค ๐Ÿคฃ)์—์„œ 13B ๋ชจ๋ธ์„ fine-tuning ํ•œ๋‹ค๊ณ  ํ•ด๋„ ์—ฌ๋Ÿฌ ๋Œ€์˜ GPU๋ฅผ ๊ฐ€์ง€๊ณ  ๋ช‡์ผ ๋™์•ˆ ํ•™์Šต์„ ์‹œ์ผœ์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๐Ÿ˜ฐ ๋ฌผ๋ก  ์ปดํ“จํŒ… ์ž์›์ด ๋งŽ์€ ํ™˜๊ฒฝ์—์„œ๋Š” ํฌ๊ฒŒ ๋ฌธ์ œ๊ฐ€ ๋  ๊ฒƒ์ด ์—†์ง€๋งŒ, ํ•„์ž์™€ ๊ฐ™์€ ๊ฐœ์ธ ํ•™์Šต์ž ๋˜๋Š” ์—ฐ๊ตฌ์ž์˜ ์ž…์žฅ์—์„œ ์ด๋Ÿฌํ•œ ์ปดํ“จํŒ… ์ž์›์€ ์ƒ๋‹นํžˆ ๋ถ€๋‹ด๋  ์ˆ˜๋ฐ–์— ์—†๋Š” ๋ฌธ์ œ์ด๊ธฐ ๋•Œ๋ฌธ์—, ๋นจ๋ฆฌ ๊ฐœ์„ ๋˜์–ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ์ด๋‹ค.. ๐Ÿ˜ญ

 

 ํšจ์œจ์„ฑ์„ ์œ„ํ•ด ๋งŒ๋“ค์–ด์ง„ ๋ฐฉ๋ฒ•์ธ fine-tuning์ด ํšจ์œจ์ ์ด์ง€ ์•Š๋‹ค๋ฉด, ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„๊นŒ? ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜์˜จ ๋ฐฉ๋ฒ•์ด ๋ฐ”๋กœ PEFT(Parameter Efficient Fine-tuning)์ด๋‹ค!! ๐Ÿค— PEFT๋Š” ์ด๋ฆ„์—์„œ๋ถ€ํ„ฐ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด ์ข€ ๋” ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์ ์ธ fine-tuning์„ ํ†ตํ•ด ๋”์šฑ ํšจ์œจ์ ์ธ fine-tuning์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” method์ด๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ด PEFT์— ์–ด๋–ค ๋ฐฉ๋ฒ•๋“ค์ด ์กด์žฌํ•˜๊ณ , ์–ด๋–ค ๊ฒฝ์œ„๋กœ ์ด๋Ÿฌํ•œ method๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ ์ธ์ง€์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ ธ๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค. ์ž, ๊ทธ๋Ÿผ ์‹œ์ž‘ํ•ด ๋ณด๋„๋ก ํ•˜์ž!! ๐Ÿ›ซ

 

HuggingFace PEFT (์ถœ์ฒ˜: https://huggingface.co/blog/peft)

 

Parameter Efficient Fine-Tuning ๐Ÿค—

 ๋“œ๋””์–ด ๋Œ€๋ง์˜ PEFT์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณผ ์‹œ๊ฐ„์ด๋‹ค. PEFT, ๊ณผ์—ฐ ๋ฌด์—‡์ผ๊นŒ? PEFT๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์ ์ธ fine-tuning์ด๋ผ๊ณ  ํ•˜๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ ๋งํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์ ์ด๋ผ๋Š” ๊ฒƒ์€ ๋ฌด์Šจ ์˜๋ฏธ์ผ๊นŒ? ์ž, fine-tuning์˜ ๊ฐœ๋…์— ๋Œ€ํ•ด์„œ ๋‹ค์‹œ ํ•œ ๋ฒˆ ์ƒ๊ฐํ•ด ๋ณด์ž. fine-tuning์€ ๊ธฐ์กด์˜ pre-trained base model์— ๋Œ€ํ•ด domain-specific ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ 'ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์žฌ์กฐ์ •'ํ•จ์œผ๋กœ์จ ๋‹ค์‹œ ํ•™์Šต์‹œํ‚ค๋Š” ๊ณผ์ •์„ ๋งํ•œ๋‹ค. ์•ž์˜ ๋ฌธ์žฅ์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์ฃผ๋ชฉํ•ด์•ผ ํ•  ๋ถ€๋ถ„์€ 'ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์žฌ์กฐ์ •' ํ•œ๋‹ค๋Š” ๋ถ€๋ถ„์ด๋‹ค. ์ด๋•Œ, fine-tuning์€ ๋ชจ๋ธ์˜ '๋ชจ๋“ ' ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์žฌ์กฐ์ •ํ•จ์œผ๋กœ์จ ์ข€ ๋” ๋„๋ฉ”์ธ์— ์•Œ๋งž์€ ๋ชจ๋ธ์„ ์–ป์„ ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. PEFT๋Š” fine-tuning์˜ ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ถ€๋ถ„์„ ํŒŒ๊ณ ๋“ค์–ด์„œ ๊ตณ์ด ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜์ง€ ์•Š๊ณ ๋„ fine-tuning๊ณผ ๋น„์Šทํ•œ ํšจ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜์ง€ ์•Š๊ณ  ๊ทน์†Œ์ˆ˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ์„ ์—…๋ฐ์ดํŠธ ํ•จ์œผ๋กœ์จ ์ข€ ๋” ํšจ์œจ์ ์œผ๋กœ fine-tuning์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ ๊ฒƒ์ด๋‹ค! ๐Ÿ˜†

 

Fine-tuning vs. PEFT

 

 ์ด์ œ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์˜๋ฌธ์ด ๋“ค ์ˆ˜๋ฐ–์— ์—†๋‹ค. '์ด๋ ‡๊ฒŒ ์กฐ๊ธˆ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ ๋งŒ์œผ๋กœ๋„ ๊ธฐ์กด์˜ full parameter update๋ฅผ ํ•˜๋Š” fine-tuning๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ์ˆ˜๊ฐ€ ์žˆ๋‹ค๊ณ ? ๐Ÿค”' ๋‹น์—ฐํ•˜๊ฒŒ๋„ ๋“ค ์ˆ˜๋ฐ–์— ์—†๋Š” ์˜๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ํ•„์ž๋„ ์ฒ˜์Œ ๋ณด๊ณ  ์ •๋ง ๋ง๋„ ์•ˆ ๋˜๋Š” ๋…ผ๋ฆฌ๋ผ๊ณ  ์ƒ๊ฐํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๐Ÿ˜ ๊ณผ์—ฐ ์ด๊ฒƒ์ด ์–ด๋–ป๊ฒŒ ๊ฐ€๋Šฅํ–ˆ๋˜ ๊ฑด์ง€ ์—ฌ๋Ÿฌ PEFT method๋ฅผ ์•Œ์•„๋ณด๋ฉด์„œ ์–˜๊ธฐํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค! ๐Ÿ˜™

 

Adapter: Parameter-Efficient Transfer Learning for NLP (Houlsby et al., 2019)

 

 ๊ฑฐ์˜ ์ฒ˜์Œ์œผ๋กœ Parameter Efficient ํ•™์Šต ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ Adapter๋Š” 2019๋…„๋„ 'Parameter-Efficient Transfer Learning for NLP' ๋…ผ๋ฌธ์„ ํ†ตํ•ด์„œ ๊ณต๊ฐœ๋˜์—ˆ๋‹ค. Adapter๊ฐ€ ๋งŒ๋“ค์–ด์ง„ ๋ชฉํ‘œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ์ด๋•Œ๋ถ€ํ„ฐ PEFT์˜ ๊ทผ๋ณธ์ ์ธ ์ด๋…์ด ๋งŒ๋“ค์–ด์กŒ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋  ๊ฒƒ ๊ฐ™๋‹ค. ๐Ÿ˜Š

 

  • ์†Œํ˜•์˜ Adapter ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ task ๋ณ€ํ™˜์ด ์šฉ์ดํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ ์ž ํ•จ. ๐Ÿค–
  • ๋” ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜์—ฌ fine-tuning๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ์–ป๊ณ ์ž ํ•จ. ๐Ÿ”ป๐Ÿ†™

 

 ์œ„์˜ ๋ชฉํ‘œ๋“ค์„ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด Adapter paper์—์„œ๋Š” pre-trained weight๋Š” frozen(๋™๊ฒฐ์‹œ์ผœ๋‘๊ณ  ์—…๋ฐ์ดํŠธ๋ฅผ ํ•˜์ง€ ์•Š๋Š” ์ƒํƒœ) ์‹œ์ผœ๋‘๊ณ , ์˜ค์ง Adapter layer๋งŒ์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด ๋ฐฉ์‹์„ ํ†ตํ•ด ์œ„ ๋‘ ๋ชฉํ‘œ๋ฅผ ํ•จ๊ป˜ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋Š”๋ฐ, ์šฐ์„  ๊ธฐ์กด์˜ pre-trained weight๊ฐ€ ์—…๋ฐ์ดํŠธ๋˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์—, Adapter๋งŒ ๊ฐˆ์•„ ๋ผ์šฐ๊ฒŒ ๋œ๋‹ค๋ฉด ์–ด๋А task๋“  ์šฉ์ดํ•˜๊ฒŒ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๊ณ , pre-trained weight๊ฐ€ frozen ๋˜์–ด ์žˆ๊ณ , Adapter๋งŒ์„ ์—…๋ฐ์ดํŠธํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ›จ์”ฌ ๋” ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๊ฒŒ ๋œ๋‹ค! ๐Ÿ™‚ ๊ทธ๋ ‡๋‹ค๋ฉด ์ด์ œ ์ •ํ™•ํžˆ Adapter๊ฐ€ ๋ฌด์—‡์ธ์ง€์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž!

 

 Adapter๋Š” ๋”ฐ๋กœ ๊ฑฐ์ฐฝํ•œ ๋ฌด์–ธ๊ฐ€๊ฐ€ ์žˆ๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๊ณ , ๊ทธ ์‹ค์ƒ์€ Transformer layer ์ค‘๊ฐ„์— ์ถ”๊ฐ€๋˜๋Š” ํ•˜๋‚˜์˜ ์ž‘์€ ๋ ˆ์ด์–ด์ผ ๋ฟ์ด๋‹ค. ๐Ÿ˜ ๊ทธ๋ ‡๋‹ค๋ฉด ์ด ํ•˜๋‚˜์˜ ๋ ˆ์ด์–ด๊ฐ€ ์–ด๋–ป๊ฒŒ ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์ ์ธ fine-tuning์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ผ๊นŒ? ๋‹ค์Œ์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด Adapter layer๋Š” ์ด 3๊ฐœ์˜ ๋ ˆ์ด์–ด๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š”๋ฐ, 'Feed-forward down-projection', 'Non-linear layer', 'Feed-forward up-projection'์ด๋‹ค. ์ด๋•Œ down-projection์„ ํ†ตํ•ด ๊ธฐ์กด์˜ d ์ฐจ์›์„ smaller m ์ฐจ์›์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ๋น„์„ ํ˜• ๋ ˆ์ด์–ด๋ฅผ ๊ฑฐ์นœ ๋‹ค์Œ์—, ๋‹ค์‹œ up-projection์„ ํ†ตํ•ด smaller m ์ฐจ์›์—์„œ ๊ธฐ์กด์˜ d ์ฐจ์›์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด ๊ฐ™์€ ๊ณผ์ •์„ ํ†ตํ•ด ๋”์šฑ ์ ์€ ์ˆ˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ์—ˆ๋˜ ๊ฒƒ์ด๋‹ค! ๐Ÿ˜‰

 

Adapter Layer

 

 Adapter๋Š” ๊ธฐ์กด ๋ชจ๋ธ์˜ 2~3%์— ๋ถˆ๊ณผํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ์„ ์—…๋ฐ์ดํŠธ ํ–ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , GLUE ๋ฒค์น˜๋งˆํฌ์—์„œ ๊ธฐ์กด BERT_LARGE์— 0.4% ๋ฐ–์— ๋’ค์ง€์ง€ ์•Š๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๋‹ค! ๐Ÿ˜ฎ ์ •๋ง์ด์ง€ ์—„์ฒญ๋‚œ ๊ฒฐ๊ณผ๊ฐ€ ์•„๋‹ ์ˆ˜๊ฐ€ ์—†๋‹ค! ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” ์ง€๊ธˆ๊นŒ์ง€์˜ fine-tuning์— ๋Œ€ํ•œ ์ „๋ฐ˜์ ์ธ ์ƒ๊ฐ์„ ๋’ค์ง‘์–ด์—Ž๋Š” ๊ฒฐ๊ณผ์˜€๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ Adapter๋Š” ํ™•์‹คํžˆ ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์ ์œผ๋กœ fine-tuning์˜ ์„ฑ๋Šฅ์— ๋ฒ„๊ธˆ๊ฐ€๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.

 

Adapter๋Š” ํ›จ์”ฌ ๋” ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ๋กœ ํ›จ์”ฌ ๋” ๋น ๋ฅด๊ฒŒ base model์˜ ์„ฑ๋Šฅ์— ๊ทผ์ ‘ํ•  ์ˆ˜ ์žˆ์—ˆ์Œ (์ถœ์ฒ˜: Adapter paper)

 

Prefix-Tuning: Optimizing Continuous Prompts for Generation (Li & Liang, 2021)

 

 Adapter์˜ ๋’ค๋ฅผ ์ด์€ ๋…ผ๋ฌธ์€ Prefix-Tuning์ด๋‹ค(์‹ค์ œ๋กœ ๋ฐ”๋กœ ํ›„์†์œผ๋กœ ๋‚˜์˜จ ๋…ผ๋ฌธ์ธ์ง€๋Š” ์ž˜ ๋ชจ๋ฅด๋‚˜, ํ•„์ž๋Š” Adapter ์ดํ›„์— Prefix-Tuning์„ ์ฝ๊ธด ํ–ˆ์Šต๋‹ˆ๋‹ค! ๐Ÿ˜…). Prefix-Tuning์€ 2021๋…„๋„ 'Prefix-Tuning: Optimizing Continuous Prompts for Generation' ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœ๋˜์—ˆ๋‹ค. Prefix-Tuning๋„ Adapter์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๊ธฐ์กด์˜ fine-tuning์€ ๋„ˆ๋ฌด ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ•„์š”๋กœ ํ•˜๊ณ , ํ•˜๋‚˜์˜ task ๋‹น ํ•˜๋‚˜์˜ full fine-tuned model์„ ์ €์žฅํ•ด์•ผ ํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์„ ์ง€์ ํ•˜๋ฉด์„œ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด prefix๋งŒ์„ ํŠœ๋‹ํ•˜๋Š” Prefix-Tuning์„ ์ œ์•ˆํ•˜์˜€๋‹ค.

 

 ๊ทธ๋ ‡๋‹ค๋ฉด ์—ฌ๊ธฐ์„œ ๋งํ•˜๋Š” prefix๋ž€ ๋ฌด์—‡์ผ๊นŒ? ์šฐ์„  GPT-2์™€ GPT-3์— ๋Œ€ํ•ด ๋จผ์ € ์ƒ๊ฐํ•ด๋ณด์ž. ์ด ๋‘ ๋ชจ๋ธ์€ few-shot์„ ํ™œ์šฉํ•˜์˜€๋‹ค. few-shot์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ช‡ ๊ฐ€์ง€ example์„ ์ฃผ์–ด์•ผ ํ•˜๋Š”๋ฐ, ์ด๋•Œ ๋ช‡ ๊ฐ€์ง€ example๊ณผ ํ•จ๊ป˜ ์ฃผ์–ด์ง€๋Š” ์•ฝ๊ฐ„์˜ instruction ๊ฐ™์€ ๊ฒƒ์„ prefix๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค. (ex. 'input์„ ์ด์šฉํ•˜์—ฌ input์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜์‹œ์˜ค.') ์ด๋ ‡๊ฒŒ ์ฃผ์–ด์ง€๋Š” prefix๋งŒ์„ ํŠœ๋‹ํ•˜๊ณ , pre-trained model์€ frozen ์‹œ์ผœ๋‘” ์ฑ„, ์ข€ ๋” input์„ ์ž˜ ํ™œ์šฉํ•˜์—ฌ ๋” ๋‚˜์€ output์„ ์ œ๊ณตํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด ๋ฐ”๋กœ Prefix-Tuning์ธ ๊ฒƒ์ด๋‹ค! ๐Ÿ™‚

 

Prefix-Tuning Overview (์ถœ์ฒ˜: Prefix-Tuning paper)

 

 ์ด๋ ‡๊ฒŒ ์–ป์–ด์ง„ Prefix-Tuning์˜ ์„ฑ๋Šฅ์€ ์–ด๋• ์„๊นŒ? ๋ฌด๋ ค ๊ธฐ์กด ๋ชจ๋ธ์˜ 0.1%์— ํ•ด๋‹นํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํŠœ๋‹ํ•˜๊ณ ๋„ ์˜คํžˆ๋ ค ๊ธฐ์กด์˜ ๋ชจ๋ธ์„ ์••๋„ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ธฐ๋„ ํ•˜์˜€๋‹ค! ๐Ÿ˜ฎ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋ฉด์„œ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ค๋ช…ํ•œ๋‹ค.

 

PLM์˜ weight๋Š” frozen ๋˜์–ด untouched ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, general purpose corpora์—์„œ

์–ป์–ด์ง„ ๋Šฅ๋ ฅ์„ ์˜จ์ „ํžˆ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด์„œ ์„ฑ๋Šฅ์ด ๊ฐœ์„ ๋Œ

 

Prefix-Tuning Experiment Results (์ถœ์ฒ˜: Prefix-Tuning paper)

 

LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021)

 

 ํ˜น์‹œ ์ง์ ‘ LM์„ fine-tuning ํ•ด๋ณด๊ฑฐ๋‚˜ LM ๊ด€๋ จ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•ด ๋ณธ ์‚ฌ๋žŒ์ด๋ผ๋ฉด, LoRA๋Š” ์–ด๋””์„ ๊ฐ€ ๊ฐ„๊ฐ„ํžˆ ๋“ค์–ด๋ณผ ์ˆ˜ ์žˆ์—ˆ์„ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๊ทธ๋งŒํผ, ํ˜„์žฌ๋„ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” PEFT method ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ, ์ด LoRA๋ฅผ ๋ฒ ์ด์Šค๋กœ ํ•ด์„œ ์–‘์žํ™”๋ฅผ ์ ์šฉํ•œ QLoRA๋„ ๋งŽ์ด ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค! ๐Ÿ˜‰ ๊ทธ๋ ‡๋‹ค๋ฉด ๋“ค์–ด๋ณด๊ธด ๋“ค์–ด๋ดค๋Š”๋ฐ,, ์ •ํ™•ํžˆ ์ด LoRA๊ฐ€ ๋ฌด์—‡์ผ๊นŒ? ๐Ÿคญ LoRA์— ๋Œ€ํ•ด ์ฐจ๊ทผ์ฐจ๊ทผ ์•Œ์•„๊ฐ€ ๋ณด๋„๋ก ํ•˜์ž!

 

Low Intrinsic Dimension of LM

 

 LoRA๋Š” ์–ด๋–ค ์•„์ด๋””์–ด๋ฅผ ๋ฒ ์ด์Šค๋กœ ํ•ด์„œ ๊ฐœ๋ฐœ๋œ method์ผ๊นŒ? LoRA๊ฐ€ ์˜๊ฐ์„ ๋ฐ›์€ ๋…ผ๋ฌธ์€ 2020๋…„์— ๋ฐœํ‘œ๋œ ๋…ผ๋ฌธ์ธ 'Intrinsic Dimensionality Explains the Effectiveness of Training Language Model Fine-tuning' (Aghajanyan et al., 2020) ์ด๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋ธ์˜ intrinsic dimension(๋‚ด์žฌ์  ์ฐจ์›)์— ๋Œ€ํ•ด์„œ ์–˜๊ธฐํ•˜๋Š”๋ฐ, ์ด ๋‚ด์žฌ์  ์ฐจ์›์€ ์ด ์ฐจ์›์„ tuning ํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ tuningํ•˜๋Š” ๊ฒƒ๋งŒํผ์˜ ํšจ๊ณผ๊ฐ€ ์žˆ๋Š” ์ฐจ์›์„ ์˜๋ฏธํ•œ๋‹ค. ํ•œ ๋งˆ๋””๋กœ ํ–‰๋ ฌ์—์„œ rank ๊ฐ™์€ ๊ฐœ๋…์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค. ์ด rank๋ฅผ ์‚ฌ์šฉํ•ด์„œ ํ–‰๋ ฌ ์ „์ฒด๋ฅผ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ์ด rank๋“ค์„ tuning ํ•˜์—ฌ ์ „์ฒด ํ–‰๋ ฌ์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ๊ณผ ๊ฐ™์ด ๋ง์ด๋‹ค. 

 

 ์ด๋Ÿฌํ•œ ์•„์ด๋””์–ด์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„ LoRA๋Š” ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋Œ€์‹ ์—, low-rank๋งŒ์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ๋‹ค. rank๋Š” ์•ž์„œ๋„ ๋งํ–ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ ํ–‰๋ ฌ์„ ์ด๋ฃจ๋Š” ๊ธฐ์ €๋“ค์„ ์˜๋ฏธํ•œ๋‹ค.

 

Adapter vs. LoRA

 

 LoRA์— ๋Œ€ํ•ด ์ž์„ธํ•˜๊ฒŒ ์•Œ์•„๋ณด๊ธฐ ์ „์— LoRA์™€ Adapter๋ฅผ ํ•œ ๋ฒˆ ๋น„๊ตํ•ด ๋ณด๋„๋ก ํ•˜์ž. LoRA์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ธฐ๋„ ์ „์— LoRA์˜ ๊ตฌ์กฐ๋ฅผ ๋ณด์—ฌ์ฃผ๋ฉด์„œ Adapter์™€ ๋น„๊ต๋ฅผ ํ•œ๋‹ค๋‹ˆ, ๋„ˆ๋ฌดํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ์ •๋ง ์ค‘์š”ํ•œ ์ ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋จผ์ € ๋ณด์—ฌ์ฃผ๊ณ  ์‹œ์ž‘ํ•ด๋ณด๋ ค ํ•œ๋‹ค! ๐Ÿ˜… ์ž์ž, ๋‹ค์Œ์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด ๋‘ method๋Š” ๋ชจ๋‘ down-projection๊ณผ up-projection์„ ํ™œ์šฉํ•œ๋‹ค. ์ด ๋‘˜์€ ์ƒ๋‹นํžˆ ๊ตฌ์กฐ๊ฐ€ ๋น„์Šทํ•œ๋ฐ, ์•ž์„œ ์„ค๋ช…ํ•œ Adapter๋ฅผ ๋– ์˜ฌ๋ฆฌ๋ฉด์„œ ์ดํ•ดํ•ด ๋ณธ๋‹ค๋ฉด ์ข€ ๋” ์‰ฝ๊ฒŒ LoRA๋ฅผ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ ‡๊ฒŒ ์„ค๋ช…ํ•˜๊ธฐ๋„ ์ „์— ๋จผ์ € ๊ตฌ์กฐ๊นŒ์ง€ ๋ณด์—ฌ์ฃผ๋ฉด์„œ ์–˜๊ธฐ๋ฅผ ๊บผ๋‚ด๋ดค๋‹ค! ๐Ÿ˜Š

 

Adapter vs. LoRA (์ถœ์ฒ˜: https://arxiv.org/abs/2110.04366)

 

LoRA: Low-Rank Adaptation

 

 LoRA๋„ ๋‹ค๋ฅธ method๋“ค๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ PLM์€ frozen ์‹œ์ผœ๋‘๊ณ , adapter ๋ถ€๋ถ„๋งŒ์„ ํŠœ๋‹ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค. ์ด๋•Œ adapter๋Š” ์˜ค์ง low-rank๋งŒ์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ข€ ๋” ํšจ์œจ์  ๋ฐ ํšจ๊ณผ์ ์œผ๋กœ ์—…๋ฐ์ดํŠธ๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค. ์ด๋•Œ LoRA์˜ tuning ๊ณผ์ •์„ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๋‹ค์Œ ์ˆ˜์‹์—์„œ $W_{0}$์€ ๊ธฐ์กด์˜ PLM weight๋ฅผ ์˜๋ฏธํ•˜๊ณ , $\Delta W$๋Š” adapter ๋ถ€๋ถ„์—์„œ ์—…๋ฐ์ดํŠธ๋œ weight๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด adapter ๋ถ€๋ถ„์€ A์™€ B๋กœ ๋‚˜๋‰˜์–ด ์žˆ๋Š”๋ฐ ๊ฐ๊ฐ์€ down-projection๊ณผ up-projection์ด๋‹ค. ํ•œ ๋งˆ๋””๋กœ, LoRA๋Š” ๊ธฐ์กด์˜ PLM weight์— adapter ๋ถ€๋ถ„์˜ updated weight๋ฅผ ๋”ํ•˜๋Š” ๋ฐฉ์‹์ธ ๊ฒƒ์ด๋‹ค! ๐Ÿ˜Š 

 

 

 ์—ฌ๊ธฐ์— ์ด์ œ input $x$๋ฅผ ์ถ”๊ฐ€ํ•˜๊ฒŒ ๋˜๋ฉด ์ตœ์ข… output $h$๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค. 

 

 

LoRA (์ถœ์ฒ˜: LoRA paper)

 

LoRA Experiment Results

 

 LoRA์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ์‚ดํŽด๋ณด๋ฉด, ์ƒ๋‹นํžˆ ์ ์€ ์–‘์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ๋กœ ์–ด๋–จ ๋•Œ๋Š” ๊ธฐ์กด์˜ fine-tuning๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ๋•Œ๋„ ์žˆ์—ˆ๋‹ค! ๐Ÿ˜ฎ ์ด์ œ ์ด ์ •๋„๋กœ๋Š” ๋ณ„๋กœ ๋†€๋ž์ง€๋„ ์•Š์„ ๋งŒํผ ์ ์€ ์ˆ˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ๋งŒ์œผ๋กœ๋„ fine-tuning์— ๋ฒ„๊ธˆ๊ฐ€๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค. 

 

 ์‹ค์ œ LoRA paper์—์„œ๋Š” ๋” ๋งŽ์€ ์‹คํ—˜์„ ํ†ตํ•ด LoRA๋ฅผ ์–ด๋А ๋ถ€๋ถ„์— ์ ์šฉํ•ด์•ผ ์„ฑ๋Šฅ์ด ๋” ์ข‹๊ณ , ์ตœ์ ์˜ rank ์ˆ˜๊ฐ€ ์–ผ๋งˆ์ธ์ง€์— ๋Œ€ํ•ด์„œ ์ž์„ธํ•˜๊ฒŒ ๋ถ„์„ํ•ด ๋ณด๋Š”๋ฐ, ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ๋”ฐ๋กœ ๊ทธ ๋ถ€๋ถ„์„ ๋‹ค๋ค„๋ณด๋„๋ก ํ•˜์ง€๋Š” ์•Š๊ฒ ๋‹ค. ์–‘ํ•ด ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค,, ๐Ÿฅฒ

 

Overall View of PEFT Methods

 

 ์ด๋ ‡๊ฒŒ ํ•ด์„œ ์—ฌ๋Ÿฌ PEFT method์— ๋Œ€ํ•ด์„œ ์ „๋ฐ˜์ ์œผ๋กœ ์•Œ์•„๋ณด์•˜๋‹ค. ๋ฌผ๋ก  Adapter, Prefix-Tuning, LoRA ์™ธ์—๋„ ๋” ๋งŽ์€ PEFT method(P-Tuning, (IA)3, etc.)๊ฐ€ ์กด์žฌํ•˜๋‚˜, ๋‹ค ์•Œ์•„๋ณผ ์ˆ˜ ์—†๊ธฐ์—, ๊ตต์งํ•œ method๋“ค์— ๋Œ€ํ•ด์„œ๋งŒ ์•Œ์•„๋ณด์•˜๋‹ค! ๐Ÿ˜… ์ด๋ ‡๊ฒŒ ํ•ด์„œ ์ง€๊ธˆ๊นŒ์ง€ ์•Œ์•„๋ณธ Adapter, Prefix-Tuning, LoRA๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™๋‹ค. 

 

Overall View of PEFT Methods (์ถœ์ฒ˜: https://arxiv.org/abs/2110.04366)

 

So, WHY?? Why PEFT performs well?? ๐Ÿง

 ์‚ฌ์‹ค ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์‚ฌ์‹ค์— ๋Œ€ํ•ด์„œ ๋งํ•˜์ง€ ์•Š์•˜๋Š”๋ฐ, ๊ทธ๋ž˜์„œ ์™œ! ์™œ? ์™œ PEFT method๋“ค์ด ์ž˜ ๋˜๋Š” ๊ฒƒ์ผ๊นŒ? full fine-tuning์˜ ์„ฑ๋Šฅ๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์˜คํžˆ๋ ค ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ธฐ๋„ ํ–ˆ๋Š”๋ฐ, ๋„๋Œ€์ฒด ์–ด๋–ป๊ฒŒ ๊ทธ๋Ÿด ์ˆ˜ ์žˆ์—ˆ๋˜ ๊ฒƒ์ผ๊นŒ? ์ด ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ๋Š” ํ•„์ž์˜ ์ƒ๊ฐ์ด ํ•จ์œ ๋˜์–ด ์žˆ๋Š” ์ด์œ ๋ฅผ ์–˜๊ธฐํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค.

 

 ์ฒซ ๋ฒˆ์งธ๋กœ PEFT๊ฐ€ ์„ฑ๋Šฅ์ด ์ข‹์•˜๋˜ ์ด์œ ๋Š” 'PLM์„ ๊ฑด๋“œ๋ฆฌ์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ'์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค. PLM์˜ weight๋Š” general purpose corpora๋กœ๋ถ€ํ„ฐ ํ•™์Šต๋œ ๋งค์šฐ ํ€„๋ฆฌํ‹ฐ ์ข‹์€ ๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋ฐ, ์ด ๋Šฅ๋ ฅ์€ fine-tuning์„ ๊ฑฐ์น˜๋ฉด์„œ ์–ด๋А ํŠน์ • ๋„๋ฉ”์ธ์— ํŠนํ™”๋˜๋„๋ก ๋ณ€ํ•ด๊ฐ„๋‹ค. ์ด ๊ณผ์ •์—์„œ PLM์ด ๊ฐ€์ง€๊ณ  ์žˆ๋˜ ํ€„๋ฆฌํ‹ฐ ์ข‹์€ ๋Šฅ๋ ฅ์ด ์™€ํ•ด๋˜๊ธฐ ๋•Œ๋ฌธ์—, ์ด PLM์— ์˜ํ–ฅ์„ ์ ๊ฒŒ ๋ผ์นœ PEFT method๊ฐ€ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ผ ์ˆ˜ ์žˆ์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

 

 ๋‘ ๋ฒˆ์งธ๋กœ ์ง€๊ธˆ์˜ fine-tuning์€ ๋ถˆํ•„์š”ํ•  ์ •๋„๋กœ '์ง€๋‚˜์น˜๊ฒŒ ๋งŽ์€ ์–‘์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ'ํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค. LoRA์—์„œ ๋ดค๋˜ ๊ฒƒ์ฒ˜๋Ÿผ ์‹ค์ œ ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ ์ค‘์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค๋งŒ์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ full fine-tuning์— ํ•„์ ํ•˜๋Š” ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ๊ตณ์ด ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•  ํ•„์š” ์—†์ด ์ด๋ ‡๊ฒŒ ๋”์šฑ ์ค‘์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ฒŒ ๋” ๋‚ซ์ง€ ์•Š์„๊นŒ? ๋ฌผ๋ก  ํ™•์‹คํ•˜๊ฒŒ ์•Œ ์ˆ˜๋Š” ์—†์œผ๋‚˜, ํ•„์ž๋Š” ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ์ง€๊ธˆ์˜ fine-tuning์ด ๋ถˆํ•„์š”ํ•  ์ •๋„๋กœ ๋งŽ์€ ์–‘์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ ์ง€๋งŒ ์ค‘์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” PEFT๊ฐ€ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

 

 ๋ฌผ๋ก  ์ด ์ฃผ์žฅ์—๋Š” ์˜ค๋ฅ˜๊ฐ€ ์žˆ์„ ์ˆ˜๋„ ์žˆ๋‹ค. ๋งŒ์•ฝ ์ž˜๋ชป๋œ ๋ถ€๋ถ„์ด๋‚˜ ์‹ค์ œ๋กœ ํ™•์ธ๋œ ๋ถ€๋ถ„๋“ค์ด ์กด์žฌํ•œ๋‹ค๋ฉด, ๋Œ“๊ธ€๋กœ ์•Œ๋ ค์ค„ ์ˆ˜ ์žˆ๊ธธ ๋ฐ”๋ž€๋‹ค! ๐Ÿ˜Š

 

How to use PEFT Methods? ๐Ÿซค

 ์ด๋ ‡๊ฒŒ ํ•ด์„œ ํ™•์‹คํžˆ PEFT Method๊ฐ€ ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์ ์œผ๋กœ fine-tuning์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์ค€๋‹ค๋Š” ์‚ฌ์‹ค์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์ด์ œ๋Š” ๋‚ด๊ฐ€ ์ง€๊ธˆ๊นŒ์ง€ ํ•ด๋ณด๊ณ  ์‹ถ์—ˆ์ง€๋งŒ, ์ปดํ“จํŒ… ์ž์›์˜ ํ•œ๊ณ„์— ๋ถ€๋”ชํ˜€์„œ ๊ทธ๋Ÿฌ์ง€ ๋ชปํ–ˆ๋˜ LM์˜ fine-tuning์„ PEFT๋ฅผ ํ™œ์šฉํ•ด์„œ ์ง์ ‘ ํ•ด๋ณผ ์‹œ๊ฐ„์ด๋‹ค! ๐Ÿค“

 

 ๋‹คํ–‰ํžˆ๋„ ์ง์ ‘ ๊ตฌํ˜„ํ•ด์„œ ์‚ฌ์šฉํ•  ํ•„์š” ์—†์ด ์šฐ๋ฆฌ์˜ HuggingFace์˜ PEFT library์— ๋Œ€๋ถ€๋ถ„์˜ PEFT method๋“ค์ด ๊ตฌํ˜„๋˜์–ด ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ PEFT๋ฅผ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค! ์ •ํ™•ํ•œ ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ์˜ ๋งํฌ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ธธ ๋ฐ”๋ž€๋‹ค! ๐Ÿ˜Š

 

https://huggingface.co/docs/peft/index

 

PEFT

๐Ÿค— Accelerate integrations

huggingface.co

 

One ray of hope, PEFT! โœจ

 ์ด ๊ธ€์˜ ๋งˆ์ง€๋ง‰์„ ๋ณด๋Š” ์—ฌ๋Ÿฌ๋ถ„๊ป˜ ๋งˆ์ง€๋ง‰์œผ๋กœ ์งˆ๋ฌธ๋“œ๋ฆฌ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. PEFT๋ฅผ ๋ณด๋ฉด์„œ ์—ฌ๋Ÿฌ๋ถ„์€ ์–ด๋–ค ์ƒ๊ฐ์„ ํ•˜์…จ์Šต๋‹ˆ๊นŒ? ์šฐ์„  ํ•„์ž์˜ ์ƒ๊ฐ๋ถ€ํ„ฐ ๋จผ์ € ๋งํ•ด๋ณด์ž๋ฉด, ํ•„์ž๋Š” ์ด์ „ ํฌ์ŠคํŒ…์—์„œ๋„ ์–ธ๊ธ‰ํ–ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ ์ปดํ“จํŒ… ์ž์›์ด ์ƒ๋‹นํžˆ ์ œํ•œ๋˜์–ด ์žˆ๋Š” ํ•™์ƒ์ด๋‹ค.. ๋”ฐ๋ผ์„œ ์ƒˆ๋กœ์šด ๋ชจ๋ธ๋“ค์ด ๋‚˜์˜จ๋‹ค๊ณ  ํ•ด๋„ ์จ๋ณผ ์—„๋‘์กฐ์ฐจ ๋ชป ๋‚ด๊ณ  ๋‚จ๋“ค์ด fine-tuning ํ•ด์„œ ๋‚ด๋†“์€ ๋ชจ๋ธ๋“ค์„ ๋ฐ”๋ผ๋ณด๊ธฐ๋งŒ ๊ธ‰๊ธ‰ํ–ˆ์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ๊ทธ๋Ÿด์ˆ˜๋ก ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ fine-tuningํ•ด์„œ ์‚ฌ์šฉํ•ด๋ณด๊ณ  ์‹ถ์€ ๋งˆ์Œ์€ ์ ์  ์ปค์ ธ๋งŒ ๊ฐ”๊ณ , ์ด๋Ÿฌํ•œ ๋งˆ์Œ์„ ํ•ด๊ฒฐํ•ด ์คฌ๋˜ ๊ฒƒ์€ ๋‹ค๋ฅธ ๋ฌด์—‡๋„ ์•„๋‹Œ PEFT์˜€๋‹ค. PEFT ๋•๋ถ„์— full fine-tuning์— ๋น„ํ•ด ์ƒ๋‹นํžˆ ์ ์€ ๋น„์šฉ์œผ๋กœ ๋ชจ๋ธ์„ ํŠœ๋‹์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด๋Ÿฐ ํ•„์ž์˜ ์ž…์žฅ์—์„œ๋Š”PEFT๊ฐ€ ์ •๋ง 'ํ•œ ์ค„๊ธฐ ํฌ๋ง'์ด์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๐Ÿ˜Š

 

 ๋ฌผ๋ก  ์ด๋Ÿฌํ•œ ์ƒ๊ฐ์€ ๊ฐ์ž์˜ ์ž…์žฅ๋งˆ๋‹ค ๋‹ค๋ฅด๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ์ปดํ“จํŒ… ์ž์›์ด ์ถฉ๋ถ„ํ•œ ์ž…์žฅ์—์„œ๋Š” ๊ตณ์ด PEFT๋ฅผ ํ™œ์šฉํ•˜์ง€ ์•Š๊ณ , full fine-tuning์„ ํ•ด๋„ ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ํ•˜์ง€๋งŒ, ํ•™์ƒ์ด๋‚˜ ์—ฐ๊ตฌ์ž์˜ ์ž…์žฅ์—์„œ PEFT๋Š” ์ •๋ง ํฐ ๋„์›€์ด ๋œ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•„์ž๋Š” ์–ด๋А ํ•œ ๋ถ„์•ผ๊ฐ€ ๋ฐœ์ „ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ทธ ๋ถ„์•ผ์— ๋Œ€ํ•œ ์ ‘๊ทผ์„ฑ๊ณผ ํ™•์žฅ์„ฑ์ด ๋ชจ๋‘ ๊ฐ–์ถฐ์ค˜์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ์ง€๊ธˆ๊นŒ์ง€์˜ NLP๋Š” LLM์œผ๋กœ ์ธํ•ด ์ ‘๊ทผ์„ฑ๊ณผ ํ™•์žฅ์„ฑ ๋ชจ๋‘ ์ œํ•œ์„ ๋ฐ›์•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š”๋ฐ, Open-source LM & PEFT ๋“ฑ์„ ํ†ตํ•ด์„œ ๋‹ค์‹œ๊ธˆ ์ ‘๊ทผ์„ฑ๊ณผ ํ™•์žฅ์„ฑ์„ ๋˜์ฐพ์•„์™”๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๐Ÿ˜Š ์•ž์œผ๋กœ ์ด ๋ถ„์•ผ์— ๋ฌด๊ถ๋ฌด์ง„ํ•œ ๋ฐœ์ „๋“ค์ด ์ผ์–ด๋‚  ์ˆ˜ ์žˆ๋„๋ก ๋นŒ๋ฉฐ ํฌ์ŠคํŒ…์„ ๋งˆ์ณ๋ด…๋‹ˆ๋‹ค! ์ฝ์–ด์ฃผ์‹  ๋ชจ๋“  ๋ถ„๊ป˜ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค!