Paper Reading ๐Ÿ“œ/Natural Language Processing

LoRA: Low-Rank Adaptation of Large Language Models ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

Cartinoe 2023. 5. 26. 22:35

 ์ด๋ฒˆ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋Š” ๊ธฐ์กด ๋ฐฉ์‹๊ณผ ๋‹ค๋ฅด๊ฒŒ powerpoint๋กœ ์ž‘์„ฑํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์˜ ๊ฐ„๋‹จํ•œ ๊ฐœ์š”๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๊ณ , ๋…ผ๋ฌธ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ฒจ๋ถ€๋œ powerpoint ํŒŒ์ผ์„ ํ™•์ธํ•˜๊ธธ ๋ฐ”๋ž€๋‹ค. powerpoint์˜ ๋ฉ”๋ชจ์™€ ์Šฌ๋ผ์ด๋“œ ๋…ธํŠธ์— ์„ค๋ช…์„ ์ ์–ด๋’€์œผ๋‹ˆ ์ฐธ๊ณ ํ•˜๊ธธ ๋ฐ”๋ž€๋‹ค. ์ด ํฌ์ŠคํŒ…์€ ๋‹ค์Œ์˜ ์œ ํŠœ๋ธŒ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ์ž‘์„ฑ๋˜์—ˆ๋‹ค. 

 

The overview of this paper

 NLP์˜ ์ค‘์š” ํŒจ๋Ÿฌ๋‹ค์ž„์€ general domain ๋ฐ์ดํ„ฐ์—์„œ ๋Œ€๊ทœ๋ชจ pre-training์„ ํ•˜๊ณ  ํŠน์ • task ๋˜๋Š” domain์— ์ ์šฉ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. larger model์„ pre-train ํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์žฌํ•™์Šตํ•˜๋Š” full fine-tuning์€ ์‹คํ–‰ ๊ฐ€๋Šฅ์„ฑ์ด ๋–จ์–ด์ง„๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” pre-trained model์˜ ๊ฐ€์ค‘์น˜๋ฅผ freezeํ•˜๊ณ  ํ•™์Šต ๊ฐ€๋Šฅํ•œ rank decomposition ํ–‰๋ ฌ๋“ค์„ Transformer architecture์˜ ๊ฐ layer์— ์ฃผ์ž…ํ•˜๋Š” LoRA๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Š” downstream task๋ฅผ ์œ„ํ•œ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ˆ˜๋ฅผ ์ค„์ธ๋‹ค. LoRA๋Š” fine-tuning๋ณด๋‹ค ๋” ์ ์€ training ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋” ๋†’์€ training throughput์„ ๋ณด์—ฌ์ฃผ๊ณ  ์ถ”๊ฐ€์  inference latency๊ฐ€ ์—†๋‹ค.

 

LoRA.pptx
0.66MB

 

 

 

 

 

์ถœ์ฒ˜

https://arxiv.org/abs/2106.09685

 

LoRA: Low-Rank Adaptation of Large Language Models

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes le

arxiv.org

https://www.youtube.com/watch?v=BJqwmDpa0wM