Insight ๐Ÿ˜Ž

Insight ๐Ÿ˜Ž

Noise makes LLM better! - NEFTune ๐Ÿ˜‰

What is the big difference of NLP compared to CV? ๐Ÿ˜ฎ ์ด ํฌ์ŠคํŒ…์˜ ์ œ๋ชฉ๋ถ€ํ„ฐ ํ•ด์„œ ์˜์•„ํ•œ ๋ถ€๋ถ„์ด ํ•œ๋‘ ๊ฐ€์ง€๊ฐ€ ์•„๋‹ ๊ฒƒ์ด๋‹ค. ๊ฐ‘์ž๊ธฐ ๋’ค๋Œ์•„๋ด์•ผ ํ•œ๋‹ค๋Š๋‹ˆ CV์™€ NLP์˜ ๊ฐ€์žฅ ํฐ ์ฐจ์ด์ ์ด ๋ฌด์—‡์ธ์ง€์— ๋Œ€ํ•ด ๋ฌป์ง€๋ฅผ ์•Š๋‚˜. ํ•˜์ง€๋งŒ ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋งํ•˜๊ณ ์ž ํ•˜๋Š” ๋‚ด์šฉ์„ ์œ„ํ•ด์„œ๋Š” ์ด ์ฐจ์ด์ ์„ ๋˜์งš์–ด๋ณด์•„์•ผ ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค! ๊ทธ๋ ‡๋‹ค๋ฉด ๋จผ์ € ๋…์ž๋ถ„๋“ค๊ป˜ ์งˆ๋ฌธํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค. NLP๊ณผ CV์˜ ๊ฐ€์žฅ ํฐ ์ฐจ์ด์ ์€ ๋ฌด์—‡์ผ๊นŒ? ์•„๋งˆ๋„ ์ด๋ ‡๊ฒŒ ์ถ”์ƒ์ ์œผ๋กœ ์งˆ๋ฌธํ•œ๋‹ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‹ต๋ณ€๋“ค์ด ๋‚˜์˜ฌ ๊ฒƒ์ด๋ผ ์ƒ๊ฐํ•œ๋‹ค. ๐Ÿ˜ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ค๋ฆ„. (text & image) ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ๋“ค์˜ ์ฐจ์ด ํ•™์Šต ๋ฐฉ์‹์˜ ์ฐจ์ด ๋ฌผ๋ก  ์œ„์™€ ๊ฐ™์€ ๋‹ต๋ณ€๋“ค๋„ ๋งž์ง€๋งŒ, ํ•„์ž๊ฐ€ ๋ณธ ํฌ์ŠคํŒ…์—์„œ ๋งํ•˜๊ณ ์ž ํ•˜๋Š” ๋‘ ์—ฐ๊ตฌ๊ณ„์˜ ๊ฐ€์žฅ ํฐ ์ฐจ..

Insight ๐Ÿ˜Ž

์ด์ œ๋Š” ChatGPT๋ฅผ fine-tuning ํ•  ์‹œ๊ฐ„!! โฐ

What a BIG NEWS!!! ๐Ÿ“ฐ ์ตœ๊ทผ ๋“ค์–ด ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŒ…์„ ์˜ฌ๋ฆฌ๋Š” ๊ฒƒ์ด ๋œธํ•ด์กŒ๋Š”๋ฐ, ์˜ค๋Š˜ ์ •๋ง ๋†€๋ผ์šด ์†Œ์‹์„ ์ ‘ํ•˜๊ฒŒ ๋˜์–ด์„œ ์ด๋ ‡๊ฒŒ ์˜ค๋ž˜๊ฐ„๋งŒ์— ์ฐพ์•„์˜ค๊ฒŒ ๋˜์—ˆ๋‹ค. ๋ฐ”๋กœ ๋ณธ๋ก ์œผ๋กœ ๋“ค์–ด๊ฐ€์„œ ์šฐ๋ฆฌ๋‚˜๋ผ ์‹œ๊ฐ„์œผ๋กœ๋Š” ์˜ค๋Š˜! (๋ฌผ๋ก  ๋ฏธ๊ตญ ์‹œ๊ฐ„์œผ๋กœ๋Š” 8์›” 22์ผ์ด๊ธด ํ•˜๋‹ค ๐Ÿ˜) ๋“œ๋””์–ด OpenAI์—์„œ ์ด๋“ค์˜ ๊ฐ•๋ ฅํ•œ ์–ธ์–ด ๋ชจ๋ธ์ธ ChatGPT(gpt-3.5-turbo)์— ๋Œ€ํ•ด์„œ fine-tuning์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“ค์—ˆ๋‹ค!! ๐Ÿซข ๊ทธ๋ž˜์„œ ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” OpenAI์—์„œ ์ด ์†Œ์‹์„ ์•Œ๋ฆฌ๊ธฐ ์œ„ํ•ด ์˜ฌ๋ฆฐ ๊ธ€์„ ํ† ๋Œ€๋กœ ์–ด๋–ป๊ฒŒ ChatGPT๋ฅผ fuine-tuning ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ๊ทธ ์ž์„ธํ•œ ๋‚ด์šฉ๋“ค๊ณผ ์„ธ๋ถ€ ์‚ฌํ•ญ๋“ค์— ์•Œ์•„๋ณด๋ ค๊ณ  ํ•œ๋‹ค! ๐Ÿค— ์ด ํฌ์ŠคํŒ…์€ OpenAI์˜ ๊ธ€์„ ํ† ๋Œ€๋กœ ์ž‘์„ฑ๋˜์—ˆ์œผ๋‹ˆ ๋”์šฑ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ํ™•์ธํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ๋‹ค์Œ์˜ ..

Insight ๐Ÿ˜Ž

Fine-tuning method์˜ ๋ฐœ์ „ ๊ณผ์ •!! Fine-tuning๋ถ€ํ„ฐ RLHF๊นŒ์ง€ ๐Ÿฆ–โžก๏ธ๐Ÿง‘

A new spectrum of model learning, Fine-tuning โœจ ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ค„๋ณด๊ณ ์ž ํ•˜๋Š” ๋‚ด์šฉ์€ ๋ชจ๋ธ์˜ fine-tuning ๋ฐฉ์‹์— ๋Œ€ํ•ด์„œ์ด๋‹ค. ์‚ฌ์‹ค ํฌ์ŠคํŒ…์˜ ์ˆœ์„œ๊ฐ€ ๋ฌด์–ธ๊ฐ€ ์ž˜๋ชป๋˜์—ˆ๋‹ค๋Š” ์‚ฌ์‹ค์„ ๋Š๋ผ๊ณ  ์žˆ๊ธฐ๋Š” ํ•œ๋ฐ, ๊ทธ ์ ์€ ์–‘ํ•ด๋ฅผ ๋ถ€ํƒํ•œ๋‹ค..!! ๐Ÿ˜… ์ €๋ฒˆ ์‹œ๊ฐ„์— ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์ ์ธ fine-tuning์„ ์•Œ์•„๋ณด๋ฉด์„œ fine-tuning์„ ํšจ์œจ์ ์œผ๋กœ ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ดค๋Š”๋ฐ, ๊ทธ๋ ‡๋‹ค๋ฉด fine-tuning์„ ์ข€ ๋” ํšจ๊ณผ์ ์œผ๋กœ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์€ ์—†์„๊นŒ? ๋‹น์—ฐํžˆ ์žˆ๋‹ค!! ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” fine-tuning method๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™” ํ•ด๋‚˜๊ฐ”๋Š”์ง€์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ณ ์ž ํ•œ๋‹ค. ์ž, ๊ทธ๋ ‡๋‹ค๋ฉด fine-tuning์ด ๋ฌด์—‡์ผ๊นŒ? ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋งํ–ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ ์ง€๊ธˆ์˜ ์ˆ˜๋งŽ์€ language..

Insight ๐Ÿ˜Ž

ํ•œ ๋‹จ๊ณ„, ํ•œ ๋‹จ๊ณ„์”ฉ ์ธ๊ฐ„์ฒ˜๋Ÿผ ์ƒ๊ฐํ•ด๋ณด์ž! ๐Ÿง ๐Ÿค”

Let's think step-by-step! ๐Ÿชœ ํฌ์ŠคํŒ…์˜ ์ œ๋ชฉ๊ณผ ์ด ์„น์…˜์˜ ์ œ๋ชฉ์„ ๋ดค์„ ๋•Œ ์˜์•„ํ•˜๊ฒŒ ์ƒ๊ฐํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์ด ์žˆ์„ ๊ฒƒ์ด๋‹ค. '์•„๋‹ˆ ์ด ์‚ฌ๋žŒ, NLP ๊ด€๋ จ ์–˜๊ธฐ ์ž˜๋งŒ ํ•˜๋‹ค๊ฐ€ ๊ฐ‘์ž๊ธฐ ๋ฌด์Šจ ๋šฑ๋”ด์ง€๊ฐ™์€ ์†Œ๋ฆฌ๋ž˜? ๐Ÿคจ' ์ถฉ๋ถ„ํžˆ ๊ทธ๋Ÿด ์ˆ˜ ์žˆ๋‹ค! ํ•˜์ง€๋งŒ, NLP ๊ด€๋ จ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ดค๊ฑฐ๋‚˜ ์ตœ์‹  method๋“ค์— ๋Œ€ํ•ด ์ž˜ ์•Œ๊ณ  ์žˆ๋Š” ์‚ฌ๋žŒ์ด๋ฉด ํ•„์ž๊ฐ€ ๋ฌด์Šจ ์†Œ๋ฆฌ๋ฅผ ํ•˜๊ณ  ์‹ถ์–ด ํ•˜๋Š” ๊ฒƒ์ธ์ง€๋ฅผ ์•Œ ๊ฒƒ์ด๋ผ ์ƒ๊ฐํ•œ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์ด ์„น์…˜์˜ ์ œ๋ชฉ์ด 'Let's think step-by-step'์€ ์ด ํฌ์ŠคํŒ…์„ ๊ด€ํ†ตํ•˜๋Š” ๋ฌธ์žฅ์ด์ž, ์œ ๋ช…ํ•œ ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉ๋œ method์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด๊ฒŒ ๋ฌด์Šจ ์†Œ๋ฆฌ๋ƒ๊ตฌ์š”? ๊ถ๊ธˆํ•˜์‹œ๋‹ค๋ฉด, LM์ด ์‚ฌ๋žŒ๊ณผ ๋น„์Šทํ•œ ๋ฐฉ์‹์œผ๋กœ ์‚ฌ๊ณ ๋ฅผ ํ•ด์„œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ฒŒ ํ•˜๊ณ ์ž ํ•œ method๋“ค์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋Š” ์ด๋ฒˆ ํฌ์ŠคํŒ…์„ ๋..

Insight ๐Ÿ˜Ž

๋‹น์‹ ๋„ Fine-tuning ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! with PEFT ๐Ÿค—

The current trend of LM ๐Ÿ“ˆ 2017๋…„ Vaswani ๊ป˜์„œ 'Attention Is All You Need'๋ผ๋Š” ๋…ผ๋ฌธ์œผ๋กœ Transformer๋ฅผ ์ฒ˜์Œ ์†Œ๊ฐœํ•˜์‹œ๊ณ , ๊ทธ ํ›„ 2018๋…„์— BERT์™€ GPT๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋˜๋ฉด์„œ๋ถ€ํ„ฐ LM(Language Model)์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋Š” ๊ทธ ์‹œ์ž‘์„ ์•Œ๋ ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด ๋‹น์‹œ์— ์†Œ๊ฐœ๋˜์—ˆ๋˜ pre-training & fine-tuning์ด๋ผ๋Š” ๊ฐœ๋…์€ ์•„์ง๊นŒ์ง€๋„ ๋„๋ฆฌ ์‚ฌ์šฉ๋  ์ •๋„๋กœ ํฌ๋‚˜ํฐ LM์˜ framework๋ฅผ ์ด๋ฃจ๊ฒŒ ๋˜์—ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ์•Œ์•„๋ณด๊ฒŒ ๋  PEFT(์ž์„ธํ•œ ๋œป์€ ์กฐ๊ธˆ ๋’ค์— ์•Œ๋ ค๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค! ๐Ÿ˜„)๋„ ์ด ์ค‘ fine-tuning์— ๊ด€๋ จ๋œ method์ด๋‹ค. PEFT์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ธฐ ์ „์— ์ด pre-training๊ณผ fine-tuning์ด ๊ณผ์—ฐ ์ •ํ™•ํžˆ ..

Insight ๐Ÿ˜Ž

ChatGPT์˜ ์„ฑ๋Šฅ์ด ์•ˆ ์ข‹์•„์ง€๊ณ  ์žˆ๋‹ค๊ตฌ?!?!? ๐Ÿ˜ฒ๐Ÿ˜ฒ

Did you hear that..? ๐Ÿ˜ฑ ์š”์ฆ˜ ์„ธ๊ฐ„์— ๋– ๋„๋Š” ํ•˜๋‚˜์˜ ์†Œ๋ฌธ์ด ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. ์ด์ œ๋Š” ์šฐ๋ฆฌ์—๊ฒŒ ์นœ์ˆ™ํ•ด์ง„, ์˜คํžˆ๋ ค ์—†์œผ๋ฉด ๋ถˆํŽธํ•จ์„ ๋Š๋‚„ ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ ๊ฐ€๊นŒ์›Œ์ง„ ChatGPT์˜ ์„ฑ๋Šฅ์ด ์•ˆ ์ข‹์•„์กŒ๋‹ค๋Š” ์†Œ๋ฌธ์ด๋‹ค!! ๐Ÿ˜ฎ ์‹ค์ œ ์–ด๋–ค ์†Œ๋ฌธ๋“ค์ด ์žˆ๋Š”์ง€์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ธฐ ์ „์— ์šฐ์„  ์ตœ๊ทผ ChatGPT์™€ GPT-4์˜ ์ •ํ™•ํ•œ ์ฐจ์ด์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ณ , ์ตœ๊ทผ ์ด ๋ชจ๋ธ๋“ค์— ์ƒ๊ธด ๋ณ€ํ™”์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž. ChatGPT์™€ GPT-4๋Š” ๊ทธ ์‚ฌ์šฉ๋œ ๋ชจ๋ธ์— ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค. ChatGPT๋Š” GPT-3.5์— RLHF๋ฅผ ์ง„ํ–‰ํ•œ ๋ชจ๋ธ์ด๊ณ , GPT-4๋Š” ๋ง ๊ทธ๋Œ€๋กœ GPT-3.5์—์„œ ํ›จ์”ฌ ๋” ๋ฐœ์ „๋œ GPT-4 ๋ชจ๋ธ์„ ๋งํ•œ๋‹ค. (GPT-4์— ๋Œ€ํ•ด์„œ๋Š” ์ž์„ธํžˆ ๋ฐํ˜€์ง„ ๊ฒƒ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ •ํ™•ํ•œ ๋น„๊ต๋Š” ๋ถˆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค,, ๐Ÿ˜“) OpenAI์—์„œ ์ œ๊ณต..

Insight ๐Ÿ˜Ž

LM์„ ๊ฐ€์žฅ ์ตœ์ ์œผ๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ผ๊นŒ? ๐Ÿ˜Ž

์ด๋ฒˆ ํฌ์ŠคํŒ…์€ ๊ธฐ์กด์˜ ํฌ์ŠคํŒ…๊ณผ ์‚ด์ง ๋‹ค๋ฅด๊ฒŒ PPT ์ž๋ฃŒ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์„ค๋ช…ํ•˜๋„๋ก ํ•˜๊ฒ ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์˜ ์ฃผ์ œ๋Š” ์ œ๋ชฉ์—์„œ ๋ณด์—ฌ์ง€๋Š” ๊ฒƒ์ฒ˜๋Ÿผ LM์˜ Evaluation metric์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ ธ๋ณด๋ ค๊ณ  ํ•œ๋‹ค! ๐Ÿ˜Š ๊ธฐ์กด์˜ Evaluation metric์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ณ , ๊ธฐ์กด metric๋“ค์— ์–ด๋– ํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š”์ง€ ์•Œ์•„๋ณธ ๋’ค, ๋งˆ์ง€๋ง‰์œผ๋กœ ์–ด๋–ค ๊ฐœ์„ ์•ˆ๋“ค์ด ์ƒ๊ฒจ๋‚ฌ๋Š”์ง€์— ๋Œ€ํ•ด์„œ ํ•œ ๋ฒˆ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค. ๋งŒ์•ฝ PPT๋ฅผ ๋ณด๋ฉด์„œ ๊ถ๊ธˆํ•˜๊ฑฐ๋‚˜ ์˜ค๋ฅ˜๊ฐ€ ์žˆ๋Š” ๊ฒƒ ๊ฐ™์€ ์‚ฌํ•ญ๋“ค์€ PPT ๋˜๋Š” ํฌ์ŠคํŒ…์— ๋Œ“๊ธ€์„ ๋‹ฌ์•„์ฃผ์‹œ๋ฉด ๋‹ต๋ณ€์„ ๋‹ฌ์•„๋†“๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค! ์žฌ๋ฐŒ๊ฒŒ ๋ด์ฃผ์‹ญ์‡ผ! ๐Ÿคฉ https://docs.google.com/presentation/d/1XL_B0nI-yp2dgLDVrEzTlLcg9DpUnALBklmpJ4iOZRw/e..

Insight ๐Ÿ˜Ž

LM์˜ context window, ๊ธธ์–ด์•ผ ํ• ๊นŒ? ์งง์•„์•ผ ํ• ๊นŒ? ๐Ÿ“๐Ÿคจ

Newly spotlighted elements of LM โœจ LM์€ ์‹œ์‹œ๊ฐ๊ฐ ๋ณ€ํ™”ํ•ด๊ฐ€๊ณ  ์žˆ๋‹ค. ๋ฉฐ์น  ์ „์— ์ƒˆ๋กญ๊ฒŒ ๋ฐœํ‘œ๋œ ๋ชจ๋ธ์ด ์˜ค๋Š˜์—์„œ๋Š” ๊ทธ ๋ฉด๋ชจ๊ฐ€ ๋‚ฑ๋‚ฑ์ด ํŒŒ์•…๋˜์–ด ๋ถ€์กฑํ•œ ์ ๋“ค์ด๋‚˜ ๋‹จ์ ๋“ค์ด ์ง€์ ๋ฐ›๊ณ  ์žˆ๋Š” ์š”์ฆ˜์ด๋‹ค. ๐Ÿ˜ฅ ๊ทธ๋งŒํผ LM์€ ๊ทธ๊ฒƒ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋“  ๋ฐ์ดํ„ฐ๋“  ๋‹ค๋ฐฉ๋ฉด์œผ๋กœ ๋น ๋ฅด๊ฒŒ ๋ณ€ํ™”ํ•ด๋‚˜๊ฐ€๊ณ  ์žˆ๋Š”๋ฐ, ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ค„๋ณด๊ณ ์ž ํ•˜๋Š” ๋‚ด์šฉ์€ ์˜ค๋žœ ์‹œ๊ฐ„ ๋™์•ˆ ๋ณ„๋กœ ๊ฑด๋“œ๋ ค์ง€์ง€ ์•Š๋‹ค๊ฐ€ ์ตœ๊ทผ์— ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ(Chen et al., 2023, Ding et al., 2023, Liu et al., 2023)๋ฅผ ํ†ตํ•ด ๋‹ค์‹œ ๊ฐ๊ด‘๋ฐ›๊ณ  ์žˆ๋Š” ๋‚ด์šฉ์ธ LM์˜ context window์— ๋Œ€ํ•ด์„œ ์–˜๊ธฐํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค! ๐Ÿ˜Š What is the 'context window'? ๐Ÿค” ์‹œ์ž‘ํ•˜๊ธฐ์— ์•ž์„œ์„œ ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ค„๋ณผ ๋‚ด์šฉ์ธ ..

Insight ๐Ÿ˜Ž

Closed-source๐Ÿ”’? Open-source๐Ÿ”“? ๊ทธ๊ฒŒ ๋ญ”๋ฐ?? ๐Ÿคจ๐Ÿค”

Starting from ChatGPT ๐Ÿค– which is closed-source ์ž‘๋…„ 12์›”, ์ฆ‰ 2022๋…„ 12์›”์— ์ „ ์„ธ๊ณ„์˜ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์ ์ž–์ด ์‹ ์„ ํ•œ ์ถฉ๊ฒฉ์„ ์ค€ ์‚ฌ๊ฑด์ด ๋ฐœ์ƒํ•˜์˜€๋‹ค. ๋ฐ”๋กœ ๊ทธ ์œ ๋ช…ํ•œ 'ChatGPT'์˜ ๋ฐœํ‘œ๋‹ค! OpenAI์—์„œ ๋ฐœํ‘œํ•œ ์ด ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ(Large Language Model, LLM)์€ ์ง€๊ธˆ๊นŒ์ง€์™€๋Š” ์ฐจ์›์ด ๋‹ค๋ฅธ ์—„์ฒญ๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฉด์„œ ์‚ฌ๋žŒ๋“ค์˜ ์‚ฌํšŒ ๋ฐ ์‚ถ์— ์ „๋ฐ˜์ ์œผ๋กœ ์Šค๋ฉฐ๋“ค์–ด๊ฐ€๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด ์™„๋ฒฝํ•ด ๋ณด์ด๋Š” ChatGPT๋„ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋‹จ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋ฐ, ๊ทธ์ค‘์—์„œ ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ค„๋ณด๊ณ ์ž ํ•˜๋Š” ๋‚ด์šฉ์€ ๋ฐ”๋กœ 'Closed-source' model์ด๋ผ๋Š” ์ ์ด๋‹ค. ๐Ÿšซ closed-source๊ฐ€ ๋ฌด์—‡์ผ๊นŒ? ์ด ์šฉ์–ด๋ฅผ ์ฒ˜์Œ ๋“ฃ๊ฒŒ ๋œ๋‹ค๋ฉด ๋‹ค์†Œ ์ƒ์†Œํ• ํ…๋ฐ, clos..

Insight ๐Ÿ˜Ž

How has scaling law developed in NLP? ๐Ÿค” - NLP์—์„œ scaling law๋Š” ์–ด๋–ป๊ฒŒ ๋ฐœ์ „๋˜์—ˆ์„๊นŒ?

Before Starting.. 2017๋…„ NLP๋ฅผ ํฌํ•จํ•œ ์ง€๊ธˆ๊นŒ์ง€์˜ ๋”ฅ๋Ÿฌ๋‹์˜ ํŒ๋„๋ฅผ ๋’ค์ง‘์–ด์—Ž๋Š” ํ˜์‹ ์ ์ธ ๋ชจ๋ธ์ธ 'Transformer'๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ค„๋ณผ ๋‚ด์šฉ์€ Transformer์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์ด ์•„๋‹ˆ๊ธฐ์— ๋”ฐ๋กœ ๊นŠ์ด ์•Œ์•„๋ณด์ง€๋Š” ์•Š๊ฒ ์ง€๋งŒ, ์ด๋ฒˆ ํฌ์ŠคํŒ…์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ์— ๋Œ€ํ•ด์„œ๋Š” ์•Œ์•„๋‘˜ ํ•„์š”๊ฐ€ ์žˆ๋‹ค. Transformer์˜ ์‚ฌ์ด์ฆˆ๋Š” 465M ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง€๋Š” ๋ชจ๋ธ์ด์—ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋ถˆ๊ณผ 3๋…„ ๋งŒ์— ์ด ์‚ฌ์ด์ฆˆ๊ฐ€ ์ •๋ง ์ž‘๊ฒŒ ๋Š๊ปด์ง€๊ฒŒ ํ•  ๋งŒํผ ํฐ ์‚ฌ์ด์ฆˆ์˜ ๋ชจ๋ธ์ธ GPT-3(175B)๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋˜์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ˜„์žฌ๊นŒ์ง€๋„ ์ด๋ณด๋‹ค ๋” ํฐ ๋ชจ๋ธ๋“ค์€ ๊ณ„์† ๋‚˜์˜ค๊ณ  ์žˆ๋‹ค. LM์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ด๋ ‡๊ฒŒ ์ ์  ์ปค์ง€๊ฒŒ ๋œ ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ? ๊ทธ ์ด์œ ๋Š” Kaplan et al. 2020..

Cartinoe
'Insight ๐Ÿ˜Ž' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก