Chinchilla

Insight ๐Ÿ˜Ž

How has scaling law developed in NLP? ๐Ÿค” - NLP์—์„œ scaling law๋Š” ์–ด๋–ป๊ฒŒ ๋ฐœ์ „๋˜์—ˆ์„๊นŒ?

Before Starting.. 2017๋…„ NLP๋ฅผ ํฌํ•จํ•œ ์ง€๊ธˆ๊นŒ์ง€์˜ ๋”ฅ๋Ÿฌ๋‹์˜ ํŒ๋„๋ฅผ ๋’ค์ง‘์–ด์—Ž๋Š” ํ˜์‹ ์ ์ธ ๋ชจ๋ธ์ธ 'Transformer'๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ค„๋ณผ ๋‚ด์šฉ์€ Transformer์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์ด ์•„๋‹ˆ๊ธฐ์— ๋”ฐ๋กœ ๊นŠ์ด ์•Œ์•„๋ณด์ง€๋Š” ์•Š๊ฒ ์ง€๋งŒ, ์ด๋ฒˆ ํฌ์ŠคํŒ…์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ์— ๋Œ€ํ•ด์„œ๋Š” ์•Œ์•„๋‘˜ ํ•„์š”๊ฐ€ ์žˆ๋‹ค. Transformer์˜ ์‚ฌ์ด์ฆˆ๋Š” 465M ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง€๋Š” ๋ชจ๋ธ์ด์—ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋ถˆ๊ณผ 3๋…„ ๋งŒ์— ์ด ์‚ฌ์ด์ฆˆ๊ฐ€ ์ •๋ง ์ž‘๊ฒŒ ๋Š๊ปด์ง€๊ฒŒ ํ•  ๋งŒํผ ํฐ ์‚ฌ์ด์ฆˆ์˜ ๋ชจ๋ธ์ธ GPT-3(175B)๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋˜์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ˜„์žฌ๊นŒ์ง€๋„ ์ด๋ณด๋‹ค ๋” ํฐ ๋ชจ๋ธ๋“ค์€ ๊ณ„์† ๋‚˜์˜ค๊ณ  ์žˆ๋‹ค. LM์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ด๋ ‡๊ฒŒ ์ ์  ์ปค์ง€๊ฒŒ ๋œ ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ? ๊ทธ ์ด์œ ๋Š” Kaplan et al. 2020..

Cartinoe
'Chinchilla' ํƒœ๊ทธ์˜ ๊ธ€ ๋ชฉ๋ก