transformer

Insight ๐Ÿ˜Ž

LM์˜ context window, ๊ธธ์–ด์•ผ ํ• ๊นŒ? ์งง์•„์•ผ ํ• ๊นŒ? ๐Ÿ“๐Ÿคจ

Newly spotlighted elements of LM โœจ LM์€ ์‹œ์‹œ๊ฐ๊ฐ ๋ณ€ํ™”ํ•ด๊ฐ€๊ณ  ์žˆ๋‹ค. ๋ฉฐ์น  ์ „์— ์ƒˆ๋กญ๊ฒŒ ๋ฐœํ‘œ๋œ ๋ชจ๋ธ์ด ์˜ค๋Š˜์—์„œ๋Š” ๊ทธ ๋ฉด๋ชจ๊ฐ€ ๋‚ฑ๋‚ฑ์ด ํŒŒ์•…๋˜์–ด ๋ถ€์กฑํ•œ ์ ๋“ค์ด๋‚˜ ๋‹จ์ ๋“ค์ด ์ง€์ ๋ฐ›๊ณ  ์žˆ๋Š” ์š”์ฆ˜์ด๋‹ค. ๐Ÿ˜ฅ ๊ทธ๋งŒํผ LM์€ ๊ทธ๊ฒƒ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋“  ๋ฐ์ดํ„ฐ๋“  ๋‹ค๋ฐฉ๋ฉด์œผ๋กœ ๋น ๋ฅด๊ฒŒ ๋ณ€ํ™”ํ•ด๋‚˜๊ฐ€๊ณ  ์žˆ๋Š”๋ฐ, ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ค„๋ณด๊ณ ์ž ํ•˜๋Š” ๋‚ด์šฉ์€ ์˜ค๋žœ ์‹œ๊ฐ„ ๋™์•ˆ ๋ณ„๋กœ ๊ฑด๋“œ๋ ค์ง€์ง€ ์•Š๋‹ค๊ฐ€ ์ตœ๊ทผ์— ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ(Chen et al., 2023, Ding et al., 2023, Liu et al., 2023)๋ฅผ ํ†ตํ•ด ๋‹ค์‹œ ๊ฐ๊ด‘๋ฐ›๊ณ  ์žˆ๋Š” ๋‚ด์šฉ์ธ LM์˜ context window์— ๋Œ€ํ•ด์„œ ์–˜๊ธฐํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค! ๐Ÿ˜Š What is the 'context window'? ๐Ÿค” ์‹œ์ž‘ํ•˜๊ธฐ์— ์•ž์„œ์„œ ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ค„๋ณผ ๋‚ด์šฉ์ธ ..

Insight ๐Ÿ˜Ž

How has scaling law developed in NLP? ๐Ÿค” - NLP์—์„œ scaling law๋Š” ์–ด๋–ป๊ฒŒ ๋ฐœ์ „๋˜์—ˆ์„๊นŒ?

Before Starting.. 2017๋…„ NLP๋ฅผ ํฌํ•จํ•œ ์ง€๊ธˆ๊นŒ์ง€์˜ ๋”ฅ๋Ÿฌ๋‹์˜ ํŒ๋„๋ฅผ ๋’ค์ง‘์–ด์—Ž๋Š” ํ˜์‹ ์ ์ธ ๋ชจ๋ธ์ธ 'Transformer'๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ค„๋ณผ ๋‚ด์šฉ์€ Transformer์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์ด ์•„๋‹ˆ๊ธฐ์— ๋”ฐ๋กœ ๊นŠ์ด ์•Œ์•„๋ณด์ง€๋Š” ์•Š๊ฒ ์ง€๋งŒ, ์ด๋ฒˆ ํฌ์ŠคํŒ…์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ์— ๋Œ€ํ•ด์„œ๋Š” ์•Œ์•„๋‘˜ ํ•„์š”๊ฐ€ ์žˆ๋‹ค. Transformer์˜ ์‚ฌ์ด์ฆˆ๋Š” 465M ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง€๋Š” ๋ชจ๋ธ์ด์—ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋ถˆ๊ณผ 3๋…„ ๋งŒ์— ์ด ์‚ฌ์ด์ฆˆ๊ฐ€ ์ •๋ง ์ž‘๊ฒŒ ๋Š๊ปด์ง€๊ฒŒ ํ•  ๋งŒํผ ํฐ ์‚ฌ์ด์ฆˆ์˜ ๋ชจ๋ธ์ธ GPT-3(175B)๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋˜์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ˜„์žฌ๊นŒ์ง€๋„ ์ด๋ณด๋‹ค ๋” ํฐ ๋ชจ๋ธ๋“ค์€ ๊ณ„์† ๋‚˜์˜ค๊ณ  ์žˆ๋‹ค. LM์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ด๋ ‡๊ฒŒ ์ ์  ์ปค์ง€๊ฒŒ ๋œ ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ? ๊ทธ ์ด์œ ๋Š” Kaplan et al. 2020..

Cartinoe
'transformer' ํƒœ๊ทธ์˜ ๊ธ€ ๋ชฉ๋ก