Paper Reading ๐Ÿ“œ/Natural Language Processing

Open LLM Leaderboard๋ฅผ ํœฉ์“ด Falcon๐Ÿฆ… LLM: Falcon & RefinedWeb

2023. 6. 14. 21:12

 ์ตœ๊ทผ Hugging Face์˜ Open LLM Leaderboard๋ฅผ ๋‘˜๋Ÿฌ๋ณด๋˜ ์ค‘ ์ƒˆ๋กœ์šด ๋ชจ๋ธ์ด ๋ฆฌ๋”๋ณด๋“œ์˜ 1๋“ฑ์— ์œ„์น˜ํ•ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณด๊ณ  '์–ด๋–ค ๋ชจ๋ธ์ด์ง€?'๋ผ๋Š” ๊ถ๊ธˆ์ฆ์ด ์ƒ๊ฒจ์„œ ์ด๋ ‡๊ฒŒ ํฌ์ŠคํŒ…์„ ์ž‘์„ฑํ•ด ๋ณธ๋‹ค. ์ƒˆ๋กญ๊ฒŒ 1๋“ฑ์„ ์ฐจ์ง€ํ•œ ๋ชจ๋ธ์€ ๋ฐ”๋กœ TII์—์„œ ๊ฐœ๋ฐœํ•œ Falcon๐Ÿฆ… ์ด๋ผ๋Š” ๋ชจ๋ธ์ด๋‹ค. Falcon์€ ์ด 4๊ฐ€์ง€ ๋ฒ„์ „์˜ ๋ชจ๋ธ์ด ์กด์žฌํ•˜๋Š”๋ฐ, 7B & 40B ์‚ฌ์ด์ฆˆ์˜ ๋ชจ๋ธ๊ณผ ๊ฐ ์‚ฌ์ด์ฆˆ์—์„œ ๊ทธ๋ƒฅ base ๋ฒ„์ „๊ณผ instruct-tuned ๋ฒ„์ „๊นŒ์ง€ ํ•ด์„œ 4๊ฐœ์ด๋‹ค. ๊ทธ์ค‘์— 40B ์‚ฌ์ด์ฆˆ์˜ instruct-tuned ๋ฒ„์ „์ธ 'falcon-40b-instruct'๊ฐ€ Leaderboard์—์„œ 1๋“ฑ์„ ์ฐจ์ง€ํ•˜์˜€๋‹ค.

 

๊ทธ๋ฆผ 1. Hugging Face Open LLM Leaderboard (2023.06.14 ๊ธฐ์ค€)

 

 ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ด๋Ÿฌํ•œ Falcon ๋ชจ๋ธ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ณ  Falcon์„ ๋งŒ๋“œ๋Š” ๋ฐ ํฐ ๊ธฐ์—ฌ๋ฅผ ํ–ˆ๋˜ ๋ฐ์ดํ„ฐ์…‹์ธ RefinedWeb ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ๋…ผ๋ฌธ์„ ๋ฆฌ๋ทฐํ•ด๋ณด์•˜๋‹ค.

 

Table of Contents

1. Falcon Models

2. RefinedWeb Dataset

   2-1. Introduction

   2-2. Macrodata Refinement and RefinedWeb

   2-3. Experiments

   2-4. Limitations

   2-5. Conclusion

 

 

Falcon Models

 ์ด๋ ‡๊ฒŒ ์œ ๋Šฅํ•œ ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ๋Š” Open LLM์ธ Falcon์€ ์•„์‰ฝ๊ฒŒ๋„ ์•„์ง paper๊ฐ€ ๋”ฐ๋กœ ์žˆ์ง€๋Š” ์•Š๋‹ค. (Hugging Face์˜ model card๋ฅผ ๋ณด๋ฉด 'paper coming soon โ˜บ๏ธ' ์ด๋ผ๊ณ ๋งŒ ์ ํ˜€ ์žˆ์„ ๋ฟ์ด๋‹ค,,) ๊ทธ๋ž˜์„œ Falcon model์— ๋Œ€ํ•ด์„œ ๊ฐ„๋žตํ•˜๊ฒŒ ์„ค๋ช…์„ ํ•ด์ฃผ๋Š” Hugging Face์˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ์ž‘์„ฑํ•˜์˜€๋‹ค. 

 

 Falcon ๋ชจ๋ธ์€ 2๊ฐœ์˜ ๋ฒ ์ด์Šค ๋ชจ๋ธ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค: Falcon-40B & Falcon-7B. ์ด์ค‘์— Falcon-40B model์€ ํ˜„์žฌ(2023.06.14) ๊ธฐ์ค€์œผ๋กœ Open LLM Leaderboard์˜ ๋งจ ๊ผญ๋Œ€๊ธฐ๋ฅผ ์ฐจ์ง€ํ•˜๊ณ  ์žˆ๊ณ , Falcon-7B๋„ ๋™ ์‚ฌ์ด์ฆˆ ๋ชจ๋ธ๊ณผ ๋น„๊ตํ•ด์„œ๋Š” ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค.

 

 Falcon-40B๋Š” ~90GB ์ •๋„์˜ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ•„์š”๋กœ ํ•œ๋‹ค. ์ด ์ˆซ์ž๋Š” ์—„์ฒญ ํฌ๊ฒŒ ๋ณด์ผ์ง€ ๋ชฐ๋ผ๋„ LLaMA-65B ๋ณด๋‹ค๋„ ์ž‘์€ ๊ฒƒ์ด๋‹ค. ๋ฐ˜๋ฉด์— Falcon-7B๋Š” ~15GB๋งŒ์„ ํ•„์š”๋กœ ํ•˜๊ณ , ์ถ”๋ก ๊ณผ fine-tuning์ด ๊ฐ€๋ฒผ์šด ํ•˜๋“œ์›จ์–ด์—์„œ๋„ ์ถฉ๋ถ„ํžˆ ๋Œ์•„๊ฐˆ ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค์—ˆ๋‹ค.

 

 ๋˜ํ•œ Falcon์€ instruct ๋ฒ„์ „๋„ ๋งŒ๋“ค์–ด์กŒ๋‹ค: Falcon-40B-Instruct & Falcon-7B-Instruct. ์ด๋Ÿฌํ•œ ์‹คํ—˜์  ๋ณ€ํ˜•์€ instruction๊ณผ ๋Œ€ํ™” ๋ฐ์ดํ„ฐ์—์„œ fine-tune ๋˜์—ˆ๋‹ค. ๋˜ํ•œ Falcon ๋ชจ๋ธ์€ ์ž๊ธฐ ์ž์‹ ๋งŒ์˜ ์ปค์Šคํ…€ instruct ๋ฒ„์ „๋„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค!

 

 Falcon-7B์™€ Falcon-40B๋Š” ๊ฐ๊ฐ 1.5T์™€ 1T ํ† ํฐ์—์„œ ๋ชจ๋ธ๋“ค์ด ์ถ”๋ก ์— ๋Œ€ํ•ด ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์— ๋งž์ถฐ์„œ ํ•™์Šต๋˜์—ˆ๋‹ค. Falcon model์˜ high quality์— ๋Œ€ํ•œ ์ค‘์š” ์š”์†Œ๋Š” ์ด๋“ค์˜ training data์ด๋‹ค. ์ด๋“ค์˜ training data๋Š” ์ฃผ๋กœ(>80%) RefinedWeb์— ๊ธฐ๋ฐ˜์„ ๋‘๊ณ  ๋งŒ๋“ค์–ด์กŒ๋‹ค. TII๋Š” ํฉ๋ฟŒ๋ ค์ง„ ์—„์„ ๋œ ์†Œ์Šค๋ฅผ ๋ชจ์œผ๋Š” ๊ฒƒ ๋Œ€์‹ ์— ์›น ๋ฐ์ดํ„ฐ์˜ ํ€„๋ฆฌํ‹ฐ๋ฅผ ๊ฐœ์„ ์‹œํ‚ค๊ณ  scaling ํ•˜๋Š”๋ฐ ์ง‘์ค‘ํ•˜๊ธฐ ์œ„ํ•ด, ๋‹ค๋ฅธ corpora์˜ ํ€„๋ฆฌํ‹ฐ์™€ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ๋Œ€๊ทœ๋ชจ deduplication๊ณผ ์—„๊ฒฉํ•œ ํ•„ํ„ฐ๋ง์„ ํ™œ์šฉํ•˜์˜€๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์€ ๋’ค์—์„œ ๋” ์ž์„ธํ•˜๊ฒŒ ๋‹ค๋ฃจ๋„๋ก ํ•˜๊ฒ ๋‹ค. Falcon model์€ ์•„์ง ๋ช‡ ๊ฐœ์˜ curated source๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๊ธด ํ•˜์ง€๋งŒ ๊ทธ ์–‘์ด ํ˜„์žฌ SoTA ๋ชจ๋ธ๋“ค์ธ GPT-3 ๋˜๋Š” PaLM๊ณผ ๋น„๊ตํ•ด์„œ ์ƒ๋‹นํžˆ ๋‚ฎ์€ ํŽธ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ TII๊ฐ€ RefinedWeb์˜ 600B ํ† ํฐ์„ ๊ณต๊ฐœ์ ์œผ๋กœ ๊ณต๊ฐœํ–ˆ๋‹ค๋Š” ์ ์ด๋‹ค! ๐Ÿซข

 

 Falcon model์˜ ๋˜ ๋‹ค๋ฅธ ํฅ๋ฏธ๋กœ์šด ์ ์€ ์ด๋“ค์ด multiquery attention์„ ์‚ฌ์šฉํ–ˆ๋‹ค๋Š” ์ ์ด๋‹ค. vanilla multihead attention์€ ํ—ค๋“œ ๋‹น ๊ฐ๊ฐ ํ•˜๋‚˜์˜ query, key, value๋ฅผ ๊ฐ€์ง€์ง€๋งŒ, multiquery๋Š” ๋ชจ๋“  ํ—ค๋“œ์— ๋Œ€ํ•ด์„œ ๋”ฑ ํ•˜๋‚˜์˜ key์™€ value๋งŒ์„ ๊ฐ€์ง„๋‹ค.

 

๊ทธ๋ฆผ 2. Multi-Query Attention์€ attention head์— ๊ฑธ์ณ์„œ key์™€ value๋ฅผ ๊ณต์œ ํ•จ

 

 ์ด๋Ÿฌํ•œ ํŠธ๋ฆญ์€ pre-training์— ํฐ ์˜ํ–ฅ์„ ๋ผ์น˜์ง€๋Š” ์•Š์ง€๋งŒ, ์ถ”๋ก ์˜ scalability๋ฅผ ํฌ๊ฒŒ ๊ฐœ์„ ์‹œํ‚จ๋‹ค. ๋‹ค์Œ์˜ ํ‘œ๋Š” ์ง€๊ธˆ๊นŒ์ง€ ๋‚˜์˜จ Open LLM๋“ค์ด๋‹ค.

 

ํ‘œ 1. ํ˜„์žฌ Open LLM๋“ค

 

 

RefinedWeb Dataset

 Falcone model๊ณผ ๋‹ฌ๋ฆฌ RefinedWeb์€ ๋…ผ๋ฌธ์ด ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ด ๋…ผ๋ฌธ์„ ๋ฆฌ๋ทฐํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ •๋ฆฌํ•˜์˜€๋‹ค.

 

Overview of RefinedWeb

 

 LLM์€ ์ผ๋ฐ˜์ ์œผ๋กœ ํ•„ํ„ฐ๋ง๋œ ์›น ๋ฐ์ดํ„ฐ์™€ curated high-quality corpora์˜ ๋ฌถ์Œ์—์„œ ํ•™์Šต๋œ๋‹ค. ์ด curation ํ”„๋กœ์„ธ์Šค๋Š” ๊ด‘๋ฒ”์œ„ํ•œ zero-shot ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๋Š” ๋Šฅ์ˆ™ํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ํ•„์ˆ˜์ ์ด๋ผ๊ณ  ๋ฏฟ์–ด์กŒ๋‹ค. ํ•˜์ง€๋งŒ, larger ๋ชจ๋ธ์€ ์ˆ˜ ์กฐ ๊ฐœ์˜ ํ† ํฐ์—์„œ์˜ ํ•™์Šต์„ ํ•„์š”๋กœ ํ•˜์ง€๋งŒ curation์ด ์–ผ๋งˆ๋‚˜ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ์ง€, ๊ณ ์œ ํ•œ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณง ๊ณ ๊ฐˆ๋ ์ง€๋Š” ๋ถˆํ™•์‹คํ•˜๋‹ค.

 

 ์ด์ „์˜ ๋ฏฟ์Œ๊ณผ ๋‹ฌ๋ฆฌ ์ ์ ˆํžˆ filtering ๋˜๊ณ  deduplication ๋œ ์›น ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ๋„ ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์€ The Pile์—์„œ ํ•™์Šต๋œ SoTA ๋ชจ๋ธ๋„ ์ƒ๋‹นํžˆ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ํ•„ํ„ฐ๋ง์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์›น์—์„œ ์ถ”์ถœ๋œ high-quality ๋ฐ์ดํ„ฐ๋Š” ์•„์ง ํ’๋ถ€ํ•˜๊ณ , CommonCrawl ๋กœ๋ถ€ํ„ฐ 5์กฐ ๊ฐœ์˜ ํ† ํฐ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์•ž์„œ ๋งํ–ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ RefinedWeb ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋ถ€ํ„ฐ ์ถ”์ถœ๋œ 600B ๊ฐœ์˜ ํ† ํฐ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณต๊ฐœํ•˜๊ณ , 1.3/7.5B ๋ชจ๋ธ์„ ์ด ๋ฐ์ดํ„ฐ์—์„œ ํ•™์Šต์‹œ์ผœ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.

 

๊ทธ๋ฆผ 3. RefinedWeb์—์„œ๋งŒ ํ•™์Šต๋œ ๋ชจ๋ธ์ด curated corpora์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•จ

 

1. Introduction

 

 ์ƒˆ๋กญ๊ฒŒ ๋ฐํ˜€์ง„ LM์˜ scaling law(Chincilla)์— ๋”ฐ๋ฅด๋ฉด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์–‘ ๋‘˜ ์ค‘ ํ•˜๋‚˜๋งŒ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ๋ณด๋‹ค ์ด ๋‘˜์„ ๊ณต๋™์œผ๋กœ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๊ฒƒ์ด ํšจ๊ณผ์ ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐœ๊ฒฌ์„ ํ†ตํ•ด ๊ธฐ์กด์˜ ๋ชจ๋ธ๋“ค์„ ๋Œ์•„๋ณด๋‹ˆ GPT-3์„ ์ตœ์ ์œผ๋กœ ํ•™์Šต์‹œํ‚ค๋Š”๋ฐ ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ์˜ ์–‘์€ ํ˜„์กดํ•˜๋Š” ๊ฐ€์žฅ ํฐ ๋ฐ์ดํ„ฐ์…‹์˜ 2๋ฐฐ๊ฐ€๋Ÿ‰ ์ •๋„๊ฐ€ ํ•„์š”ํ•˜๋‹ค๊ณ  ํ•œ๋‹ค.

 

 ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ฐœ๊ฒฌ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์—ฐ๊ตฌ์ž๋“ค์€ ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๋ฅผ ๋Š˜๋ฆฌ๋Š”๋ฐ ์–ด๋ ค์›€์„ ๊ฒช๊ณ  ์žˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ์–‘์€ ํ•œ์ •๋˜์–ด ์žˆ๊ณ , ํŠนํžˆ ํ€„๋ฆฌํ‹ฐ์™€ ๋ผ์ด์„ ์Šค ๋“ฑ์˜ ๋ฌธ์ œ๋ฅผ ๋”ฐ์ง€๋ฉด ํ›จ์”ฌ ๋” ๋งŽ์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ๋งŒ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ ‡๊ฒŒ ์–ป์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ํ•„ํ„ฐ๋ง์„ ๊ฑฐ์น˜๋Š” ๊ณผ์ •์—์„œ๋„ ๋งŽ์€ ๋น„์šฉ์ด ๋ฐœ์ƒํ• ๋ฟ๋”๋Ÿฌ ํ•„ํ„ฐ๋ง์„ ๊ฑฐ์น˜๊ณ  ๋‚˜๋ฉด ๋ฐ์ดํ„ฐ์…‹์ด ๋งŽ์ด ๊ฐ„์†Œํ™”๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

 ๊ทธ๋ž˜์„œ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํ•„์š”์˜ ์ฆ๊ฐ€๋ฅผ ๊ฒฌ๋””๊ณ , ๋ฐ์ดํ„ฐ pipeline์„ ๊ฐ„์†Œํ™”ํ•˜๊ณ  human-intensive curation์˜ ํ•„์š”๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ์˜ ํ€„๋ฆฌํ‹ฐ๋ฅผ ๊ฐœ์„ ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋”์šฑ ์ž˜ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์˜ contribution์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

  • 5์กฐ ๊ฐœ์˜ web-only ์˜์–ด pre-training ๋ฐ์ดํ„ฐ์ธ RefinedWeb ์†Œ๊ฐœํ•จ
  • web data alone์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์ด public & private curated corpora ๋ชจ๋‘์—์„œ ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ฆ
  • RefinedWeb์—์„œ ์ถ”์ถœ๋œ 600B ํ† ํฐ์„ ๊ณต๊ฐœํ•˜๊ณ  1B & 7B ๋ชจ๋ธ์„ ์ด ๋ฐ์ดํ„ฐ์—์„œ ํ•™์Šต์‹œํ‚ด

 

ํ‘œ 2. ์ „๋ก€ ์—†๋Š” ๊ทœ๋ชจ์—์„œ ์—„๊ฒฉํ•œ deduplication์™€ ๊ด‘๋ฒ”์œ„ํ•œ filtering์„ ๋ฌถ์Œ์œผ๋กœ์จ LLM์— ๋Œ€ํ•œ ๊ธฐ์กด ์˜์–ด pre-training ๋ฐ์ดํ„ฐ์…‹์„ ๊ฐœ์„ ์‹œํ‚ด

 

2. Macrodata Refinement and RefinedWeb

 

๋…ผ๋ฌธ์—์„œ๋Š” CommonCrawl์˜ ์›น ๋ฐ์ดํ„ฐ๋ฅผ ํ•„ํ„ฐ๋งํ•˜๊ณ  decuplication ํ•˜๊ธฐ ์œ„ํ•œ pipeline์ธ MDR(MacroData Refinement)์„ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ์ด MDR์„ ์‚ฌ์šฉํ•ด์„œ RefinedWeb์„ ์ƒ์„ฑํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์›น ๋ฐ์ดํ„ฐ์˜ ํ€„๋ฆฌํ‹ฐ๋ฅผ ๋Œ์–ด์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด ์—„๊ฒฉํ•œ ํ•„ํ„ฐ๋ง๊ณผ deduplication์„ ํ™œ์šฉํ•˜์˜€๋‹ค.

 

Design Principles.  ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ์˜ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ์ค€์ˆ˜ํ•˜์˜€๋‹ค:

 

  • Scale first.  40-200B ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ์„ ์œ„ํ•ด MDR์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๊ทธ๋ž˜์„œ 3~6T ๊ฐœ์˜ ํ† ํฐ์„ ํฌํ•จํ•˜๋Š” ๊ทœ๋ชจ์˜ ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  human curation ํ”„๋กœ์„ธ์Šค๋Š” ์›ฌ๋งŒํ•˜๋ฉด ์‚ผ๊ฐ”๋‹ค.
  • Strict Deduplication.  ์ด์ „์˜ ์—ฐ๊ตฌ์— ์˜๊ฐ์„ ๋ฐ›์•„์„œ ์—„๊ฒฉํ•œ deduplication pipeline์„ ๊ตฌํ˜•ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด exact & fuzzy deduplication์„ ๋ฌถ์—ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ ์ด์ „์— ๊ธฐ๋ก๋œ ์ œ๊ฑฐ์œจ๋ณด๋‹ค ๋†’์€ ์ œ๊ฑฐ์œจ์„ ๊ธฐ๋กํ•˜์˜€๋‹ค!
  • Neutral Filtering.  ์˜๋„์น˜ ์•Š์€ bias๋ฅผ ๋ชจ๋ธ์—๊ฒŒ ์ฃผ๋Š” ๊ฒƒ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด, language identification ์™ธ๋ถ€์—์„œ ML ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ ์‚ฌ์šฉ์„ ํ”ผํ•˜์˜€๋‹ค.

 

 ๋‹ค์Œ์˜ ํ‘œ 3์€ MDR์˜ ๊ฐ ๋‹จ๊ณ„์—์„œ ์ˆ˜ํ–‰๋˜๋Š” ๊ณผ์ •๋“ค์„ ๋Œ€๋žต์ ์œผ๋กœ ์„ค๋ช…ํ•œ ๊ฒƒ์ด๋‹ค.

 

ํ‘œ 3. Macrodata Refinement๋Š” high-quality ์›น ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด SoTA ๋ชจ๋ธ์˜ ๋ฐฉ์‹๊ณผ ์ƒˆ๋กœ์šด ๋ฐฉ์‹๋“ค์„ ์ข…ํ•ฉํ•จ

 

 ๋‹ค์Œ์˜ ๊ทธ๋ฆผ 4๋Š” MDR์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ์…‹์— ์–ด๋– ํ•œ ๋ณ€ํ™”๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”์ง€๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค. Document Preparation ๊ณผ์ •์„ ๊ฑฐ์น˜๊ณ  ๋‚˜์„œ ์–ป๊ฒŒ ๋˜๋Š” ๋ฐ์ดํ„ฐ์…‹์„ RW-Raw๋ผ ํ•˜๊ณ , ์ด๋•Œ 48% ์ •๋„์˜ ๋ฐ์ดํ„ฐ๋งŒ ๋‚จ๊ณ  ๋Œ€๋ถ€๋ถ„์ด language identification ๊ณผ์ • ์ค‘์— ํ•„ํ„ฐ๋ง๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  FIltering ๊ณผ์ •์„ ๊ฑฐ์น˜๊ณ  ๋‚˜์„œ ์–ป๊ฒŒ ๋˜๋Š” ๋ฐ์ดํ„ฐ์…‹์„ RW-Filtered๋ผ ํ•˜๊ณ , ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์˜ 23% ์ •๋„๋งˆ ๋‚จ๊ฒŒ ๋˜๊ณ , ์ด๋Š” RW-Raw์˜ 50% ์ •๋„ ๋˜๋Š” ๊ทœ๋ชจ์ด๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ Deduplication ๊ณผ์ •์„ ๊ฑฐ์น˜๊ณ  ๋‚˜์„œ ์–ป๊ฒŒ ๋˜๋Š” ๋ฐ์ดํ„ฐ์…‹์„ RW(RefinedWeb)์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. 

 

๊ทธ๋ฆผ 4. Macrodata Refinement์˜ ์Šคํ…Œ์ด์ง€๋“ค์€ CommonCrawl์— ์žˆ๋Š” ๊ธฐ์กด ๋ฌธ์„œ์˜ 90%์— ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ฑฐํ•จ. ํšŒ์ƒ‰์€ ์ด์ „ ๋ฐ์ดํ„ฐ์—์„œ์˜ ์ œ๊ฑฐ์œจ์„ ์˜๋ฏธํ•จ. ๊ทธ๋ฆฌ๊ณ  ์ƒ‰๊น”์€ ๋ณด์กด์œจ์„ ์˜๋ฏธํ•จ

 

2-3. Experiments

 

Setting

 

 ๋…ผ๋ฌธ์—์„œ๋Š” validation loss๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ๋ณด๋‹ค ๋งŽ์€ task์— ๋Œ€ํ•œ zero-shot ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๋Š”๋ฐ ์ง‘์ค‘ํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ‰๊ฐ€๋ฅผ Eleuther AI evaluation์— ๊ธฐ๋ฐ˜ํ•ด์„œ ์ง„ํ–‰ํ•จ์œผ๋กœ์จ ๋‹ค์–‘ํ•œ task์— ๊ฑธ์ณ์„œ zero-shot ์„ธํŒ…์—์„œ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ—ˆ๋ฝํ•ด ์คฌ๋‹ค. ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹์€ ํ‘œ 4์—์„œ ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ small(ablation์šฉ)๊ณผ core, main, ext(๋น„๊ต์šฉ)์œผ๋กœ ๋‚˜๋ˆ ์ง„๋‹ค.

 

ํ‘œ 4. RefinedWeb์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๊ณ  SoTA ๋ชจ๋ธ๊ณผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด zero-shot ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๋Š” 18๊ฐœ์˜ task์— ๊ฑธ์ณ์„œ 4๊ฐœ์˜ ์ง‘ํ•ฉ์„ ๋งŒ๋“ค์—ˆ์Œ

 

 ๋…ผ๋ฌธ์—์„œ๋Š” ๋น„๊ต์˜ 3๊ฐ€์ง€ ๋ ˆ๋ฒจ์„ ๊ตฌ๋ณ„ํ•˜์˜€๋‹ค.

 

  1. internal comparison. ๋…ผ๋ฌธ์˜ ์ฝ”๋“œ๋ฒ ์ด์Šค์—์„œ ํ•™์Šต๋˜๊ณ  ํ‰๊ฐ€๋œ ๋ชจ๋ธ์ด์ง€๋งŒ, pre-training ๋ฐ์ดํ„ฐ์…‹๋งŒ ๋‹ค๋ฆ„
  2. benchmark-level comparison. ์„œ๋กœ ๋‹ค๋ฅธ ์ฝ”๋“œ๋ฒ ์ด์Šค์™€ ํ•จ๊ป˜ ํ•™์Šต๋œ ๋ชจ๋ธ์ด์ง€๋งŒ, Eleuther AI harness๋กœ ํ‰๊ฐ€๋จ
  3. external comparison. PaLM๊ณผ GPT-3์™€ ๋น„๊ต๋จ

 

 ๋…ผ๋ฌธ์—์„œ ํ•™์Šต์‹œํ‚จ ๋ชจ๋ธ์€ ์ด 3๊ฐ€์ง€๋กœ 1B, 3B, 7B AR decoder-only model์ด๋‹ค. ์ด ๋ชจ๋ธ๋“ค์€ GPT-3์™€ ์œ ์‚ฌํ•œ ๊ตฌ์„ฑ๊ณผ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ ๋ชจ๋ธ ์‚ฌ์ด์ฆˆ์— ๋Œ€ํ•ด scaling law๋ฅผ ๋”ฐ๋ผ์„œ ์ตœ์ ์˜ ํ† ํฐ ์ˆ˜๋กœ ํ•™์Šต์‹œ์ผฐ๋‹ค. 1B ๋ชจ๋ธ์€ 27B ํ† ํฐ์œผ๋กœ, 3B ๋ชจ๋ธ์€ 60B ํ† ํฐ์œผ๋กœ ํ•™์Šต์‹œ์ผฐ๋‹ค.

 

Can wed data alone outperform curated corpora?

 

 ๋…ผ๋ฌธ์—์„œ๋Š” web data๋กœ๋งŒ ํ•™์Šต๋œ ๋ชจ๋ธ์ด curated corpora์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๋Š”์ง€ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์œ ๋ช… ์›น ๋ฐ์ดํ„ฐ์…‹๊ณผ curated ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ตœ์ ์œผ๋กœ ํ•™์Šต๋œ 1~3B ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ๊ทธ๋‹ค์Œ์— 1B ๋ชจ๋ธ์„ 350GT์—์„œ ํ•™์Šต๋œ 7B ๋ชจ๋ธ๋กœ scale up ํ•ด์„œ SoTA ๋ชจ๋ธ๊ณผ zero-shot ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๋น„๊ตํ•˜์˜€๋‹ค. 

 

 small scale study. ๋…ผ๋ฌธ์—์„œ๋Š” RefinedWeb์˜ ์„ธ๋ถ€ ๋ฐ์ดํ„ฐ์ธ RW-Raw, RW-Filtered, RW ๊ฐ๊ฐ์„ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋˜‘๊ฐ™์€ ์•„ํ‚คํ…์ฒ˜์™€ ์ฝ”๋“œ๋ฒ ์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ๋˜‘๊ฐ™์€ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ํ‰๊ฐ€๋˜์ง€๋งŒ ์„œ๋กœ ๋‹ค๋ฅธ pre-training ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ๋“ค์„ ๋น„๊ตํ•˜์˜€๋‹ค(ํ‘œ 5). ๊ฒฐ๊ณผ๋ฅผ ์‚ดํŽด๋ณด๋ฉด filtering๊ณผ deduplication์€ ์„ฑ๋Šฅ์„ ์ƒ๋‹นํžˆ ๊ฐœ์„ ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

 

ํ‘œ 5. curation์€ zero-shot ์ผ๋ฐ˜ํ™”๋ฅผ ์œ„ํ•œ ์ตœ์ ์˜ ํ•ด๊ฒฐ์ฑ…์ด ์•„๋‹˜: RefinedWeb์—์„œ ํ•™์Šต๋œ ์†Œ๊ทœ๋ชจ ๋ชจ๋ธ์€ web data์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ๊ณผ curated corpora์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ๋ณด๋‹ค ๋Šฅ๊ฐ€ํ•จ

 

 full scale models. ์•ž์„  ์‹คํ—˜์˜ ๊ทœ๋ชจ๋ฅผ ํ‚ค์›Œ์„œ 350GT์—์„œ 1B & 7B ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ณ , ๋˜ํ•œ The Pile์—์„œ 1B ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผฐ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด ๋ชจ๋ธ๋“ค์„ ํ˜„์กดํ•˜๋Š” ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค๊ณผ ๋น„๊ตํ•˜์˜€๋‹ค. main-agg์˜ ๊ฒฐ๊ณผ๋Š” ๊ทธ๋ฆผ 3์— ๋‚˜ํƒ€๋‚˜ ์žˆ๊ณ , core-agg์™€ ext-agg์˜ ๊ฒฐ๊ณผ๋Š” ๊ทธ๋ฆผ 5์— ๋‚˜ํƒ€๋‚˜ ์žˆ๋‹ค. ๊ฒฐ๊ณผ๋ฅผ ์‚ดํŽด๋ณด๋ฉด ํ™•์‹คํžˆ ์˜คํ”ˆ ๋ชจ๋ธ์€ ๊ฐœ์ธ์ ์ธ curated corpora์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ๋ณด๋‹ค underperform ํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋ณด๋ฉด RefinedWeb์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์€ ์˜ค์ง web data๋งŒ ์‚ฌ์šฉํ•ด์„œ GPT-3 ์‹œ๋ฆฌ์ฆˆ์™€ ๋งž๋จน๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ The Pile์—์„œ ์‚ฌ์šฉ๋˜๋Š” high-quality source๋Š” RefinedWeb์—์„œ๋Š” ์ œ์™ธ๋˜์—ˆ๋‹ค.

 

๊ทธ๋ฆผ 5. RefinedWeb์—์„œ๋งŒ ํ•™์Šต๋œ ๋ชจ๋ธ์€ curated corpora์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•จ. ์™ผ์ชฝ์ด core-agg์ด๊ณ , ์˜ค๋ฅธ์ชฝ์ด ext-agg์ž„.

 

 Finding. ์ ์ ˆํžˆ filtering ๋˜๊ณ  deduplication ๋œ ์›น ๋ฐ์ดํ„ฐ์—์„œ ํ•™์Šต๋œ LM์€ curated data์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ๊ณผ ๊ฑฐ์˜ ๋งž๋จน๋Š”๋‹ค.

 

Do other corpora benefit from MDR?

 

 MDR์˜ filtering & deduplication ์Šคํ…Œ์ด์ง€๋ฅผ ๋‹ค๋ฅธ pre-training ๋ฐ์ดํ„ฐ์…‹์— ๋…๋ฆฝ์ ์œผ๋กœ ์ ์šฉํ•ด์„œ ์ด๋“ค์ด ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ๋„๋ฆฌ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š”์ง€ ์—ฐ๊ตฌํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ๊ฐ€ ํ‘œ 6์— ๋‚˜ํƒ€๋‚˜ ์žˆ๊ณ , ๊ฒฐ๊ณผ๋ฅผ ๋ถ„์„ํ•ด ๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

  1. filtering์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์€ ์ฒด๊ณ„์ ์ด์ง€ ์•Š์Œ. filtering์˜ ์ œ๊ฑฐ ๋น„์œจ์€ downstrema ์ •ํ™•๋„์™€ ๊ฐ•ํ•˜๊ฒŒ ์—ฐ๊ด€๋˜์–ด ์žˆ์ง€ ์•Š์Œ.
  2. deduplication์€ ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹์— ๊ฑธ์ณ์„œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์คŒ. ์ œ๊ฑฐ ๋น„์œจ๋„ ์„ฑ๋Šฅ ๋ณ€ํ™”์™€ ๋” ๋‚˜์€ ์ƒ๊ด€์„ฑ์„ ๋ณด์ž„.

 

 filtering๊ณผ deduplication์˜ ๋ฌถ์Œ์€ ์ถ”๊ฐ€์  ๊ฐœ์„ ์„ ๋‚ด๋†“๋Š”๋‹ค. ๋น„๋ก ์„ฑ๋Šฅ์€ ๋ฐ์ดํ„ฐ์…‹์— ๊ฑธ์ณ์„œ ๋”์šฑ ๊ท ์ผํ•˜์ง€๋งŒ, ์ฐจ์ด์ ์€ ๋‚จ์•„์žˆ๋‹ค. ์ด๋Š” ๊ธฐ์กด text ์ถ”์ถœ๊ณผ ์ฒ˜๋ฆฌ์— ์žˆ๋Š” ๊ฒฐ์ ์„ ์™„์ „ํžˆ ๋ณด์ƒํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•œ๋‹ค. 

 

ํ‘œ 6. filtering์œผ๋กœ๋ถ€ํ„ฐ์˜ ๊ฐœ์„ ์€ ๋ฐ์ดํ„ฐ์…‹์— ๊ฑธ์ณ์„œ ์ฒด๊ณ„์ ์ด์ง€ ์•Š์ง€๋งŒ, ๋ณด๋“œ์— ๊ฑธ์ณ์„œ ๊พธ์ค€ํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ด

 

 Finding. filtering ํœด๋ฆฌ์Šคํ‹ฑ์€ ์†Œ์Šค ์˜์กด์  ํŠœ๋‹์„ ํ•„์š”๋กœ ํ•˜๋Š” ๋ฐ˜๋ฉด ์—„๊ฒฉํ•œ deduplication๋Š” ๋ฐ์ดํ„ฐ์…‹์— ๊ฑธ์ณ์„œ ์ผ๊ด€์ ์œผ๋กœ zero-shot ์„ฑ๋Šฅ์„ ๊ฐœ์„ ์‹œํ‚จ๋‹ค.

 

 

The end of posting..

 

 ์ด๋ ‡๊ฒŒ ํ•ด์„œ Falcon model๊ณผ ์ด ๋ชจ๋ธ์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ์•Œ์•„๋ดค๋‹ค. ์•„์ง Falcon model์˜ paper๊ฐ€ ๊ณต๊ฐœ๋˜์ง€ ์•Š์•„์„œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ์•Œ๊ธฐ๋Š” ํž˜๋“œ๋‚˜ ์ถ”ํ›„์— ๋…ผ๋ฌธ์ด ๊ณต๊ฐœ๋˜๋ฉด ๋‹ค์‹œ ํ•œ๋ฒˆ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค. ๋ฐœ์ „์„ ์œ„ํ•ด์„œ๋Š” ์•ž์œผ๋กœ๋„ ์ด๋Ÿฌํ•œ Open LLM์ด ๋งŽ์ด ๊ฐœ๋ฐœ๋˜์–ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๊ณ , ๊ทธ๋Ÿฐ ์ธก๋ฉด์—์„œ Falcon์˜ ๊ณต๊ฐœ๋Š” ์•ž์œผ๋กœ์˜ ๋ฐœ์ „์— ๋„์›€์ด ๋˜๋Š” ๊ฐœ๋ฐœ์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค. Falcon์˜ ๋ชจ๋ธ ์นด๋“œ์— ๋Œ€ํ•ด์„œ ๊ถ๊ธˆํ•˜๋‹ค๋ฉด ๋‹ค์Œ์„ ์ฐธ๊ณ ํ•˜๊ธธ ๋ฐ”๋ž€๋‹ค.

 

https://huggingface.co/tiiuae/falcon-40b

 

tiiuae/falcon-40b ยท Hugging Face

๐Ÿš€ Falcon-40B Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license. Paper coming soon ๐Ÿ˜Š. ๐Ÿค— To get started w

huggingface.co

 

 

 

 

 

์ถœ์ฒ˜

https://huggingface.co/blog/falcon

 

The Falcon has landed in the Hugging Face ecosystem

The Falcon has landed in the Hugging Face ecosystem Introduction Falcon is a new family of state-of-the-art language models created by the Technology Innovation Institute in Abu Dhabi, and released under the Apache 2.0 license. Notably, Falcon-40B is the f

huggingface.co

https://arxiv.org/abs/2306.01116

 

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Large language models are commonly trained on a mixture of filtered web data and curated high-quality corpora, such as social media conversations, books, or technical papers. This curation process is believed to be necessary to produce performant models wi

arxiv.org

 

'Paper Reading ๐Ÿ“œ > Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Vicuna๐Ÿช: An Open-Source Chatbot Impressing GPT-4 ๋ฆฌ๋ทฐ  (1) 2023.06.17
imitation์ด ์ข‹์€ ํ•™์Šต ๋ฐฉ๋ฒ•์ผ๊นŒ? ๐Ÿค”: The False Promise of Imitating Proprietary LLMs ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (0) 2023.06.16
๐ŸฒBaize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (1) 2023.06.13
Sparks of Artificial General Intelligence: Early experiments with GPT-4 ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (2) 2023.06.12
Why can GPT learn in-context? ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (0) 2023.06.12
'Paper Reading ๐Ÿ“œ/Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • Vicuna๐Ÿช: An Open-Source Chatbot Impressing GPT-4 ๋ฆฌ๋ทฐ
  • imitation์ด ์ข‹์€ ํ•™์Šต ๋ฐฉ๋ฒ•์ผ๊นŒ? ๐Ÿค”: The False Promise of Imitating Proprietary LLMs ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
  • ๐ŸฒBaize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
  • Sparks of Artificial General Intelligence: Early experiments with GPT-4 ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
Cartinoe
Cartinoe
Welcome! I'm a student studying about deep learning(NLP) ๐Ÿ˜‰ The goal of my study is to develop a competent LLM helping people!
  • faviconinstagram
  • faviconfacebook
  • favicongithub
  • faviconLinkedIn
Cartinoe's paper review
Cartinoe
Cartinoe
Cartinoe's paper review
Cartinoe
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • My Posting (141)
    • Paper Reading ๐Ÿ“œ (113)
      • Natural Language Processing (67)
      • Alignment Problem of LLM (11)
      • Computer Vision (4)
      • Deep Learning (6)
      • multimodal models (17)
      • Mathematics(์„ ํ˜•๋Œ€์ˆ˜, ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„, ๋ฏธ.. (8)
    • Lecture ๐Ÿง‘โ€๐Ÿซ (16)
      • Hugging Face Course (1)
      • Coursera (15)
    • Insight ๐Ÿ˜Ž (10)
    • Research & Project ๐Ÿ”ฌ (2)

์ธ๊ธฐ ๊ธ€

์ตœ๊ทผ ๊ธ€

๊ณต์ง€์‚ฌํ•ญ

  • ๋ธ”๋กœ๊ทธ ๊ณต์ง€์‚ฌํ•ญ - ๋ชจ๋ฐ”์ผ ์ˆ˜์‹ ๊นจ์ง

ํƒœ๊ทธ

  • Vicuna Evaluation
  • transformer
  • proprietary model
  • context length
  • LLM
  • LLAMA2
  • open-source model
  • context window
  • Open-source
  • scaling law
  • MT-Bench
  • closed-source
  • RLHF
  • Chinchilla
  • LM
  • Evaluation Metric
  • ChatGPT
  • Vicuna
  • GPT-4
  • closed-source model
hELLO ยท Designed By ์ •์ƒ์šฐ.
Cartinoe
Open LLM Leaderboard๋ฅผ ํœฉ์“ด Falcon๐Ÿฆ… LLM: Falcon & RefinedWeb
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๊ฐœ์ธ์ •๋ณด

  • ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ
  • ํฌ๋Ÿผ
  • ๋กœ๊ทธ์ธ

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.