Paper Reading ๐Ÿ“œ/Natural Language Processing

Paper Reading ๐Ÿ“œ/Natural Language Processing

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper BERT์™€ RoBERTa๋Š” semantic textual simialrity$($STS$)$ ๊ฐ™์€ ๋ฌธ์žฅ ์Œ ํšŒ๊ท€ task์— ๋Œ€ํ•ด์„œ ์ƒˆ๋กœ์šด SoTA performance๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ task๋Š” ๋‘ ๋ฌธ์žฅ์ด ๋„คํŠธ์›Œํฌ์— ์ž…๋ ฅ๋˜์–ด์•ผ ํ•˜๋ฏ€๋กœ ์ƒ๋‹นํ•œ computational overhead๋ฅผ ๋ฐœ์ƒ์‹œํ‚จ๋‹ค. BERT๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 10,000๊ฐœ ๋ฌธ์žฅ์˜ ๋ชจ์Œ์—์„œ ๊ฐ€์žฅ ๋น„์Šทํ•œ ์ง์„ ์ฐพ๋Š” ๊ฒƒ์€ 5,000๋งŒ ๋ฒˆ์˜ ์ถ”๋ก  ๊ณ„์‚ฐ์ด ํ•„์š”ํ•˜๋‹ค. ์ด๋Ÿฌํ•œ BERT์˜ ๊ตฌ์กฐ๋Š” semantic similarity search ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ clustering ๊ฐ™์€ unsupervised task์— ๋Œ€ํ•ด์„œ๋Š” ๋ถ€์ ํ•ฉํ•˜๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” simase & triplet network๋ฅผ ์‚ฌ์šฉํ•ด์„œ c..

Paper Reading ๐Ÿ“œ/Natural Language Processing

Data Augmentation methods in NLP

ํ˜„์žฌ ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์—์„œ๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ถ€์กฑ์— ์‹œ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์ˆ˜์ ์ธ๋ฐ ์ด๋ฅผ ์œ„ํ•ด ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ์˜ ์–‘์€ ํ•œ์ •์ ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฐœ๋ช…๋œ ๊ธฐ์ˆ ์ด Data Augmentation์ด๋‹ค. Data Augmentation์— ๋Œ€ํ•ด ๊ฐ„๋žตํ•˜๊ฒŒ ์„ค๋ช…ํ•˜๋ฉด ๊ธฐ์กด์— ์กด์žฌํ•˜๋Š” ๋ฐ์ดํ„ฐ์— ์•ฝ๊ฐ„์˜ ๋ณ€ํ˜• ๋˜๋Š” ์†์ƒ์„ ๊ฐ€ํ•ด์„œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์ฃผ๋กœ Computer VIsion ๋ถ„์•ผ์—์„œ ์‚ฌ์šฉ๋˜๋Š”๋ฐ NLP์—๋„ Data Augmentation ๊ธฐ๋ฒ•์ด ์กด์žฌํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์„ ์•Œ๊ฒŒ ๋˜๊ณ  ํ•œ ๋ฒˆ ๊ณต๋ถ€ํ•ด๋ณด๋ฉด์„œ ํฌ์ŠคํŠธ๋ฅผ ์ž‘์„ฑํ•˜์˜€๋‹ค. ์ด ํฌ์ŠคํŠธ๋Š” ๋‹ค์Œ์˜ ๋ธ”๋กœ๊ทธ๋“ค์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ž‘์„ฑ๋˜์—ˆ๋‹ค. https://neptune.ai/blog/data-augmentat..

Paper Reading ๐Ÿ“œ/Natural Language Processing

GPT-4 Techinal Report Review

Introduction GPT-4๋Š” real-world์—์„œ๋Š” ์‚ฌ๋žŒ๋ณด๋‹ค ์กฐ๊ธˆ ๋ชปํ•œ ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ์ง€๋งŒ, ๊ทธ๋ž˜๋„ ์ „๋ฌธ์  ๋ฐ ํ•™๋ฌธ์  ๋ฒค์น˜๋งˆํฌ์—์„œ ์‚ฌ๋žŒ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” large multimodel model$($์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ , ํ…์ŠคํŠธ ์ถœ๋ ฅ์„ ๋‚ด๋†“์Œ$)$ ์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋ณ€ํ˜ธ์‚ฌ ์ž๊ฒฉ์ฆ ์‹œํ—˜$($simulated bar exam$)$์—์„œ GPT-3.5์˜ ํ•˜์œ„ 10% ๊ธฐ๋ก๊ณผ ์ƒ๋ฐ˜๋˜๊ฒŒ ์ƒ์œ„ 10%์˜ ์„ฑ์ ์„ ๊ธฐ๋กํ•˜์˜€๋‹ค. 6๊ฐœ์›” ๋™์•ˆ ์ ๋Œ€์  ํ…Œ์ŠคํŠธ ํ”„๋กœ๊ทธ๋žจ๊ณผ ChatGPT์˜ ๊ตํ›ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ GPT-4๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์กฐ์ •ํ•˜์—ฌ ์‚ฌ์‹ค์„ฑ$($factuality$)$, ์กฐ์ข…์„ฑ$($steerability$)$ ๋ฐ ๊ฐ€๋“œ๋ ˆ์ผ์„ ๋ฒ—์–ด๋‚˜์ง€ ์•Š๋Š” ์ธก๋ฉด์—์„œ$($์™„๋ฒฝํ•˜์ง€๋Š” ์•Š์ง€๋งŒ$)$ ์ตœ๊ณ ์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค. ๋ถˆ๊ณผ 1๋…„ ..

Paper Reading ๐Ÿ“œ/Natural Language Processing

BigBird: Transformers for Longer Sequences ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper Transformer ๊ธฐ๋ฐ˜์˜ BERT ๊ฐ™์€ ๋ชจ๋ธ์€ NLP ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ์„ฑ๊ณตํ•œ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํ•˜์ง€๋งŒ ๋ถˆํ–‰ํ•˜๊ฒŒ๋„, Transformer ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋“ค์˜ ๊ฐ€์žฅ ํฐ ์•ฝ์ ์€ full attention ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋•Œ๋ฌธ์— sequence์˜ ๊ธธ์ด์— ๋”ฐ๋ผ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๊ณฑ์ ˆ๋กœ ๋Š˜์–ด๋‚œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด, ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ quadratic์„ linear๋กœ ์ค„์ธ sparse attention ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์‚ฌ์šฉํ•œ Big Bird๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” BigBird๊ฐ€ sequence ํ•จ์ˆ˜์˜ ๋ฒ”์šฉ์ ์ธ ๊ทผ์‚ฌ์น˜๊ฐ€ ๋˜๊ณ  Turing completeํ•˜๋‹ค๊ณ  ๋งํ–ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  BigBird๋Š” full attention model์˜ quadratic ํŠน์„ฑ์„ ๋ณด์กดํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ..

Paper Reading ๐Ÿ“œ/Natural Language Processing

Sparse Transformers: Generating Long Sequence with Sparse Transformers ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper Transformer๋Š” ๋งค์šฐ ๊ฐ•๋ ฅํ•œ sequence model์ด์ง€๋งŒ, sequence์˜ ๊ธธ์ด์— ๋”ฐ๋ผ์„œ ์‹œ๊ฐ„๊ณผ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ณฑ์ ˆ๋กœ ํ•„์š”ํ•˜๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” attention ํ–‰๋ ฌ์˜ sparse factorization์„ ์†Œ๊ฐœํ•˜์˜€๋Š”๋ฐ, ์ด๋Š” Transformer์˜ ์‹œ๊ฐ„ ๋ณต์žก๋„๋ฅผ $O(n \sqrt{n})$์œผ๋กœ ์ค„์˜€๋‹ค. ๋˜ํ•œ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ์˜ ๋‚ด์šฉ๋“ค์„ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ๋”์šฑ ๊นŠ์€ ๋„คํŠธ์›Œํฌ๋ฅผ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์™€ ์ดˆ๊ธฐํ™”์— ๋ณ€๋™์„ ์ฃผ์—ˆ์Œ. attention ํ–‰๋ ฌ์˜ ์žฌ๊ณ„์‚ฐ์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์•„๋‚Œ. ํ•™์Šต์„ ์œ„ํ•ด fast attention์„ ์‚ฌ์šฉํ•จ. ์ด๋Ÿฌํ•œ ๋ณ€ํ™”๋ฅผ ์ค€ ๋ชจ๋ธ์„ Sparse Transformer๋ผ๊ณ  ๋ถ€๋ฅด๊ธฐ๋กœ ํ–ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ˆ˜๋ฐฑ๊ฐœ์˜ ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•ด..

Paper Reading ๐Ÿ“œ/Natural Language Processing

GPT-3: Language Models are Few-Shot Learners ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

ChatGPT ์ดํ›„๋กœ ์„ธ์ƒ์€ GPT-4์˜ ์ถœ์‹œ๋กœ ์ธํ•ด ๋˜ ํ•œ ๋ฒˆ ๋“ค์ฉ์ด๊ณ  ์žˆ๋‹ค. ํ•„์ž๋„ GPT-4๊ฐ€ ์ฒ˜์Œ ๋‚˜์˜ค๊ณ  OpenAI์˜ ์†Œ๊ฐœ ์˜์ƒ์„ ๋ณด๊ณ  GPT-4์˜ ๋Šฅ๋ ฅ์— ๋Œ€ํ•ด ๊ฐ•ํ•œ ๊ถ๊ธˆ์ฆ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ƒํƒœ์ด๋‹ค. GPT-4๋ฅผ ๋ฆฌ๋ทฐํ•˜๊ธฐ ์ „์— GPT-3์— ๋Œ€ํ•ด์„œ ๋จผ์ € ๋ฆฌ๋ทฐํ•ด์•ผ๊ฒ ๋‹ค๋Š” ์ƒ๊ฐ์— ์ด๋ ‡๊ฒŒ ๋ฆฌ๋ทฐ๋ฅผ ํ•ด๋ณธ๋‹ค. GPT-3๋ฅผ ์†Œ๊ฐœํ•œ ๋…ผ๋ฌธ์ธ 'Language Models are Few-Shot Learners'๋Š” ์ด 75ํŽ˜์ด์ง€์— ๋‹ฌํ•˜๋Š” ๊ธด ๋…ผ๋ฌธ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค ๋ฆฌ๋ทฐํ•˜๋Š” ๊ฒƒ์—๋Š” ๋ฌด๋ฆฌ๊ฐ€ ์žˆ์–ด ํŠน์ • ๋ถ€๋ถ„๋งŒ ๋ฆฌ๋ทฐํ•˜์˜€๋‹ค. The overview of this paper ์ตœ๊ทผ์˜ ์—ฐ๊ตฌ๋“ค์— ์˜ํ•˜๋ฉด ๋งŽ์€ NLP task์— ๋Œ€ํ•ด ์ƒ๋‹นํ•œ ์ˆ˜์ค€์˜ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์–ป๊ฒŒ ๋œ ๋ฐ์—๋Š” ๊ฑฐ๋Œ€ํ•œ ์–‘์˜ text corpus์— ๋Œ€ํ•ด pre-trainingํ•˜๊ณ ..

Paper Reading ๐Ÿ“œ/Natural Language Processing

TinyBERT: Distilling BERT for Natural Language Understanding ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper BERT์™€ ๊ฐ™์€ LM pre-training์€ ์—ฌ๋Ÿฌ NLP task์— ๋Œ€ํ•ด ์ƒ๋‹นํžˆ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. ํ•˜์ง€๋งŒ, PLM์€ ๋ณดํ†ต ๊ณ„์‚ฐ์  ๋น„์šฉ์ด ๋งค์šฐ ๋น„์‹ธ๊ณ , ๊ทธ์— ๋”ฐ๋ผ์„œ ์ž์›์ด ์ œํ•œ๋œ ํ™˜๊ฒฝ์—์„œ ์‹คํ–‰ํ•˜๋Š”๋ฐ ์–ด๋ ค์›€์ด ์žˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” Transformer distillation method๋ฅผ ์ œ์•ˆํ•ด์„œ ์ถ”๋ก  ์†๋„๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•˜๊ณ , ๋ชจ๋ธ ํฌ๊ธฐ๋„ ์ค„์–ด๋“ค๊ฒŒ ํ•˜๊ณ , ๊ทธ ๋Œ€์‹ ์— ์ •ํ™•๋„๋Š” ์œ ์ง€์‹œ์ผฐ๋‹ค. ์ด Transformer distillation method๋Š” Transformer ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์— ๋Œ€ํ•ด knowledge distillation$($KD$)$์„ ์ ์šฉ์‹œ์ผฐ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ํ’๋ถ€ํ•œ ์ง€์‹์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ํฐ 'teacher' BERT์—์„œ ์ž‘์€ 'student' TinyBERT..

Paper Reading ๐Ÿ“œ/Natural Language Processing

Pre-LN Transformer: On Layer Normalization in the Transformer Architecture ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper Transformer๋Š” NLP task์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋œ๋‹ค. ํ•˜์ง€๋งŒ Transformer๋ฅผ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด, ๋Œ€๊ฒŒ ์‹ ์ค‘ํ•˜๊ฒŒ ๋””์ž์ธ๋œ learning rate warm-up stage๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ด learning rate warm-up stage๋Š” ์ตœ์ข… ์„ฑ๋Šฅ์— ๋งŽ์€ ์˜ํ–ฅ์„ ๋ผ์น˜์ง€๋งŒ, optimization์˜ ์†๋„๋ฅผ ์ €ํ•˜์‹œํ‚ค๊ณ  ๋” ๋งŽ์€ hyper-parameter tuning์„ ํ•„์š”๋กœ ํ•œ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” learning rate warm-up stage๊ฐ€ ์™œ ํ•„์ˆ˜์ ์ธ์ง€์™€ layer normalization$($LN$)$์˜ ์œ„์น˜์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ๋…ผ๋ฌธ์—์„œ๋Š” ์ดˆ๊ธฐํ™” ์‹œ residual block ์‚ฌ์ด์— layer normalization์„..

Paper Reading ๐Ÿ“œ/Natural Language Processing

Longformer: The Long-Document Transformer ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper ๊ธฐ์กด์˜ Transformer ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ๋“ค์€ long sequence ์ฒ˜๋ฆฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜์˜€๋‹ค. ์™œ๋ƒํ•˜๋ฉด, ๊ณ„์‚ฐ๋Ÿ‰์ด ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ๋Š˜์–ด๋‚ฌ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด๋Ÿฌํ•œ ์ œ์•ฝ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, sequence length์— ๋”ฐ๋ผ ์„ ํ˜•์ ์œผ๋กœ ์Šค์ผ€์ผ๋ง๋˜๋Š” attention mechanism์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” Longformer์„ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ์ด๋Š” ์ˆ˜์ฒœ๊ฐœ ๋˜๋Š” ๋” ๊ธด ํ† ํฐ์„ ๊ฐ€์ง€๋Š” ๋ฌธ์„œ์— ๋Œ€ํ•ด์„œ๋„ ์‰ฝ๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค์–ด์ฃผ์—ˆ๋‹ค. Longformer์˜ attention mechanism์€ ๊ธฐ์กด์˜ self-attention์— ๋Œ€ํ•œ drop-in ๋Œ€์ฒด์ด๊ณ , local windowed attention๊ณผ task motivated global attention์„ ํ•ฉ์ณค๋‹ค. ์ด์ „์˜ long..

Paper Reading ๐Ÿ“œ/Natural Language Processing

SpanBERT: Improving Pre-training by Representing and Predicting Spans ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

The overview of this paper ๋…ผ๋ฌธ์—์„œ๋Š” ํ…์ŠคํŠธ ๋ฒ”์œ„๋ฅผ ๋”์šฑ ์ž˜ ํ‘œํ˜„ํ•˜๋Š” pre-training method์ธ SpanBERT๋ฅผ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์—์„œ์˜ ๋ฐฉ์‹์€ BERT๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ™•์žฅํ•˜์˜€๋‹ค. 1. ๋žœ๋ค ํ† ํฐ์„ ๋งˆ์Šคํ‚นํ•˜๊ธฐ ๋ณด๋‹ค๋Š” ์ธ์ ‘ํ•œ ๋žœ๋ค ํ† ํฐ์„ ๋งˆ์Šคํ‚น 2. Span Boundary Representations$($SBO$)$๋ฅผ ํ•™์Šต์‹œ์ผœ ๊ฐ๊ฐ์˜ token representation์— ์˜์กดํ•˜์ง€ ์•Š๊ณ  masked token์˜ ์ „์ฒด ๋‚ด์šฉ์„ ์˜ˆ์ธก. SpanBERT๋Š” BERT๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๊ณ , SpanBERT๋Š” QA์™€ coreference resolution ๊ฐ™์€ span selection ๋ฌธ์ œ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๋‹ค. Table of Contents 1. Introduction..

Cartinoe
'Paper Reading ๐Ÿ“œ/Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก (4 Page)