Paper Reading ๐Ÿ“œ/Natural Language Processing

Pre-trained Language Modeling paper reading(1) - ELMo: Deep contextualized word representations

2022. 11. 16. 22:00
๋ชฉ์ฐจ
  1. Pre-trained Language Modeling paper reading
  2. Table of Contents

Pre-trained Language Modeling paper reading

์š”์ฆ˜ NLP ๋ถ„์•ผ์—์„œ ๋œจ๊ฑฐ์šด ๊ฐ์ž์ธ pre-trained Language Modeling์— ๊ด€ํ•œ ์œ ๋ช…ํ•œ ๋…ผ๋ฌธ๋“ค์„ ์ฝ๊ณ  ๋ฆฌ๋ทฐ๋ฅผ ํ•˜์˜€๋‹ค. ์ด Pre-trained Language Modeling paper reading์€ ์ด ํฌ์ŠคํŠธ๋งŒ์œผ๋กœ ๋๋‚˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์—ฐ์†๋œ ํฌ์ŠคํŠธ๋ฅผ ์ž‘์„ฑํ•  ์ƒ๊ฐ์ด๋‹ค. ๊ทธ๋ž˜์„œ ์ด ํฌ์ŠคํŠธ๋Š” Pre-trained Language Modeling paper reading์˜ ์ฒซ ์„œ๋ง‰์„ ์—ฌ๋Š” ํฌ์ŠคํŠธ์ด๋‹ค. ์•ž์œผ๋กœ์˜ ํฌ์ŠคํŠธ ๊ณ„ํš์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  1. ELMo: 'Deep contextualized word representations' reading & review(this post)
  2. BERT: 'Pre-training of Deep Bidirectional Transformers for Language Understanding' reading & review
  3. GPT-1: 'Improving Language Understanding by Generative Pre-Training' reading & review

๊ทธ๋ž˜์„œ ์˜ค๋Š˜์€ ELMo ๋…ผ๋ฌธ์— ๋Œ€ํ•ด์„œ ์ฝ๊ณ  ๋ฆฌ๋ทฐ๋ฅผ ํ•ด๋ณผ ๊ฒƒ์ด๋‹ค. ELMo ๋…ผ๋ฌธ์€ ์—ฌ๊ธฐ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Table of Contents

1. Introduction

2. ELMo: Embedding from Language Models

   2-1. Bidirectional language models

   2-2. ELMo

   2-3. Using biLMs for supervised NLP tasks

   2-4. Pre-trained bidirectional language model architecture

3. ์•Œ์•„๋ณด๊ธฐ ์‰ฝ๊ฒŒ ELMo ์„ค๋ช…

4. Analysis

 

 

1. Introduction

์ด ๋…ผ๋ฌธ์˜ ์ €์ž๋“ค์ด ๋งํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ”๋Š” pre-trained word representation ์ž์ฒด๊ฐ€ ์ด๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘๋˜๋Š” ๋‹ค์–‘ํ•œ NLP tasks์— ๋Œ€ํ•œ key component๋ผ๊ณ  ์ฃผ์žฅํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  high quality representation์€ ์กฐ๊ฑด๋“ค์„ ๋”ฐ๋ผ์•ผ ํ•˜๋Š”๋ฐ ๊ทธ ์กฐ๊ฑด๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • ๋‹จ์–ด์˜ ๋ณต์žกํ•œ ํŠน์ง•์„ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค. (๊ตฌ๋ฌธ ๋ถ„์„ ๊ด€์ ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ & ์˜๋ฏธ ๋ถ„์„ ๊ด€์ ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์„ ์ „๋ถ€ ๋‹ค ์ปค๋ฒ„ํ•ด์•ผ ๋จ)
  • ๋‹จ์–ด๋“ค์ด linguistic context ์ƒ์—์„œ ์„œ๋กœ ๋‹ค๋ฅด๊ฒŒ ์‚ฌ์šฉ๋  ๋•Œ, ํ•ด๋‹นํ•˜๋Š” ์‚ฌ์šฉ๋ฒ•์— ๋งž๋Š” representation์„ ํ‘œํ˜„ํ•ด์ค˜์•ผ ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์˜์–ด์ธ '๋ฐฐ'๋ผ๋Š” ๋‹จ์–ด์— ๋Œ€ํ•ด ๊ณผ์ผ์˜ ์˜๋ฏธ๋กœ ์‚ฌ์šฉ๋  ๋•Œ์™€ ์šด์†ก์ˆ˜๋‹จ์˜ ์˜๋ฏธ๋กœ ์‚ฌ์šฉ๋  ๋•Œ ์ด ๋‘ ์ƒํ™ฉ์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๊ฐ€ ๋‹ฌ๋ผ์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

ELMo์˜ ํŠน์ง•

ELMo๋Š” ๊ฐ๊ฐ์˜ token๋“ค์— ๋Œ€ํ•ด ์ „์ฒด input sentence์˜ ํ•จ์ˆ˜๋ฅผ representation์œผ๋กœ ๋ฐ›๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ  ELMo๋Š” language model์— ๋Œ€ํ•œ ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ๊ฐ€์ง€๊ณ  ํ•™์Šต์ด ๋˜๋Š” bidirectional-LSTM์œผ๋กœ๋ถ€ํ„ฐ ์œ ๋ž˜๋˜๋Š” vector๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ELMo ์ž์ฒด๋Š” ์ „์ฒด์˜ ์ž…๋ ฅ ๋ฌธ์žฅ์„ ์ด์šฉ์„ ํ•œ representation์ด๊ณ , ๊ทธ๋ฆฌ๊ณ  ์ด ๋ฌธ์žฅ์€ bidirectional-LSTM์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šต์ด ๋œ๋‹ค. ๋‹ค์Œ์€ ELMo์˜ ํŠน์ง•์ด๋‹ค.

  • ELMo๋Š” ์ƒ๋‹นํžˆ deepํ•œ ๋ชจ๋ธ์ธ๋ฐ, ์ด๋Š” bidirectional-LSTM์˜ ๋ชจ๋“  ๋‚ด๋ถ€ internal layer์— ํ•ด๋‹นํ•˜๋Š” ํžˆ๋“  ๋ฒกํ„ฐ๋“ค์„ ๊ฒฐํ•ฉ์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋”ฐ๋ผ์„œ ํŠน์ •ํ•œ ์ธต์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ bidirectional-LSTM์ด ํ•™์Šต์ด ๋˜์—ˆ์„ ๋•Œ ๋‚˜์˜ค๋Š”, ๋‹ค์–‘ํ•œ ํžˆ๋“  ๋ฒกํ„ฐ๋“ค์„ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์‚ฌ์šฉ๋  ๊ฒƒ์ด๋‹ค.
    • ํ•˜๋‚˜์˜ position์— ์žˆ๋Š” ๋‹จ์–ด์— ์—ฌ๋Ÿฌ ์ธต์— ํ•ด๋‹นํ•˜๋Š” ์ •๋ณด๋“ค์„ ๊ฒฐํ•ฉํ•œ๋‹ค. ๊ฐ€์žฅ ์œ— ๋‹จ์˜ LSTM layer๋ฅผ ์‚ฌ์šฉํ•œ representation๋ณด๋‹ค ์„ฑ๋Šฅ์ด ํ›จ์”ฌ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค.
    • ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์˜ representation์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ์œ„์ชฝ์— ์žˆ๋Š” ๋†’์€ ๋ ˆ๋ฒจ์— ์žˆ๋Š” hidden state๋“ค์€ context-dependentํ•˜๊ณ , ๋ณด๋‹ค ๋‚ฎ์€ ๋ ˆ๋ฒจ์— ์žˆ๋Š” hidden state๋“ค์€ syntax์— ํ•ด๋‹นํ•˜๋Š” ํŠน์ง•๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋งŒ์•ฝ syntax ํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ณด๋‹ค ๋‚ฎ์€ layer์˜ hidden state๋“ค์— ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๋ฉด ๋˜๊ณ , context-dependent ํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋†’์€ layer์˜ hidden state๋“ค์— ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๋ฉด ๋œ๋‹ค. 

 

2. ELMo: Embedding from Language Models

์ด๋ฒˆ ์žฅ์—์„œ๋Š” ELMo์˜ ๊ตฌ์กฐ์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•œ๋‹ค. ELMo๋Š” ๋‚ด๋ถ€ ๋„คํŠธ์›Œํฌ ์ƒํƒœ(2-2)์˜ ์„ ํ˜• ํ•จ์ˆ˜๋กœ์„œ ๋ฌธ์ž ์ปจ๋ณผ๋ฃจ์…˜(2-1)์ด ์žˆ๋Š” 2๊ณ„์ธต biLM ์œ„์—์„œ ๊ณ„์‚ฐ๋œ๋‹ค. ์ด ์„ค์ •์„ ํ†ตํ•ด biLM์ด ๋Œ€๊ทœ๋ชจ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จ๋˜๊ณ (2-4) ๊ธฐ์กด์˜ ๊ด‘๋ฒ”์œ„ํ•œ neural NLP architecture(2-3)์— ์‰ฝ๊ฒŒ ํ†ตํ•ฉ๋˜๋Š” semi-supervised learning์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2-1. Bidirectional language modeling

Bidirectional language modeling์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋‰˜๋Š”๋ฐ forward LM๊ณผ backward LM์ด๋‹ค. ๋ง ๊ทธ๋ž˜๋„ ์ด ๋‘˜์€ sequence๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ฐ๊ฐ ์•ž์—์„œ๋ถ€ํ„ฐ ํ•ด์„์„ ํ•˜๋Š”์ง€์™€ ๋’ค์—์„œ๋ถ€ํ„ฐ ํ•ด์„์„ ํ•˜๋Š”์ง€์ด๋‹ค. N๊ฐœ์˜ token sequence์ธ (t_1, t_2, ..., t_N)๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ Bidirectional LM์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณผ์ •์œผ๋กœ ์ง„ํ–‰๋œ๋‹ค.

 

forward LM

๋จผ์ € forward LM์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ๋‹ค. forward LM์€ sequence probability๋ฅผ ์ด๋ฏธ ์ฃผ์–ด์กŒ๋˜ ํ† ํฐ๋“ค์ธ(t_1, ..., t_k-1)์— ๋Œ€ํ•ด ๋ชจ๋ธ๋ง์„ ์ง„ํ–‰ํ•˜๋ฉด์„œ ๊ตฌํ•œ๋‹ค. ์•„๋ž˜์˜ ์ˆ˜์‹์ด ๊ทธ ๊ณผ์ •์ด๋‹ค.

 

forward LM

์ตœ๊ทผ์˜ SOTA neural language model๋“ค์€ context-independentํ•œ token representation x^LM_k(๋…ผ๋ฌธ์—์„œ ํ™•์ธํ•˜์‹œ์˜ค)๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ๊ทธ๋‹ค์Œ์— ์ด๊ฒƒ์„ L layer์˜ forward LSTM๋“ค์— ํ˜๋ ค๋ณด๋‚ธ๋‹ค. ๊ฐ๊ฐ์˜ k position์—์„œ ๊ฐ๊ฐ์˜ LSTM layer๋Š” context-dependent ํ•œ representation์ธ j=1, ..., L์ธ forward  h^LM_i,j(๋…ผ๋ฌธ์—์„œ ํ™•์ธํ•˜์‹œ์˜ค)๋ฅผ ์ถœ๋ ฅํ•ด๋‚ธ๋‹ค. ๊ฐ€์žฅ ๋†’์€ LSTM์ธต์˜ ์ถœ๋ ฅ์ธ forward h^LM_k,L์€ Softmax layer์™€ ํ•จ๊ป˜ ๋‹ค์Œ ํ† ํฐ์ธ t_k+1์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

 

backward LM

backward LM๋„ forward LM๊ณผ sequence๋ฅผ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ๋ถ€ํ„ฐ runํ•œ๋‹ค๋Š” ์ ๋งŒ ์ œ์™ธํ•˜๋ฉด ๋น„์Šทํ•˜๋‹ค. ํ•œ ๋งˆ๋””๋กœ future context๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์ด์ „์˜ token์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์•„๋ž˜์˜ ์ˆ˜์‹์ด ๊ทธ ๊ณผ์ •์ด๋‹ค.

 

backward LM

์ด๊ฒƒ๋„ forward LM๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ์ ์˜๋  ์ˆ˜ ์žˆ๋Š”๋ฐ representation์„ ์ƒ์„ฑํ•˜๋Š” L layer deep model์˜ ๊ฐ backward LSTM ๊ณ„์ธต j๋กœ ์ฃผ์–ด์ง„ t_k์ธ (t_k+1, ..., t_N)์˜ backward h^LM_k,j(๋…ผ๋ฌธ์—์„œ ํ™•์ธํ•˜์‹œ์˜ค).

 

biLM

biLM์€ forward LM๊ณผ backward LM์˜ ๊ฒฐํ•ฉ์ด๋‹ค. ๋…ผ๋ฌธ์˜ ์‹์€ forward์™€ backeward direction์— ๋Œ€ํ•ด ๊ณต๋™์œผ๋กœ log likelihood๋ฅผ ๊ทน๋Œ€ํ™” ์‹œํ‚จ๋‹ค. ์•„๋ž˜ ์ˆ˜์‹์ด biLM์˜ ๊ณผ์ •์ด๋‹ค.

 

biLM

๋…ผ๋ฌธ์—์„œ๋Š” token representation(ฮ˜x)์™€ Softmax layer(ฮ˜s) ๋ชจ๋‘์— ๋Œ€ํ•œ parameter๋ฅผ forward์™€ backward๋กœ ๋ฌถ๊ณ  ๊ฐ ๋ฐฉํ–ฅ์—์„œ LSTM์— ๋Œ€ํ•œ ๋ณ„๋„์˜ parameter๋ฅผ ์œ ์ง€ํ•˜๊ฒŒ ํ–ˆ๋‹ค. ์ด ์‹์€ ์ด์ „์˜ ์—ฐ๊ตฌ์™€ ๋น„์Šทํ•œ ๋ชจ์Šต์„ ๋ณด์ด์ง€๋งŒ, ์™„์ „ํžˆ ๋…๋ฆฝ์ ์ธ parameter๋กœ ํ•œ ๊ฒƒ์ด ์•„๋‹Œ ๋ฐฉํ–ฅ๋“ค ์‚ฌ์ด์—์„œ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณต์œ ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰ํ•˜์˜€๋‹ค.

 

2-2. ELMo

ELMo๋Š” biLM์˜ ์ค‘๊ฐ„ ๊ณ„์ธต๋“ค์˜ representation์˜ task specificํ•œ combination์ด๋‹ค. ๊ฐ๊ฐ์˜ token t_k์— ๋Œ€ํ•ด L-layer biLM์€ 2L+1์˜ representation์„ ๊ณ„์‚ฐํ•ด์•ผ ํ•œ๋‹ค. ์•„๋ž˜ ์ˆ˜์‹์ด ๊ทธ ๊ณผ์ •์ด๋‹ค.

 

L-layer biLM ๊ณ„์‚ฐ

h^LM_k,0(๋…ผ๋ฌธ์—์„œ ํ™•์ธํ•˜์‹œ์˜ค)์€ token layer์ด๊ณ , h^LM_k,j๋Š” ๊ฐ๊ฐ์˜ biLSTM layer์— ๋Œ€ํ•œ forward LM๊ณผ backward LM์ด๋‹ค.

downstream model์—์„œ ELMo๋Š” R์˜ ๋ชจ๋“  layer๋ฅผ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ ELMo_k = E(R_k;ฮ˜_e)๋กœ collapse ์‹œํ‚จ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” task specific ํ•œ weighting์„ ๋ชจ๋“  biLM layer์— ๋Œ€ํ•ด ์ง„ํ–‰ํ•˜์˜€๋‹ค. ๊ทธ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

ELMo

s^task๋Š” softmax-normalized ๊ฐ€์ค‘์น˜์ธ๋ฐ, ์ด๋Š” task specificํ•˜๊ฒŒ ๊ฐ€์ค‘์น˜๋ฅผ ํ•™์Šต์‹œํ‚จ๋‹ค(์ด ๋‚ด์šฉ์€ 3์žฅ์—์„œ ๋‹ค์‹œ ํ•œ๋ฒˆ ์„ค๋ช…). ฮณ(๊ฐ๋งˆ)^task๋Š” scalar parameter๋กœ task model์—๊ฒŒ ๋ชจ๋“  ELMo vector๋ฅผ scale ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์ค€๋‹ค.ฮณ(๊ฐ๋งˆ)๋Š” optimization process๋ฅผ ๋„์™€์ฃผ๋Š” ์ค‘์š”ํ•œ ์š”์†Œ์ด๋‹ค. ์ด๋Š” ์ „์ฒด ๊ฐ€์ค‘ํ•ฉ์ด ๋œ pre-trainced vector๋ฅผ ์–ผ๋งˆ๋‚˜ ํ™•๋Œ€ ๋˜๋Š” ์ถ•์†Œ์‹œํ‚ฌ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ๊ฐ’์ด๋‹ค. ๊ฐ๊ฐ์˜ biLM layer์˜ activation์€ ์„œ๋กœ ๋‹ค๋ฅธ distribution์„ ๊ฐ€์ง€๋Š”๋ฐ, ์–ด๋–ค ์ƒํ™ฉ์—์„œ๋Š” ๊ฐ๊ฐ์˜ biLM layer์— layer normalization์„ weighting ํ•˜๊ธฐ ์ „์— ์ ์šฉํ•˜๋Š” ๊ฒƒ์— ๋„์›€์„ ์ค€๋‹ค.

 

2-3. Using biLMs for supervised NLP tasks

NLP task์— ๋Œ€ํ•œ architecture์™€ pre-trained๋˜ biLM์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, biLM์„ ๊ฐ๊ฐ์˜ ๋‹จ์–ด์˜ ๋ชจ๋“  representation layer์— ๋Œ๋ฆฌ๋ฉด, ๋งˆ์ง€๋ง‰์— ๋ชจ๋ธ์€ ์ด representation๋“ค์— ๋Œ€ํ•œ ์„ ํ˜• ๊ฒฐํ•ฉ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ์ด ๊ณผ์ •์€ ์•„๋ž˜์—์„œ ๋”์šฑ ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๊ฒ ๋‹ค.

  1. ์ฒซ ๋ฒˆ์งธ๋กœ, ๊ฐ€์žฅ layer์˜ ๊ฐ€์žฅ ๋ฐ‘๋™์€ biLM์ด ์ ์šฉ๋˜์ง€ ์•Š์•˜๋‹ค. ๋งŽ์€ supervised NLP model๋“ค์€ ์ผ๋ฐ˜์ ์ธ architecture์„ ๊ณต์œ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ELMo๋ฅผ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ํ† ํฐ ์‹œํ€€์Šค(t_1, ... , t_N )๊ฐ€ ์ฃผ์–ด์ง€๋ฉด pre-trained word embedding๊ณผ ์„ ํƒ์ ์œผ๋กœ chracter-based representation์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ† ํฐ ์œ„์น˜์— ๋Œ€ํ•ด context-independent toekn representation x_k๋ฅผ ํ˜•์„ฑํ•˜๋Š” ๊ฒƒ์ด ํ‘œ์ค€์ด๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์–‘๋ฐฉํ–ฅ RNN, CNN ๋˜๋Š” ํ”ผ๋“œ ํฌ์›Œ๋“œ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒํ™ฉ์— ๋งž๋Š” ํ‘œํ˜„ h_k๋ฅผ ํ˜•์„ฑํ•œ๋‹ค.
  2. ๊ทธ๋‹ค์Œ์œผ๋กœ, supervised model์— ELMo๋ฅผ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์ฒ˜์Œ์— biLM ๊ฐ€์ค‘์น˜๋ฅผ freeze ํ•˜๊ณ  ELMo^task_k์™€ x_k๋ฅผ concatenate ํ•˜์—ฌ ํ–ฅ์ƒ๋œ ELMo representation์„ task RNN์— ํ†ต๊ณผ์‹œํ‚จ๋‹ค.
  3. ๋งˆ์ง€๋ง‰์œผ๋กœ, ELMo์—๋„ ์ ๋‹นํ•œ ์–‘์˜ dropout์„ ์ถ”๊ฐ€์‹œํ‚ค๊ณ  ์–ด๋–ค ๊ฒฝ์šฐ์—๋Š” loss์— ๋žŒ๋‹ค๊ฐ’(๋…ผ๋ฌธ์—์„œ ํ™•์ธํ•˜์‹œ์˜ค)์„ ์ถ”๊ฐ€ํ•˜์—ฌ ELMo ๊ฐ€์ค‘์น˜๋ฅผ ์ •๊ทœํ™”ํ•œ๋‹ค. ์ด๊ฒƒ์€ ๋ชจ๋“  biLM layer์˜ ํ‰๊ท ์— ๊ฐ€๊น๊ฒŒ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ELMo weights์— indeuctive bias๋ฅผ ๋ถ€๊ณผํ•œ๋‹ค.

 

2-4. Pre-trained bidirectional language model architecture

์ด ๋…ผ๋ฌธ์˜ pre-trained biLMs๋Š” ์ด์ „์˜ ๋…ผ๋ฌธ๋“ค๊ณผ ์œ ์‚ฌํ•œ๋ฐ, ์–‘๋ฐฉํ–ฅ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ด์ง ์ˆ˜์ •๋˜์—ˆ๊ณ , residual connection์„ LSTM layers ์‚ฌ์ด์— ์ถ”๊ฐ€ํ•˜์˜€๋‹ค. 

๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉ๋œ ์ตœ์ข… ๋ชจ๋ธ์€ L = 2์ธ biLSTM์ด๊ณ , ์ด๊ฒƒ์€ 4096๊ฐœ์˜ unit๊ณผ 512๊ฐœ์˜ dimension projection์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ , residual connection์ด ์ฒซ ๋ฒˆ์งธ layer๋กœ๋ถ€ํ„ฐ ๋‘ ๋ฒˆ์งธ layer๊นŒ์ง€ ์žˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ biLM์€ purely character input์œผ๋กœ ์ธํ•ด ํ›ˆ๋ จ ์„ธํŠธ ์™ธ๋ถ€์˜ representation์„ ํฌํ•จํ•˜์—ฌ ๊ฐ input token์— ๋Œ€ํ•ด 3๊ฐœ์˜ representation layer๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ๋ฐ˜๋ฉด์— ์ „ํ†ต์ ์ธ word embedding method๋“ค์€ ์˜ค์ง ํ•˜๋‚˜์˜ representation layer๋ฅผ ๊ณ ์ •๋œ vocabulary ์•ˆ์˜ tokens์— ๋Œ€ํ•ด ์ œ๊ณตํ•œ๋‹ค.

10 epochs์˜ ํ›ˆ๋ จ์„ ๊ฑฐ์น˜๊ณ  ๋‚œ ํ›„์— forward CNN-BIG-LSTM๋Š” perplexity๊ฐ€ 30.0์ธ ๊ฒƒ์— ๋น„ํ•ด forward์™€ backward์˜ perplexities์˜ ํ‰๊ท ์€ 39.7์ด์—ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, forward ๋ฐ backward perplexities๊ฐ€ ๊ฑฐ์˜ ๋™์ผํ•˜๊ณ  backward ๊ฐ’์ด ์•ฝ๊ฐ„ ๋” ๋‚ฎ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ๋‹ค.

 

 

3. ์•Œ์•„๋ณด๊ธฐ ์‰ฝ๊ฒŒ ELMo ์„ค๋ช…

์ด๋ฒˆ ์žฅ์—์„œ๋Š” ์ด ์‚ฌ์ดํŠธ์—์„œ ์„ค๋ช…ํ•˜๋Š” ELMo์— ๋Œ€ํ•ด ๋‹ค์‹œ ํ•œ ๋ฒˆ ์„ค๋ช…ํ•ด๋ณด์•˜๋‹ค.

ELMo๋Š” ๊ฐ๊ฐ์˜ ๋‹จ์–ด์— ๋Œ€ํ•ด embedding์„ ํ•˜๋Š” ๊ฒƒ ๋Œ€์‹ ์—, ๊ฐ๊ฐ์˜ ๋‹จ์–ด๋ฅผ embedding์— ํ• ๋‹นํ•˜๊ธฐ ์ „์— ์ „์ฒด sentence๋ฅผ ํ•œ ๋ฒˆ ๋‘˜๋Ÿฌ๋ณธ๋‹ค. ELMo๋Š” specific task์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋œ bidirectional-LSTM์„ ์ด์šฉํ•˜์—ฌ ์ด๋Ÿฌํ•œ embedding์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ELMo๋Š” NLP์˜ ๋งฅ๋ฝ์—์„œ pre-training์„ ํ–ฅํ•œ ์ค‘์š”ํ•œ ๋‹จ๊ณ„๋ฅผ ์ œ๊ณตํ–ˆ๋‹ค. ELMo์˜ LSTM์€ ์–ด๋– ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋ถ€ํ„ฐ ํ›ˆ๋ จ๋ผ์„œ ๋‹ค๋ฅธ ๋ชจ๋ธ์˜ ๊ตฌ์„ฑ ์š”์†Œ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•œ ๋งˆ๋””๋กœ, ๋‚ด๊ฐ€ ์‚ฌ์šฉํ•œ dataset์— ๋Œ€ํ•ด ํ›ˆ๋ จ์„ ์ง„ํ–‰ํ•˜๊ณ  ์ด ํ›ˆ๋ จ๋œ ELMo๋ฅผ ๋‹ค๋ฅธ ๋ชจ๋ธ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

 

ELMo's secret

ELMo๋Š” word sequence๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ์ด๋ฅธ ๋ฐ” Langauge Modeling์„ ํ†ตํ•ด language understanding์„ ์–ป๋Š”๋‹ค. ๋‹ค์Œ์˜ ๊ทธ๋ฆผ์€ ELMo์˜ pre-training process๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ์•„๋ž˜์˜ ๊ทธ๋ฆผ์—์„œ๋Š” 'Let's', 'stick', 'to'๋ผ๋Š” ๋‹จ์–ด๋ฅผ ๋ฐ›์•˜์„ ๋•Œ, ๊ฐ๊ฐ์˜ embedding vector์„ ์ง‘์–ด๋„ฃ์–ด์„œ LSTM์˜ layer๋“ค์— ํƒœ์šฐ๊ฒŒ ๋˜๋ฉด ์ด์ „ ๋‹จ์–ด๋“ค์˜ embedding vector์„ ์ด์šฉํ•ด์„œ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๊ฒŒ ๋œ๋‹ค.

ELMo์˜ pre-training process

์œ„์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด ELMo ์บ๋ฆญํ„ฐ์˜ ๋จธ๋ฆฌ ๋’ค์— ์ •์ ์— ๋„๋‹ฌํ•œ ๊ฐ๊ฐ์˜ unrolled-LSTM ๋‹จ๊ณ„์˜ ์ˆจ๊ฒจ์ง„ ์ƒํƒœ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. pre-training ๋‹จ๊ณ„๊ฐ€ ๋๋‚˜๊ณ  ์ด๊ฒƒ๋“ค์€ embedding process์—์„œ ์œ ์šฉํ•˜๋‹ค. ELMo๋Š” ์ถ”๊ฐ€์ ์ธ step๋“ค์„ ์ง„ํ–‰ํ•˜๊ณ  bidirectional LSTM์„ ํ›ˆ๋ จ์‹œํ‚จ๋‹ค. ๋”ฐ๋ผ์„œ ELMo๋Š” ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋Šฅ๋ ฅ๋งŒ ๊ฐ€์ง€๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ด์ „ ๋‹จ์–ด์— ๋Œ€ํ•œ ์˜ˆ์ธก๋„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋œ๋‹ค. ๋‹ค์Œ ๊ทธ๋ฆผ์ด ELMo์˜ biLM์„ ๋ณด์—ฌ์ค€๋‹ค.

 

ELMo step #1

๊ทธ๋ฆฌ๊ณ  ELMo๋Š” hidden states(and initial embedding)๋ฅผ ํŠน์ •ํ•œ ๋ฐฉ๋ฒ•(๊ฐ€์ค‘์น˜ ํ•ฉ์œผ๋กœ ์ธํ•œ concatenation)์œผ๋กœ grouping ์‹œํ‚ค๋Š” ๊ณผ์ •์„ ํ†ตํ•ด contextualized embedding์„ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค. ๋‹ค์Œ์˜ ๊ทธ๋ฆผ์€ ์ด ๊ณผ์ •์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

  • stick์ด๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ์กด์žฌํ•˜๋Š” ์ด ์‹œ์ ์—์„œ์˜ ์œ„์น˜๊ฐ€  forward์™€ backward์—์„œ ๋™์ผํ•˜๊ณ , ๊ฐ๊ฐ์˜ ๊ฐ™์€ ๋™์ผํ•œ ๋ ˆ๋ฒจ์— ์žˆ๋Š” hidden state vector์„ concatenaton์„ ํ•œ๋‹ค. ๋‹ค์‹œ ๋งํ•˜์ž๋ฉด, forward์˜ input๊ฐ’๊ณผ backward์˜ input๊ฐ’์„ ํ•ฉ์นœ ๊ฒƒ์ด Concatenate hidden layer์˜ ์ฒซ ๋ฒˆ์งธ ๊ฐ’์ด ๋˜๋Š” ๊ฒƒ์ด๊ณ , ์ด์™€ ๊ฐ™์€ ๊ณผ์ •์„ ๊ฑฐ์ณ ์•„๋ž˜ ๊ทธ๋ฆผ์˜ ์™ผ์ชฝ ์ƒ๋‹จ์— ์œ„์น˜ํ•ด์žˆ๋Š” Concatenate hidden layers๊ฐ€ ๋งŒ๋“ค์–ด์ง€๋Š” ๊ฒƒ์ด๋‹ค. ์ด ๋ง์ธ์ฆ‰์Šจ, forward ๋ชจ๋ธ๊ณผ backward ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ๊ฐ™์€ ๋ ˆ๋ฒจ์— ์žˆ๋Š” hidden state๋“ค์„ ์˜†์œผ๋กœ concatenation ํ•œ ๊ฒƒ์ด๋‹ค. 
  • ๊ทธ ๋’ค์— ํŠน์ •ํ•œ task์— ๋Œ€ํ•ด์„œ ์ด 3๊ฐœ์˜ ๋ฒกํ„ฐ๋“ค์— ๋Œ€ํ•œ ์ ์ ˆํ•œ ๊ฐ€์ค‘ํ•ฉ์„ ๊ณ„์‚ฐํ•ด์„œ(s_0, s_1, s_2 ๊ณ„์‚ฐ) ๊ฐ€์ค‘ํ•ฉ์„ ํ•˜๊ฒŒ ๋˜๋ฉด ์•„๋ž˜ ๊ทธ๋ฆผ์˜ ์ขŒ์ธก ํ•˜๋‹จ์— ์œ„์น˜ํ•ด ์žˆ๋Š” ํŒŒ๋ž€์ƒ‰ ๋ฒกํ„ฐ๊ฐ€ ๋งŒ๋“ค์–ด์ง€๊ฒŒ ๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๊ฒŒ ๋ฐ”๋กœ ELMo์— ์˜ํ•ด์„œ ๋งŒ๋“ค์–ด์ง„ 'stick' ์ด๋ผ๋Š” ๋‹จ์–ด์— ๋Œ€ํ•œ embedding ๊ฐ’์ด๋‹ค.
    • ๊ฐ€์ค‘ํ•ฉ์„ ํ•  ๋•Œ s_0, s_1, s_2๋Š” task์— ๋”ฐ๋ผ์„œ ๋ณ€ํ™”ํ•˜๋Š” parameter์ด๋‹ค. ๋”ฐ๋ผ์„œ ์–ด๋–ค task๊ฐ€ ์ฃผ์–ด์ง€๋А๋ƒ์— ๋”ฐ๋ผ ํ•™์Šต์ด ํ•จ๊ป˜ ์ง„ํ–‰๋œ๋‹ค. ๋งŒ์•ฝ syntax์— ๋Œ€ํ•œ task๋ฅผ ์ง„ํ–‰ํ•˜๋ ค๋ฉด ์•„๋ž˜์— ๋ ˆ๋ฒจ์— ์œ„์น˜ํ•ด ์žˆ๋Š” ๋ฒกํ„ฐ์— ๋Œ€ํ•ด ๋” ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ–๊ฒŒ ๋  ๊ฒƒ์ด๊ณ , contextual ํ•œ task๋ฅผ ์ง„ํ–‰ํ•˜๋ ค๋ฉด ์œ„์˜ ๋ ˆ๋ฒจ์— ์œ„์น˜ํ•ด ์žˆ๋Š” ๋ฒกํ„ฐ์— ๋Œ€ํ•ด ๋” ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ–๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค. 

 

ELMo step #2

 

 

4. Analysis

๋…ผ๋ฌธ์—์„œ๋Š” Experiment๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  ๋‚˜์„œ ๋ช‡ ๊ฐ€์ง€์˜ Ablation Study๋ฅผ ์ง„ํ–‰ํ–ˆ๋Š”๋ฐ ๊ทธ ์ค‘์— ๋ช‡ ๊ฐ€์ง€๋ฅผ ์‚ดํŽด๋ณด์•˜๋‹ค.

 

Alternate layer weighting scheme

์ด layer๋ฅผ ์–ด๋–ป๊ฒŒ ๊ฐ€์ค‘ํ•ฉ์„ ํ•˜๋Š” ๊ฒŒ ์ข‹์œผ๋ƒ ๋ผ๋Š” ์งˆ๋ฌธ์ด ๋˜์ ธ์กŒ์„ ๋•Œ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๋‹ตํ•˜์˜€๋‹ค.

Analysis 1

์œ„์˜ ๊ทธ๋ฆผ์„ ํ•ด์„ํ•ด๋ณด๋ฉด ์ฒซ ๋ฒˆ์งธ๋กœ๋Š” task์— ๋”ฐ๋ผ์„œ ๊ฐ€์ค‘์น˜ ๊ฐ’์„ ๋‹ค๋ฅด๊ฒŒ ์ฃผ๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹๊ณ , ๊ทธ๋‹ค์Œ์œผ๋กœ๋Š” ๋ชจ๋“  ๋ฒกํ„ฐ๋“ค์— ๋Œ€ํ•ด ๋˜‘๊ฐ™์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์ด ์ข‹์•˜๊ณ , ๊ทธ๋‹ค์Œ์œผ๋กœ๋Š” ์ œ์ผ ๋†’์€ ๋ ˆ๋ฒจ์˜ ๋ฒกํ„ฐ์— ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๋Š” ๊ฒƒ์ด ๋‚˜์•˜๋‹ค.

 

Where to include ELMo?

ELMo๋ฅผ ์–ด๋””์— ์ง‘์–ด๋„ฃ๋Š” ๊ฒƒ์ด ๋” ์ข‹์„์ง€์— ๋Œ€ํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์ด๋‹ค. ๋‹ค์Œ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๋‹ตํ•˜์˜€๋‹ค. ๋‹ค์Œ ๊ทธ๋ฆผ์€ LM์ด ์•„๋‹Œ downstream task์— ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค. 

Analysis 2

์œ„์˜ ๊ทธ๋ฆผ์„ ํ•ด์„ํ•ด๋ณด๋ฉด Input ๊ฐ’๊ณผ Output ๊ฐ’์— ELMo๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์•˜๊ณ , ๊ทธ ๋‹ค์Œ์œผ๋กœ๋Š” Input์—๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด, ๊ทธ๋‹ค์Œ์—๋Š” Output ๊ฐ’์—๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์ด ์ข‹์•˜๋‹ค.

 

 

 

 

์ถœ์ฒ˜

https://arxiv.org/abs/1802.05365

 

Deep contextualized word representations

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are

arxiv.org

https://www.youtube.com/watch?v=zV8kIUwH32M 

https://jalammar.github.io/illustrated-bert/

 

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)

Discussions: Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations: Chinese (Simplified), French 1, French 2, Japanese, Korean, Persian, Russian, Spanish 2021 Update: I created this brief and highly accessibl

jalammar.github.io

 

'Paper Reading ๐Ÿ“œ > Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Transformer: 'Attention Is All You Need' ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (6) 2022.11.21
Pre-trained Language Modeling paper reading(3) - GPT-1: Improving Language Understanding by Generative Pre-Training  (0) 2022.11.18
Pre-trained Language Modeling paper reading(2) - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding  (0) 2022.11.17
Embedding Matrix ํ•™์Šต  (2) 2022.11.16
Better Reasoning Behind Classification Predictions with BERT for Fake News Detection ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ  (2) 2022.11.15
  1. Pre-trained Language Modeling paper reading
  2. Table of Contents
'Paper Reading ๐Ÿ“œ/Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • Pre-trained Language Modeling paper reading(3) - GPT-1: Improving Language Understanding by Generative Pre-Training
  • Pre-trained Language Modeling paper reading(2) - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • Embedding Matrix ํ•™์Šต
  • Better Reasoning Behind Classification Predictions with BERT for Fake News Detection ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
Cartinoe
Cartinoe
Welcome! I'm a student studying about deep learning(NLP) ๐Ÿ˜‰ The goal of my study is to develop a competent LLM helping people!
Cartinoe's paper reviewWelcome! I'm a student studying about deep learning(NLP) ๐Ÿ˜‰ The goal of my study is to develop a competent LLM helping people!
  • faviconinstagram
  • faviconfacebook
  • favicongithub
  • faviconLinkedIn
Cartinoe's paper review
Cartinoe
Cartinoe
Cartinoe's paper review
Cartinoe
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • My Posting (141)
    • Paper Reading ๐Ÿ“œ (113)
      • Natural Language Processing (67)
      • Alignment Problem of LLM (11)
      • Computer Vision (4)
      • Deep Learning (6)
      • multimodal models (17)
      • Mathematics(์„ ํ˜•๋Œ€์ˆ˜, ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„, ๋ฏธ.. (8)
    • Lecture ๐Ÿง‘โ€๐Ÿซ (16)
      • Hugging Face Course (1)
      • Coursera (15)
    • Insight ๐Ÿ˜Ž (10)
    • Research & Project ๐Ÿ”ฌ (2)

์ธ๊ธฐ ๊ธ€

์ตœ๊ทผ ๊ธ€

๊ณต์ง€์‚ฌํ•ญ

  • ๋ธ”๋กœ๊ทธ ๊ณต์ง€์‚ฌํ•ญ - ๋ชจ๋ฐ”์ผ ์ˆ˜์‹ ๊นจ์ง

ํƒœ๊ทธ

  • LLAMA2
  • open-source model
  • ChatGPT
  • Chinchilla
  • closed-source model
  • Evaluation Metric
  • context length
  • proprietary model
  • RLHF
  • Vicuna Evaluation
  • Open-source
  • Vicuna
  • LLM
  • scaling law
  • MT-Bench
  • transformer
  • GPT-4
  • LM
  • context window
  • closed-source
hELLO ยท Designed By ์ •์ƒ์šฐ.
Cartinoe
Pre-trained Language Modeling paper reading(1) - ELMo: Deep contextualized word representations
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๊ฐœ์ธ์ •๋ณด

  • ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ
  • ํฌ๋Ÿผ
  • ๋กœ๊ทธ์ธ

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.