Lecture ๐Ÿง‘โ€๐Ÿซ/Coursera

[Machine Learning] Neural Networks

2023. 3. 20. 17:33

Model Representation I

 ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•ด์„œ ์–ด๋–ป๊ฒŒ hypothesis function์„ ํ‘œํ˜„ํ•  ์ง€ ์ƒ๊ฐํ•ด๋ณด๋„๋ก ํ•˜์ž. ๋งค์šฐ ๊ฐ„๋‹จํ•œ ์ˆ˜์ค€์—์„œ, ๋‰ด๋Ÿฐ์€ ์ „๊ธฐ์  ์‹ ํ˜ธ๋กœ ์ž…๋ ฅ์„ ๋ฐ›์•„์„œ ์ถœ๋ ฅ์„ ์ฑ„๋„๋งํ•˜๋Š” ๊ณ„์‚ฐ ์œ ๋‹›์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ชจ๋ธ์˜ ๊ฐœ๋…์œผ๋กœ ์ƒ๊ฐํ•ด๋ณด๋ฉด ์ž…๋ ฅ์€ feature $x_1, \cdots x_n$์ด ๋˜๊ณ , ์ถœ๋ ฅ์€ hypothesis function์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋œ๋‹ค. ๋ชจ๋ธ์—์„œ $x_0$ ์ž…๋ ฅ ๋…ธ๋“œ๋Š” bias unit์œผ๋กœ ๋ถˆ๋ฆฌ๊ธฐ๋„ ํ•˜๋Š”๋ฐ, ์ด ๋…ธ๋“œ๋Š” ํ•ญ์ƒ 1์˜ ๊ฐ’์„ ๊ฐ€์ง„๋‹ค. ์‹ ๊ฒฝ๋ง์—์„œ ๋ถ„๋ฅ˜์ฒ˜๋Ÿผ ๋˜‘๊ฐ™์€ logistic function์ด๊ณ , sigmoid ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ผ๊ณ ๋„ ๋ถˆ๋ฆฌ๋Š” $\frac {1}{1+e^{-\theta^{T}x}}$์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ์ƒํ™ฉ์—์„œ ์„ธํƒ€ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๊ฐ€์ค‘์น˜๋ผ๊ณ ๋„ ๋ถˆ๋ฆฐ๋‹ค.

 

 ์‹œ๊ฐ์ ์œผ๋กœ ๊ฐ„๋‹จํ•˜๊ฒŒ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

 '์ž…๋ ฅ ๋ ˆ์ด์–ด'๋ผ๊ณ ๋„ ์•Œ๋ ค์ ธ ์žˆ๋Š”, ์ž…๋ ฅ ๋…ธ๋“œ$($layer 1$)$๋Š” ๋‹ค๋ฅธ ๋…ธ๋“œ$($layer 2$)$๋กœ ํ–ฅํ•˜๊ฒŒ ๋œ๋‹ค. ์ด ๋‹ค๋ฅธ ๋…ธ๋“œ๋Š” hypothesis function์˜ ์ตœ์ข… ์ถœ๋ ฅ์„ ์ถœ๋ ฅํ•˜๋Š”๋ฐ, '์ถœ๋ ฅ ๋ ˆ์ด์–ด'๋ผ๊ณ  ๋ถˆ๋ฆฐ๋‹ค. ์ด ๋‘ ๋ ˆ์ด์–ด ์ค‘๊ฐ„์— ๋‹ค๋ฅธ ๋ ˆ์ด์–ด๋“ค๋„ ์กด์žฌํ•˜๋Š”๋ฐ, ์ด๋“ค์„ 'hidden layer'๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ์ด ์˜ˆ์‹œ์—์„œ๋Š”, ์ด ์ค‘๊ฐ„ ํ˜น์€ 'hidden' ๋ ˆ์ด์–ด ๋…ธ๋“œ $a_{0}^{2} \cdots a_{n}^{2}$๋กœ ๋ผ๋ฒจ๋งํ•˜๊ณ  'activation unit'์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

 

 

 ๋งŒ์•ฝ ํ•˜๋‚˜์˜ hidden layer๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค.

 

 

 ๊ฐ 'activation' ๋…ธ๋“œ๋“ค์˜ ๊ฐ’์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์–ป๊ฒŒ ๋œ๋‹ค.

 

 

 activation node๋Š” $3 \times 4$ ํฌ๊ธฐ์˜ ํ–‰๋ ฌ์„ ์ด์šฉํ•ด์„œ ๊ณ„์‚ฐ๋œ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐ ํ–‰์„ ์ž…๋ ฅ์— ์ ์šฉํ•ด์„œ ํ•˜๋‚˜์˜ activation node์— ๋Œ€ํ•œ ๊ฐ’์„ ์–ป๋Š”๋‹ค. hypothesis ์ถœ๋ ฅ์€ ๋…ธ๋“œ์˜ ๋‘ ๋ฒˆ์งธ ๋ ˆ์ด์–ด์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ํฌํ•จํ•˜๋Š” ๋˜ ๋‹ค๋ฅธ ๋งค๊ฐœ๋ณ€์ˆ˜ ํ–‰๋ ฌ $\theta^{2}$๋กœ ๊ณฑํ•ด์ง„ activation node ๊ฐ’์˜ ํ•ฉ์— ์ ์šฉ๋œ logistic ํ•จ์ˆ˜์ด๋‹ค.

 

 ๊ฐ ๋ ˆ์ด์–ด๋Š” ๊ฐ์ž์˜ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ์ธ $\theta^{(j)}$์„ ๊ฐ–๋Š”๋‹ค. ์ด ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ์˜ ์ฐจ์›์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฒฐ์ •๋œ๋‹ค.

 

layer j์—์„œ ๋„คํŠธ์›Œํฌ๊ฐ€ $s_j$ ์œ ๋‹›๊ณผ layer j+1์—์„œ $s_{j+1}$ ์œ ๋‹›์„ ๊ฐ€์ง€๋ฉด $\theta^{j}$์˜ ์ฐจ์›์€ $s_{j+1} \times (s_{j} + 1)$์„ ๊ฐ€์ง„๋‹ค.

 

 1์ด ๋”ํ•ด์ง„ ์ด์œ ๋Š” $\theta^{(j)}$์˜ bias node์ธ $x_{0}$๊ณผ $\theta_{0}^{(j)}$์˜ ์ถ”๊ฐ€๋กœ ์˜จ ๊ฒƒ์ด๋‹ค. ์ฆ‰, ์ถœ๋ ฅ ๋…ธ๋“œ์—๋Š” bias node๋ฅผ ํฌํ•จ๋˜์ง€ ์•Š์ง€๋งŒ, ์ž…๋ ฅ ๋…ธ๋“œ์—๋Š” ํฌํ•จ๋œ๋‹ค. ๋‹ค์Œ์˜ ๊ทธ๋ฆผ์€ model ํ‘œํ˜„์„ ์š”์•ฝํ•˜๊ณ  ์žˆ๋‹ค.

 

 

 ์˜ˆ๋ฅผ ๋“ค์–ด, layer 1์ด 2๊ฐœ์˜ ์ž…๋ ฅ ๋…ธ๋“œ๋ฅผ ๊ฐ–๊ณ , layer 2๊ฐ€ 4๊ฐœ์˜ activation node๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ํ•ด๋ณด์ž. $\theta^{(1)}$์˜ ์ฐจ์›์€ $4 \times 3$์ด ๋œ๋‹ค. ์—ฌ๊ธฐ์„œ $s_j = 2$์ด๊ณ  $s_{j+1}=4$์ด๋‹ค. ๊ทธ๋ž˜์„œ $s_{j+1} \times (s_j + 1) = 4 \times 3$์ด ๋œ๋‹ค.

 

 

Model Representation II

 ๋‹ค์Œ์€ ์‹ ๊ฒฝ๋ง์˜ ์˜ˆ์‹œ์ด๋‹ค.

 

 

 ์ด ์„น์…˜์—์„œ๋Š” ์œ„ ํ•จ์ˆ˜์˜ ๋ฒกํ„ฐํ™”๋œ ๊ตฌํ˜„์„ ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค. $g$ ํ•จ์ˆ˜์˜ ์•ˆ์— ์žˆ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํฌํ•จํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜ $z_{k}^{(j)}$์„ ์ •์˜ํ•˜๋„๋ก ํ•˜๊ฒ ๋‹ค. ์ด์ „์˜ ์˜ˆ์‹œ์—์„œ ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋ณ€์ˆ˜ $z$๋กœ ๋ฐ”๊พธ๋ฉด ๋‹ค์Œ์„ ์–ป๊ฒŒ ๋œ๋‹ค.

 

 

 ๋‹ค๋ฅธ ๋ง๋กœ ํ•˜๋ฉด, layer $j=2$์™€ ๋…ธ๋“œ $k$์— ๋Œ€ํ•ด ๋ณ€์ˆ˜ $z$๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

 $x$์™€ $z^{j}$์˜ ๋ฒกํ„ฐ ํ‘œํ˜„์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

 $x = a^{(1)}$์œผ๋กœ ์„ค์ •ํ•˜๋ฉด, ๋ฐฉ์ •์‹์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐ”๊ฟ” ์“ธ ์ˆ˜ ์žˆ๋‹ค.

 

 

 ํ–‰๋ ฌ $\theta^{(j-1)}$๊ณผ ์ฐจ์› $s_j \times (n+1)$$($์—ฌ๊ธฐ์„œ $s_j$๋Š” activation node์˜ ์ˆ˜$)$์„ ๋†’์ด $(n+1)$์ธ ๋ฒกํ„ฐ $a^{(j-1)}$๊ณผ ๊ณฑํ•œ๋‹ค. ์ด๊ฒƒ์€ ๋†’์ด๊ฐ€ $s_j$์ธ ๋ฒกํ„ฐ $z^{(j)}$๋ฅผ ์ค€๋‹ค. ์ด์ œ layer $j$์— ๋Œ€ํ•œ activation node์˜ ๋ฒกํ„ฐ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 

$a^{(j)} = g(z^{(j)})$

 

 ์—ฌ๊ธฐ์„œ ํ•จ์ˆ˜ $g$๋Š” ๋ฒกํ„ฐ $z^{(j)}$์— element-wiseํ•˜๊ฒŒ ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. $a^{(j)}$๋ฅผ ๊ณ„์‚ฐํ•œ ๋‹ค์Œ์— layer $j$์— bias unit์„ ๋”ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๊ฒƒ์€ ์›์†Œ $a_{0}^{(j)}$๊ฐ€ ๋˜๊ณ , 1๊ณผ ๊ฐ™์€ ๊ฐ’์„ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค. ์ตœ์ข… hypothesis๋ฅผ ๊ฒŒ์‚ฌํ•˜๊ธฐ ์œ„ํ•ด, ๋‹ค๋ฅธ $z$๋ฒกํ„ฐ๋ถ€ํ„ฐ ๊ณ„์‚ฐํ•ด์•ผ ํ•œ๋‹ค.

 

$z^{(j+1)} = \theta^{(j)}a^{(j)}$

 

  ๋ฐฉ๊ธˆ ์–ป์€ ๋ชจ๋“  activation node์˜ ๊ฐ’์„ $\theta^{(j-1)}$ ๋‹ค์Œ ์„ธํƒ€ ํ–‰๋ ฌ์— ๊ณฑํ•˜์—ฌ ์ตœ์ข… ๋ฒกํ„ฐ $z$๋ฅผ ์–ป๋Š”๋‹ค. ์ด ๋งˆ์ง€๋ง‰ ์„ธํƒ€ ํ–‰๋ ฌ $\theta^{(j)}$๋Š” ์˜ค์ง ํ•˜๋‚˜์˜ ํ–‰์„ ๊ฐ€์งˆ ๊ฒƒ์ด๊ณ , ์ด๋Š” ํ•˜๋‚˜์˜ ์—ด $a^{(j)}$์— ์˜ํ•ด ๊ณฑํ•ด์ ธ์„œ ๊ฒฐ๊ณผ์ ์œผ๋กœ ํ•˜๋‚˜์˜ ์ˆซ์ž๋ฅผ ๊ฐ–๊ฒŒ ๋œ๋‹ค. ์ตœ์ข…์ ์ธ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

$h_{\theta}(x) = a^{(j+1)} = g(z^{(j+1)})$

 

 layer $j$์™€ layer $j+1$ ์‚ฌ์ด์˜ ์ด ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„์—์„œ logistic regression์—์„œ ํ–ˆ๋˜ ๊ฒƒ๊ณผ ์ •ํ™•ํžˆ ๊ฐ™์€ ์ผ์„ ํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋“  ์ค‘๊ฐ„ ๋ ˆ์ด์–ด๋“ค์„ ์‹ ๊ฒฝ๋ง์— ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ๋” ์„ธ๋ จ๋˜๊ณ  ํฅ๋ฏธ๋กœ์šด ์ถœ๋ ฅ์„ ํ•˜๊ณ , ๋ณต์žกํ•œ ๋น„์„ ํ˜• hypothesis๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์ค€๋‹ค.

'Lecture ๐Ÿง‘โ€๐Ÿซ > Coursera' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Machine Learning] Backpropagation in Practice  (0) 2023.03.27
[Machine Learning] Cost Function & Backpropagation  (0) 2023.03.26
[Machine Learning] Solving the Problem of Overfitting  (2) 2023.03.20
[Machine Learning] Multiclass Classification  (0) 2023.03.15
[Machine Learning] Classification & Representation  (0) 2023.03.15
'Lecture ๐Ÿง‘โ€๐Ÿซ/Coursera' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [Machine Learning] Backpropagation in Practice
  • [Machine Learning] Cost Function & Backpropagation
  • [Machine Learning] Solving the Problem of Overfitting
  • [Machine Learning] Multiclass Classification
Cartinoe
Cartinoe
Welcome! I'm a student studying about deep learning(NLP) ๐Ÿ˜‰ The goal of my study is to develop a competent LLM helping people!
  • faviconinstagram
  • faviconfacebook
  • favicongithub
  • faviconLinkedIn
Cartinoe's paper review
Cartinoe
Cartinoe
Cartinoe's paper review
Cartinoe
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • My Posting (141)
    • Paper Reading ๐Ÿ“œ (113)
      • Natural Language Processing (67)
      • Alignment Problem of LLM (11)
      • Computer Vision (4)
      • Deep Learning (6)
      • multimodal models (17)
      • Mathematics(์„ ํ˜•๋Œ€์ˆ˜, ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„, ๋ฏธ.. (8)
    • Lecture ๐Ÿง‘โ€๐Ÿซ (16)
      • Hugging Face Course (1)
      • Coursera (15)
    • Insight ๐Ÿ˜Ž (10)
    • Research & Project ๐Ÿ”ฌ (2)

์ธ๊ธฐ ๊ธ€

์ตœ๊ทผ ๊ธ€

๊ณต์ง€์‚ฌํ•ญ

  • ๋ธ”๋กœ๊ทธ ๊ณต์ง€์‚ฌํ•ญ - ๋ชจ๋ฐ”์ผ ์ˆ˜์‹ ๊นจ์ง

ํƒœ๊ทธ

  • transformer
  • Vicuna Evaluation
  • open-source model
  • ChatGPT
  • context window
  • closed-source
  • Vicuna
  • RLHF
  • proprietary model
  • closed-source model
  • scaling law
  • Open-source
  • context length
  • Evaluation Metric
  • MT-Bench
  • LLM
  • Chinchilla
  • LLAMA2
  • LM
  • GPT-4
hELLO ยท Designed By ์ •์ƒ์šฐ.
Cartinoe
[Machine Learning] Neural Networks
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๊ฐœ์ธ์ •๋ณด

  • ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ
  • ํฌ๋Ÿผ
  • ๋กœ๊ทธ์ธ

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.