Lecture ๐Ÿง‘โ€๐Ÿซ/Coursera

[Machine Learning] Bias vs Variance

2023. 3. 27. 15:40

Diagnosing Bias vs Variance

 ์ด ์„น์…˜์—์„œ๋Š” polynomial d์™€ hypothesis์˜ underfitting ํ˜น์€ overfitting์˜ ๊ด€๊ณ„์— ๋Œ€ํ•ด์„œ ์กฐ์‚ฌํ•˜์˜€๋‹ค.

 

  • ์ž˜๋ชป๋œ ์˜ˆ์ธก์— ๊ณตํ—Œํ•˜๋Š” bias์™€ variance๋ฅผ ๊ตฌ๋ถ„ํ•˜์˜€๋‹ค.
  • ๋†’์€ bias๋Š” underfitting์„ ์•ผ๊ธฐํ•˜๊ณ , ๋†’์€ variance๋Š” overfitting์„ ์•ผ๊ธฐํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๋‘˜ ๊ฐ„์˜ ํ™ฉ๊ธˆ ํ‰๊ท ์„ ์ฐพ์•„์•ผ ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

 

 polynomial์˜ degree d๋ฅผ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜๋ก training error๋Š” ๊ฐ์†Œํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. ๋™์‹œ๊ฐ„๋Œ€์— cross validation error๋Š” ์ผ์ • ํฌ์ธํŠธ๊นŒ์ง€๋Š” ๊ฐ์†Œํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๊ณ , ๊ทธ ๋‹ค์Œ์—๋Š” d๊ฐ’์ด ์˜ค๋ฆ„์— ๋”ฐ๋ผ ์ƒ์Šนํ•˜๋ฉด์„œ, convex curve๋ฅผ ๋งŒ๋“ค์–ด ๋‚ธ๋‹ค ((์ตœ์†Ÿ๊ฐ’ ๋˜๋Š” ์ตœ๋Œ“๊ฐ’์ด ํ•˜๋‚˜์ธ ์ปค๋ธŒ)).

 

  • High bias((underfitting)): Jtrain(ฮธ)Jtrain(ฮธ)์™€ JCV(ฮธ)JCV(ฮธ) ๋‘˜ ๋‹ค ๋†’๋‹ค. ๋˜ํ•œ JCV(ฮธ)โ‰ˆJtrain(ฮธ)JCV(ฮธ)โ‰ˆJtrain(ฮธ)
  • High variance((overfitting)): Jtrain(ฮธ)Jtrain(ฮธ)๋Š” ๋‚ฎ๊ณ , JCV(ฮธ)JCV(ฮธ)๋Š” Jtrain(ฮธ)Jtrain(ฮธ) ๋ณด๋‹ค ํ›จ์”ฌ ํฌ๋‹ค.

 

 ์ด๊ฒƒ์„ ์š”์•ฝํ•œ ๊ทธ๋ฆผ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

Regularization and Bias/Variance

 

 ์œ„์˜ ๊ทธ๋ฆผ์—์„œ์ฒ˜๋Ÿผ ฮปฮป๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์ง์„ ์— ๊ฐ€๊น๊ฒŒ fit๋˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ ฮปฮป๊ฐ€ 0์— ๊ทผ์ ‘ํ•จ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ์— overfit๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ์˜ฌ๋ฐ”๋ฅธ ํŒŒ๋ผ๋ฏธํ„ฐ ฮปฮป๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์„๊นŒ? ๊ทธ๋Ÿฌ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ชจ๋ธ๊ณผ ์ •๊ทœํ™” ํ•ญ ฮปฮป๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ ํƒํ•ด์•ผ ํ•œ๋‹ค.

 

  1. lambda ๋ฆฌ์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.
  2. ์„œ๋กœ ๋‹ค๋ฅธ degree ํ˜น์€ variant๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ชจ๋ธ์˜ ์„ธํŠธ๋ฅผ ๋งŒ๋“ ๋‹ค.
  3. ฮปฮป๋ฅผ ๋ฐ˜๋ณตํ•˜๊ณ  ๊ฐ ฮปฮป์— ๋Œ€ํ•ด ๋ชจ๋“  ๋ชจ๋ธ์„ ๊ฑฐ์ณ ์ผ๋ถ€ ฮธฮธ๋ฅผ ํ•™์Šตํ•œ๋‹ค.
  4. ์ •๊ทœํ™” ๋˜๋Š” ฮป=0ฮป=0 ์—†์ด JCV(ฮธ)JCV(ฮธ)์—์„œ ํ•™์Šต๋œ ฮธฮธ๋ฅผ ์‚ฌ์šฉํ•ด์„œ cross validation ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.
  5. cross validation set์—์„œ ๊ฐ€์žฅ ๋‚ฎ์€ ์˜ค์ฐจ๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ์ฝค๋ณด๋ฅผ ์„ ํƒํ•œ๋‹ค.
  6. ์ตœ๊ณ ์˜ ์ฝค๋ณด ฮธฮธ์™€ ฮปฮป๋ฅผ Jtest(ฮธ)Jtest(ฮธ)์— ์ ์šฉํ•ด์„œ ๋ฌธ์ œ์˜ ์ข‹์€ ์ผ๋ฐ˜ํ™”๋ฅผ ๋ณธ๋‹ค.

 

 

Learning Curves

 ๋งค์šฐ ์ ์€ ์ˆ˜์˜ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ((์˜ˆ: 1, 2 ๋˜๋Š” 3))์—์„œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ต์œกํ•˜๋ฉด ํ•ด๋‹น ํฌ์ธํŠธ ์ˆ˜์— ์ •ํ™•ํžˆ ๋‹ฟ๋Š” 2์ฐจ ๊ณก์„ ์„ ํ•ญ์ƒ ์ฐพ์„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‰ฝ๊ฒŒ ์˜ค์ฐจ๊ฐ€ 0์ด ๋œ๋‹ค. ๋”ฐ๋ผ์„œ:

 

  • training set์ด ์ปค์งˆ์ˆ˜๋ก 2์ฐจ ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์˜ค์ฐจ๋Š” ์ปค์ง„๋‹ค.
  • ์˜ค์ฐจ๊ฐ’์€ ํŠน์ • m ๋˜๋Š” training set ํฌ๊ธฐ๊ฐ€ ์ง€๋‚œ ํ›„์— ์•ˆ์ •์— ์ด๋ฅธ๋‹ค.

 

Experiencing high bias

 

  • Low training set size: Jtrain(ฮธ)Jtrain(ฮธ)๋Š” ๋‚ฎ๊ณ , JCV(ฮธ)JCV(ฮธ)๋Š” ๋†’์•„์ง„๋‹ค.
  • Large training set size: $J_{train{(\theta)์™€J_{CV}(\theta)๋Š”๋‘˜๋‹คJ_{train}(\theta) \approx J_{CV}(\theta)$ ์ •๋„์—์„œ ๋†’๋‹ค.

 

 ๋งŒ์•ฝ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด high bias๋ฅผ ๊ฒช๋Š”๋‹ค๋ฉด, ๋” ๋งŽ์€ training data๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ์ด ๋ณ„ ๋„์›€์ด ๋˜์ง€ ์•Š๋Š”๋‹ค.

 

 

Experiencing high variance

 

  • Low training set size: Jtrain(ฮธ)๋Š” ๋‚ฎ๊ณ  JCV(ฮธ)๋Š” ๋†’๋‹ค.
  • Large training set size: Jtrain(ฮธ)๋Š” training set size์™€ ํ•จ๊ป˜ ์ฆ๊ฐ€ํ•˜๊ณ  JCV(ฮธ)๋Š” ๋ ˆ๋ฒจ๋ง ์˜คํ”„ ์—†์ด ๊ณ„์†ํ•ด์„œ ๊ฐ์†Œํ•œ๋‹ค. ๋˜ํ•œ Jtrain(ฮธ)<JCV(ฮธ)์ด์ง€๋งŒ, ๋‘˜ ๊ฐ„์˜ ์ฐจ์ด๋Š” ๋ถ„๋ช…ํžˆ ์กด์žฌํ•œ๋‹ค.

 

 ๋งŒ์•ฝ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด high variance๋ฅผ ๊ฒช๋Š”๋‹ค๋ฉด ๋” ๋งŽ์€ training data๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ์ด ๋„์›€์ด ๋œ๋‹ค.

 

 

Deciding What to do Next Revisited

 ์„ ํƒ ํ”„๋กœ์„ธ์Šค๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ชผ๊ฐœ์ง„๋‹ค. ์šฐ์„  high variance์˜ ๊ฒฝ์šฐ ์ด๋Ÿฐ ๋ฐฉ๋ฒ•๋“ค์„ ์ถ”์ฒœํ•œ๋‹ค.

 

  • ๋” ๋งŽ์€ training example ๊ฐ€์ง€๊ธฐ
  • feature set๋ฅผ ์ค„์ด๊ธฐ
  • ฮป๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ

 

 ๋ฐ˜๋Œ€๋กœ high bias์˜ ๊ฒฝ์šฐ ์ด๋Ÿฐ ๋ฐฉ๋ฒ•๋“ค์„ ์ถ”์ฒœํ•œ๋‹ค.

 

  • feature ์ถ”๊ฐ€ํ•˜๊ธฐ
  • polynomial feature ์ถ”๊ฐ€ํ•˜๊ธฐ
  • ฮป๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ

 

์‹ ๊ฒฝ๋ง ๋„คํŠธ์›Œํฌ ์ง„๋‹จ

 

  • ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์‹ ๊ฒฝ๋ง ๋„คํŠธ์›Œํฌ๋Š” underfitting๋  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ณ„์‚ฐ ๋น„์šฉ์€ ์ ๋‹ค.
  • ๋” ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํฐ ์‹ ๊ฒฝ๋ง ๋„คํŠธ์›Œํฌ๋Š” overfitting๋  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ณ„์‚ฐ ๋น„์šฉ์€ ํฌ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ์ •๊ทœํ™”๋ฅผ ์‚ฌ์šฉ(ฮป๊ฐ’ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ)ํ•˜์—ฌ overfitting์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 ํ•˜๋‚˜์˜ hidden layer์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์€ ์‹œ์ž‘ ๊ธฐ๋ณธ๊ฐ’์ด๋‹ค. cross validation set๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ hidden layer์—์„œ ์‹ ๊ฒฝ๋ง ๋„คํŠธ์›Œํฌ๋ฅผ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ ์ค‘์— ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋ฉด ๋œ๋‹ค.

 

๋ชจ๋ธ ๋ณต์žก๋„ ํšจ๊ณผ

 

  • lower-order polynomial(๋‚ฎ์€ ๋ชจ๋ธ ๋ณต์žก๋„)์€ high bais์™€ low variance๋ฅผ ๊ฐ€์ง„๋‹ค. ์ด ๊ฒฝ์šฐ์— ๋ชจ๋ธ์€ ์ผ๊ด€์ ์œผ๋กœ ์ข‹์ง€ ์•Š์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค.
  • higher-order polynomial(๋†’์€ ๋ชจ๋ธ ๋ณต์žก๋„)์€ training data์—๋Š” ์ž˜ ์ ์šฉ๋˜์ง€๋งŒ, test data์—์„œ๋Š” ์ข‹์ง€ ์•Š์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด๋Š” training data์— ๋Œ€ํ•ด low bias๋ฅผ ๊ฐ€์ง€๊ณ  high variance๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค.
  • ์‹ค์ œ๋กœ, ์ž˜ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ๋ฐ์ดํ„ฐ์— ํ•ฉ๋ฆฌ์ ์œผ๋กœ ์ž˜ ๋งž๋Š” ์ค‘๊ฐ„ ์–ด๋”˜๊ฐ€์— ์žˆ๋Š” ๋ชจ๋ธ์„ ์„ ํƒํ•ด์•ผ ํ•œ๋‹ค.

'Lecture ๐Ÿง‘โ€๐Ÿซ > Coursera' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Machine Learning] Machine Learning Algorithm Application  (0) 2023.03.28
[Machine Learning] Evaluating a Learning Algorithm  (0) 2023.03.27
[Machine Learning] Backpropagation in Practice  (0) 2023.03.27
[Machine Learning] Cost Function & Backpropagation  (0) 2023.03.26
[Machine Learning] Neural Networks  (0) 2023.03.20
'Lecture ๐Ÿง‘โ€๐Ÿซ/Coursera' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [Machine Learning] Machine Learning Algorithm Application
  • [Machine Learning] Evaluating a Learning Algorithm
  • [Machine Learning] Backpropagation in Practice
  • [Machine Learning] Cost Function & Backpropagation
Cartinoe
Cartinoe
Welcome! I'm a student studying about deep learning(NLP) ๐Ÿ˜‰ The goal of my study is to develop a competent LLM helping people!
Cartinoe's paper reviewWelcome! I'm a student studying about deep learning(NLP) ๐Ÿ˜‰ The goal of my study is to develop a competent LLM helping people!
  • faviconinstagram
  • faviconfacebook
  • favicongithub
  • faviconLinkedIn
Cartinoe's paper review
Cartinoe
Cartinoe
Cartinoe's paper review
Cartinoe
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • My Posting (141)
    • Paper Reading ๐Ÿ“œ (113)
      • Natural Language Processing (67)
      • Alignment Problem of LLM (11)
      • Computer Vision (4)
      • Deep Learning (6)
      • multimodal models (17)
      • Mathematics(์„ ํ˜•๋Œ€์ˆ˜, ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„, ๋ฏธ.. (8)
    • Lecture ๐Ÿง‘โ€๐Ÿซ (16)
      • Hugging Face Course (1)
      • Coursera (15)
    • Insight ๐Ÿ˜Ž (10)
    • Research & Project ๐Ÿ”ฌ (2)

์ธ๊ธฐ ๊ธ€

์ตœ๊ทผ ๊ธ€

๊ณต์ง€์‚ฌํ•ญ

  • ๋ธ”๋กœ๊ทธ ๊ณต์ง€์‚ฌํ•ญ - ๋ชจ๋ฐ”์ผ ์ˆ˜์‹ ๊นจ์ง

ํƒœ๊ทธ

  • ChatGPT
  • Vicuna
  • Evaluation Metric
  • LM
  • LLM
  • closed-source
  • Open-source
  • context length
  • proprietary model
  • transformer
  • RLHF
  • context window
  • open-source model
  • Vicuna Evaluation
  • closed-source model
  • Chinchilla
  • scaling law
  • GPT-4
  • LLAMA2
  • MT-Bench
hELLO ยท Designed By ์ •์ƒ์šฐ.
Cartinoe
[Machine Learning] Bias vs Variance
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๊ฐœ์ธ์ •๋ณด

  • ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ
  • ํฌ๋Ÿผ
  • ๋กœ๊ทธ์ธ

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.