Introduction

Cost Function

the cost function of a single sample 𝑖:

(1/2)[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)]²

the cost function of all samples:

𝐽(𝜃₀, …, 𝜃_𝑘) = (1/2𝑛)𝛴_{1≤𝑖≤𝑛}[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)]²

Goal

minimize 𝐽(𝜃₀, …, 𝜃_𝑘) with respect to the regression coefficients {𝜃₀, …, 𝜃_𝑘} using gradient descent

Gradient Descent

repeat until convergence:

𝜃₀← 𝜃₀- 𝛼·(1/𝑚)𝛴_{1≤𝑖≤𝑛}[[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)] · 𝑓₁(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖))]
…
𝜃_𝑘← 𝜃_𝑘- 𝛼·(1/𝑚)𝛴_{1≤𝑖≤𝑛}[[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)] · 𝑓_𝑘(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖))]

derivation:

Click here to expand...

generic gradient descent algorithm:

repeat until convergence:

𝜃₀← 𝜃₀- 𝛼·(𝛿/𝛿𝜃₀)𝐽(𝜃₀, …, 𝜃_𝑘)

…

𝜃_𝑘← 𝜃_𝑘- 𝛼·(𝛿/𝛿𝜃_𝑘)𝐽(𝜃₀, …, 𝜃_𝑘)

taking the partial derivative of 𝐽(𝜃₀, …, 𝜃_𝑘) wrt to 𝜃_𝑖:

(𝛿/𝛿𝜃_𝑖)𝐽(𝜃₀, …, 𝜃_𝑘) = (𝛿/𝛿𝜃_𝑖)(1/2𝑚)𝛴_{1≤𝑖≤𝑛}[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)]²

(𝛿/𝛿𝜃_𝑖)𝐽(𝜃₀, …, 𝜃_𝑘) = (1/𝑚)𝛴_{1≤𝑖≤𝑛}[[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)] · (𝛿/𝛿𝜃_𝑖)ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖))]

modified gradient descent algorithm:

repeat until convergence:

𝜃₀← 𝜃₀- 𝛼·(1/𝑚)𝛴_{1≤𝑖≤𝑛}[[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)] · (𝛿/𝛿𝜃₀)ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖))]

…

𝜃_𝑘← 𝜃_𝑘- 𝛼·(1/𝑚)𝛴_{1≤𝑖≤𝑛}[[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)] · (𝛿/𝛿𝜃_𝑘)ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖))]

partial derivative of ℎ(𝑥₁, …, 𝑥_𝑘) wrt to 𝜃_𝑗:

(𝛿/𝛿𝜃_𝑗)ℎ(𝑥₁, …, 𝑥_𝑘) = (𝛿/𝛿𝜃_𝑗) [𝜃₀+ 𝜃₁𝑓₁(𝑥₁, …, 𝑥_𝑘) + … + 𝜃_𝑘𝑓_𝑘(𝑥₁, …, 𝑥_𝑘)]

(𝛿/𝛿𝜃_𝑗)ℎ(𝑥₁, …, 𝑥_𝑘) = 𝑓_𝑗(𝑥₁, …, 𝑥_𝑘)

FINAL modified gradient descent algorithm:

repeat until convergence:

𝜃₀← 𝜃₀- 𝛼·(1/𝑚)𝛴_{1≤𝑖≤𝑛}[[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)] · 𝑓₁(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖))]

…

𝜃_𝑘← 𝜃_𝑘- 𝛼·(1/𝑚)𝛴_{1≤𝑖≤𝑛}[[ℎ(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)) - 𝑦^(𝑖)] · 𝑓_𝑘(𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖))]

Gradient Descent Variants

／var／log marcus chiu

Explorer

LR - Methods Estimating Unknown Coefficients - Method of Least Squares (Gradient Descent)

Introduction

Next

Cost Function

Goal

Gradient Descent

generic gradient descent algorithm:

taking the partial derivative of 𝐽(𝜃₀, …, 𝜃_𝑘) wrt to 𝜃_𝑖:

modified gradient descent algorithm:

partial derivative of ℎ(𝑥₁, …, 𝑥_𝑘) wrt to 𝜃_𝑗:

FINAL modified gradient descent algorithm:

Gradient Descent Variants

／var／logmarcus chiu

Explorer

LR - Methods Estimating Unknown Coefficients - Method of Least Squares (Gradient Descent)

Introduction

Next

Cost Function

Goal

Gradient Descent

generic gradient descent algorithm:

taking the partial derivative of 𝐽(𝜃0, …, 𝜃𝑘) wrt to 𝜃𝑖:

modified gradient descent algorithm:

partial derivative of ℎ(𝑥1, …, 𝑥𝑘) wrt to 𝜃𝑗:

FINAL modified gradient descent algorithm:

Gradient Descent Variants

／var／log marcus chiu

taking the partial derivative of 𝐽(𝜃₀, …, 𝜃_𝑘) wrt to 𝜃_𝑖:

partial derivative of ℎ(𝑥₁, …, 𝑥_𝑘) wrt to 𝜃_𝑗: