Neural Network
约 319 个字 16 行代码 5 张图片 预计阅读时间 2 分钟
Linear Classifiers cannot deal with non-linear boundaries.
such as two circles with different radius but the same center point.
→ use polar coordinate to create a feature space. \(r\) is the x axis and the \(\theta\) is the y axis.
all points on the same circle is seated on the same vertical line that is parallel to the y axis.
(Before) Linear score function: only a small part of Feature Extraction can adjust itself to better maximizing its ability.
Learn only one template of one category.
(After) Neural Network: raw picture pixel → classification scores
Learn several templates of one category.
Linear score function: \(f = Wx\)
2-layer Neural Network:\(f=W_2\max(0,W_1x)\)
\(W_2 \in \mathbb{R}^{C\times H} \: W_1 \in \mathbb{R}^{H\times D}\: x \in \mathbb{R}^D\)
\(h = W_1x = (\alpha_1 ,\alpha_2,\cdots,\alpha_H)^Tx\)
Element \((i,j)\) of \(W_1\) gives the effect on \(h_i\) from \(x_j\)
Deep Neural Networks: Depth = number of layers = number of Matrix
Width = Size of each layer
Activation Functions:
Without the activation function,we will go back to \(f=W_2W_1x=Wx\) which is linear classifiers.
Activation Functions | Expression | Graph |
---|---|---|
Sigmoid | \(\sigma(x)=\frac{1}{1+e^{-x}}\) | ![]() |
tanh | tanh(x) | ![]() |
ReLU(A good default choice for most problems) | max(0,x) | ![]() |
A simple achievement:
import numpy as np
from numpy.random import randn
N,Din,H,Dout = 64,1000,100,10
x,y = randn(N,Din),randn(N,Dout)
w1,w2 = randn(Din,H),randn(H,Dout)
for t in range(10000):
h = 1.0 / (1.0 + np.exp(-x.dot(w1)))
y_pred = h.dot(w2)
loss = np.square(y_pred - y).sum()
dy_pred = 2.0 * (y_pred - y)
dw2 = h.T.dot(dy_pred)
dh = dy_pred.dot(w2.T)
dw1 = x.T.dot(dh*h*(1-h))
w1 -= 1e-4 * dw1
w2 -= 1e-4 * dw2
Space warping:
Linear transform cannot linearly separate points even in feature space.
Universal Approximation:
use layer bias to move the graph
use many ReLU to approach the function.
to reach 0 or unchanged: slope should be opposite
let coefficient of x be 1,only change the shaping factor of MAX.
Convex Functions:
\(f:X \subset \mathbb{R}^N \rightarrow \mathbb{R}\) is convex if for all \(x_1,x_2 \in X,t\in[0,1],f(tx_1+(1-t)x_2)\leq tf(x_1)+(1-t)f(x_2)\)
convex is easy to optimize