深度学习(1)Python手写Logistics神经网络

对神经网络实现时,为了追求效率一般采用向量化技术,而尽可能的减少显式使用for循环的次数。所谓向量化,就是采用矩阵运算来代替循环,将上一篇文章《深度学习(0)Logistics正向、反向传播推导》,向量化如下,

  • $W=\left[w_{1}, w_{2} \dots w_{m}\right]$
  • $X=\left[x^{(1)}, x^{(2)} \dots x^{(m)}\right]$
  • $B=\left[b, b, \dots b\right]$
  • $Z=\left[z^{(1)}, z^{(2)} \dots z^{(m)}\right]$
  • $A=\left[a^{(1)}, a^{(2)} \dots a^{(m)}\right]$
  • $Y=\left[y^{(1)}, y^{(2)} \dots y^{(m)}\right]$
  • $dW=\left[dw_{1}, dw_{2} \dots dw_{m}\right]^T$

其中,单个样本$x_i$的维度为n,每一批输入样本个数为m。

正向传播

$Z=W^TX+B$

def affine_forward(W, X, B):
Z = np.dot(W.T, X) + B
return Z

$A = \sigma (Z) = sigmoid(Z) = \frac{1}{1+e^{-Z}}$

def sigmoid_forward(Z):
A = 1 / (1 + np.exp(-Z))
return A

损失函数

$A-Y = L(A,Y)=-Y \ln A-\left(1-Y\right) \ln \left(1-A\right)$

def softmax_loss(A, Y):
loss = -np.mean(np.sum(np.dot(Y, np.log(A)) + np.dot(1-Y, np.log(1-A))))
return loss

反向传播

上一篇文章《深度学习(0)Logistics正向、反向传播推导》已经对$dW$和$dB$进行了详细推导,
$$\frac{d J}{d w_{j}}=\frac{1}{m} \sum_{i=1}^{m} \frac{d L^{(i)}}{d a^{(i)}} \frac{d a^{(i)}}{d z^{(i)}} \frac{d z^{(i)}}{d w_{j}} = \frac{1}{m} \sum_{i=1}^{m}\left(a^{(i)}-y^{(i)}\right) x_{j}^{(i)}$$

$$\frac{d J}{d b}=\frac{1}{m} \sum_{i=1}^{m} \frac{d L^{(i)}}{d a^{(i)}} \frac{d a^{(i)}}{d z^{(i)}} \frac{d z^{(i)}}{d b} = \frac{1}{m} \sum_{i=1}^{m}\left(a^{(i)}-y^{(i)}\right)$$

这里从向量化角度再简单推导一下,矩阵转置,$Z^T = (W^TX+B)^T=X^TW+B^T$,两边对$W$求导得$dZ^T=X^TdW$,再左右同时“左乘”$X$于是有,$dW = \frac{1}{m} X dZ^T$。注意$dZ$就是变化的微分,也就是$dZ=A-Y$,于是有,

$dW = \frac{1}{m} X (A-Y)^T$

def dW

$db = \frac{1}{m}||A-Y||$

def db(A, Y):
db = np.mu