CNN框架(CNN Architectures)

Posted on 2017-11-07 Edited on 2022-09-05 In Deep Learning Waline: Views:

本文来自于CS231N（2017 Spring），将介绍几种较为常见的CNN结构。以下网络均是ImageNet比赛的冠军之作，我们将从网络结构，参数规模，运算量等来描述各个网络的特点。

AlexNet
VGG
GoogLeNet
ResNet

后续将补充以下几种网络：

NiN(Network in Network)
wide ResNet
ResNeXT
stochastic Depth
DenseNet
FractalNet
SqueezeNet

以下是正文。

AlexNet

网络结构

AlexNet 网络的输入大小为：227*227*3，每一层的结构以及参数设置如下：

Layer Type	#Filters	Stride	Pading	OUTPUT SIZE	Parameters
CONV1	#96 @11*11	4	0	555596	11113*96
MAXPOOL1	3*3	2	0	272796	0
NORM1				272796	555596
CONV1	#256 @5*5	1	2	2727256	555596
MAXPOOL2	3*3	2	0	1313256	555596
NORM2				1313256	555596
CONV3	#384 @3*3	1	1	1313384	555596
CONV4	#384 @3*3	1	1	1313384	555596
CONV5	#256 @3*3	1	1	1313256	555596
MAXPOOL3	3*3	2	0	66256	555596
FC6				4096	555596
FC7				4096	555596
FC8				1000	555596

The size of output image is \(\frac{N-Conv+2\times Pading}{stride}+1\)

后续将使用Matlab DL 工具包补充Alexnet实验...

VGGNet

The winner of ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014. ### 网络结构 small filters, deeper networks。将原来8层的AlexNet扩展到了16&19层。卷积层的大小仅仅有3*3，stride=1，pad=1；池化层仅仅有stride=2的2*2的MAXPOOL。以下是其与AlexNet的结构对比图。 VGG

更加具体的，VGG16的网络的参数个数以及内存消耗如下： VGG-details

Q：为何采用更小的CONV？ A：几个3*3的CONV叠加后的接受域和一个7*7大小的CONV的接受域一致，但是与此同时，网络层数变深，引入了更多的非线性，参数数量更少。（Stack of three 3x3 conv (stride 1) layers has same effective receptive field as one 7x7 conv layer，But deeper, more non-linearities. And fewer parameters: \(3\times3^2C^2\) vs. \(7^2C^2\) for C channels per layer）

GoogLeNet

论文地址：PDF
代码地址：CODE

Deeper networks, with computational efficiency。GoogLeNet是ILSVRC'14的图像分类冠军网络，它加入了Inception模块，并且去除了全连接层，大大减少了参数的个数。

22 layers (with weights)
Efficient “Inception” module
No FC layers
Only 5 million parameters! 12x less than AlexNet
ILSVRC’14 classification winner (6.7% top 5 error)

Inception module

精心设计了一个局部网络模块，并且将这些模块叠加构成GoolgeNet。这种经过精心设计的模块就是Inception。（design a good local network topology (network within a network) and then stack these modules on top of each other）。 Inception包含几个接受域不同的CONV核（1*1，3*3，5*5）以及池化操作（3*3）；最终将这些操作后的输出在depth方向串联。以下是两种两种不同的实现方式，左图时原始的inception模块，右图是改进版的inception模块。对于naive inception而言，它面临这运算量巨大的问题。由于池化层的输出会保留原始输入的depth，所以经过CONV&MAXPOOL过后的输出的feature map势必比原始输入的depth更深。 inception-naive 那么如何去解决以上问题呢，一个通常的方式就是降维。我们在每个CONV前加上1*1的CONV（“bottleneck” layers）来减少feature map的维度。所谓的1*1CONV就是在保持输入的空间分辨率不变的情况下来减小depth维度，即通过将不同depth上的feature map进行组合，从而将输入的feature map映射到更低的depth维度上。经过以上操作就可以将运算的操作次数大大降低。 inception-improve

于是GoogLeNet的全貌如下：

ResNet

利用残差连接成的超级深网络。这里有一个何凯明在ICML2016的Tutorial，内容比较详细。ICML 2016 Tutorial on Deep Residual Networks 代码在这里Code: deep-residual-networks

概况

152-layer model for ImageNet
ILSVRC' 15 classification winner (3.57% top 5 error)
Swept all classification and detection competitions in ILSVRC' 15 and COCO' 15!

深度增加带来的问题

deeper-nets-problems 从上图可以发现，当网络层数增加时，训练误差和测试误差都有所下降。这并不符合以往的经验，我们会想，既然网络层数增加了，那么模型参数势必增多，此时会造成过拟合。然而过拟合的表现是：训练误差减小，测试误差增大。但是事实和分析并不吻合。何凯明认为：The problem is an optimization problem, deeper models are harder to optimize。这是一个优化问题，更深的网络更难优化。并且，更深的网络应该至少比浅层网络不差，这是因为我们可以通过拷贝浅层网络+identity mapping（恒等映射）来构造一个更深的网络，这个结构化的方案表明深层网络可以达到和浅层网络一致的性能。

解决方案

resnet-layer Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping. 作者假设：相较于最优化最初的无参照映射（残差函数以输入x作为参照），最优化残差映射是更容易的。利用网络去拟合残差\(F(x)\)，并非直接拟合\(H(x)\)。

整个ResNet框架

Stack residual blocks
Every residual block has two 3x3 conv layers
Periodically, double # of filters and downsample spatially using stride 2 (/2 in each dimension)
Additional conv layer at the beginning
No FC layers at the end (only FC 1000 to output classes)

对于ImageNet比赛而言，ResNet设置的网络深度有34、50、101以及152层。对于层数较多的网络，利用“bottleneck”（类似于GoogLeNet的1*1卷积操作）来提高效率。

总结

论文An Analysis of Deep Neural Network Models for Practical Applications 比较了2016年以来的一些神经网络的规模、运算量、能耗以及精度等项目。 complexity-compare 可以从上图总结出以下几点： - GoogLeNet: most efficient - VGG: Highest memory, most operations - AlexNet: Smaller compute, still memory heavy, lower accuracy - ResNet: Moderate efficiency depending on model, highest accuracy

其他网络变体

后续补充。

疑问

ResNet为何能够使网络层数更深，应如何正确理解残差网络？He是受何启发从而发明了这种结构？
more questions will be added...