电子说
ResNet 显着改变了如何在深度网络中参数化函数的观点。DenseNet(密集卷积网络)在某种程度上是对此的逻辑延伸 (Huang et al. , 2017)。DenseNet 的特点是每一层都连接到所有前面的层的连接模式和连接操作(而不是 ResNet 中的加法运算符)以保留和重用早期层的特征。要了解如何得出它,让我们稍微绕道数学。
import torch from torch import nn from d2l import torch as d2l
from mxnet import init, np, npx from mxnet.gluon import nn from d2l import mxnet as d2l npx.set_np()
import jax from flax import linen as nn from jax import numpy as jnp from d2l import jax as d2l
import tensorflow as tf from d2l import tensorflow as d2l
8.7.1. 从 ResNet 到 DenseNet
回忆一下函数的泰勒展开式。对于这一点x=0 它可以写成
(8.7.1)f(x)=f(0)+x⋅[f′(0)+x⋅[f″(0)2!+x⋅[f‴(0)3!+…]]].
关键是它将函数分解为越来越高阶的项。同样,ResNet 将函数分解为
(8.7.2)f(x)=x+g(x).
也就是说,ResNet分解f分为一个简单的线性项和一个更复杂的非线性项。如果我们想捕获(不一定要添加)两个术语以外的信息怎么办?一种这样的解决方案是 DenseNet (Huang等人,2017 年)。
图 8.7.1 ResNet(左)和 DenseNet(右)在跨层连接中的主要区别:加法的使用和连接的使用。
如图 8.7.1所示,ResNet 和 DenseNet 的主要区别在于后者的输出是 连接的(表示为[,]) 而不是添加。结果,我们从x在应用越来越复杂的函数序列后,它的值:
(8.7.3)x→[x,f1(x),f2([x,f1(x)]),f3([x,f1(x),f2([x,f1(x)])]),…].
最后,将所有这些功能组合在 MLP 中,再次减少特征数量。就实现而言,这非常简单:我们不是添加术语,而是将它们连接起来。DenseNet 这个名字源于变量之间的依赖图变得非常密集这一事实。这种链的最后一层与前面的所有层紧密相连。密集连接如图 8.7.2所示 。
图 8.7.2 DenseNet 中的密集连接。注意维度如何随着深度增加。
构成 DenseNet 的主要组件是密集块和 过渡层。前者定义输入和输出如何连接,而后者控制通道的数量,使其不会太大,因为扩展 x→[x,f1(x),f2([x,f1(x)]),…] 可以是相当高维的。
8.7.2. 密集块
DenseNet 使用改进的 ResNet 的“批量归一化、激活和卷积”结构(参见第 8.6 节中的练习 )。首先,我们实现这个卷积块结构。
def conv_block(num_channels): return nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.LazyConv2d(num_channels, kernel_size=3, padding=1))
def conv_block(num_channels): blk = nn.Sequential() blk.add(nn.BatchNorm(), nn.Activation('relu'), nn.Conv2D(num_channels, kernel_size=3, padding=1)) return blk
class ConvBlock(nn.Module): num_channels: int training: bool = True @nn.compact def __call__(self, X): Y = nn.relu(nn.BatchNorm(not self.training)(X)) Y = nn.Conv(self.num_channels, kernel_size=(3, 3), padding=(1, 1))(Y) Y = jnp.concatenate((X, Y), axis=-1) return Y
class ConvBlock(tf.keras.layers.Layer): def __init__(self, num_channels): super(ConvBlock, self).__init__() self.bn = tf.keras.layers.BatchNormalization() self.relu = tf.keras.layers.ReLU() self.conv = tf.keras.layers.Conv2D( filters=num_channels, kernel_size=(3, 3), padding='same') self.listLayers = [self.bn, self.relu, self.conv] def call(self, x): y = x for layer in self.listLayers.layers: y = layer(y) y = tf.keras.layers.concatenate([x,y], axis=-1) return y
密集块由多个卷积块组成,每个卷积块使用相同数量的输出通道。然而,在前向传播中,我们在通道维度上连接每个卷积块的输入和输出。惰性评估允许我们自动调整维度。
class DenseBlock(nn.Module): def __init__(self, num_convs, num_channels): super(DenseBlock, self).__init__() layer = [] for i in range(num_convs): layer.append(conv_block(num_channels)) self.net = nn.Sequential(*layer) def forward(self, X): for blk in self.net: Y = blk(X) # Concatenate input and output of each block along the channels X = torch.cat((X, Y), dim=1) return X
class DenseBlock(nn.Block): def __init__(self, num_convs, num_channels): super().__init__() self.net = nn.Sequential() for _ in range(num_convs): self.net.add(conv_block(num_channels)) def forward(self, X): for blk in self.net: Y = blk(X) # Concatenate input and output of each block along the channels X = np.concatenate((X, Y), axis=1) return X
class DenseBlock(nn.Module): num_convs: int num_channels: int training: bool = True def setup(self): layer = [] for i in range(self.num_convs): layer.append(ConvBlock(self.num_channels, self.training)) self.net = nn.Sequential(layer) def __call__(self, X): return self.net(X)
class DenseBlock(tf.keras.layers.Layer): def __init__(self, num_convs, num_channels): super(DenseBlock, self).__init__() self.listLayers = [] for _ in range(num_convs): self.listLayers.append(ConvBlock(num_channels)) def call(self, x): for layer in self.listLayers.layers: x = layer(x) return x
在下面的示例中,我们定义了一个DenseBlock具有 10 个输出通道的 2 个卷积块的实例。当使用 3 个通道的输入时,我们将得到一个输出3+10+10=23渠道。卷积块通道数控制输出通道数相对于输入通道数的增长。这也称为增长率。
blk = DenseBlock(2, 10) X = torch.randn(4, 3, 8, 8) Y = blk(X) Y.shape
torch.Size([4, 23, 8, 8])
blk = DenseBlock(2, 10) X = np.random.uniform(size=(4, 3, 8, 8)) blk.initialize() Y = blk(X) Y.shape
(4, 23, 8, 8)
blk = DenseBlock(2, 10) X = jnp.zeros((4, 8, 8, 3)) Y = blk.init_with_output(d2l.get_key(), X)[0] Y.shape
(4, 8, 8, 23)
blk = DenseBlock(2, 10) X = tf.random.uniform((4, 8, 8, 3)) Y = blk(X) Y.shape
TensorShape([4, 8, 8, 23])
8.7.3. 过渡层
由于每个密集块都会增加通道的数量,因此添加太多通道会导致模型过于复杂。过渡层用于控制模型的复杂性。它通过使用一个减少通道的数量1×1卷积。此外,它通过步幅为 2 的平均池将高度和宽度减半。
def transition_block(num_channels): return nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.LazyConv2d(num_channels, kernel_size=1), nn.AvgPool2d(kernel_size=2, stride=2))
def transition_block(num_channels): blk = nn.Sequential() blk.add(nn.BatchNorm(), nn.Activation('relu'), nn.Conv2D(num_channels, kernel_size=1), nn.AvgPool2D(pool_size=2, strides=2)) return blk
class TransitionBlock(nn.Module): num_channels: int training: bool = True @nn.compact def __call__(self, X): X = nn.BatchNorm(not self.training)(X) X = nn.relu(X) X = nn.Conv(self.num_channels, kernel_size=(1, 1))(X) X = nn.avg_pool(X, window_shape=(2, 2), strides=(2, 2)) return X
class TransitionBlock(tf.keras.layers.Layer): def __init__(self, num_channels, **kwargs): super(TransitionBlock, self).__init__(**kwargs) self.batch_norm = tf.keras.layers.BatchNormalization() self.relu = tf.keras.layers.ReLU() self.conv = tf.keras.layers.Conv2D(num_channels, kernel_size=1) self.avg_pool = tf.keras.layers.AvgPool2D(pool_size=2, strides=2) def call(self, x): x = self.batch_norm(x) x = self.relu(x) x = self.conv(x) return self.avg_pool(x)
将具有 10 个通道的过渡层应用于前面示例中的密集块的输出。这将输出通道的数量减少到 10,并将高度和宽度减半。
blk = transition_block(10) blk(Y).shape
torch.Size([4, 10, 4, 4])
blk = transition_block(10) blk.initialize() blk(Y).shape
(4, 10, 4, 4)
blk = TransitionBlock(10) blk.init_with_output(d2l.get_key(), Y)[0].shape
(4, 4, 4, 10)
blk = TransitionBlock(10) blk(Y).shape
TensorShape([4, 4, 4, 10])
8.7.4. DenseNet 模型
接下来,我们将构建一个 DenseNet 模型。DenseNet 首先使用与 ResNet 中相同的单卷积层和最大池化层。
class DenseNet(d2l.Classifier): def b1(self): return nn.Sequential( nn.LazyConv2d(64, kernel_size=7, stride=2, padding=3), nn.LazyBatchNorm2d(), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
class DenseNet(d2l.Classifier): def b1(self): net = nn.Sequential() net.add(nn.Conv2D(64, kernel_size=7, strides=2, padding=3), nn.BatchNorm(), nn.Activation('relu'), nn.MaxPool2D(pool_size=3, strides=2, padding=1)) return net
class DenseNet(d2l.Classifier): num_channels: int = 64 growth_rate: int = 32 arch: tuple = (4, 4, 4, 4) lr: float = 0.1 num_classes: int = 10 training: bool = True def setup(self): self.net = self.create_net() def b1(self): return nn.Sequential([ nn.Conv(64, kernel_size=(7, 7), strides=(2, 2), padding='same'), nn.BatchNorm(not self.training), nn.relu, lambda x: nn.max_pool(x, window_shape=(3, 3), strides=(2, 2), padding='same') ])
class DenseNet(d2l.Classifier): def b1(self): return tf.keras.models.Sequential([ tf.keras.layers.Conv2D( 64, kernel_size=7, strides=2, padding='same'), tf.keras.layers.BatchNormalization(), tf.keras.layers.ReLU(), tf.keras.layers.MaxPool2D( pool_size=3, strides=2, padding='same')])
然后,类似于 ResNet 使用的由残差块组成的四个模块,DenseNet 使用四个密集块。与 ResNet 类似,我们可以设置每个密集块中使用的卷积层数。这里,我们设置为4,与8.6节中的ResNet-18模型一致。此外,我们将密集块中卷积层的通道数(即增长率)设置为 32,因此每个密集块将添加 128 个通道。
在 ResNet 中,每个模块之间的高度和宽度通过步长为 2 的残差块减少。这里,我们使用过渡层将高度和宽度减半,并将通道数减半。与 ResNet 类似,在最后连接一个全局池化层和一个全连接层以产生输出。
@d2l.add_to_class(DenseNet) def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4), lr=0.1, num_classes=10): super(DenseNet, self).__init__() self.save_hyperparameters() self.net = nn.Sequential(self.b1()) for i, num_convs in enumerate(arch): self.net.add_module(f'dense_blk{i+1}', DenseBlock(num_convs, growth_rate)) # The number of output channels in the previous dense block num_channels += num_convs * growth_rate # A transition layer that halves the number of channels is added # between the dense blocks if i != len(arch) - 1: num_channels //= 2 self.net.add_module(f'tran_blk{i+1}', transition_block( num_channels)) self.net.add_module('last', nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(), nn.LazyLinear(num_classes))) self.net.apply(d2l.init_cnn)
@d2l.add_to_class(DenseNet) def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4), lr=0.1, num_classes=10): super(DenseNet, self).__init__() self.save_hyperparameters() self.net = nn.Sequential() self.net.add(self.b1()) for i, num_convs in enumerate(arch): self.net.add(DenseBlock(num_convs, growth_rate)) # The number of output channels in the previous dense block num_channels += num_convs * growth_rate # A transition layer that halves the number of channels is added # between the dense blocks if i != len(arch) - 1: num_channels //= 2 self.net.add(transition_block(num_channels)) self.net.add(nn.BatchNorm(), nn.Activation('relu'), nn.GlobalAvgPool2D(), nn.Dense(num_classes)) self.net.initialize(init.Xavier())
@d2l.add_to_class(DenseNet) def create_net(self): net = self.b1() for i, num_convs in enumerate(self.arch): net.layers.extend([DenseBlock(num_convs, self.growth_rate, training=self.training)]) # The number of output channels in the previous dense block num_channels = self.num_channels + (num_convs * self.growth_rate) # A transition layer that halves the number of channels is added # between the dense blocks if i != len(self.arch) - 1: num_channels //= 2 net.layers.extend([TransitionBlock(num_channels, training=self.training)]) net.layers.extend([ nn.BatchNorm(not self.training), nn.relu, lambda x: nn.avg_pool(x, window_shape=x.shape[1:3], strides=x.shape[1:3], padding='valid'), lambda x: x.reshape((x.shape[0], -1)), nn.Dense(self.num_classes) ]) return net
@d2l.add_to_class(DenseNet) def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4), lr=0.1, num_classes=10): super(DenseNet, self).__init__() self.save_hyperparameters() self.net = tf.keras.models.Sequential(self.b1()) for i, num_convs in enumerate(arch): self.net.add(DenseBlock(num_convs, growth_rate)) # The number of output channels in the previous dense block num_channels += num_convs * growth_rate # A transition layer that halves the number of channels is added # between the dense blocks if i != len(arch) - 1: num_channels //= 2 self.net.add(TransitionBlock(num_channels)) self.net.add(tf.keras.models.Sequential([ tf.keras.layers.BatchNormalization(), tf.keras.layers.ReLU(), tf.keras.layers.GlobalAvgPool2D(), tf.keras.layers.Flatten(), tf.keras.layers.Dense(num_classes)]))
8.7.5. 训练
由于我们在这里使用更深的网络,在本节中,我们将输入的高度和宽度从 224 减少到 96 以简化计算。
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
trainer = d2l.Trainer(max_epochs=10) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) with d2l.try_gpu(): model = DenseNet(lr=0.01) trainer.fit(model, data)
8.7.6. 总结与讨论
构成 DenseNet 的主要组件是密集块和过渡层。对于后者,我们需要在组成网络时通过添加再次缩小通道数量的过渡层来控制维数。在跨层连接方面,不同于ResNet将输入和输出相加,DenseNet是在通道维度上拼接输入和输出。虽然这些连接操作重用特征来实现计算效率,但不幸的是它们会导致大量的 GPU 内存消耗。因此,应用 DenseNet 可能需要更高效的内存实现,这可能会增加训练时间 (Pleiss等人,2017 年)。
8.7.7. 练习
为什么我们在过渡层使用平均池而不是最大池?
DenseNet 论文中提到的优点之一是其模型参数比 ResNet 小。为什么会这样?
DenseNet 被诟病的一个问题是它的高内存消耗。
真的是这样吗?尝试将输入形状更改为 224×224凭经验查看实际的 GPU 内存消耗。
你能想到减少内存消耗的替代方法吗?您需要如何更改框架?
实施 DenseNet 论文(Huang等人,2017 年)表 1 中提供的各种 DenseNet 版本。
应用 DenseNet 思想设计基于 MLP 的模型。将其应用于第 5.7 节中的房价预测任务。
全部0条评论
快来发表一下你的评论吧 !