PyTorch教程12.11之学习率调度-电子发烧友网

到目前为止，我们主要关注如何更新权重向量的优化算法，而不是更新权重向量的速率。尽管如此，调整学习率通常与实际算法一样重要。有几个方面需要考虑：

最明显的是学习率的大小很重要。如果它太大，优化就会发散，如果它太小，训练时间太长，或者我们最终会得到一个次优的结果。我们之前看到问题的条件编号很重要（例如，参见第 12.6 节了解详细信息）。直观地说，它是最不敏感方向的变化量与最敏感方向的变化量之比。
其次，衰减率同样重要。如果学习率仍然很大，我们可能最终会在最小值附近跳来跳去，因此无法达到最优。12.5 节详细讨论了这一点，我们在12.4 节中分析了性能保证。简而言之，我们希望速率下降，但可能比O(t−12)这将是凸问题的不错选择。
另一个同样重要的方面是初始化。这既涉及参数的初始设置方式（详见第 5.4 节），也涉及它们最初的演变方式。这在热身的绰号下进行，即我们最初开始朝着解决方案前进的速度。一开始的大步骤可能没有好处，特别是因为初始参数集是随机的。最初的更新方向也可能毫无意义。
最后，还有许多执行循环学习率调整的优化变体。这超出了本章的范围。我们建议读者查看 Izmailov等人的详细信息。( 2018 )，例如，如何通过对整个参数路径进行平均来获得更好的解决方案。

鉴于管理学习率需要很多细节，大多数深度学习框架都有自动处理这个问题的工具。在本章中，我们将回顾不同的调度对准确性的影响，并展示如何通过学习率调度器有效地管理它。

12.11.1。玩具问题

我们从一个玩具问题开始，这个问题足够简单，可以轻松计算，但又足够不平凡，可以说明一些关键方面。为此，我们选择了一个稍微现代化的 LeNet 版本（relu而不是 sigmoid激活，MaxPooling 而不是 AveragePooling）应用于 Fashion-MNIST。此外，我们混合网络以提高性能。由于大部分代码都是标准的，我们只介绍基础知识而不进行进一步的详细讨论。如有需要，请参阅第 7 节进行复习。

							%matplotlib inline
import math
import torch
from torch import nn
from torch.optim import lr_scheduler
from d2l import torch as d2l


def net_fn():
  model = nn.Sequential(
    nn.Conv2d(1, 6, kernel_size=5, padding=2), nn.ReLU(),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(6, 16, kernel_size=5), nn.ReLU(),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
    nn.Linear(16 * 5 * 5, 120), nn.ReLU(),
    nn.Linear(120, 84), nn.ReLU(),
    nn.Linear(84, 10))

  return model

loss = nn.CrossEntropyLoss()
device = d2l.try_gpu()

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

# The code is almost identical to `d2l.train_ch6` defined in the
# lenet section of chapter convolutional neural networks
def train(net, train_iter, test_iter, num_epochs, loss, trainer, device,
     scheduler=None):
  net.to(device)
  animator = d2l.Animator(xlabel='epoch', xlim=[0, num_epochs],
              legend=['train loss', 'train acc', 'test acc'])

  for epoch in range(num_epochs):
    metric = d2l.Accumulator(3) # train_loss, train_acc, num_examples
    for i, (X, y) in enumerate(train_iter):
      net.train()
      trainer.zero_grad()
      X, y = X.to(device), y.to(device)
      y_hat = net(X)
      l = loss(y_hat, y)
      l.backward()
      trainer.step()
      with torch.no_grad():
        metric.add(l * X.shape[0], d2l.accuracy(y_hat, y), X.shape[0])
      train_loss = metric[0] / metric[2]
      train_acc = metric[1] / metric[2]
      if (i + 1) % 50 == 0:
        animator.add(epoch + i / len(train_iter),
               (train_loss, train_acc, None))

    test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
    animator.add(epoch+1, (None, None, test_acc))

    if scheduler:
      if scheduler.__module__ == lr_scheduler.__name__:
        # Using PyTorch In-Built scheduler
        scheduler.step()
      else:
        # Using custom defined scheduler
        for param_group in trainer.param_groups:
          param_group['lr'] = scheduler(epoch)

  print(f'train loss {train_loss:.3f}, train acc {train_acc:.3f}, '
     f'test acc {test_acc:.3f}')

							 

							%matplotlib inline
from mxnet import autograd, gluon, init, lr_scheduler, np, npx
from mxnet.gluon import nn
from d2l import mxnet as d2l

npx.set_np()

net = nn.HybridSequential()
net.add(nn.Conv2D(channels=6, kernel_size=5, padding=2, activation='relu'),
    nn.MaxPool2D(pool_size=2, strides=2),
    nn.Conv2D(channels=16, kernel_size=5, activation='relu'),
    nn.MaxPool2D(pool_size=2, strides=2),
    nn.Dense(120, activation='relu'),
    nn.Dense(84, activation='relu'),
    nn.Dense(10))
net.hybridize()
loss = gluon.loss.SoftmaxCrossEntropyLoss()
device = d2l.try_gpu()

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

# The code is almost identical to `d2l.train_ch6` defined in the
# lenet section of chapter convolutional neural networks
def train(net, train_iter, test_iter, num_epochs, loss, trainer, device):
  net.initialize(force_reinit=True, ctx=device, init=init.Xavier())
  animator = d2l.Animator(xlabel='epoch', xlim=[0, num_epochs],
              legend=['train loss', 'train acc', 'test acc'])
  for epoch in range(num_epochs):
    metric = d2l.Accumulator(3) # train_loss, train_acc, num_examples
    for i, (X, y) in enumerate(train_iter):
      X, y = X.as_in_ctx(device), y.as_in_ctx(device)
      with autograd.record():
        y_hat = net(X)
        l = loss(y_hat, y)
      l.backward()
      trainer.step(X.shape[0])
      metric.add(l.sum(), d2l.accuracy(y_hat, y), X.shape[0])
      train_loss = metric[0] / metric[2]
      train_acc = metric[1] / metric[2]
      if (i + 1) % 50 == 0:
        animator.add(epoch + i / len(train_iter),
               (train_loss, train_acc, None))
    test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
    animator.add(epoch + 1, (None, None, test_acc))
  print(f'train loss {train_loss:.3f}, train acc {train_acc:.3f}, '
     f'test acc {test_acc:.3f}')

							 

							%matplotlib inline
import math
import tensorflow as tf
from tensorflow.keras.callbacks import LearningRateScheduler
from d2l import tensorflow as d2l


def net():
  return tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=6, kernel_size=5, activation='relu',
                padding='same'),
    tf.keras.layers.AvgPool2D(pool_size=2, strides=2),
    tf.keras.layers.Conv2D(filters=16, kernel_size=5,
                activation='relu'),
    tf.keras.layers.AvgPool2D(pool_size=2, strides=2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(120, activation='relu'),
    tf.keras.layers.Dense(84, activation='sigmoid'),
    tf.keras.layers.Dense(10)])


batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

# The code is almost identical to `d2l.train_ch6` defined in the
# lenet section of chapter convolutional neural networks
def train(net_fn, train_iter, test_iter, num_epochs, lr,
       device=d2l.try_gpu(), custom_callback = False):
  device_name = 
						

PyTorch教程12.11之学习率调度

12.11.1。玩具问题

PyTorch教程5.5之深度学习中的泛化

PyTorch教程12.1之优化和深度学习

PyTorch教程简介

PyTorch Recipes.zip

PyTorch3D 3D深度学习函数库

Pytorch实现MNIST手写数字识别

Effective PyTorch之 PyTorch基础知识（译）

总降调度相关知识

含风电并网系统的鲁棒区间优化调度模型

智能电网的弹性调控平台任务调度研究

基于强化学习的嵌入式系统LLC调度技术

云环境下HEDSM工作流调度策略综述

可反映用户偏好的多目标任务调度算法

基于LSTM网络的在线学习课程推荐模型

拟态通用运行环境的资源管理调度及框架

复杂施工调度问题的混合整数线性规划模型

基于成对学习和图像聚类的肺癌亚型识别

基于预训练模型和长短期记忆网络的深度学习模型

3小时学习神经网络与深度学习课件下载

在线学习的交互网络模型和质量评价方法

一种全新的综合能源系统动态经济调度方法

电力物联网下分布式的源网荷储协同调度机制

深度模型中的优化与学习课件下载

一种基于机器学习的流簇大小推理模型

一种分布式网络扫描架构和任务调度算法

基于5G边缘计算的资源调度策略Kubernetes

EDA多任务流调度算法实验设计

github上的pytorch学习资料详细说明

RTThread操作系统的调度设计原理是怎样的

如何在实现工业物联网应用中实现多时隙帧调度算法

利用Arm Kleidi技术实现PyTorch优化

PyTorch 数据加载与处理方法

如何使用 PyTorch 进行强化学习

Pytorch深度学习训练的方法

pytorch怎么在pycharm中运行

pytorch环境搭建详细步骤

pytorch和python的关系是什么

PyTorch深度学习开发环境搭建指南

基于PyTorch的卷积核实例应用

pytorch中有神经网络模型吗

PyTorch的介绍与使用案例

tensorflow和pytorch哪个更简单?

如何使用PyTorch建立网络模型

TensorFlow与PyTorch深度学习框架的比较与选择

pytorch用来干嘛的

深度学习框架pytorch介绍

深度学习框架pytorch入门与实践

PyTorch教程-12.11。学习率调度

深度学习框架PyTorch和TensorFlow如何选择

TensorFlow和PyTorch的实际应用比较

PyTorch 的 Autograd 机制和使用

PyTorch开源深度学习框架简介

13个你一定来看看的PyTorch特性！

基于PyTorch的深度学习入门教程之PyTorch的自动梯度计算

基于PyTorch的深度学习入门教程之PyTorch简单知识

基于PyTorch的深度学习入门教程之PyTorch的安装和配置

基于PyTorch的深度学习入门教程之PyTorch重点综合实践

基于PyTorch的深度学习入门教程之使用PyTorch构建一个神经网络

深度学习应用的服务端部署—PyTorch模型部署

为什么学习深度学习需要使用PyTorch和TensorFlow框架

下载排行榜

OC5721欧创芯开关降压型LED恒流驱动器

灵动 ATE 自动测试系统用户使用说明

高性能非隔离交直流转换芯片 PC9403A数据手册

6千伏隔离电源及数字信号四路集成IC：DHS系列

ZX8002D触摸无极调光LED集成IC

WTV380Cx音频流解码芯片资料