因为 softmax 回归是如此基础,我们相信您应该知道如何自己实现它。在这里,我们限制自己定义模型的 softmax 特定方面,并重用线性回归部分的其他组件,包括训练循环。
import torch
from d2l import torch as d2l
from mxnet import autograd, gluon, np, npx
from d2l import mxnet as d2l
npx.set_np()
from functools import partial
import jax
from flax import linen as nn
from jax import numpy as jnp
from d2l import jax as d2l
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
import tensorflow as tf
from d2l import tensorflow as d2l
4.4.1. Softmax
让我们从最重要的部分开始:从标量到概率的映射。作为复习,请回忆一下在张量中沿特定维度的求和运算符,如第 2.3.6 节和 第 2.3.7 节中所讨论的。给定一个矩阵,X
我们可以对所有元素(默认情况下)或仅对同一轴上的元素求和。该axis
变量让我们计算行和列的总和:
X = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
X.sum(0, keepdims=True), X.sum(1, keepdims=True)
(tensor([[5., 7., 9.]]),
tensor([[ 6.],
[15.]]))
X = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
X.sum(0, keepdims=True), X.sum(1, keepdims=True)
(array([[5., 7., 9.]]),
array([[ 6.],
[15.]]))
X = jnp.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
X.sum(0, keepdims=True), X.sum(1, keepdims=True)
(Array([[5., 7., 9.]], dtype=float32),
Array([[ 6.],
[15.]], dtype=float32))
X = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
tf.reduce_sum(X, 0, keepdims=True), tf.reduce_sum(X, 1, keepdims=True)
(<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[5., 7., 9.]], dtype=float32)>,
<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[ 6.],
[15.]], dtype=float32)>)
计算 softmax 需要三个步骤:(i)每一项取幂;(ii) 对每一行求和以计算每个示例的归一化常数;(iii) 将每一行除以其归一化常数,确保结果之和为 1。
(4.4.1)softmax(X)ij=exp(Xij)∑kexp(Xik).
分母的(对数)称为(对数)配分函数。它是在统计物理学中引入的 ,用于对热力学系综中的所有可能状态求和。实现很简单:
def softmax(X):
X_exp = torch.exp(X)
partition = X_exp.sum(1, keepdims=True)
return X_exp / partition # The broadcasting mechanism is applied here
def softmax(X):
X_exp = np.exp(X)
partition = X_exp.sum(1, keepdims=True)
return X_exp / partition # The broadcasting mechanism is applied here
def softmax(X):
X_exp = jnp.exp(X)
partition = X_exp.sum(1, keepdims=True)
return X_exp / partition # The broadcasting mechanism is applied here
def softmax(X):
X_exp = tf.exp(X)
partition = tf.reduce_sum(X_exp, 1, keepdims=True)
return X_exp / partition # The broadcasting mechanism is applied here
对于任何输入X
,我们将每个元素变成一个非负数。每行总和为 1,这是概率所要求的。注意:上面的代码对于非常大或非常小的参数并不稳健。虽然这足以说明正在发生的事情,但您不应 将此代码逐字用于任何严肃的目的。深度学习框架内置了这样的保护,我们将在未来使用内置的 softmax。
X = torch.rand((2, 5))
X_prob = softmax(X)
X_prob, X_prob.sum(1)
(tensor([[0.1560, 0.2128, 0.2260, 0.2372, 0.1680],
[0.1504, 0.2473, 0.1132, 0.2779, 0.2112]]),
tensor([1.0000, 1.0000]))
X = np.random.rand(2, 5)
X_prob = softmax(X)
X_prob, X_prob.sum(1)
(array([[0.17777154, 0.1857739 , 0.20995119, 0.23887765, 0.18762572],
[0.24042214, 0.1757977 , 0.23786479, 0.15572716, 0.19018826]]),
array([1., 1.]))
X = jax.random.uniform(jax.random.PRNGKey(d2l.get_seed()), (2, 5))
X_prob = softmax(X)
X_prob, X_prob.sum(1)
(Array([[0.17380024, 0.13607854, 0.29826194, 0.18967763, 0.20218161],
[0.24212085, 0.19360834, 0.21299706, 0.17635451, 0.17491929]], dtype=float32),
Array([1., 1.], dtype=float32))
X = tf.random.uniform((2, 5))
X_prob = softmax(X)
X_prob, tf.reduce_sum(X_prob, 1)
(<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[0.20415688, 0.19163935, 0.25970557, 0.17480859, 0.16968955],
[0.27490872, 0.21236995, 0.12360045, 0.12381317, 0.2653077 ]],
dtype=float32)>,
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 1.], dtype=float32)>)