与词相似度和类比任务一样,我们也可以将预训练词向量应用于情感分析。由于第 16.1 节中的 IMDb 评论数据集 不是很大,使用在大规模语料库上预训练的文本表示可能会减少模型的过度拟合。作为图 16.2.1所示的具体示例 ,我们将使用预训练的 GloVe 模型表示每个标记,并将这些标记表示输入多层双向 RNN 以获得文本序列表示,并将其转换为情感分析输出 (Maas等,2011)。对于相同的下游应用程序,我们稍后会考虑不同的架构选择。
16.2.1。用 RNN 表示单个文本
在文本分类任务中,例如情感分析,变长的文本序列将被转换为固定长度的类别。在下面的BiRNN
类中,虽然文本序列的每个标记都通过嵌入层 ( self.embedding
) 获得其单独的预训练 GloVe 表示,但整个序列由双向 RNN ( self.encoder
) 编码。更具体地说,双向 LSTM 在初始和最终时间步的隐藏状态(在最后一层)被连接起来作为文本序列的表示。然后通过具有两个输出(“正”和“负”)的全连接层 ( self.decoder
) 将该单一文本表示转换为输出类别。
class BiRNN(nn.Module):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = nn.LSTM(embed_size, num_hiddens, num_layers=num_layers,
bidirectional=True)
self.decoder = nn.Linear(4 * num_hiddens, 2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
self.encoder.flatten_parameters()
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs, _ = self.encoder(embeddings)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
encoding = torch.cat((outputs[0], outputs[-1]), dim=1)
outs = self.decoder(encoding)
return outs
class BiRNN(nn.Block):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = rnn.LSTM(num_hiddens, num_layers=num_layers,
bidirectional=True, input_size=embed_size)
self.decoder = nn.Dense(2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs = self.encoder(embeddings)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
encoding = np.concatenate((outputs[0], outputs[-1]), axis=1)
outs = self.decoder(encoding)
return outs
让我们构建一个具有两个隐藏层的双向 RNN 来表示用于情感分析的单个文本。
embed_size, num_hiddens, num_layers, devices = 100, 100, 2, d2l.try_all_gpus()
net = BiRNN(len(vocab), embed_size, num_hiddens, num_layers)
def init_weights(module):
if type(module) == nn.Linear:
nn.init.xavier_uniform_(module.weight)
if type(module) == nn.LSTM:
for param in module._flat_weights_names:
if "weight" in param:
nn.init.xavier_uniform_(module._parameters[param])
net.apply(init_weights);
16.2.2。加载预训练词向量
embed_size
下面我们为词汇表中的标记加载预训练的 100 维(需要与 一致)GloVe 嵌入。
Downloading ../data/glove.6B.100d.zip from http://d2l-data.s3-accelerate.amazonaws.com/glove.6B.100d.zip...
打印词汇表中所有标记的向量形状。
我们使用这些预训练的词向量来表示评论中的标记,并且不会在训练期间更新这些向量。
16.2.3。训练和评估模型
现在我们可以训练双向 RNN 进行情感分析。
loss 0.311, train acc 0.872, test acc 0.850
574.5 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]
loss 0.428, train acc 0.806, test acc 0.791
488.5 examples/sec on [gpu(0), gpu(1)]
我们定义了以下函数来使用经过训练的模型预测文本序列的情绪net
。