物体检测算法通常在输入图像中采样大量区域,判断这些区域是否包含感兴趣的物体,并调整区域的边界,从而更准确地预测物体的真实边界 框。不同的模型可能采用不同的区域采样方案。在这里,我们介绍其中一种方法:它生成多个以每个像素为中心的具有不同比例和纵横比的边界框。这些边界框称为锚框。我们将在14.7 节设计一个基于锚框的目标检测模型。
首先,让我们修改打印精度以获得更简洁的输出。
14.4.1。生成多个锚框
假设输入图像的高度为h和宽度 w. 我们以图像的每个像素为中心生成具有不同形状的锚框。让规模成为s∈(0,1]纵横比(宽高比)为 r>0. 那么anchor box的宽高分别是hsr和 hs/r, 分别。请注意,当中心位置给定时,将确定一个已知宽度和高度的锚框。
为了生成多个不同形状的锚框,让我们设置一系列尺度s1,…,sn和一系列纵横比 r1,…,rm. 当以每个像素为中心使用这些尺度和纵横比的所有组合时,输入图像将总共有whnm锚箱。虽然这些anchor boxes可能会覆盖所有的ground-truth bounding boxes,但是计算复杂度很容易过高。在实践中,我们只能考虑那些包含s1或者r1:
也就是说,以同一个像素为中心的anchor boxes的个数为 n+m−1. 对于整个输入图像,我们将生成总共 wh(n+m−1)锚箱。
上面生成anchor boxes的方法是在下面的multibox_prior
函数中实现的。我们指定输入图像、比例列表和纵横比列表,然后此函数将返回所有锚框。
#@save
def multibox_prior(data, sizes, ratios):
"""Generate anchor boxes with different shapes centered on each pixel."""
in_height, in_width = data.shape[-2:]
device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
boxes_per_pixel = (num_sizes + num_ratios - 1)
size_tensor = torch.tensor(sizes, device=device)
ratio_tensor = torch.tensor(ratios, device=device)
# Offsets are required to move the anchor to the center of a pixel. Since
# a pixel has height=1 and width=1, we choose to offset our centers by 0.5
offset_h, offset_w = 0.5, 0.5
steps_h = 1.0 / in_height # Scaled steps in y axis
steps_w = 1.0 / in_width # Scaled steps in x axis
# Generate all center points for the anchor boxes
center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
shift_y, shift_x = torch.meshgrid(center_h, center_w, indexing='ij')
shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)
# Generate `boxes_per_pixel` number of heights and widths that are later
# used to create anchor box corner coordinates (xmin, xmax, ymin, ymax)
w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]),
sizes[0] * torch.sqrt(ratio_tensor[1:])))\
* in_height / in_width # Handle rectangular inputs
h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]),
sizes[0] / torch.sqrt(ratio_tensor[1:])))
# Divide by 2 to get half height and half width
anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
in_height * in_width, 1) / 2
# Each center point will have `boxes_per_pixel` number of anchor boxes, so
# generate a grid of all anchor box centers with `boxes_per_pixel` repeats
out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
dim=1).repeat_interleave(boxes_per_pixel, dim=0)
output = out_grid + anchor_manipulations
return output.unsqueeze(0)
#@save
def multibox_prior(data, sizes, ratios):
"""Generate anchor boxes with different shapes centered on each pixel."""
in_height, in_width = data.shape[-2:]
device, num_sizes, num_ratios = data.ctx, len(sizes), len(ratios)
boxes_per_pixel = (num_sizes + num_ratios - 1)
size_tensor = np.array(sizes, ctx=device)
ratio_tensor = np.array(ratios, ctx=device)
# Offsets are required to move the anchor to the center of a pixel. Since
# a pixel has height=1 and width=1, we choose to offset our centers by 0.5
offset_h, offset_w = 0.5, 0.5
steps_h = 1.0 / in_height # Scaled steps in y-axis
steps_w = 1.0 / in_width # Scaled steps in x-axis
# Generate all center points for the anchor boxes
center_h = (np.arange(in_height, ctx=device) + offset_h) * steps_h
center_w = (np.arange(in_width, ctx=device) + offset_w) * steps_w
shift_x, shift_y = np.meshgrid(center_w, center_h)
shift_x, shift_y = shift_x.reshape(-1), shift_y.reshape(-1)
# Generate `boxes_per_pixel` number of heights and widths that are later
# used to create anchor box corner coordinates (xmin, xmax, ymin, ymax)
w = np.concatenate((size_tensor * np.sqrt(ratio_tensor[0]),
sizes[0] * np.sqrt(ratio_tensor[1:]))) \
*