Center Net

Center Net Paper
Github Repo

Simple Intro

One Shot

输出结果直接表示目标

Anchor Free

Fast

可以换很小的backbone
No NMS

Head

输出5通道feature map，tensor shape: $(Batch,Channel=5,Height,Width)$
忽略batch，每个 $H_i,W_i$ 位置的5个channel value

$(p, w, h, offset_x, offset_y)$

表示在 $(W_i+offset_x,H_i+offset_y)$ 处有一个宽 $w$ 高 $h$ 的目标的概率为 $p$

offset：经过backbone下采样（通常为4x）后， $128\times 128$ 的输入变成了 $8\times 8$ ，feature map上一个位置表示原始输入的 $16\times 16$ 区域（如feature map的坐标 $(x, y)$ 映射到原始输入坐标 $(x\times 16,y\times 16)$ ），需要借助offset得到原始输入上的精确坐标

取p高于阈值的位置进行Decode即可得到目标bboxes

Head输出都是回归值，需要预测其他的只需要更改head输出channel即可

如果需要进行多分类检测，直接增加head中p的数量，即 $(p_{class-1}, p_{class-2}, p_{class-3} ,\dots , p_{class-n}, w, h, offset_x, offset_y)$

Decode

在输出的probability channel (HeatMap)中，概率最高的位置就是预测框的中心，但是实际输出的概率不是非0即1的，而是以某个点为中心弥散开来的圆。需要取这个区域中最高的那个点。

Filter low score

首先将概率低于阈值的去掉

NMS

在一个峰值附近可能有很多差不多高分的点，会输出多个框，需要"NMS"，这里直接对 HeapMap 进行Max Pool。 $Position(MaxPool(hm) == hm)$ 即为所求

Rescale

所有的位置是经过下采样的，需要恢复到输入图片的scale。得到最终BBox

$x = Pos_x\times scale + offset_x, \\ y = Pos_y\times scale + offset_y, \\ width = w, \\ height = h \\$

Loss

Focal Loss on heatmap channel.
L1 Loss on the other channels.

Sample

import torch
import torch.nn as nn


class CenterNetHead(nn.Module):
    def __init__(self, in_channels, inner_conv_channels, n_classes):
        super().__init__()
        self.head_in_channels = in_channels
        self.head_conv_channels = inner_conv_channels
        self.p_head_out_channels = n_classes
        self.p_head = nn.Sequential(
            nn.Conv2d(
                self.head_in_channels,
                self.head_conv_channels,
                kernel_size=3,
                padding=1,
                bias=True,
            ),
            nn.ReLU(inplace=True),
            nn.Conv2d(
                self.head_conv_channels,
                self.p_head_out_channels,
                kernel_size=1,
                stride=1,
                padding=0,
                bias=True,
            ),
        )
        self.wh_head = nn.Sequential(
            nn.Conv2d(
                self.head_in_channels,
                self.head_conv_channels,
                kernel_size=3,
                padding=1,
                bias=True,
            ),
            nn.ReLU(inplace=True),
            nn.Conv2d(
                self.head_conv_channels,
                out_channels=2,
                kernel_size=1,
                stride=1,
                padding=0,
            ),
        )
        self.offset_head = nn.Sequential(
            nn.Conv2d(
                self.head_in_channels,
                self.head_conv_channels,
                kernel_size=3,
                padding=1,
                bias=True,
            ),
            nn.ReLU(inplace=True),
            nn.Conv2d(
                self.head_conv_channels,
                out_channels=2,
                kernel_size=1,
                stride=1,
                padding=0,
            ),
        )

    def forward(self, x):
        prob = self.p_head(x)

        # need clamp sigmoid
        # https://github.com/xingyizhou/CenterNet/blob/master/src/lib/models/utils.py#L9
        prob = torch.clamp(torch.sigmoid(prob), min=1e-5, max=1 - 1e-5)
        wh = self.wh_head(x)
        offset = self.offset_head(x)
        return {"prob": prob, "wh": wh, "offset": offset}