DBNet
Paper:https://arxiv.org/pdf/1911.08947.pdf
Intro
The major contribution in this paper is the proposed DB module that is differentiable, which makes the process of binarization end-to-end trainable in a CNN.
Mask
检测目标基于mask
Fast
One shot,可以换很小的backbone
Composition
Backbone
Resnet, MobileNet, etc…
DB Module
经过CNN生成两个Mask:Probability map, threshold map
对probability map 应用 threshold map 后得到的就是binary map,即检测结果,可以通过求闭包将其变成polygon,求最小包围矩形
在预测阶段可以去掉threshold map,直接使用probability map加固定threshold
对于多类别问题,head中增加channel即可,即每个分类都有自己的probability map,然后通过n分类probability map得到n个binary map
Data
与PSENet类似,由bbox向内收缩变成mask。
BBox向内收缩D
The offset D of shrinking is computed from the perimeter L and area A of the original polygon.
where r is the shrink ratio, set to 0.4 empirically.
如果文本框很长,这里可能出现收缩后太小,需要限制最小值。可以以文本框高度为基准,如最小不能小于文本框高度40%
Loss
- threshold map使用L1 Loss
- probability map 和 binary map 当作二分类使用BCE Loss
The loss function can be expressed as a weighted sum of the loss for the probability map , the loss for the binary map , and the loss for the threshold map
According to the numeric values of the losses, and are set to 1.0 and 10 respectively.
apply BCE Loss for and , L1 Loss for
DB Function
probability map 根据thresh map 得到binary map这一步是不可微的,
所以提出DB Function(可微分二值化)
B是binary map, P是probability map, T是thresh map
k为超参,根据经验设为50
在交叉熵中Loss Function变成
可计算微分为
Predict
设置固定阈值如0.2,把probability map 中大于这个阈值的点取出来
对连接在一起的区域进行膨胀(预测的是收缩的框),膨胀系数是重要参数,能控制文本框大小
膨胀offset 计算:
where A′ is the area of the shrunk polygon; L′ is the perimeter of the shrunk polygon; r′ is set to 1.5 empirically.
个人思考
阈值为什么是比预测框大一点而且渐变的,如文本框边框上阈值是0.7,阈值向外和向内扩散并减小到边缘变成0.3?
由于Target是Probability Map,这样的动态阈值能够让probability map更好的两级分化,避免出现0.5上下徘徊的情况。对于往外扩散的地方,阈值变小,Label是0,可以把probability压低。对于向内扩散的,阈值变小,Label是1,probability map学习难度降低,可以快速降低Loss使网络收敛,即使文本框内部的阈值变低了,网络快速收敛后还是会趋像更高,probability不会在阈值附近摆烂。
根据实验,文本框内部的probability最终是接近阈值最大值的,如果阈值为30~80,最终probability map大多是80~100,如果是30~70,则probability map为 70~100。预测时用的固定阈值可以设置在0.3