Paper https://arxiv.org/abs/1708.02002

Binary Cross Entropy Loss

以二分类交叉熵为例

LCE(p,y)={log(p),y=1log(1p),otherwiseL_{CE}(p,y) = \begin{cases} -log(p) , y =1 \\ -log(1-p) , \text{otherwise} \end{cases}

简化一下,令

pt={p , y=11p , otherwisep_t = \begin{cases} p \ ,\ y = 1 \\ 1-p \ ,\ \text{otherwise} \end{cases}

得到LCE(p,y)=log(pt)L_{CE}(p,y)=-log(p_t) , ptp_t表示ppyy 的接近程度, ptp_t越高,ppyy越接近,分类越准确

平衡类别的交叉熵 Balanced Cross Entropy

目标检测任务中,大多数情况负样本远大于正样本,导致交叉熵难以学习正样本,摆烂直接全都输出负样本

一个朴素的想法就是统计ground truth 中正负样本比例,作为权重加到Loss上。(论文中直接设了个超参α\alpha

LCE(pt)=log(pt)×αL_{CE}(p_t) = -log(p_t) \times \alpha

Focal Loss

无论是设一个超参α\alpha还是每一个样本都统计正负样本比例,都是确定的权重,不够adaptive。

Focal Loss 还是从α\alpha上下手

LFL(pt)=log(pt)×(1pt)γL_{FL}(p_t) = -log(p_t) \times (1-p_t)^{\gamma}

γ>0\gamma \gt 0 为超参数, 当γ=0\gamma = 0时等于Cross Entropy
We propose a novel loss we term the Focal Loss that adds a factor (1 − pt)γ to the standard cross entropy criterion. Setting γ > 0 reduces the relative loss for well-classified examples (pt > .5), putting more focus on hard, misclassified examples. As our experiments will demonstrate, the proposed focal loss enables training highly accurate dense object detectors in the presence of vast numbers of easy background examples.

加上一个与pp相关的权重,因为ptp_t表示准确程度,那么1pt1-p_t就是不准确程度,代表这个分类更难。

当样本分类困难时,1pt1-p_t高,Loss 权重高

当样本分类简单时,1pt1-p_t低,Loss 权重低

由于0<1pt<10\lt 1-p_t \lt 1γ\gamma越高,简单样本的权重越低,困难样本则影响不大,从上图中也能看出来,差异较大的部分在0.1<p<0.60.1\lt p \lt0.6