Broadcast

Broadcast

Scatter

Scatter

Gather

Gather

Reduce

Reduce

All-to-All

All-to-All

Reduce-Scatter

Reduce-Scatter

All-Gather

All-Gather

All-Reduce

All-Reduce

all-reduce = reduce-scatter + all-gather

Ring AllReduce

ring All-Reduce
ring all-reduce 每一步每个节点都参加了,有点是充分利用带宽,但是缺点是小数据包时延迟突显。

2D-Torus

2D-Torus All-Reduce
优点:步骤更少。缺点:每步节点都不相同,连接无法复用。


Reference

Collective operation - wikipedia