计算高效深度学习-算法趋势和机遇.pdf

In Figure 3, we replicate the results of Figure 2 with an entire ResNet-50, rather than individual operations. As shown in the left subplot, factorizing all of the convolutions andlinear layers decreases the FLOP count significantly,but increases the runtime thanks to increased memory bandwidth usage and kemel launch overhead. However, removing the batch normalization ops (as is commonly done during inference) reduces time significantly-despite having almost no impact on FLOP count.A similar pattern lolds for parameter count in the middle plot.

However,in Figure 3 (right),we see that measuring the size of input and output operands correctly orders the different ResNet-50 variants-though it is still not a reliable predictor of runtime.

本文来自知之小站

 

PDF报告已分享至知识星球,微信扫码加入查阅下载3万+精选资料,年享1万+精选更新

(星球内含更多未发布精选报告.其它事宜可联系zzxz_88@163.com)