计算高效深度学习-算法趋势和机遇.pdf

determines the amount of memory consumed by the model and optimizer state. However, like FLOPs, it does not account for most factors affecting cost and runtime. Further,different models with similar parameter counts may need different amounts of work to achieve a certain performarlce level; e.g., increasing the input’s resolution or sequence length increases the compute requirements but does not change the number of parameters.

Carbon Emission. The carbon footprint of DNN training is another useful metric. However, accurately measuring carbon emissions can be challenging.This metric dependshighly on the local electricity infrastructure and current demand; therefore, one cannot easily compare results when experiments are performed in different locations or even in the same location at different times [Schwartz et al.2020].

Electricity Usage. A time- and location-agnostic way toquantify training efficiency is the electricity used during DNN training. GPUs and CPUs can report their current power usage, which can be used to estimate the total electricity consumed during training. Unfortunately, electricity usage is dependent on the hardware used for model training, which makes it difficult to perform a fair comparison across methods implemented by different researchers. Moreover, even for fixed hardware,it is possible to trade offpower consumption and runtime[You et al.2022].

Operand sizes. The total number of activations in a model’s forward pass is a proxy for memory bandwidth con-sumption and can be a useful proxy for runtime [Dollar etal.2021;Radosavovic et al.2020].This metric can be defined rigorously and independent of hardware for a given compute graph.It also decreases when operators are fused,which may or may not be desirable.

本文来自知之小站

 

PDF报告已分享至知识星球,微信扫码加入查阅下载3万+精选资料,年享1万+精选更新

(星球内含更多未发布精选报告.其它事宜可联系zzxz_88@163.com)