Alibaba cuts AI GPU use by 82% with new pooling system— up to 9x increase in output lets 213 GPUs perform like 1,192

cfgaussian@lemmygrad.ml · 12 days ago

Alibaba cuts AI GPU use by 82% with new pooling system— up to 9x increase in output lets 213 GPUs perform like 1,192

CriticalResist8@lemmygrad.ml · 12 days ago

The article mentions “Packing multiple models per GPU”, but also “using a token-level autoscaler to dynamically allocate compute as output is generated, rather than reserving resources at the request level” which I’m not sure what that means but may hint that there are ways to scale this down, possibly.

If not Alibaba then other researchers will eventually get to it.

Alibaba cuts AI GPU use by 82% with new pooling system— up to 9x increase in output lets 213 GPUs perform like 1,192

Alibaba cuts AI GPU use by 82% with new pooling system— up to 9x increase in output lets 213 GPUs perform like 1,192

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system— up to 9x increase in output lets 213 GPUs perform like 1,192