Model Compression Techniques

14h

Nvidia shrinks LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory ...

Nature

Deep Learning Network Compression Techniques

Deep learning network compression techniques have emerged as a crucial research area, aiming to reduce the computational and storage requirements of neural networks without significantly compromising ...

The Next Web

Researchers claim this AI model achieves better compression rates than PNGs

Image compression has been one of the constantly evolving challenges in computer science. Programers and researchers are always trying to improve current standards or create new ones to get better ...

Embedded

AI chip features hardware support for transformer models

Perceive, the AI chip startup spun out of Xperi, has released a second chip with hardware support for transformers, including large language models (LLMs) at the edge. The company demonstrated ...

Dark Reading

Intel Discloses Max Severity Bug in Its AI Model Compression Software

Intel has disclosed a maximum severity vulnerability in some versions of its Intel Neural Compressor software for AI model compression. The bug, designated as CVE-2024-22476, provides an ...

TechNewsWorld

Small Changes in AI Models Can Produce Big Energy Savings

Small changes in the large language models (LLMs) at the heart of AI applications can result in substantial energy savings, according to a report released by the United Nations Educational, Scientific ...

EurekAlert!

Beyond bigger models: How efficient multimodal AI is redefining the future of intelligence

A generalized architectural blueprint for building efficient MLLMs. This template achieves efficiency through a combination of component choices and data flow optimization. Key strategies include: (1) ...

Morning Overview on MSN

Scientists build pocket-sized AI brain powered by monkey neurons

A team led by Cold Spring Harbor Laboratory Assistant Professor Benjamin Cowley has compressed a 60-million-parameter ...

TechRadar

“Rewriting the blueprint, not removing bricks”: Multiverse Computing says it can shrink large AI models and cut memory use in half

Spanish AI company Multiverse Computing has released HyperNova 60B 2602, a compressed version of OpenAI’s gpt-oss-120B, and published it for free on Hugging Face. The new version cuts the original ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results