• Apple embraces Nvidia GPUs to accelerate LLM inference via its op

    From TechnologyDaily@1337:1/100 to All on Sunday, January 05, 2025 13:30:05
    Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech

    Date:
    Sun, 05 Jan 2025 13:28:00 +0000

    Description:
    Nvidia says collaboration opens up exciting possibilities for future LLM workloads.

    FULL STORY ======================================================================ReDrafte r delivers 2.7x more tokens per second compared to traditional
    auto-regression ReDrafter could reduce latency for users while using fewer GPUs Apple hasn't said when ReDrafter will be deployed on rival AI GPUs from AMD and Intel

    Apple has announced a collaboration with Nvidia to accelerate large language model inference using its open source technology, Recurrent Drafter (or ReDrafter for short).

    The partnership aims to address the computational challenges of auto-regressive token generation, which is crucial for improving efficiency and reducing latency in real-time LLM applications.

    ReDrafter, introduced by Apple in November 2024 , takes a speculative
    decoding approach by combining a recurrent neural network (RNN) draft model with beam search and dynamic tree attention. Apples benchmarks show that this method generates 2.7x more tokens per second compared to traditional auto-regression. Could it extend beyond Nvidia?

    Through its integration into Nvidias TensorRT-LLM framework, ReDrafter
    extends its impact by enabling faster LLM inference on Nvidia GPUs widely
    used in production environments.

    To accommodate ReDrafters algorithms, Nvidia introduced new operators and tweaked existing ones within TensorRT-LLM, making the tech available for any developers looking to optimize performance for large-scale models.

    In addition to the speed improvements, Apple says ReDrafter has the potential to reduce user latency while requiring fewer GPUs. This efficiency not only lowers computational costs but also lessens power consumption, a vital factor for organizations managing large-scale AI deployments.

    While the focus of this collaboration remains on Nvidias infrastructure for now, its possible that similar performance benefits could be extended to
    rival GPUs from AMD or Intel at some point in the future.

    Breakthroughs like this can help improve machine learning efficiency. As Nvidia says, "This collaboration has made TensorRT-LLM more powerful and more flexible, enabling the LLM community to innovate more sophisticated models
    and easily deploy them with TensorRT-LLM to achieve unparalleled performance on Nvidia GPUs. These new features open exciting possibilities, and we
    eagerly anticipate the next generation of advanced models from the community that leverage TensorRT-LLM capabilities, driving further improvements in LLM workloads.

    You can read more about the collaboration with Apple on the Nvidia Developer Technical Blog . You might also like Apple to pay $95 million after Siri unintentionally recorded private convos Apple set to build a server chip to service its own AI Almost everything you want to know about LLMs



    ======================================================================
    Link to news story: https://www.techradar.com/pro/apple-embraces-nvidia-gpus-to-accelerate-llm-inf erence-via-its-open-source-tech-redrafter


    --- Mystic BBS v1.12 A47 (Linux/64)
    * Origin: tqwNet Technology News (1337:1/100)