In a milestone for personal computing, Nvidia is enabling better AI on PCs by enabling generative AI processing on Windows PCs using RTX-based graphics processing units (GPUs).
In the past year, generative AI has emerged as a transformative trend. With its rapid growth and increasing accessibility, consumers now have simplified interfaces and user-friendly tools that leverage the power of GPU-optimized AI, machine learning, and high-performance computing (HPC) software.
Nvidia has made this AI revolution happen in data centers with lots of GPUs, and is now bringing RTX-based GPUs to more than 100 million Windows PCs worldwide. The integration of AI into core Windows applications is a five-year journey, with dedicated AI processors called Tensor Cores, found in GeForce RTX and Nvidia RTX GPUs, building AI capabilities into Windows PCs and workstations.
Jesse Clayton, director of product management and product marketing for Windows AI at Nvidia, said in an interview with GamesBeat that we’re at a big moment.
“For AI on PCs, we think this is one of the most important moments in the history of technology. I don’t think it’s an exaggeration to say that AI offers new experiences — for gamers, creators, video streamers, office workers, students, and even casual PC users. It unlocks creativity. And it makes it easier for everyone to do more. AI is embedded in every critical application. And it affects every PC user. It really fundamentally changes the way people use computers.
Data centers were previously notified, TensorRT-LLM, an open source library designed to speed up inference performance for large language models (LLMs), is now making its way to Windows. This library, optimized for Nvidia RTX GPUs, can improve the performance of recent LLMs such as Llama 2 and Code Llama by up to four times.
In addition, Nvidia has released tools to help developers accelerate their LLMs, including scripts that enable compatibility with TensorRT-LLM, TensorRT-optimized open-source models, and a developer reference project that demonstrates the speed and quality of LLM responses.
“What a lot of people don’t realize is that the AI use cases in computing are already firmly established. Nvidia started this five years ago in 2018,” Clayton said. We believed that would be important. So with the introduction of so-called RTX GPUs, we also introduced AI technology for gaming.
Fixed spread demo
TensorRT Acceleration is integrated with Static Diffusion, a popular web UI by the Automatic1111 distribution.
Standard spread takes a text line and creates an image based on it. Creators use them to create some amazing works of art. But fetching each image takes time and computer resources. That means you have to wait for it to complete. Nvidia’s latest GPUs can double performance on a standard spread over previous implementations, and more than seven times faster than Apple’s latest chips. So a machine with a GeForce RTX 4090 graphics card can produce 15 images at a constant spread in the time it takes an Apple machine to do two.
DLSS is based on graphics research where AI takes a low-resolution image and upscales it to a higher resolution, increasing frame rates and helping gamers get more value out of their GPUs. Game developers can also add more visual artistry to their games. There are now over 300 DLSS games and Nvidia just released version 3.5 of the technology.
“Generative AI has reached a point where it opens up a whole new class of use cases with opportunities to bring PC AI into the mainstream,” Clayton said. “So gamers will enjoy AI-powered avatars. Office workers and students use Large Language Models, or LLMs, to draw documents and slides and quickly extract insights from CSV data. Developers use LLMs to help with coding and debugging. And every day users will use LLMs to do everything from skimming web content to planning a trip and eventually using AI as a digital assistant.
Video Super Resolution
Also, the release of RTX Video Super Resolution (VSR) version 1.5 as part of the Game Ready driver further enhances AI-powered capabilities. VSR improves the quality of streamed video content by reducing compression artifacts, sharpening edges and enhancing detail. The latest version of VSR offers even better visual quality with improved models, de-artifacting content played at native resolution, and support for both professional RTX and GeForce RTX 20 series GPUs based on Turing architecture.
This technology has been integrated into the latest Game Ready driver and will be included in the upcoming Nvidia Studio Driver scheduled for release in early November.
The combination of TensorRT-LLM acceleration and LLM capabilities opens up new possibilities in productivity, enabling LLMs to run up to four times faster on RTX-powered Windows PCs. This acceleration improves the user experience for sophisticated LLM use cases such as writing and coding assistants that provide multiple unique auto-completion results at once.
Finding Alan Wake 2
Integrating TensorRT-LLM with other technologies such as retrieval-augmented generation (RAG) allows LLMs to provide targeted responses based on specific datasets.
For example, when asked about Nvidia technology integrations in Alan Wake 2, the LLaMa 2 model responded that the game was not initially announced. However, when RAG was used with recent GeForce news articles, the LLaMa 2 model quickly provided the correct answer, showing the speed and efficiency achieved with TensorRT-LLM acceleration.
If the data already exists in the cloud and the model is already trained on that data, it makes architectural sense to run it in the cloud, Clayton said.
However, if it’s a private data set, or a data set that only you have access to, or the model isn’t trained in the cloud, you have to find another way to do it, he said.
“Retraining models is very challenging from a computational perspective. This allows you to do it without having to go down that route. I’m paying $20 a month to use it right now. [AI services]. How many of these cloud services am I going to pay for if I can do a lot of work locally with a powerful GPU?
Developers interested in developing TensorRT-LLM can download it from Nvidia Developer. Additionally, TensorRT-optimized open-source models and a trained RAG demo are available on GeForce News at ngc.nvidia.com and GitHub.com/NVIDIA.
Competitors such as Intel, Advanced Micro Devices, Qualcomm and Apple are using competing technologies to improve AI in PCs and smart devices. Clayton said these solutions are good for lightweight AI workloads that run on low power. These are similar to Table Stakes AI, he said, and are complimentary to what Nvidia’s GPUs do.
RTX GPUs have 20 to 100 times the performance of CPUs in AI workloads, which is why the technology starts with the GPU. The math at the heart of modern AI is matrix multiplication, and at the heart of Nvidia’s platform are RTX GPUs with tensor cores designed to accelerate matrix multiplication. Today’s GeForce RTX GPUs can compute 1,300 trillion tensor operations per second, making them the fastest PC AI accelerators.
“They also represent the world’s largest install base of dedicated AI hardware, with more than 100 million RTX PC GPUs worldwide,” Clayton said. “So, they really have the performance and flexibility to take on not just today’s tasks, but tomorrow’s AI use cases.”
Your PC can turn to the cloud for any AI tasks that require too much of your PC’s GPU. Today, there are over 400 AI-enabled PC apps and games.
Gamespeed Creed “Where passion meets business,” when covering the sports industry. What does this mean? We want to tell you how news matters to you — not just as a decision maker at a game studio, but as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, Gamespeed helps you learn about the industry and enjoy being involved. Discover our abstracts.