Authors: Masaki Outsuki, Rahul Unnikrishnan Nair
As generative AI reshapes industries, robust, scalable infrastructure is critical for enterprise-grade AI workloads. neoAI, a Japanese AI startup and member of the Intel® Liftoff for Startups program, recently evaluated Intel’s Gaudi 2 AI accelerators on the Intel® Tiber™ AI Cloud. The Proof of Concept aimed to assess Gaudi 2’s performance on neoAI Chat, their Retrieval Augmented Generation (RAG)-enabled LLM platform, which supports major enterprises like Japan Post Bank and Kyushu Electric Power.
neo AI: LLM AI Chatbot Solution
neoAI offers generative AI applications for enterprise businesses, including its SaaS platform, neoAI Chat, which allows companies to create AI agents without coding and connect their data with various LLMs.
PoC Objectives on Intel® Tiber™ AI Cloud
neoAI targeted three key goals:
1. Concurrency Handling: Test how many concurrent inference requests Gaudi 2 can manage compared to NVIDIA L40S and H100 GPUs.
2. Inference Speed: Benchmark the token generation rate.
3. Software Production Experience: Evaluate the ease of deploying AI workloads on Gaudi 2.
Key Findings
1. Concurrency Performance
Concurrency testing focused on identifying the elbow point, where LLM throughput stops scaling and latency begins increasing. The chart below shows the elbow points for various accelerators:
This chart illustrates that:
- L40S (x2) reaches its elbow at 16 concurrent requests.
- H100 (x1) at 32 concurrent requests.
- Both Intel Gaudi 2 (x2) and H100 (x2) reach the elbow at 64 concurrent requests.
This demonstrates Gaudi 2’s ability to match dual H100s in handling concurrency, validating its scalability for high-demand enterprise AI tasks.
2. Inference Speed
The inference speed comparison at 1 concurrent request is summarized below:
Accelerator Configuration | Tokens/sec |
L40S (x2) | 23.6 |
H100 (x2) | 65.7 |
Intel Gaudi 2 (x2) | 26.9 |
The table reveals that while Gaudi 2 slightly outpaces L40S in token/sec, it trails behind dual H100s. However, its concurrency advantage makes it a strong contender for workloads prioritizing parallel processing over single-stream speed.
3. Developer Experience
neoAI’s team reported a smooth production experience, facilitated by ready-to-use Docker images for Gaudi 2. The setup process was straightforward, ensuring efficient PoC execution.
"Intel® Tiber™ AI Cloud helped us achieve our goal of testing Intel Gaudi 2 AI Accelerator's ability to manage multiple concurrent requests. With 96GB of memory, Gaudi 2 performed well, and the ready-to-use Docker images made the process smooth, leading to favorable outcomes and performance."
— Masaki Otsuki, Head of R&D at neoAI.
Ready to take off?
This PoC highlights Intel Gaudi 2’s strong concurrency performance, making it a cost-effective, scalable alternative to traditional GPUs like H100 for enterprise AI workloads. The collaboration under the Intel Liftoff for Startups program enabled neoAI to explore new hardware solutions, demonstrating how Intel empowers startups to scale their AI innovations with advanced infrastructure and tailored support.
Intel® Liftoff is free, virtual, and open to early-stage AI startups worldwide. No cohorts. No equity. No limits. Apply today!
Related resources
Intel® Tiber™ AI Cloud - Cloud platform for AI development and deployment
Intel® Gaudi® 2 AI accelerator - High-performance AI training processor designed for deep learning workloads
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.