Intel® Gaudi® 2 in Action: How neoAI Delivers GenAI at Enterprise Scale

Eugenie_Wirz · ‎04-25-2025

Authors: Masaki Outsuki, Rahul Unnikrishnan Nair

As generative AI reshapes industries, robust, scalable infrastructure is critical for enterprise-grade AI workloads. neoAI, a Japanese AI startup and member of the Intel® Liftoff for Startups program, recently evaluated Intel’s Gaudi 2 AI accelerators on the Intel® Tiber™ AI Cloud. The Proof of Concept aimed to assess Gaudi 2’s performance on neoAI Chat, their Retrieval Augmented Generation (RAG)-enabled LLM platform, which supports major enterprises like Japan Post Bank and Kyushu Electric Power.

neo AI: LLM AI Chatbot Solution

neoAI offers generative AI applications for enterprise businesses, including its SaaS platform, neoAI Chat, which allows companies to create AI agents without coding and connect their data with various LLMs.

PoC Objectives on Intel® Tiber™ AI Cloud

neoAI targeted three key goals:

1. Concurrency Handling: Test how many concurrent inference requests Gaudi 2 can manage compared to NVIDIA L40S and H100 GPUs.
2. Inference Speed: Benchmark the token generation rate.
3. Software Production Experience: Evaluate the ease of deploying AI workloads on Gaudi 2.

Key Findings

1. Concurrency Performance

Concurrency testing focused on identifying the elbow point, where LLM throughput stops scaling and latency begins increasing. The chart below shows the elbow points for various accelerators:

This chart illustrates that:

L40S (x2) reaches its elbow at 16 concurrent requests.
H100 (x1) at 32 concurrent requests.
Both Intel Gaudi 2 (x2) and H100 (x2) reach the elbow at 64 concurrent requests.
This demonstrates Gaudi 2’s ability to match dual H100s in handling concurrency, validating its scalability for high-demand enterprise AI tasks.

2. Inference Speed

The inference speed comparison at 1 concurrent request is summarized below:

Accelerator Configuration	Tokens/sec
L40S (x2)	23.6
H100 (x2)	65.7
Intel Gaudi 2 (x2)	26.9

The table reveals that while Gaudi 2 slightly outpaces L40S in token/sec, it trails behind dual H100s. However, its concurrency advantage makes it a strong contender for workloads prioritizing parallel processing over single-stream speed.

3. Developer Experience

neoAI’s team reported a smooth production experience, facilitated by ready-to-use Docker images for Gaudi 2. The setup process was straightforward, ensuring efficient PoC execution.

"Intel® Tiber™ AI Cloud helped us achieve our goal of testing Intel Gaudi 2 AI Accelerator's ability to manage multiple concurrent requests. With 96GB of memory, Gaudi 2 performed well, and the ready-to-use Docker images made the process smooth, leading to favorable outcomes and performance."
— Masaki Otsuki, Head of R&D at neoAI.

Ready to take off?

This PoC highlights Intel Gaudi 2’s strong concurrency performance, making it a cost-effective, scalable alternative to traditional GPUs like H100 for enterprise AI workloads. The collaboration under the Intel Liftoff for Startups program enabled neoAI to explore new hardware solutions, demonstrating how Intel empowers startups to scale their AI innovations with advanced infrastructure and tailored support.

Intel ® Liftoff is free, virtual, and open to early-stage AI startups worldwide. No cohorts. No equity. No limits. Apply today!

Related resources

Intel® Tiber™ AI Cloud - Cloud platform for AI development and deployment

Intel® Gaudi® 2 AI accelerator - High-performance AI training processor designed for deep learning workloads

NeoAI Chat

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in