How Penguin Solutions Solves Infrastructure Complexity in the Hot AI Market
More than 40% of large-scale enterprises actively use AI, and another 40% are exploring their AI options. According to recent survey data, however, challenges remain, with 38% of IT professionals citing a lack of technology infrastructure as their biggest barrier to AI success.
In a recent Tech Disruptors podcast, host Woo Jin Ho sat down with Penguin Solutions' Chief Technology Officer Phil Pokorny and VP of Global Marketing Mark Seamans to explore how companies can cool infrastructure complexity in a red-hot AI market.
Hot Topics: Price, Performance, and Power
Price
The challenge: AI infrastructure is expensive.
"AI can cost 10-20X the price of a standard server," says Mark. To effectively manage AI spend, two considerations are critical: cost per query and value per workload. Cost per query includes capital spending on servers plus ongoing costs such as power and cooling. Value per workload, meanwhile, speaks to the benefits of solutions such as AI-powered business process automation (BPA) or in-depth data analysis.
The solution: Tailored, proven solutions.
"Penguin Solutions focuses exclusively on the design, build, deployment, and management of AI," says Mark. "While these systems look like standard computers, the entire process of designing and building is different than generic IT." Penguin Solutions' end-to-end approach delivers tailored AI solutions that help ensure alignment between cost and value.
Performance
The challenge: Balancing effort and output.
According to Phil, "Deploying AI can be a steep learning curve for IT shops, and how an AI model can benefit companies may not be obvious." He notes that reliable AI output requires multiple GPUs working together using a finely tuned InfiniBand fabric, while Mark highlights the need for infrastructure expandability to manage growing capacity needs. "Development time between new processors used to be 2-4 years," he says. "Now it's compressed down to half a year."
The solution: Bringing in the experts.
"One of the key services we provide is our factory," says Phil. "We don't ship you raw parts. We do the racking and stacking and testing in our facility." He also points to Penguin's work with Meta. Rather than ask their production team to deploy and manage an HPC cluster, Meta selected Penguin Solutions. Five years and 16,000 GPUs later, the partnership remains strong.
Power
The challenge: Too much and too little.
AI-enabled systems require significant amounts of power and generate substantial amounts of heat. As Phil points out, "The typical data center uses 120-volt power, but this isn't enough for an HPC rack. Companies need to plan for upgrades to 240 or 277-volt." Even with that scalability, however, he notes that companies may find it difficult to purchase the power they need, depending on the capacity of local utility infrastructure.
The heat generated as a byproduct of this electricity usage, meanwhile, requires effective management to limit the risk of hardware damage.
The solution: Penguin AI Factory Validated Solutions
Deploying AI and HPC based on Penguin’s AI Factory validated designs allows organizations to adapt the delivery approach to match the power capabilities of their data center – while concurrently delivering top performance from the solution. Penguin Solutions is also at the forefront of delivering oil-based immersion cooling solutions to allow for very dense server delivery in a manner that can also reduce overall system power consumption.
Keeping Your Cool
As the AI market heats up, executives and IT managers are under pressure to implement intelligent solutions and deliver consistent value. Penguin Solutions can help cool the complexity of AI infrastructure with solutions that deliver on price, performance, and power. Let's get started.