【Blog】Standardization for Advancing Heterogeneous AI Computing Platforms

08 November 2021

New specification provides current system compatibility and a framework for mixed accelerator hardware applications

Alan Chang, Vice President of Technical Operations at Inspur Information

Today, one need only watch a key sporting event or popular television show to be inundated with numerous commercials touting the benefits and future potential that artificial intelligence (AI) holds for humanity. Applications that could hardly be envisioned only a short time ago are now becoming commonplace, and the outlook for the future is that the AI field will continue to grow by leaps and bounds. That said, achieving the promise of AI requires building computing platforms that can deliver high-performance, robustness and scalability while embracing openness to overcome interoperability challenge and set the stage to respond more quickly and cost-effectively to market demands.

Helping to assure interoperability and aid manufactures in meeting the growing demand for AI systems with enhanced capabilities, the Open Compute Project (OCP) engages numerous partners committed to advancing AI computing technology through the utilization of open specifications—its latest project referred to as Open Accelerator Infrastructure (OAI). Drawing on past experience with previous open hardware and software projects, the organization draws participants from all areas of the computing ecosystem, with most recent efforts successfully focused on advancing accelerator technologies to offer more elegant, streamlined and accessible open specifications for advancing AI computing platforms.

A recent roundtable discussion with leaders from OCP and Baidu involved an in-depth exploration of the development and value proposition of OAI that reached some interesting conclusions.

According to Archna Haylock, Community Director, Open Compute Foundation, “Companies today are facing numerous challenges, whether it comes to data center infrastructure, hardware acceleration, or hardware management from the facilities to the rack down to the nodes. What OCP brings to the table is an environment of collaboration to meet these challenges and find a common solution that works across the board and that provides economies of skill to achieve improved efficiencies and cost savings.”

Clearly, a key objective for OAI participants, Baidu and Inspur amongst them, was to simplify the design of the accelerator module. The specification resulting from these efforts is in of itself a technical solution, whereby manufacturers can design their own products based on the OAI specification without having to start from scratch. Much as with other open source software, such as Hadoop, GFS, Linux and others, users can download the code directly and pursue individual development efforts.

In effect, the specification promotes the convergence of different accelerator technologies, such as ASIC, GPU, and FPGA, overcoming incompatibility issues and enabling these technologies to perform under unified hardware standards. In this way, users can replace different chips freely, bringing more options to manufacturers and simplifying the supply side of the accelerator industry. The key technological advantages of OAI are:

  • Comprehensive compatibility, which supports current AI accelerators such as FPGAs, GPUs, and ASICs, as well as future generations of heterogeneous technologies.
  • Supports 12V and 54V power supply, which are 12V and 54V respectively. The maximum power of 12V power supply is 300W, and the maximum power of 54V power supply is 450W-500W.
  • Supports four interconnected topologies, including HCM (for 8-port & 6-port OAM), FC, combined FC/HCM and 4D Hypercube.

One of the first product offering to benefit from the specification’s development is the Baidu X-MAN 4.0 - a jointly developed system with Inspur, which is the latest AI computing systems from a company that continue to be a leading proponent of openness in its product development. The evolution of the OAI specification started with the OpenAPI model specification, with contributions from Facebook, Microsoft and Baidu. From that point, it became clear that there was as a desire to expand the specification to an infrastructure where the whole rack and system could perform with increased interoperability. Working under the auspices of OCP, the OAI subgroup focused on how best to support diversified accelerators. As a result, manufactures are provided greater choice in an open ecosystem that will ultimately bring notable benefits to developers and end users of AI applications.

Richard Ding, AI System Architect from Baidu, also commented: “OCP is a very good platform for the people and users and system integrators, as well as chip providers to work on one stage. For Baidu, OCP was the platform where we could better identify our requirements, discover how we could work together with our partners, even sometimes our competitors, and define a kind of standard that can benefit the entire ecosystem.  Overall it was positive experience resulting the development of our latest full-rack AI computing product, X-MAN 4.0.”

The scope of OAI subgroup’s work included defining the physical modules that include logical aspects such as electrical, mechanical, thermal, management, hardware security, physical serviceability, etc. to produce solutions compatible with traditional existing operation systems and allowing for the creation of frameworks for running heterogeneous accelerator applications. Moving forward there is growing industry consensus that by encouraging the specification’s adoption, and further practical application testing, that ongoing advancements in the AI ecosystem can be achieved through standardization.

Conclusion:

The OAI project is built around the concept of designing a modular architecture that can support different accelerators and multi-system scaling-up interconnecting communication very easily. The task ahead is to promote its application and garner increased support from industry in order to achieve scale both across the high-performance computing ecosystem, as well as vertical markets. As the standard becomes of more practical significance, its actual application can test the advantages and disadvantages of the specifications so that the standard’s technology can be upgraded to meet the needs of real-world computing scenarios based upon AI applications. Inspur is committed to the continued advancement of the OAI standard’s scalability and to supporting its broader market adoption.  

×