Caffe-MPI Caffe with the support of MPI parallel programing techniques

Product Overview:

 Doctor Jia Yangqing, graduated of UC Berkeley, first opened the source of stand-alone Caffe on the GitHub platform in December 2013.

Since then, Caffe is mainly updated and maintained by contributors from the Berkeley Vision and Learning Center (BVLC).

With strong versatility, high performance and good code readability, Caffe can be applied in face recognition, image classification, object identification, and other image processing tasks, making it one of the most popular deep learning frameworks in the world.

However, along with increasingly complicated training models and rapidly growing training samples, stand-alone operation is already not enough.

To solve this problem, Inspur took the lead in parallel optimization of the Caffe computing framework and realized development of multi-device and multi-card programs. On the International GPU Technical Conference, GTC 2015, Inspur launched the first Caffe-MPI deep-learning framework and opened all the codes (Open-source address:

The Inspur-developed Caffe-MPI is a cluster parallel Caffe with the support of MPI parallel programing techniques, built on the foundation of the BVLC Caffe framework.

Based on the capacity required in the large-scale image trainings of Berkeley’s Caffe, Caffe-MPI can maximize the performance of Caffe in data trainings through parallel data processing and multi-tasking. Inspur Caffe-MPI can run on large-scale cluster platforms, including GPU, KNL and CPU cluster platforms.

With sound inheritance and usability, Caffe-MPI has kept characteristics of the original Caffe, featuring high performance and scalability.



Main features

1. Complete HPC system solution

The hardware composed of Lustre storage, IB network and GPU clusters can realize IO throughput, high-speed IB interconnection and GPU large-scale parallel trainings.


2. High performance and scalability

Simultaneous multi-device and multi-card training can substantially improve performance compared to a single device with only one GPU.

It can be deployed on large-scale training platforms to realize training of large scale data sets. For the GoogleNet model, Caffe-MPI is 10 time faster than the single GPU version. 


 3. Good inheritance and usability

Caffe-MPI has kept all the characteristics of the original Caffe, which is user-friendly, fast, modularized and open, providing best experiences for users. Caffe-MPI is currently attracting wide attention of companies and research institutions from China, India, and the US. With the rapid development of artificial intelligence, some scenarios depicted in science fiction movies are coming closer to reality. If algorithms could exceed man-made limits to interact with the world we live in, real artificial intelligence will come to reality. In this context, deep learning and supercomputers will play a key role.