PB-Level Storage Project of State Administration of Press, Publication, Radio, Film and Television with Leading Boost from Inspur AS10000
Opportunities and Challenges to the China’s Radio and Television Industry out of Integration of 3 Networks
The State Administration of Press, Publication, Radio, Film and Television (SAPPRFT) is responsible for the censoring of programs broadcasted on TV, internet, new media and radio throughout the country. The censoring material, with its variety and quantity, gathers to such immensity that it exerts great difficulty on unified data management. How to integrate material with such volume to realize data storage, management and sharing has always been a technical problem in radio and television-related industries. Now, Inspur Mass Storage System is the first in the field to crack this problem, finally clearing the obstinate hurdle that had obstructed the development of mass data storage management in radio and television industry. The successful application of mass storage technology must be the one that initiates another technical revolution in radio and television industry.
To better censor programs on mediums like cable TV, internet, and new media, etc., to establish and improve safe broadcasting system and provide supervision over it, and to exert management, resource redeployment and emergence response administration over nationwide radio-television transmission networks in event of emergencies, in the second half of 2011, SAPPRFT set up a new censoring agency, with a dozen of sub-agencies deployed in different parts of the country to concert with SAPPRFT for surveillance over the nationwide radio, cable TV, satellite TV, wireless radio, internet and new media, etc. They are the “eyes and ears” of China’s propaganda enterprise.
With the integration of networks and fast development of satellite TV, internet programs and the new media, needs for mass storage system are in huge demand. Take IPTV broadcasting platform as an example. The platform connects with integrated downstream sub-platforms located in various pilot areas. It distributes contents, product, value-added service, EPG information, etc. The storage system is responsible for reception and storage of content submitted by the content platform, and collecting operating data from integrated broadcasting sub-platforms in pilot areas. The data storage volume for now is already 500TB, and data access bandwidth has reached 1GB. This is only from the live stream recording and video-on-demand downloads from about 10 broadcasting platforms. At the beginning of 2012, 42 more cities, including 2 municipalities and 22 province capitals, were added into the range of surveillance. With such a tremendous surveillance volume, it is small wonder a system that is more able for data processing and storage is in urgent need. Inspur Mass Storage System, with its large capacity, high bandwidth and 99.999% reliability, is a perfect solution for the problem that has been plaguing the radio and television industry, such as large number of operational systems, heavy surveillance load, massive concurrent accesses, and data sharing between different business systems, etc. Inspur Mass Storage System is a response to the call of the era, and will definitely bring another revolution for the informatization of radio and television industry.
Storage Pressure from Data in Multitudes
Ever since the integration of 3 networks, radio and television industry is facing an unprecedented opportunity for development. Business of radio and television industry is, from the traditional cable TV, satellite TV and wireless radio, expanding towards emerging areas like internet TV, IPTV and new media, etc. Themes for the development of current radio and television industry are digitalization, networking and high definition. For surveillance agencies of radio and television industry, the highest hurdle for the development of censoring technique is the PB-level data processing and storage. To quote a person-in-charge of the SAPPRFT, “the data volume generated since the integration of 3 networks is even bigger than all those generated in the past 50 years combined. How to store and manage such a huge volume of data is something that we need to focus our minds on for a fairly long period of time in future.”
Difficulty in Data Management due to Coexisting but Separate Business Systems
In recent years, the development of business from the new media such as IPTV, internet TV, mobile TV and mobile multi-media radio and television is booming. Every provincial-level TV stations, the 7 broadcasting platforms approved by the SAPPRFT and local new media TV operators, all are on the track of building their own new media business in an effort to strike a stand on this emerging market. The industry is full of business systems with complex constitutions, separate with each other, and scattered in different places all over the country. To exert effective censoring powering over the emerging new media, the SAPPRFT has no other choice but to build separate censoring systems for each and every one of the new media. However, there are a large part of the data from different business systems overlapping with each other, causing waste of data resources and difficulty in data sharing and unified management between business systems.
For Big Data Comes the Mass Storage Technology
After an in-depth research of the existing storage system adopted by the SAPPRFT, Inspur found problems like undersized storage capacity, difficulty in data sharing between different business systems, poor scalability, inefficient space utilization, high data security risk, encumbrance in data management, etc. are ubiquitous. A mass storage system with large capacity, excellent scalability, low data security risk and high data access bandwidth is in urgent need for the development of business.
The SAPPRFT Mass Storage System for Censoring Agencies is constituted by 3 parts, online storage system, database system, and backup system. The system will be used to carry out data storage for businesslike radio surveillance system, new media system, resources sharing system, information platform business system, satellite TV data system, and broadcasting system. The detailed systematic diagram is as below:
The deployment of Mass Storage System renders the saying of “Information Island” a phrase of the past. It makes possible the data sharing between different business systems, integrates the data from different systems into an orderly fashion for the convenience of storage, improves the security level of data storage and management, and satisfies the expanding need for data from the censoring agencies of the SAPPRFT for years on. During the Stage 1 building of the system in 2012, the PB-level online storage space and offline backup system have been installed in success. As the business of new media develops rapidly in radio and television industry, the SAPPRFT launched the Stage 2 building of the Mass Storage System in June, 2013. On the basis of Stage 1, the second stage has expanded the capacity by nearly 1PB, and the total bandwidth of data transmission has reached 5GB. The completion of the Mass Storage System is a key to problems radio and television industry has been suffering from, like big data storage, big data management and difficulty in data sharing between different operational systems. It’s an extra wing for the fast development of radio and television industry.
Take the censoring system for TV series for example. After the successful deployment and operation of the Mass Storage System, the need for large capacity to store TV series annually submitted for censorship is immediately satisfied. The large storage capacity and excellent data-sharing competence totally solved the problem of basic storage and between-user sharing, headaches that vex users for years. Users start to shift their focus on more practical service of higher level. Efficiency for the censoring process has been considerably raised due to high bandwidth and fast read-write speed, thus reducing cost of manpower and communication. For the system itself, the distributed deployment method, which separates the business module with the data module, has made development and integration a much easier process, simple to construct and easy to understand. The scalability is also effectively improved.
Analysis of Solution Value and Advantage
This solution provides AS10000 storage system with 10 redundant controllers. With this platform in place, the system will be able to enjoy a stable storage bandwidth of 3.5-4GB, and is subject to seamless expansion of its performance and capacity by adding AS10000 controllers and storage units.
The architecture of AS10000 is entirely modularized. The space between controllers are all designed as redundant and distributed. The storage adopts double-control method which is stable and reliable. The connection between controller and data storage unit is, in terms of either interface or cable design, without single point of failure, easy to perform online upgrade for both data processing unit and storage unit, as well as the necessary module expansion.
The subsequent upgrade of the system will make backup of important data an easy task. In case of data loss due to fault in system software or hardware, attack from virus or hackers, or man-made mistakes in operation, the important data could be retrieved from the backup equipment, thus preventing data loss.
The establishment of back-end storage platform from distributed storage AS10000 will achieve the following objects:
1. Secure, Independent and Controllable: The high price of foreign manufacturers is a key deterrence to China’s effort to construct a storage platform for data management and security within a fairly short period of time at low cost; on the other hand, the important data from national resource and security such as ecological monitoring, mineral resource prospecting, disaster control and application of radio and television would have been in great danger if not held by China itself. After years of research and development, Inspur now presents Mass Storage System AS10000, a system that has solved numerous technical problems such as multi-control coordination and global shared cache, etc., to make China, after the US and Japan, the third in the world who is in possession of the core technology in high-end storage which provides data with autonomy, security and controllability.
2. Open Architecture (high scalability): The system is open in terms of the internal constitution of distributed storage. Generally speaking, the distributed storage is consisted of 3 components: controller, front-end network and back-end network. It is extremely easy for every component to adopt the latest technology that has appeared within the field without changing the architecture of the distributed storage, moreover, to expand the storage capacity is as easy as adding a Lego block. This is especially user-friendly for those who are not assured by a predictable developing trend, because they can purchase only a number of controllers and storage units for a start, and make the expansion in times of need without interrupting the normal operation of their business.
3. Self-developed Distributed Operating System: The distributed operating system is the soul of AS10000 storage system. Every directive to the storage will be collected to the distributed operating system for unified coordination and redirection, to be spread to the respective controllers of AS10000 storage system. The advantage of distributed operating system is that there is no difference, either in priority or function, between controllers. All controllers are of exactly the same function to achieve optimal performance. Furthermore, in a system that is fully distributed, failure of any one controller will exert to the whole AS10000 storage influence that is almost negligible. The majority of the system will remain functional.
4. Uniform Namespace: The idea of uniform namespace is not new to many storage manufacturers. In the context of distributed storage, uniform namespace puts emphasis on the uniform namespace under the same file system. For the storage capacity of PB-level, if the uniform namespace is achieved by way of loading several scrolls, the space of which has upper limits, into 1 root directory, the efficiency and performance under the circumstance when storage hotspots turn up will be much lower than to put PB-level storage capacity into the same file system.
5. Easy Management: Currently, the management method adopted by the storage industry is either through the management tool provided by manufacturers or the Web interface. For clients, extra software has to be installed to access the storage space. As the storage capacity grows, the complexity of and manpower deployed into the management of the storage will also grow. However, what distributed storage brings about is an integrated and simple way to manage the data. It has no influence upon the clients, and the distributed storage will be accessed through the standard access protocol of the field. Moreover, as the capacity of distributed storage increases, there is no need for the client to put extra manpower into the management, thus saving time and energy for his own use.
6. Load Balancing: The distributed storage will achieve load balancing in both front-end and back-end through the application of distributed operating system. Accesses to the front-end, under the guidance of several load balancing strategies, will be distributed to the various controllers of the distributed storage, effectively reducing the load for every controller. Access data to the back-end, through the open architecture and back-end network, will be distributed to all the controllers for storage and reading. Every read-write act will be receiving processing from more than 1 disk, thus substantially enhancing the read-write performance.
7. High Performance: In terms of high performance, currently, the discussion shall be limited to the application mode of high bandwidth and high concurrent accesses. Undoubtedly, distributed storage is able to present performance more excellent than traditional storage architecture. However, the current applications, besides high bandwidth and high concurrent accesses, shall include other categories of applications, such as high IOPS, random access, small file access and backup and archiving. The distributed storage shall provide high performance solutions to the abovementioned categories.
8．Data Processing Competence: AS10000 supports 3 ways of access: 40Gb IB, 10Gb Eth and 1Gb Eth. Especially, through the deployment of IB or 10Gb, the read-write response ability of the server at front-end is effectively improved. It is able to satisfy the storage access request from different business regardless of size of the file.
9. High Data Reliability: Inspur AS10000 storage system adopts storage architecture different from traditional storage. The way that the storage is arranged in distributed fashion is a focus on data reliability from the perspective of design. The system will provide multi-level data protection and failover function. Data protection covers the range from hard-disk to controller. For hard-disk data protection, the system will ensure that failure of 1 hard-disk will not cause any data loss thanks to functions like RAID1, RAID5 or RAID6. Inside storage controllers, according to the size of object, different RAID algorithms will be adopted. For files with big sizes (over 1MB), the RAID5/RAID6 algorithm will be adopted. For those with smaller sizes (not over 1MB), RAID1 algorithm will be adopted to ensure the data reliability. The controllers follow the design of redundancy multi-control. In event of multiple failures from the controllers, the system will activate the remaining available controllers to take over the job; the system networking adopts the design of full redundancy. The operational communication for storage and business systems will use redundancy networking and the controllers inside the storage system will use redundancy networking, too, to prevent single-point of failure.