Return to ENVRI Community Home
This example explains the usage of the Reference Model in a pilot project that investigates the big data strategies for the EISCAT 3D research infrastructure. The Reference Model serves as a knowledge base to guide various research activities.
EISCAT, the European Incoherent Scatter Scientific Association, was established to conduct research on the lower, middle and upper atmosphere and ionosphere using the incoherent scatter radar technique. This technique is the most powerful ground-based tool for these research applications. A next generation incoherent scatter radar system, EISCAT 3D, is being designed. The multi-static radars to be used will be a tool to carry out plasma physics experiments in the natural environment, a novel atmospheric monitoring instrument for climate and space weather studies, and an essential element in multi-instrument campaigns to study the polar ionosphere and magnetosphere. It will be a world-leading international research infrastructure, using the incoherent scatter technique to study how the Earth's atmosphere is coupled to space.
The design of the EISCAT 3D opens up opportunities for physicists to explore many new research fields. On the other hand, it also introduces significant challenges in handling large-scale experimental data that will be massively generated at great speeds and volumes. During its first operation stage in 2018, EISCAT 3D will produce 5PB data per year, and the total data volume will rise up to 40PB per year in its full operations stage in 2023. This challenge is typically referred to as a big data problem and requires solutions beyond the capabilities of conventional database technologies.
EISCAT is currently considering the use of e-Science technologies to deliver strategies for handling its big data products. Advanced e-Science infrastructure projects such as EGI, PRACE, and their enabling technologies are making large-scale computational capacities more accessible to researchers of all scientific disciplines. Emerging infrastructures, such as cloud systems proposed by the Helix Nebula project and by the EGI Federated Cloud Task Force, or the data infrastructure to be provided by EUDAT will extend possibilities even further.
As a potential of e-science partner for EISCAT, we present EGI. EGI was established in 2010 as a Europe-wide federation of national computing and storage resources. The EGI collaboration is coordinated by EGI.eu, a not-for-profit foundation created to manage the infrastructure on behalf of its participants: National Grid Initiatives and European Intergovernmental Research Organisations. Resources in EGI are provided by about 350 resource centres from the NGIs who are distributed across 55 countries in Europe, the Asia-Pacific region, Canada and Latin America. These providers operate more than 370,000 logical CPUs, 248 PB disk and 176 PB of disk capacity (June 2013 statistics) to drive research and innovation in Europe and beyond.
Since February 2013, a pilot project has been set up within ENVRI, which establishes a partnership between EISCAT, EGI and EUDAT, aiming to identify and allocate solutions that directly benefit EISCAT 3D, which can also be reused in other ESFRI projects involved in ENVRI. ENVRI WP3 has been involved in this investigation, and uses the Reference Model to guide various research activities, including;
Having fulfilled these tasks, the Reference Model is proving to be useful as a knowledge base that can be referred when conducting various system analysis and design activities.
In the following, we describe how the Reference Model is used to conduct several system analysis tasks.
The initial challenge for the pilot project is to understand the EISCAT 3D data infrastructure. The existing design documents of EISCAT 3D has been focused on the incoherent scatter radar technologies. As shown in Figure 1, its data infrastructure is embedded within the overall design of the observatory system that is difficult for a computer scientist/technologist having little physics knowledge background to understand.
Figure 1: The original design of EISCAT 3D data infrastructure is embedded within the overall observatory system design
We use the ENVRI_Common_Subsystem framework to decompose the computational elements, clarifying the boundary between the radar network and data infrastructure, which results in Figure 2. This diagram now, instead of Figure 1, is frequently used in presentations and discussions of the EISCAT 3D data infrastructure.
Figure 2: Using the 5 ENVRI Common Subsystem to interpret the EISCAT 3D data infrastructure makes it easy to communicate with computer scientists/technologists
Figure 2 illustrates that the EISCAT 3D functional components can be placed into 2 ENVRI common subsystems, subsys_acq and subsys_cur. Briefly, at the subsys_acq, the raw signal voltage data will be generated by the antenna Receivers at the speed of 125 TB/hr, and be temporarily stored in a Ring buffer. A second stream of RF signal voltages will be passed to a Beam-former to generate the beam-formed data (1MHz). Continually, the beam-formed data will be processed by a Correlator to generate correlation analysis data based on standard methods. Then, the correlation data will be delivered to a Fitter to produce the fitted data (1GB/year). In order to support different user requirements, EISCAT 3D will allow users to access and process the raw voltage data in the Ring buffer and to generate the specialised products based on self-defined analysis algorithms. Both raw data and their products will be stored in Intermediate storage (11PB/year), from where they will be delivered to the central site within the curation subsystem.
In the curation subsystem , Long-Term Storage will preserve the raw voltage data and their products. A High Performance Computer will be used for data searching and processing (e.g., beam forming, lag profiling or other correlation, and parameter fitting). Searching facilities will enable user to search over all data products and to identify significant data signatures. A Multi-static fitter will be installed to process the stored raw voltage data to generate the 3D plasma parameters that will then be stored back in Long-Term Storage. A complete copy of Long-Term Storage data will be established at mirror sites; related data processing and searching tools will be provided.
While it is made clear that the design specification covers 2 of 5 common subsystems described in the ENVRI Reference Model, we understand functionalities of the other 3 subsystems are currently missing. The reason of this is likely due to resource limitations. However, the absent 3 subsystems are crucial for a big data system such as EISCAT 3D. Without providing services to support data discovery, access, processing and user community, the value of EISCAT 3D big data cannot be unlocked, and expensively generated and archived scientific data will be useless.
Using the Reference Model as the analysis tool, we identified the missing pieces of the design specification, which gives the direction for future investigation.
We need to understand the functionalities of EGI services and how to integrate them to support the EISCAT 3D requirements.
A set of generic services are enabled by the EGI e-Infrastructure, including:
Showing in Table 1, by examining the functionalities of the EGI services and mapping them to the ENVRI Reference Model computational model objects, we understand these services fall into 2 ENVRI common subsystems: Curation and Community Support.
Table 1: Mapping EGI Services to the Reference Model Elements (from Computational Viewpoint perspective)
ENVRI- RM Computational Objects
ENVRI Common Subsystem
AMGA Metadata catalogue
LFC File catalogue
File Transfer Service
Portal for application development & hosting
Above analysis gives clues to a solution for integrating the EGI technologies into the EISCAT 3D data infrastructure. Depicted in Figure 3, a secondary CV Data Curation (seen as the mirror site of the EISCAT 3D central archive in Figure 1) can be established using the EGI infrastructure and its services. Data from EISCAT 3D central archive (or the acquisition subsystem) can be staged into the EGI storages, and be managed using LFC File Catalogue and AMGA Metadata Catalogue. At the front end, an EISCAT science gateway can be established, seen as part of a CV Data Use, to provide access control (e.g., authentication, authorisation, and single sign-on) and application portals (e.g., to which processing and data- mining applications from EISCAT 3D can be plugged in).
Figure 3: An integrated infrastructure of EGI and EISCAT 3D
Using the Reference Model, functional elements of both EISCAT 3D and EGI can be placed into a uniform framework, which provides a way of thinking about the construction of the integrated infrastructure.
Using the common framework enabled by the Reference Model, we can analyse and compare the EGI and EUDAT generic service infrastructure and the requirements from a domain-specific data infrastructure such as EISCAT 3D, and we understand that there are significant gaps in-between, including but not limited to:
In this example, we have shown that the Reference Model could be used to conduct various system analysis tasks. Using the Reference Model we have:
We have shown that the Reference Model offered a research infrastructure: