Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. Background

1.1 Short description

Particle formation is an atmospheric process whereby at specific spatial locations aerosol particles form and grow in diameter size over the course of a few hours. Particle formation is studied for its role in climate change and human respiratory health.


The use case aims to, primarily, (1) harmonize the information describing particle formation; (2) represent information, specifically the meaning of data, using an appropriate computer language; and (3) acquire and curate information in infrastructure.

1.2 Contact

BackgroundContact PersonOrganizationContact email
ICTMarkus StockerTIB,
RI-DomainJaana BäckUniversity of
e-InfrastructureYann Le
ICTRobert HuberUniHB,

1.3 Use case type

Test Case

1.4 Scientific domain and communities

Scientific domain



Data Use, Data Acquisition (primarily)


Data Publication (possibly)


Relevant Data Use Community Behaviors


  • Data Publication: Performed by the RI, this behavior provides information describing particle formation by following specified publication and sharing policies.
  • Semantic Harmonisation: Since the information describing particle formation is semantically harmonized, this behavior is performed by software agents of the scientific workflow, before Data Collection. Assumed is community agreement for the ontology design pattern.


Relevant Data Use Community Roles


  • Data Originator: A passive role, an RI component that provides information describing particle formation to be made available for (public) access.
  • Data Repository: A passive role, an RI component that is the facility for the deposition of published information describing particle formation.
  • Data Publisher: An active role, a person of the RI in charge of supervising the information publishing processes.
  • Data Publishing Subsystem: A passive role of the RI that enables the discovery and retrieval of information describing particle formation.

2. Detailed description

Section 1.1 provides a summary of the primary aims of this use case. We begin this section by providing a more detailed description of the aims. Where applicable, we discuss how these aims align with FAIR Principles (Wilkinson, 2016). Aims marked optional will be addressed if time permits.


  • 1.As a community effort, harmonize the information describing particle formation. Specifically, harmonize the used vocabulary. This aim addresses the following FAIR principles: Data and metadata use vocabularies that follow FAIR principles (I2); Data and metadata are richly described with a plurality of accurate and relevant attributes (R1); Data and metadata meet domain-relevant community standards (R1.3).
  • 2.Link information describing particle formation with other relevant information, specifically external vocabularies (e.g., for time and space) as well as related descriptions (e.g., locations as provided by a gazetteer such as GeoNames[2]). This aim addresses the following FAIR principle: Data and metadata include qualified references to other data or metadata (I3).
  • 3.Represent information describing particle formation using a computer language for information representation. Specifically, represent meaning (in addition to data). This aim addresses the following FAIR principle: Data and metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation (I1). The Web Ontology Language[3] is considered as the language of choice in this use case.
  • 4.Implement and publish an ontology design pattern that reflects the harmonized description, information linking, and information representation proposed in aims 1-3. It is proposed for this pattern to be part of the Environment Ontology[4].
  • 5.Adopting the solutions proposed in aims 1-4, implement the scientific workflow in Jupyter[5] (Perez and Granger, 2007) and deploy the implementation on e-Infrastructure. Specifically, expose the scientific workflow as a service used by the particle formation research community, thereby connecting the research community to infrastructure.
  • 6.Systematically acquire and curate information describing particle formation in infrastructure. Specifically, research infrastructures, e-Infrastructures, data centers such as PANGAEA[6], or similar. If not achievable, provide a concept for systematic acquisition and curation on institutional systems (possibly including individual workstations).
  • 7.(Optional) Support computing summary statistics (or other processing) on curated information describing particle formation. Implemented in Jupyter.
  • 8.(Optional) Represent, acquire and curate summary statistics in infrastructure. The Statistics Ontology[7] is the specialized language of choice in this use case for representing summary statistics.
  • 9.(Optional) Represent, acquire and curate provenance relating (summary statistics) to information describing particle formation and to particle size distribution observational data, as well as the involved agents (e.g., researchers) and activities (e.g., data interpretation). This aim addresses the following FAIR principle: Data and metadata are associated with detailed provenance (R1.2). The PROV Ontology[8] is the specialized language of choice in this use case for representing provenance.

Objective and Impact

There exist multiple, institutionally and geographically distributed, research groups that perform the scientific task of interpreting particle size distribution observational data to detect and characterize the occurrence of particle formation at determinate spatiotemporal locations. Two groups well-known to the authors of this use case are the Atmospheric Aerosol Physics[9] research group at the University of Eastern Finland and the Aerosol Cloud Climate Interactions[10] research group at the University of Helsinki.


The second impact is the possible systematic acquisition and curation of explicit and formal (i.e., machine actionable) meaning of data (in addition to the data themselves). Rather than merely acquiring data products in form of, e.g., visualizations such as maps or plots (with implicit information content not available to machines) this use case aims to set an example for how infrastructures can systematically acquire and curate truthful, meaningful, well-formed data (i.e., information) whereby meaning is explicit and formal. Furthermore, we expect that harmonized information generated by distributed research groups will be easier to acquire for infrastructure, and thus curate and possibly publish. As such, the use case contributes to advancing infrastructures from the current data systems to information and knowledge-based systems (Stocker, 2017) that manage information about natural worlds and their phenomena of interest (in addition to information about people, organizations, instrumentation, publications, etc.).


A key challenge is to bring together representatives of the research community studying particle formation and come to an agreement for how to harmonize the information describing particle formation. It is unclear whether such agreement is desired and achievable. At this stage it is also unclear whether the required people can be motivated to attend the planned workshop.


A third difficulty is the lack of clarity for whether it is possible for infrastructure to systematically acquire, curate and potentially publish the information describing particle formation as envisioned in this use case.

Detailed scenarios

The basic scenario is for research groups, specifically individual researchers, of the atmospheric aerosol particle formation research community to be served with a service that implements a scientific workflow for particle size distribution observational data interpretation and the systematic acquisition, curation and possible publishing of information describing particle formation, resulting from observational data interpretation.


Of interest to advanced scenarios is also the possibility to openly publish information describing particle formation as well as the support for functionality relevant to data publishing, such as persistent identification and citation of information describing particle formation.

Technical status and requirements

The required components are Jupyter, the implementation of the scientific workflow as a Jupyter Notebook, an RDF database with SPARQL endpoint, as well as a Python library with specialized functions. Figure 2 shows a visualization of the prototype implementation of the scientific workflow. The components are containerized using Docker and can easily be deployed on infrastructures such as EGI. Indeed, this has already been tested with the deployment at Recently, we have adopted JupyterHub[12] in order to support authentication of multiple users and management of individual notebooks.


Overall, the use case is arguably already in a fairly advanced stage. While further technical advances are possible, the more critical advancements now rely on collaborative work with the research community, such as achieving agreement on representing information describing particle formation and adoption of the scientific workflow as a service.

Implementation plan and timetable

We envision the following implementation plan. First, we plan to organize the aforementioned workshop during Q1 2018 and hold the workshop during Q2 2018, possibly in April ahead of the next ENVRIweek, which would allow for presenting results on aims 1-3 during ENVRIweek. The successful execution of the workshop is a milestone for this use case.


Finally, linking the scientific workflow with the ENVRIplus Knowledge Base in order to support selection of observational data sources and, possibly, automated retrieval of data required in workflow execution also relates to Theme 2 activities and the implementation may serve as a demonstrator in this context.

Expected output and evaluation of output

The use case expects the following (primary) outputs:

  • 1.A community-agreed ontology design pattern for information describing particle formation published as a concept of the Environment Ontology. The pattern can be extended to be adopted in the proposed solution to represent information. The output is deemed a success if the community-agreed ontology design pattern is published by the Environment Ontology.
  • 2.Jupyter implementation and deployment on infrastructure of the scientific workflow as a service to be used by the research community. The output is deemed a success if (1) a functioning implementation is deployed as a service on infrastructure and (2) at least two groups of the research community are using the service.
  • 3.A report that discusses which infrastructure is best suited to acquire, curate, and possibly publish the information describing particle formation, as well as derived summary statistics (if applicable). This output is deemed a success if the report is published.
  • 4.A demonstrator that shows how the proposed solution enables integrated (e.g., statistical) analysis of information describing particle formation, generated by distributed research groups. This output is deemed a success if such a demonstrator is delivered.


Floridi, L. (2011). The Philosophy of Information. Oxford University Press.