Skip to main content

SUBMERSE Project Advances Data Management with Dynamic Planning

To manage a complex data flow, the SUBMERSE project employs an iterative approach to FAIRification and Data Management Plan creation, evolving with their growing understanding of the data and its dynamics.
13/06/2024 12:06
SUBMERSe under water
Foto: iStock

The SUBMERSE project is making significant strides in the collection and management of oceanic data. By utilizing submerged fiber-optic network cables, the project streams and processes enormous amounts of initially unintelligible data from the ocean floor. This data, which includes sensitive information such as ship and submarine movements, is continuously analyzed, filtered, and managed in real time. Most of it is discarded, some stored temporarily, and a tiny fraction preserved in FAIR (Findable, Accessible, Interoperable, Reusable) repositories.

"This type of data collection is a great example for the need of interdisciplinary work and collaboration both on the researcher’s side, where the same data is used by multiple different domains, but also on the technical side. The project includes specialists for network, computing, data management and security", says Hannah Mihai, Data Management Consultant in DeiC.

The Necessity of a Dynamic Data Management Plan

To handle this complex data flow, SUBMERSE requires a Data Management Plan (DMP) that evolves with the project. As data understanding improves, the DMP is updated to reflect better management practices. This process involves representatives from various organizations within SUBMERSE who collaborate to transform raw data into intelligible "data products" such as rapid earthquake detections, tsunami warnings, whale migration patterns, and non-military shipping intelligence.

Iterative FAIRification and DMP Development through Workshops

Supporting good data management practices, the SUBMERSE project engages in FAIRification, which enhances the Findability, Accessibility, Interoperability, and Reusability of its digital assets – the data. Given the impracticality of making all data FAIR, the focus is on gradually increasing the utilization of valuable data by FAIRifying meaningful bits before submission to repositories.

FAIRification and DMP creation in SUBMERSE are iterative processes involving continuous debate and redrafting based on changes in data understanding and flow. This iterative approach was highlighted in two key workshops in summer and autumn 2023.

The first workshop provided a platform for researchers to articulate their instrumentation understanding and data needs while exploring strategic possibilities. Discussions emphasized the importance of secure data storage and the challenges of data management, particularly the scrubbing of sensitive data. Balancing data security with accessibility emerged as a key consideration.

In the second workshop, feedback was gathered to refine the DMP further. This session initiated discussions on potential uses of SUBMERSE instrumentation and the development of "SUBMERSE products" that the DMP should focus on. The collaborative effort led to the submission of the first SUBMERSE DMP by the end of November 2023, demonstrating a collective commitment to effective data management practices.

Key Findings and Insights

The refined DMP emphasizes secure data storage, short-term buffering, and evolving the scrubbing methodology to filter out sensitive information late in the data flow to maximize research potential. The DMP also recognizes the diversity of metadata standards and file formats across varied research communities, ensuring inclusivity and usability.

Future Directions and Ongoing Collaboration

Looking ahead, the Data Lifecycle task will continue to foster synergies with other project efforts. Collaboration with the Ethics and Security Task and the Security Advisory Board is prioritized to ensure that researchers' needs remain central, despite security considerations. The task will monitor scientific and technical developments to offer support and guidance on data management issues, adjusting the DMP as data transitions from unintelligible noise to valuable information.

As the project progresses, the iterative development of a robust DMP, continuously updated until the project's end, is essential to ensure that collected data is FAIR and can be used efficiently.