Session DescriptionAccess-optimized metadata—exemplified by formats such as DMR++ and Kerchunk—presents a powerful strategy for improving data access, analysis workflows, and software performance. These metadata structures describe how data is organized within a file or object, enabling tools to efficiently retrieve only the required subsets of data, regardless of whether they reside on local disks or in cloud object stores like S3.
Recent developments and performance benchmarks demonstrate that software capable of interpreting access-optimized metadata (e.g., VirtualiZarr) can significantly reduce data access time of data stored in legacy formats without requiring the costly and time-consuming reformatting of entire archives. This creates new opportunities to modernize data access patterns while maintaining compatibility with older data formats.
Despite this promise, the community lacks a shared understanding or specification—formal or informal—of what constitutes access-optimized metadata. This absence of coordination has limited interoperability and awareness across projects and tools.
During this session we will aim to:
- Demonstrate high-performance access to data in legacy formats stored on S3, enabled by DMR++ metadata.
- Examine the structural and semantic characteristics that make DMR++ and Kerchunk effective.
- Explore how these metadata models might converge into a unified and extensible framework for access-optimized metadata.
The goal is to initiate a community-wide discussion about formalizing and standardizing this emerging class of metadata. The intent is to enable interoperable, independent implementations of software that uses Access-optimized metadata and foster broader adoption across scientific computing environments.
Value to Session ParticipantsParticipants in this session will gain awareness of, and provide feedback on, access-optimized metadata practices, such as chunking and data compression practices. Feedback about best practices remains lacking, as does community awareness of what access-optimized metadata information is, how to create it, and overall scientific workflows that make use of access-optimized metadata. Finally, we hope that participants will be able to differentiate between domain-based community standards such as Climate-Forecast and domain-neutral access-optimized metadata.
Recommended Ways to Prepare for this SessionNo recommendations provided