\n\n*Visualization of ocean temperature on January 16, 2014.*\n\n

GlueX is an experiment at the Thomas Jefferson National Accelerator Facility (JLab) in Newport News, Virginia, that studies how particles called mesons behave to learn more about the strong force—the force that holds atomic nuclei together. The dataset from GlueX comes from millions of collisions between high-energy photons and protons. GlueX uses the OSDF to distribute inputs to its data simulations and is exploring using OSDF for reprocessing.
GlueX is supported by the US Department of Energy.
Experiments related to the Virtual Data Collaboratory at the Scientific Computing and Imaging Institute at the University of Utah.
These cyberinfrastructure experiments include activities like running automated workflows on the OSPool triggered on alerts from the EarthScope Consortium.
AWS Open Data hosts publicly accessible datasets covering areas such as earth science, climate, genomics, machine learning, transportation, and economics. The collection includes contributions from a range of organizations, including government agencies, academic institutions, and private companies.
There are currently nearly 700 datasets, totaling over 100 petabytes of data.
Browse the full catalog at the Registry of Open Data on AWS.
The AWS Open Data datasets are publicly accessible and are integrated with the OSDF, allowing users to stage the data closer to nationally-funded computing resources via the OSDF’s hardware infrastructure. This enables fusion between AWS Open Data and other data sources accessible via the OSDF.
The Center for Applied Internet Data Analysis (CAIDA) runs an “Network Telescope”, collecting packets sent to a cross-section of the public Internet similarly to how a telescope collects stray light.
This dataset is made available to scientists attempting to understand how activity, such as malware, is moving across the Internet.
The CAIDA integration with OSDF aims to stage the most recent subset of the recorded data to be made available for large-scale analysis.
Staging data for CHTC collaborations with University of Wisconsin-Madison research groups. Currently serving data specifically for the Joao Dorea group.
The Center for High Throughput Computing (CHTC), established in 2006, aims to bring the power of High Throughput Computing to all fields of research, and to allow the future of HTC to be shaped by insight from all fields.
Beyond technologies and innovation and HTC through projects like HTCondor, the CHTC operates general purpose clusters for the UW-Madison campus. CHTC allows researchers to stage their research data to an object store connected to the OSDF and then process and analyze the data using the OSDF with on-campus resources or the OSPool.
This data is organized as “working datasets” representing running workloads, not permanent scientific outputs.
The Electron-Ion Collider is a proposed facility being built at the Brookhaven National Laboratory. Experiments at the facility include the ePIC detector. The computing for EIC is a joint collaboration with the Jefferson National Lab; the datasets connected to the OSDF include input files and other information necessary to help with simulations of the detector’s behavior.
The South Florida region is home to nearly 10 million people, and the population is growing. The region faces several challenges, such as rising sea levels and flooding, harmful algae blooms, water contamination, and wildlife habit loss, which affects the economy and the welfare of its population. Florida International University (FIU) runs the EnviStor project, which is a centrally managed, petabyte-scale storage system that is also a clearing house for supporting interdisciplinary research and modeling involving both built and natural environments in South Florida. EnviStor provides opportunities for students and faculty to enhance their knowledge of database management, focusing on interoperability.
The datasets kept in EnviStor can be accessed via the OSDF; work is ongoing to provide new computing workflows and AI-based dataset discovery that will help users utilize the data.
The EnviStor activity and underlying storage is funded through the NSF Campus Cyberinfrastructure program under Award # 2322308.
Simulation data used for the Einstein Telescope Mock Data Challenge.
The Einstein Telescope (ET) is a proposed next-generation gravitational wave observatory, aiming to detect gravitational waves with much higher sensitivity than either the LIGO or VIRGO instruments.
As part of the studies and the design proposal for the ET instrument, the mock data challenge is being run in 2024 and 2025 to better understand how the future data may be distributed and analyzed. An example tutorial for using the data can be found on GitHub.
This is a cool description.
Screeeeeeeeeam
The Fusion Data Platform (FDP) provides a modern, Python-based data framework for analyzing data from magnetic fusion experiments.
Using data from the DIII-D National Fusion Facility, users can leverage the FDP software to stream data via the OSDF services for their fusion data analysis.
The FDP is funded by the DOE under award DE-SC0024426.
Public gravitational wave data from international gravitational wave network, including data from LIGO, VIRGO, and KAGRA. This data can be used in the detection and study of black holes throughout the universe.
These datasets are the calibrated readouts from the corresponding interferometers. Also included are mirrors of data analysis products released to Zenodo to accompany publications.
The IceCube repository integrates data from the IceCube Neutrino Observatory, a cubic-kilometer detector embedded deep in Antarctic ice near the South Pole. IceCube records when high-energy neutrinos interact with the ice.
Using over 5,000 optical sensors deployed between 1,450 and 2,450 meters below the surface, the observatory captures detailed information about these events, including their timing, location, and intensity. The data is used to study cosmic neutrinos and the astrophysical phenomena that produce them, such as black holes, supernovae, and gamma-ray bursts.
The IceCube collaboration is supported by multiple funding agencies including the NSF. The dataset is maintained by the Wisconsin Icecube Particle Astrophysics Center.
User-managed data by members of the LIGO Scientific Collaboration, the Virgo Collaboration, and the KAGRA Collaboration. These data are created and used within individual users’ workflows as they analyze gravitational-wave data in order to detect black hole collisions and other cosmic phenomena. This origin is hosted at Caltech.
This data is not public; it is in support of in-progress computational workflows.
Gravitational wave data collected by the KAGRA interferometer, a scientific device for detecting gravitational waves in the Gifu prefecture in Japan. KAGRA collaborates closely with the LIGO detectors in the US to provide more accurate detection of gravitational waves
This is the data not yet released to the public.
Gravitational wave data collected by the LIGO interferometer detectors in Hanford, Washington and Livingston, Louisiana and hosted by LIGO Laboratory at Caltech. Gravitational wave data is used to detect black hole collisions and other cosmic phenomena and is one piece of the NSF’s multi-messenger astronomy initiatives.
This is the data not yet released to the public.
Curated datasets used by members of the LIGO Scientific Collaboration, the Virgo Collaboration, and the KAGRA Collaboration in the combined analysis of data collected from their detectors. These data consist of gravitational-wave data collected at any of the four interferometers but with simulated signals, as well as some other datasets, used for data analysis purposes in detecting black hole collisions and other cosmic phenomena as part of the NSF’s multi-messenger astronomy initiatives.
These data are not yet released to the public.
This is a test repository utilized by staff of the LIGO Laboratory at Caltech to test new versions ofthe Pelican software and configuration, to ensure that upcoming changes do not disrupt ongoing data analysis on any of the production origins. This test origin specifically tests the software and configuration of user-managed data analogous to that served in /igwn/cit.
This data is private.
This is a test namespace utilized by staff of the LIGO Laboratory at Caltech to test new versions of Pelican software and configuration, to ensure that upcoming changes do not disrupt ongoing data analysis on any of the production origins.
This data is private.
Gravitational wave data collected by the VIRGO interferometer, a scientific device for detecting gravitational waves near Pisa, Italy. VIRGO collaborates closely with the LIGO detectors in the US to provide more accurate detection of gravitational waves
This is the data not yet released to the public.
Jessica Kendall-Bar leads a research group that integrates engineering, data science, ecology, and visual storytelling/public communication to explore the behavior and physiology of marine life.
Her visual data work has appeared in various media platforms—from UC San Diego news to national outlets like The New York Times and The Atlantic—and has contributed to global policy efforts in areas such as marine mammal protection and coral reef recovery.
Jessica Kendall-Bar leads a research group that integrates engineering, data science, and ecology to explore the behavior and physiology of marine life. The data stored on the OSDF includes high-resolution multimodal data such as video, GPS, and electrophysiology.
The OSDF data is catalogued on the National Data Platform, enabling textual, conceptual, and map-based spatiotemporal search capabilities.
The NDP project is using this dataset as inputs for a data challenge planned for Fall 2025. It also powers an application running on the National Research Platform at https://lifeinthedeep.nrp-nautilus.io/.
The Jefferson National Laboratory (JLab) operates particle accelerator facilities and associated detectors for experiments like GlueX.
JLab connects its storage to the OSDF to allow large-scale data simulation and reprocessing on the PATh-operated OSPool resources and JLab-provided capacity.
This repository enables faculty and students at Kennesaw State University to use their NSF Campus Cyberinfrastructure (CC*) funded storage (Award #2430289) with their local HPC cluster via OSDF.
The Knight Lab uses and develops state-of-the-art computational and experimental techniques to ask fundamental questions about the evolution of the composition of biomolecules, genomes, and communities in different ecosystems, including the complex microbial ecosystems of the human body.
The MeerKAT Absorption Line Survey (MALS) consists of 1,655 hours of observatory time on the MeerKAT radio telescope at the South African Radio Astronomy Observatory. The survey aims to carry out the most sensitive search of HI and OH absorption lines at 0<z<2, the redshift range over which most of the cosmic evolution in the star formation rate density takes place.
The MALS dataset is replicated to the OSDF to allow collaborators at the NRAO participate in the scientific study of the data.
General namespace for University of Missouri OSStore contribution.
Research in machine learning methods like deep learning neural networks, computer vision and morphological neural networks.
The LLC4320 ocean dataset is the product of a 14-month simulation of ocean circulation and dynamics using the Massachusetts Institute of Technology’s General Circulation Model on a lat-lon-cap grid. Comprising extensive scalar data such as temperature, salinity, heat flux, radiation, and velocity, the dataset exceeds 4 PB and can potentially improve our understanding of global ocean circulation and its role in Earth’s climate system.
In order to make this dataset more accessible and easier to visualize, the National Science Data Fabric has processed the raw data into the ViSUS data format using their OpenViSUS toolsuite.
It will be used in the 2026 IEEE SciVis Contest to demonstrate cutting-edge technologies for working with petascale climate data provided by NASA.
The NASA C1440-LLC2160 dataset is the simulation output from research into coupling two models: a global atmospheric model and a global ocean model that were originally designed to be run separately. The atmospheric model is a C1440 configuration of the Goddard Earth Observing System (GEOS) atmospheric model running on a cubed-sphere grid. The global ocean model is an LLC2160 configuration of the MITgcm model that uses a latlon-cap grid. Each model was run for over 10000 hourly timesteps covering over 14 simulation months. With more than 10000 time steps and multiple scalar fields, it totals approximately 1.8 PB.
In order to make this dataset more accessible and easier to visualize, the National Science Data Fabric has processed the raw data into the ViSUS data format using their OpenViSUS toolsuite.
It will be used in the 2026 IEEE SciVis Contest to demonstrate cutting-edge technologies for working with petascale climate data provided by NASA.
NCAR provides a wide range of atmospheric and Earth system science datasets, including observational data from airborne and ground-based instruments, outputs from community weather models, and large-scale reanalysis and simulation data. These datasets support research on weather patterns, the water cycle, and extreme weather events. They are used by researchers, educators, and policymakers across the US.
Integrated with the OSDF is NCAR’s Research Data Archive (RDA), the centrally managed archive of the laboratory’s atmospheric and Earth system datasets. When downloading data from the web interface, users are automatically redirected to the OSDF cyberinfrastructure.
Example notebooks that analyze data from these datasets can be found in the NCAR OSDF Examples repository and are part of the NCAR effort to utilize the OSDF.
Visualizations
Visualization of climate data over South America on October 10, 2020, using NCAR datasets.
Visualization of ocean temperature on January 16, 2014.
The integration between NCAR and OSDF is part of the Pathfinders collaboration, a collaboration between five initiatives aimed at developing science-led pathways through the NSF cyberinfrastructure landscape. This work is funded by NSF award 1852977.
A century of suppressing wildfires has created a dangerous accumulation of flammable vegetation on landscapes, contributing to megafires that risk human life and destroy ecosystems. Prescribed burns can dramatically reduce the risk of large fires that are uncontrollable by decreasing this buildup of fuels. BurnPro3D is a science-driven, decision-support platform to help the fire management community understand risks and tradeoffs quickly and accurately when planning and conducting prescribed burns.
A century of suppressing wildfires has created a dangerous accumulation of flammable vegetation on landscapes, contributing to megafires that risk human life and destroy ecosystems. Prescribed burns can dramatically reduce the risk of large fires that are uncontrollable by decreasing this buildup of fuels. BurnPro3D is a science-driven, decision-support platform to help the fire management community understand risks and tradeoffs quickly and accurately when planning and conducting prescribed burns.
NOAA collects and uses active acoustic (or sonar) data for a variety of mapping requirements. Water column sonar data focus on the area from near the surface of the ocean to the seafloor. Primary uses of these specific sonar data include 3-D mapping of fish schools and other mid-water marine organisms; assessing biological abundance; species identification; and habitat characterization. Other uses include mapping underwater gas seeps and remotely monitoring undersea oil spills. NCEI archives water column sonar data collected by NOAA line offices, academia, industry, and international institutions.
Radio astronomy data from the Very Large Array Sky Survey (VLASS).
As written in the VLASS homepage, VLASS is a survey of the universe through the use of the Very Large Array (VLA) in New Mexico. The VLA is one of the most sensitive telescopes in the radio band that can provide more sensitive images of the universe than any other radio telescope in the world. This, however, requires processing large volumes of data and super-computer class computing resources. The VLASS is designed to produce a large collection of radio data available to wide range of scientists within the astronomical community. VLASS’s science goal is to produce a radio, all-sky survey that will benefit the entire astronomical community. As VLASS completes its three scans of the sky separated by approximately 32 months, new developments in data processing techniques will allow scientists an opportunity to download data instantly on potentially millions of astronomical radio sources.
The data in this data origin consists of interferometric visibilities stored in (Measurement Set (MS)) format. Each dataset contains calibrated visibilities for one of the sixteen spectral windows of the VLA and covers an area of 4 square degrees (2 degrees x 2 degrees) in the sky. All sixteen spectral windows are combined to generate a single image, so that the data contained in this data origin can be used to make images of approximately 70 regions in the sky, each image covering 4 square degrees. The LibRA software package is used to transform visibilities to images. The architecture and design considerations for LibRA are shown in this presentation.
Teams of scientists at the National Radio Astronomy Observatory (NRAO), Socorro, NM and the Center for High Throughput Computing (CHTC) have used the PATh and NRP facilities of the OSG to make the deepest image in the radio band of the Hubble Ultra-deep Field (HUDF). Similarly, the COSMOS HI Large Extra Galactic Survey (CHILES)[http://chiles.astro.columbia.edu/] project has 1000 hr of integration with the VLA on the COSMOS field. Imaging the CHILES data using PATh and NRP facilities delivered the deepest radio image of this region of the sky, at an unmatched data processing throughput. Similarly to the VLASS data stored in this data origin, the data for HUDF and CHILES is stored in the PATh facility data origin. These recent large scale imaging achievements that were made possible through use of OSG resources are reported in this [NRAO Newsletter article] (https://science.nrao.edu/enews/17.3/index.shtml#deepimaging) and this press release.
Namespace used by Fabio for ongoing CheckMK testing of NRP caches
The XENON Dark Matter Project is a scientific collaboration organized around the XENONnT dark matter detector at the INFN Gran Sasso National Laboratory in Gran Sasso, Italy.
This repository is used to store data and simulations from the XENONnT experiment to aid in its computing workloads.
Scripps Institution of Oceanography scientists conduct fundamental research to understand and protect the planet, and investigate our oceans, Earth, and atmosphere to find solutions to our greatest environmental challenges.
Datasets for use in OSDF usage tutorials by Pelican Platform facilitation team.
This repository supports the education and workforce development mission of the Pelican Project.
Staging area for PATh-operated Access Points located at the University of Chicago.
The PATh project allows researcher teams to stage their research data to an object store connected to the OSDF and then process and analyze the data using the OSDF via the OSPool. Any US-based open science team can utilize the PATh services for distributed High Throughput Computing workflows.
This data is organized as “working datasets” representing running workloads, not permanent scientific outputs.
Staging area for PATh-operated Access Points located at the University of Wisconsin-Madison.
The PATh project allows researcher teams to stage their research data to an object store connected to the OSDF and then process and analyze the data using the OSDF via the OSPool. Any US-based open science team can utilize the PATh services for distributed High Throughput Computing workflows.
This data is organized as “working datasets” representing running workloads, not permanent scientific outputs.
Staging area for PATh-operated collaboration services located at the University of Chicago.
The PATh project allows multi-institutional collaborations to stage their experimental data and simulation outputs to an object store connected to the OSDF and then process and analyze the data using the OSDF via the OSPool or other capacity dedicated to their experiment.
This data is organized as “working datasets” representing running workloads, not permanent scientific outputs.
Data staging area for OSPool projects with public data
Staging area for data used in the PATh Facility. The PATh Facility is a distributed computing resource spanning 5 sites, from San Diego, California to Syracuse, New York, that provides NSF-funded researches with compute credits for High Throughput Computing workflows.
This repository enables these NSF projects to stage their research data outputs to an object store connected to the OSDF and then process and analyze the data using the OSDF via both the PATh Facility computing hardware and the OSPool.
This data is organized as “working datasets” representing active workloads from researchers, not permanent scientific outputs.
Special projects data in the PATh facility.
To avoid redundancy, focus on /path-facility/data
instead.
A namespace for the Pelican Platform facilitation team to use for a variety of facilitation purposes.
Testing and Validation Origin
The Dark Energy Survey (DES) will probe the origin of the accelerating universe and help uncover the nature of dark energy by measuring the 14-billion-year history of cosmic expansion with high precision. A 570M-pix camera, the DECam, is being built for this project and comprehensive tests were successfully accomplished at Fermilab’s telescope simulator (pictured above). As we countdown to DECam’s first light, workload and excitement increase among our collaborators. Starting in late 2011 and continuing for five years, DES will survey a large swath of the southern sky out to vast distances in order to provide new clues to this most fundamental of questions.
DES uses the OSDF to deliver common data inputs for large-scale simulation jobs distributed across the US.
The Deep Underground Neutrino Experiment is an international flagship experiment to unlock the mysteries of neutrinos. DUNE scientists will paint a clearer picture of the universe and how it works. Their research may even give us the key to understanding why we live in a matter-dominated universe — in other words, why we are here at all.
DUNE will pursue three major science goals: find out whether neutrinos could be the reason the universe is made of matter; look for subatomic phenomena that could help realize Einstein’s dream of the unification of forces; and watch for neutrinos emerging from an exploding star, perhaps witnessing the birth of a neutron star or a black hole.
DUNE uses the OSDF to deliver common data inputs for large-scale simulation jobs distributed across the US.
The ICARUS neutrino detector measures 65 feet long and weighs 760 tons. It began its life in Gran Sasso Laboratory in Italy, seeking out elusive particles using pioneering technology. It later spent two years undergoing upgrades at CERN, the European particle physics laboratory and home of the Large Hadron Collider. It moved to Fermilab in 2017 and was installed in its detector hall in 2018, where along with the new Cosmic Ray Tagger it forms the far detector for the Short-Baseline Neutrino program.
The ICARUS collaboration is investigating signs of physics that may point to a new kind of neutrino called the sterile neutrino. Other experiments have made measurements that suggest a departure from the standard three-neutrino model. ICARUS is also investigating the various probabilities of a neutrino interacting with different types of matter as well as neutrino-related astrophysics topics.
ICARUS uses the OSDF to deliver common data inputs for large-scale simulation jobs distributed across the US.
MINERvA (Main Injector Neutrino ExpeRiment to study v-A interactions) is the first neutrino experiment in the world to use a high-intensity beam to study neutrino reactions with five different nuclei, creating the first self-contained comparison of interactions in different elements. While this type of study has previously been done using beams of electrons, this is a first for neutrinos.
MINERvA is providing the world’s best, high-precision measurements of neutrino interactions on various nuclei, in the 1-to 10-GeV energy range. MINERvA’s results are being used as inputs to current and future experiments seeking to study neutrino oscillations, or the ability of neutrinos to change their type.
MINERvA uses the OSDF to deliver common data inputs for large-scale simulation jobs distributed across the US.
The NOvA (NuMI Off-axis ve Appearance) experiment is shedding light on one of nature’s most elusive particles: neutrinos. Since the late 1990s, physicists have known that neutrinos exhibit a quantum mechanical behavior called oscillations. But this behavior is not predicted by the Standard Model of particle physics. NOvA is working to better understand these strange particles through precision measurements of their oscillation properties.
NOvA uses the OSDF to deliver common data inputs for large-scale simulation jobs distributed across the US.
The international Short-Baseline Neutrino Program at Fermilab examines the properties of neutrinos, specifically how the flavor of a neutrino changes as it moves through space and matter. The program emerged from a joint proposal, submitted by three scientific collaborations, to use particle detectors to perform sensitive searches for ve appearance and νμ disappearance in the Booster Neutrino Beam. All of the detectors are types of liquid-argon time projection chambers, and each contributes to the development of this particle detection technology for the long-baseline Deep Underground Neutrino Experiments (DUNE).
SBN uses the OSDF to deliver common data inputs for large-scale simulation jobs distributed across the US.
The Short-Baseline Near Detector (SBND) is a 112-ton active mass liquid argon time projection chamber (LArTPC) neutrino detector that sits only 110-m from the target of the Booster Neutrino Beam (BNB) at Fermilab. SBND is the near detector in the Short-Baseline Neutrino Program. ICARUS is the far detector in the program, and MicroBooNE ran previously in the same beam.
SBND will record over a million neutrino interactions per year. By providing such a high statistics measurement of the un-oscillated content of the BNB, SBND plays a critical role in performing searches for neutrino oscillations at the SBN Program. The large data sample will also allow studies of neutrino-argon interactions in the GeV energy range with unprecedented precision. The physics of these interactions is an important element of future neutrino experiments that will employ the LArTPC technology, such as the long-baseline Deep Underground Neutrino Experiment, DUNE.
SBND uses the OSDF to deliver common data inputs for large-scale simulation jobs distributed across the US.
MicroBooNE is a large 170-ton liquid-argon time projection chamber (LArTPC) neutrino experiment located on the Booster neutrino beamline at Fermilab. The experiment first started collecting neutrino data in October 2015.
MicroBooNE investigates the low energy excess events observed by the MiniBooNE experiment, measure a suite of low energy neutrino cross sections, and investigate astro-particle physics.
MicroBooNE uses the OSDF to deliver common data inputs for large-scale simulation jobs distributed across the US.
General namespace for Purdue University OSStore contribution.
The RouteViews dataset provides a map of the Internet, as seen by participating sites. The information, collected from the BGP tables of routers, includes both current and historic “snapshots”. This allows operators of major Internet services to detect changes to the map in near-real time and for researchers to understand the historical evolution of the Internet.
The RouteViews dataset is funded by University of Oregon’s Advanced Network Technology Center, and by grants from the National Science Foundation, Cisco Systems, the Defense Advanced Research Projects Agency, Juniper Networks, Sprint Advanced Technology Laboratories, Catchpoint and the providers who graciously provide their BGP views.
The Sage project provides a platform for AI computing at the edge. It operates a nationwide infrastructure of distributed sensors - from urban landscapes to remote mountainsides - that collect, process using AI techniques, and aggregate data.
With over 100 Sage nodes deployed across 17 states, including fire-prone regions in the Western U.S., the platform supports rapid-response science and sustained observation of ecological systems, agriculture, urban environments, and weather-related hazards.
Sage uploads its data into NSF CC* funded storage systems connected to the OSDF. Data access requires a Sage account; more information can be found in the Sage documentation and tutorials.
The SPIn4D project (Spectropolarimetric Inversion in Four Dimensions with Deep Learning) develops neural networks to help prepare for the huge amount of solar data coming from the NSF-funded Inouye Solar Telescope, the most powerful solar telescope in the world.
SPIn4D’s data release one is 109TB of simulated small-scale dynamo actions accompanying the project’s first paper. A corresponding Jupyter notebook illustrates how to access and use the data via the OSDF using the Pelican clients. The dataset is also accessible via the National Data Platform.
For more information, see the accompanying spotlight article.
The KoaStore repository is a high performance and scalable parallel file system storage solution that can be used by University of Hawai’i faculty and staff. KoaStore was funded through the NSF Campus Cyberinfrastructure program through award #2232862.
KoaStore users provide datasets such as SPIN4D accessible via the OSDF.