The Pelican Project: Building a universal plug for scientific data-sharing
By: Brian Mattmiller
November 16, 2023
From its founding, the Morgridge Institute for Research has driven the idea that open sharing of research computing resources will be a great enabler of scientific discovery, powering everything from black hole astronomy to stem cell biology.
Increasingly, the principle of sharing is being applied not only to computing resources, but to the wealth of data those projects are producing. Resources such as high-throughput computing and the OSG Consortium have been incorporating more tools for scientists to share their raw data for further exploration.
This principle is now getting traction on a national policy scale. The White House Office of Science and Technology Policy (OSTP) established new requirements in 2022 that any research supported by federal funds must be made available to the public without embargoes or paywalls.
This mandate applies not only to published findings, but to the core data those findings are based upon. Within the scientific community, the approach is referred to as the “FAIR” principles, which means that scientific data should be “findable, accessible, interoperable and reusable.”
Obviously, applying this new standard to data is as much a technical challenge as it is a cultural one. A new project at the Morgridge, led by research computing investigators Brian Bockelman and Miron Livny, is working toward creating a software platform that can facilitate the sharing of diverse research datasets.
Nicknamed “Pelican,” the project is supported through a $7 million grant from the National Science Foundation (NSF). The award (OAC-2331489) will strive to make data produced by researchers, from single-investigator labs to international collaborations, more accessible for computing and remote clients for viewing. Pelican supports and extends the work Bockelman and Livny have been doing as part of the OSG Consortium for over a decade.
Bockelman says that public research data-sharing has been a growing movement the past decade, but the COVID-19 pandemic served as a potent catalyst. The pandemic made the benefits of sharing abundantly clear, including the development of a vaccine at an unprecedented pace — 6 months compared to a typical multi-year process.
“Our philosophy is that not only should your research paper be public and readable, but your data should be as well,” Bockelman says. “If scientists just say, ‘here are the results in a pretty graph,’ and don’t share the underlying dataset, we lose a lot of value when others can’t access the data, can’t interpret it, or use it for their own research.”
Bockelman says there are some other core benefits that may come from the open science push. By making data more readily accessible, it should improve the reproducibility of experiments and potentially reduce scientific fraud. It can also narrow the gap between the “haves” and “have-nots” in the research world by providing data access regardless of institutional resources.
Bockelman likens the Pelican project to developing a “universal adapter plug” that can accommodate all different types of data. Just like homes have standard outlets that work for all different household appliances, that same approach should help individual scientists plug into a sharable data platform regardless of the nature of their data.
One of the first proving grounds for Pelican will be its participation within the National Discovery Cloud for Climate, an effort to bring together compute, data, and network resources to democratize access and advance the climate-related science and engineering. Bockelman says the Pelican project will help optimize this data sharing effort with the climate science community and provide a proof of concept for other research areas.
But ultimately, the best benefit may be enhancing public trust in high-impact science.
“Even for people who may not go digging into the data, they want to know that science has been done responsibly, especially for fields where it directly affects their lives,” Bockelman says. “Climate is a great example of where the science can really drive regulations that affect people. Getting data out as open and following the FAIR principles … is part of that relationship between the scientific community and the society at large.”
Bockelman says making data accessible is more than just downloading from a webserver. Pelican works to establish approaches that help people utilize the data effectively from anywhere in the nation’s computing infrastructure — essential so anyone from a tribal college to the largest university can understand and interpret the climate data.
The original memo was written in 2022 by then OSTP Director Alondra Nelson, and today the “Nelson memo” is viewed as a watershed document in federal research policy.
“When research is widely available to other researchers and the public, it can save lives, provide policy makers with the tools to make critical decisions, and drive more equitable outcomes across every sector of society,” Nelson wrote. “The American people fund tens of billions of dollars of cutting-edge research annually. There should be no delay or barrier between the American public and the returns on their investments in research.”