Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Our Strategic Road Map defines strategies, standards, and policy frameworks to support responsible global use of genomic and related health data.
Discover how a meeting of 50 leaders in genomics and medicine led to an alliance uniting more than 5,000 individuals and organisations to benefit human health.
GA4GH Inc. is a not-for-profit organisation that supports the global GA4GH community.
To guide our collaborative, globe-spanning alliance, GA4GH relies on a Standards Steering Committee and an Executive Committee.
The Funders Forum brings together organisations that offer both financial support and strategic guidance.
The EDI Advisory Group responds to issues raised in the GA4GH community, finding equitable, inclusive ways to build products that benefit diverse groups.
Distributed across four Host Institutions, our staff team supports the mission and operations of GA4GH.
Curious who we are? Meet the people and organisations across six continents who make up GA4GH.
More than 500 organisations connected to genomics — in healthcare, research, patient advocacy, industry, and beyond — have signed onto the mission and vision of GA4GH as Organisational Members.
These core Organisational Members are genomic data initiatives that have committed resources to guide GA4GH work and pilot our products.
This subset of Organisational Members whose networks or infrastructure align with GA4GH priorities has made a long-term commitment to engaging with our community.
Local and national organisations assign experts to spend at least 30% of their time building GA4GH products.
Anyone working in genomics and related fields is invited to participate in our inclusive community by creating and using new products.
Wondering what GA4GH does? Learn how we find and overcome challenges to expanding responsible genomic data use for the benefit of human health.
Study Groups define needs. Participants survey the landscape of the genomics and health community and determine whether GA4GH can help.
Work Streams create products. Community members join together to develop technical standards, policy frameworks, and policy tools that overcome hurdles to international genomic data use.
GIF solves problems. Organisations in the forum pilot GA4GH products in real-world situations. Along the way, they troubleshoot products, suggest updates, and flag additional needs.
NIF finds challenges and opportunities in genomics at a global scale. National programmes meet to share best practices, avoid incompatabilities, and help translate genomics into benefits for human health.
Communities of Interest find challenges and opportunities in areas such as rare disease, cancer, and infectious disease. Participants pinpoint real-world problems that would benefit from broad data use.
See all our products — always free and open-source. Do you work on cloud genomics, data discovery, user access, data security or regulatory policy and ethics? Need to represent genomic, phenotypic, or clinical data? We’ve got a solution for you.
All GA4GH standards, frameworks, and tools follow the Product Development and Approval Process before being officially adopted.
Learn how other organisations have implemented GA4GH products to solve real-world problems.
Help us transform the future of genomic data use! See how GA4GH can benefit you — whether you’re using our products, writing our standards, subscribing to a newsletter, or more.
Help create new global standards and frameworks for responsible genomic data use.
Align your organisation with the GA4GH mission and vision.
Solve your real-world data problems with support from this valuable network of global institutions.
Work with like-minded groups committed to better data use in areas like rare disease, cancer, and infectious disease.
Share your thoughts on all GA4GH products currently open for public comment.
Solve real problems by aligning your organisation with the world’s genomics standards. We offer software dvelopers both customisable and out-of-the-box solutions to help you get started.
Learn more about upcoming GA4GH events. See reports and recordings from our past events.
Speak directly to the global genomics and health community while supporting GA4GH strategy.
Be the first to hear about the latest GA4GH products, upcoming meetings, new initiatives, and more.
Questions? We would love to hear from you.
Read news, stories, and insights from the forefront of genomic and clinical data use.
Attend an upcoming GA4GH event, or view meeting reports from past events.
See new projects, updates, and calls for support from the Work Streams.
Read academic papers coauthored by GA4GH contributors.
Listen to our podcast OmicsXchange, featuring discussions from leaders in the world of genomics, health, and data sharing.
Check out our videos, then subscribe to our YouTube channel for more content.
View the latest GA4GH updates, Genomics and Health News, Implementation Notes, GDPR Briefs, and more.
Discover all things GA4GH: explore our news, events, videos, podcasts, announcements, publications, and newsletters.
30 Mar 2022
Approved in 2021 by the GA4GH Standards Steering Committee (SSC), GA4GH BED v1.0 establishes a concrete set of guidelines for utilising the format.
Genomic features — such as genes, regulatory elements, and repeated sequences, as well as RNA — can have consequences for human health and disease. To better understand disease-causing genes, we must clearly document these features. Investigators use a process called genome annotation to identify what genomic features are present in a DNA sequence, where they are located, and what they do.
Over the past two decades, the Browser Extensible Data (BED) file format has become a popular method of capturing the location of genomic features and associated annotations.
Established by Jim Kent at the University of California Santa Cruz (UCSC) Genomics Institute, the BED file format was first used during the Human Genome Project. Since then, numerous genomics projects, analysis software, and visualisation tools — including UCSC’s Genome Browser application — have adopted the format as well.
“The BED format was developed in service of making genomic annotations visualisable on a graphical viewer, or browser,” said Robert Kuhn, associate director of the UCSC Genome Browser. “Over time, due to the format’s simplicity, flexibility, and conciseness, it became the de facto standard.”
Despite its widespread use, the BED file format has lacked a formal specification. The Large Scale Genomics Work Stream at the Global Alliance for Genomics & Health (GA4GH) set about addressing this need.
With guidance from UCSC, the Work Stream built upon the UCSC BED description to produce GA4GH BED v1.0.
“Thousands of users worldwide work with the BED format on a daily basis,” said Aaron Quinlan, co-maintainer of the new specification and professor of human genetics and biomedical informatics at the University of Utah.
“Formalising the intended use and structure of this fundamental format — from how to name chromosomes to what whitespace delimiter is preferred — will facilitate more reproducible research, thus saving time and improving accuracy,” Quinlan said.
Approved in 2021 by the GA4GH Standards Steering Committee (SSC), GA4GH BED v1.0 fills in the gaps in the existing documentation and establishes a concrete set of guidelines for utilising the format.
“The BED format is an interesting project since we’re trying to formalise a standard that has a variety of ‘flavours’ and is widely used today,” said Oliver Hofmann, co-lead of the Large Scale Genomics Work Stream and professor at the University of Melbourne. “By bringing the format to GA4GH, we’re able to leverage community input to tighten up the ambiguous aspects and ensure interoperability.”
In essence, BED is a simple, plain text file format consisting of a series of fields. The first three mandatory fields capture the physical start and end positions of a genomic feature on a linear chromosome. Nine more specified optional fields provide additional information, such as gene name and aesthetic features. Custom fields allow addition of many other data types.
While seemingly straightforward, the lack of conventions has led to a plethora of ways to fill in and structure the fields. The new specification aims to define a numerical range for each specified BED field and provide semantics for whitespace, sorting, default values, and other missing details.
A formal specification is essential for developing software that works with the file format. Otherwise, various tools may read elements differently or simply reject the file altogether.
“Interoperability of tools would enable output from one tool to be used as input into other tools — if the formats are well-defined and predictable,” said Kuhn.
“By standardising the BED format, we can reduce any misinterpretation when using the format, minimise issues when interoperating between software tools, and ultimately avoid errors and inconsistencies in scientific results,” according to Michael Hoffman, co-maintainer of the specification, senior scientist at University Health Network and associate professor of medical biophysics and computer science at the University of Toronto.
To test and verify the performance of software packages that analyse BED files, contributors have developed quality assurance tools, such as the UCSC Genome Browser’s Kent tools, which can validate the output of tools that write BED files. Additionally, Hoffman’s team built a tool to screen for correct behaviour across software tools that read BED files. These efforts are reported in the preprint “Assessing and assuring interoperability of a genomics file format,” available on bioRxiv.
In the future, the Large Scale Genomics Work Stream plans to continue iterating on the GA4GH BED specification.
“The 1.0 specification was a formalisation effort,” said Hoffman. “For future specifications, the next steps are gathering stakeholders together to determine the best way to encode new metadata and genome annotations within the format itself. Important metadata include the version of the genome assembly used, and the definition of custom data types that might differ from BED file to BED file.”