The Bioframe Python library is a powerful tool designed for operations and analysis of genomic intervals. This project aimed to develop comprehensive tutorials using Jupyter notebooks to demonstrate the usage of Bioframe, focusing on real-world user cases using publicly available genomic data. The tutorials cover a diverse range of genomic analysis scenarios, including SNV analysis, CNV analysis, transcription start site enrichment, and an introduction to Bioframe for GenomicRanges users.
Open2C is an organization dedicated to developing open-source tools for analyzing 3D genomic data. Their core tools—Bioframe, pairtools, and cooltools—are specifically designed for analyzing data generated from Hi-C and related technologies. The organization’s goals focus on ease of use, flexibility, and versatility to support the active development of novel analytical methods, while ensuring scalability to handle the most up-to-date and extensive datasets. One of the primary tools is Bioframe, a Python library designed for the operations and analysis of genomic intervals. Bioframe allows users to load genomic datasets (e.g., genome annotations or experimental data) and perform essential tasks. It is particularly useful for researchers handling large-scale genomic data, offering flexibility, scalability, and community-driven development.
The project aimed to develop comprehensive tutorials using Jupyter notebooks to demonstrate the usage of the Bioframe Python library, focusing on real-world user cases using publicly available genomic data from in vitro experiments. Each tutorial covers a diverse range of genomic analysis scenarios, including but not limited to SNV analysis, CNV analysis, Transcription Start Site Enrichment of scATACseq peaks, and a tutorial for GenomicRanges users. The tutorials developed cover diverse topics using real-world genomic datasets and offer insights into how Bioframe can streamline genomic data manipulation. Below are the key contributions from the project.
The following tutorials were developed as part of this project:
1. SNV analysis: This tutorial covers the analysis
of single nucleotide variants (SNVs) using Bioframe. This tutorial
demonstrates how to identify and annotate Single Nucleotide Variations
(SNVs) within specific protein domains. The tutorial walks users through
extracting domain information for genes associated with cancer, and
mapping these domains to genomic coordinates.
* Tutorial
Link
2. CNV analysis: This tutorial focuses on the analysis
of copy number variations (CNVs) using Bioframe. It includes annotating
CNV regions and mapping CNV regions to corresponding genes.
* Tutorial
Link
3. Transcription Start Site Enrichment of scATACseq
peaks: This tutorial demonstrates how to use Bioframe to
analyze transcription start site enrichment of single-cell ATAC-seq
peaks. This tutorial showcases how Bioframe can be used to handle
single-cell chromatin accessibility data.
* Tutorial
Link
4. A Python Alternative for GenomicRanges Users: This
tutorial provides a comparison between Bioframe and GenomicRanges,
highlighting the benefits of using Bioframe for genomic interval
analysis. The tutorial is aimed at users familiar with the
GenomicRanges R package, this tutorial provides an alternative
workflow in Python, leveraging Bioframe to perform similar operations on
genomic intervals.
* Tutorial
Link
The project achieved its objectives by delivering four comprehensive Jupyter notebooks that cover a wide range of genomic analysis tasks using Bioframe. The tutorials provide clear guidance, practical examples, and visualizations to help users effectively utilize Bioframe for their research.
- Dataset Acquisition: Publicly available datasets
were acquired from repositories to demonstrate real-world
applications.
- Tutorial Development: Step-by-step guidance was
provided in each tutorial, covering different use cases such as SNV and
CNV analysis, transcription start site enrichment, and genomic interval
manipulation in Python.
- Community Engagement: Throughout the project,
feedback from the Bioframe mentors, ensuring that the tutorials address
real user needs and challenges.
- Collaboration: Close collaboration with mentors and
the Open2C community was maintained to resolve any issues and ensure the
tutorials were aligned with the project’s goals.
The project developed four comprehensive tutorials for the Bioframe Python library, covering a range of genomic analysis scenarios. These tutorials provide step-by-step guidance, from data loading to advanced interval-based operations, ensuring users of all skill levels can easily follow along and understand the usage of Bioframe.
In the future, I plan to finalize and to expand the tutorials to cover additional genomic analysis scenarios and to incorporate feedback from the community. I also plan to continue contributing to the Bioframe library and to help with its development and maintenance. Future plans include continuing to improve the existing tutorials based on user feedback and expanding them to cover additional use cases and advanced features of the Bioframe library. Further integration with other bioinformatics tools, such as visualizing results, will also be explored to enhance the tutorials. I plan to publish jupyter notebooks on Myst and github pages.
I would like to thank my mentors, Ilya Flayamer and Geoff Fudenberg, for their guidance, support, and valuable feedback throughout the project. I am also grateful to the Open Chromosome Collective for providing me with the opportunity to contribute to this project and learn from their expertise.