Analysis Pipeline for RNA-Binding Protein Interacting Transcripts Based on POND-seq data

👤 Author: Gang Xie, PhD Candidate
🏫 Affiliation: PKU-THU-NIBS Joint Graduate Program, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
✉️ Email: gangx1e@stu.pku.edu.cn
📅 Date: May 28th, 2026
✅ Version: 1.0

Overview

This repository contains the custom analysis pipeline and scripts used in the manuscript: "Longitudinal monitoring of cytoplasmic RBP-RNA interactions and transcriptome in living cells by engineered protein nanocages".

It provides a comprehensive workflow for analyzing transcripts interacting with RNA-binding proteins (RBPs) utilizing our novel omics technology, POND-seq. The pipeline covers the main process from raw high-throughput sequencing data preprocessing and alignment to downstream bioinformatics and statistical analyses.

System Requirements

OS: Linux (Tested on Ubuntu 20.04 / CentOS 7)
Memory: Minimum 64GB RAM is recommended for full dataset processing.
Languages: R (>= 4.2.0), Python (>= 3.9)

Key Dependencies

Command-line Tools: trim_galore, STAR, subread (featureCounts), samtools, bowtie, bedtools, umi_tools
Python Packages: pandas, numpy, matplotlib, seaborn
R Packages: DESeq2, clusterProfiler, Mfuzz

Environment Setup

We highly recommend using Miniconda to manage your software environments. You can quickly configure the required environment for command-line tools and Python packages by running:

# Clone the repository
git clone [https://github.com/WangLabPKU/POND-seq.git](https://github.com/WangLabPKU/POND-seq.git)
cd POND-seq-Pipeline

# Create and activate the conda environment
conda env create -f POND_environment.yml
conda activate POND_env

⚠️ Important Note: The POND_environment.yml file configures the Python and command-line tool dependencies only. It does NOT include the installation of the R packages. Please ensure you install DESeq2, clusterProfiler, Mfuzz and so on, manually within your R environment (e.g., via Bioconductor).

Reference Genomes & Annotations

To ensure accurate alignment and reproducibility, this pipeline relies on the following specific reference genomes, annotations, and spike-in sequences:

Human Reference: Genome Assembly: hg38
- Annotation: GENCODE Human v40 (Download Link)
Mouse Reference: Genome Assembly: mm10
- Annotation: GENCODE Mouse vM25 (Download Link)
Spike-in Controls: * ERCC Spike-in sequences (Download ZIP)
Small RNA References:
- miRNA: Homo sapiens mature miRNA sequences obtained from the miRBase database.
- snoRNA: Homo sapiens noncoding RNA sequences (excluding long noncoding RNAs) obtained from the ENSEMBL database.

Data Availability

The raw sequencing data (FASTQ format) and processed count matrices generated in this study have been deposited in the NCBI Gene Expression Omnibus (GEO) and are publicly accessible under accession number: GSE293919.

Usage & Repository Structure

The scripts provided in this repository are intended as a reference for reproducing the analysis described in our study. The repository is organized as follows:

scripts/: Contains example shell scripts for upstream sequence processing, quality control, and read alignment.
plot/: Contains scripts for downstream analyses, including target enrichment evaluation, comparative transcriptomics, and functional profiling (e.g., GO/KEGG analysis).

Additional Information: The code provided here serves as a representative framework. Additional custom scripts or specific intermediate data processing codes are available upon reasonable request via email. If you have any questions, encounter bugs, or need assistance running the pipeline, please feel free to open an issue or reach out directly.

Citation

If you find this code, the POND-seq methodology, or our datasets useful for your research, please cite our paper:

#Hu LF, #Xie G, Wu YX, Li YX, Wan ZL, Mi L, Wang JZ, *Wang Y. "Longitudinal monitoring of cytoplasmic RBP-RNA interactions and transcriptome in living cells by engineered protein nanocages." Molecular Cell, 2026 (Accepted).

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0) - see the LICENSE page for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis Pipeline for RNA-Binding Protein Interacting Transcripts Based on POND-seq data

Overview

System Requirements

Key Dependencies

Environment Setup

Reference Genomes & Annotations

Data Availability

Usage & Repository Structure

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
plot		plot
script		script
LICENSE		LICENSE
POND_environment.yml		POND_environment.yml
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Analysis Pipeline for RNA-Binding Protein Interacting Transcripts Based on POND-seq data

Overview

System Requirements

Key Dependencies

Environment Setup

Reference Genomes & Annotations

Data Availability

Usage & Repository Structure

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages