NODE: Networked Orchestration of Distal Elements

NODE is an R package for identifying transcription factor (TF) target genes by integrating TF binding sites, 3D chromatin interactions (Hi-C), and gene annotations. It builds a genomic graph to find and classify regulatory paths from distal elements to gene promoters.

🚀 Getting Started

Installation

# install.packages("remotes")
remotes::install_github("Novartis/NODE")

📖 Core Concepts

NODE classifies TF-to-gene links by the shortest path in the genomic graph.

Path Type	Description	Graph Structure
👑 `direct`	TF peak overlaps a gene promoter.	`TF -> Transcript -> Gene`
🔗 `hic`	TF peak connects to a promoter via one Hi-C loop.	`TF -> Region A -> Region B -> Transcript -> Gene`
🧬 `hic_hopped`	TF peak connects to a promoter via two linked Hi-C loops.	`TF -> Region A -> Region B -> Region C -> Transcript -> Gene`

✨ Features

Integrated Network Analysis: Builds a single igraph object from multiple genomic data types.
Path-Based Classification: Categorizes links as direct, hic, or hic_hopped.
Robust Input Validation: Pre-flight checks ensure all inputs are correct before execution.
Optional Cis-Regulatory Filtering: Focus analysis on regions overlapping with known regulatory elements.
Reproducible by Design: Creates a self-contained project with comprehensive metadata.
HPC-Ready: Scales to large analyses with batchtools and SGE support.

🛠️ Configuration & Inputs

Input Data Requirements

TF Peaks: A named GRanges object of TF ChIP-seq peaks.
Gene List: A character vector of target gene IDs (e.g., ENSEMBL).
Hi-C Loops: A tab-delimited Hi-C loop file in BEDPE-like format.
Chromosome Sizes: A tab-delimited file with chromosome names and lengths.
Annotations: A TxDb and an OrgDb annotation object.
Configuration: Parameters for project, cluster, and analysis settings.

Hi-C File Format

NODE expects a tab-delimited file (plain or gzipped) with at least six columns representing the interacting anchors. It robustly handles files with or without a header (including commented headers).

Required Columns:

# Columns 1-6 are standardized internally
chrom1  start1  end1    chrom2  start2  end2
chr1    10000   20000   chr1    50000   60000

Additional columns from tools like Juicer HiCCUPS are allowed but ignored.

HPC and `batchtools` Configuration

For large analyses on an HPC cluster, point the batchtools directories to a shared scratch space for better performance.

batchtools_registry_dir: Stores temporary job files.
batchtools_template_dir: Stores the SGE template file.

node_results <- run_NODE(
    # ... other parameters ...
    batchtools_registry_dir = "/path/to/cluster/scratch/registry",
    batchtools_template_dir = "/path/to/cluster/scratch/templates",
    # ... other parameters ...
)

📂 Project Structure & Outputs

NODE creates a self-contained project directory for each run, ensuring results are organized and reproducible.

my_tf_analysis/
├── 📁 data_prepped/  (Intermediate data files)
├── 📝 metadata/      (Run parameters and session info)
└── 📊 results/       (Final result tables)
    ├── NODE_results.rds
    └── NODE_results.tsv

The output table includes TF peaks, gene information, path classifications, and the genomic elements forming the link.

Minimal Example

library(NODE)
library(GenomicRanges)
library(TxDb.Hsapiens.GENCODE.v46.hg38)
library(org.Hsapiens.eg.db)

# 1. Define inputs
my_tf_peaks <- GRanges(
    seqnames = c("chr1", "chr1"),
    ranges = IRanges::IRanges(start = c(100000, 250000), end = c(100500, 250500))
)
names(my_tf_peaks) <- c("peak_1", "peak_2")

my_gene_list <- c("ENSG00000123456", "ENSG00000654321")
hic_file <- "/path/to/your/loops.bedpe.gz"
chrom_file <- "/path/to/your/hg38.chrom.sizes"

# 2. Run NODE
node_results <- run_NODE(
    project_dir = "./my_tf_analysis",
    tfPeaks = my_tf_peaks,
    geneList = my_gene_list,
    hic_bedpe_file = hic_file,
    chrom_sizes_file = chrom_file,
    txdb = TxDb.Hsapiens.GENCODE.v46.hg38::TxDb.Hsapiens.GENCODE.v46.hg38,
    orgdb = org.Hsapiens.eg.db::org.Hsapiens.eg.db,
    hic_bins = 10000,
    orgdb_keytype = "ENSEMBLTRANS",
    orgdb_columns = c("ENSEMBL", "SYMBOL"),
    ensembl_column = "ENSEMBL",
    ucsc_genome_build = "hg38",
    batchtools_cores = 2,
    batchtools_mem_gb = 8,
    batchtools_chunks = 100
)

# 3. Explore results
head(node_results)

🆘 Troubleshooting

Common Issues & Solutions

tfPeaks must be named: Ensure your GRanges object has unique names.
```
names(my_tf_peaks) <- paste0("peak_", seq_along(my_tf_peaks))
```
Chromosome names do not match: Use consistent chromosome naming (e.g., chr1, chr2) across all input files and annotation objects.
No paths found: This can happen if:
- Genomic regions in your input files do not physically overlap.
- hic_bins does not match the resolution of your Hi-C data.
- Genome builds are mismatched between files.
- Optional cis-regulatory filters are too restrictive.

📜 License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
R		R
man		man
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NODE: Networked Orchestration of Distal Elements

🚀 Getting Started

Installation

📖 Core Concepts

✨ Features

🛠️ Configuration & Inputs

📂 Project Structure & Outputs

Minimal Example

🆘 Troubleshooting

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NODE: Networked Orchestration of Distal Elements

🚀 Getting Started

Installation

📖 Core Concepts

✨ Features

🛠️ Configuration & Inputs

📂 Project Structure & Outputs

Minimal Example

🆘 Troubleshooting

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages