rMATS-ISO
rMATS-ISO
What is rMATS-ISO ?
rMATS-Iso is a generalization of the rMATS statistical framework, and the first event-based tool which can detect differential AS in splicing modules with complex splicing patterns using replicate RNA-seq data. The rMATS-Iso statistical model utilizes a hierarchical framework to account for both the estimation uncertainty in PSI values in individual replicates as well as the variation among replicates.
Table of Contents
- What is rMATS-ISO ?
- Installation
- Operating system
- Cloning and building rMATS-ISO pipeline
- Components
- Getting started with toy example in
test_data
- Output folder
- Contact
Installation
Operating system
rMATS-ISO currently can only be built and run on Linux/Unix systems.
Cloning and building rMATS-ISO pipeline
git clone https://github.com/Xinglab/rMATS-ISO.git --recursive
cd rMATS-ISO
make
export PATH=$PATH:$PWD/lr2rmats/bin # To permanently modify your PATH, you need to add it to your ~/.profile or ~/.bashrc file.
Python 3 is required for the installnation of lr2rmats module that uses snakemaker. After the building is done, the path of rMATS-ISO/lr2rmats/bin
needs to be added to the environment variable PATH.
Components
rMATS-ISO consists the following components: IsoModule, rMATS-EM. More functions will be available in future versions.
IsoModule: Alternative splicing module detection.
python rMATS-ISO.py module --gtf --bam -o
rMATS-EM: Statistical test for differential splicing.
python rMATS-ISO.py stat --bam -o
They will all be automatically downloaded and built via make
command.
Getting started with test example in test_data
python rMATS-ISO.py module --gtf ./test_data/gtf/PC3E_GS689.gtf --bam ./test_data/PC3E_GS689_short_read_bam_input.list -o ./output2/
python rMATS-ISO.py stat --bam ./test_data/PC3E_GS689_short_read_bam_input.list -o ./output2/
All the output files and intermediate files will be generated in ./output
folder.
Output folder
The test output folder will contain these following sub-folders:
ISO_module/
EM_out/
The Iso_module folder contains the detected splicing modules in files ended with IsoExon. The first line of each module contains the module ID, number of exons in the module, the number of isoforms, strand, chromosome, and gene name. The second line contains the exon coordinates. The rest lines contains the isoform definations in each row, including the how the exons are included in the module and how exons are connected in the isoform. Each row is an isoform.
ASM#0 4 3 + chr2 MYO1B ENSG00000128641 A
192265108,192265194 192265475,192265561 192267358,192267444 192272841,192272915
4 0 1 2 3
3 0 2 3
2 0 3
The sample-specific read counts for each module are in files ended with IsoMatrix. The first line of each module contains the module ID, the number of patterns that a read can be overlapped with multiple isoforms, the number of isoforms, the left and right side length added to the module for reads to be fully covered.
Note that for a total of N isoforms, the maximum number of overlap patterns is 2^N-1, for reads to be overlapped with different combinations of isoforms.
ASM#0 5 3 219 135
101 776 0 0 1
101 157 0 1 0
101 133 1 0 0
101 101 1 1 0
101 1802 1 1 1