Liu Lab at Huazhong University of Science and Technology


CPPred: Coding Potential Prediction based on the global description of RNA sequence

CPPred is a coding potential prediction tool, which is based on SVM to distinguish ncRNAs from coding RNAs using sequence features, such as ORF length, ORF coverage, ORF integrity, Fickett score, Hexamer score, PI, Gravy, Instability index and CTD features.

CPPred is available:
  • CPPred(CPPred.tar.gz,3.10M)
  • Training sets:
  • Human-Training: coding RNAs(33360,117M)+ noncoding RNAs(24163,24M)
  • Integrate-Training: coding RNAs(52530,112M)+ noncoding RNAs(27600,18M)
  • Testing sets:
  • Human-Testing: 8557 coding RNAs+ 8241 noncoding RNAs
  • Human-sORF-Testing: 641 coding RNAs+ 641 noncoding RNAs
  • Mouse-Testing: 31102 coding RNAs+ 19930 noncoding RNAs
  • Mouse-sORF-Testing: 846 coding RNAs+ 1000 noncoding RNAs
  • Zebrafish-Testing: 15594 coding RNAs+ 10662 noncoding RNAs
  • Zebrafish-sORF-Testing: 387 coding RNAs+ 500 noncoding RNAs
  • Fruit-fly-Testing: 17400 coding RNAs+ 4098 noncoding RNAs
  • Fruit-fly-sORF-Testing: 381 coding RNAs+ 381 noncoding RNAs
  • S.cerevisiae-Testing: 6713 coding RNAs+ 413 noncoding RNAs
  • S.cerevisiae-sORF-Testing: 505 coding RNAs+ 413 noncoding RNAs

  • Integrate-Testing: 13903 coding RNAs+ 13903 noncoding RNAs
  • Integrate-sORF-Testing: 11634 coding RNAs+ 11634 noncoding RNAs

  • Uncompress and usage of CPPred:

    1. Install the biopython.

      pip install biopython==1.75 --user

    2. Download the CPPred package.

    (1) Type "tar -zxvf CPPred.tar.gz" to uncompress the package

    (2) Type "cd CPPred/bin" to change the current directory

    (3) Run "python CPPred.py -i input_RNA.fa -hex Hexamer.tsv -r range -m model -spe species -o result" to predict. Here, "input_RNA.fa" is RNAs file in FASTA format. "Hexamer.tsv" is a pre-built hexamer frequency table. "range" is pre-built training range file. "model" is pre-built training model. "species" is the model of the species to choose (Human,Integrated). The "result" file in it is the final result for each prediction.

    Example :

    If your data is a single species, use the example below:

    python CPPred.py -i ../data/Human_coding_RNA_test.fa -hex ../Hexamer/Human_Hexamer.tsv -r ../Human_Model/Human.range -mol ../Human_Model/Human.model -spe Human -o Human_coding.result

    python CPPred.py -i ../data/Human_ncRNA_test.fa -hex ../Hexamer/Human_Hexamer.tsv -r ../Human_Model/Human.range -mol ../Human_Model/Human.model -spe Human -o Human_ncRNA.result

    python CPPred.py -i ../data/Mouse(Zebrafish,S.cerevisiae or Fruit_fly)_coding_RNA_test.fa -hex ../Hexamer/Human_Hexamer.tsv -r ../Human_Model/Human.range -mol ../Human_Model/Human.model -spe Human -o Mouse(Zebrasfish,S.cerevisiae or Fruit_fly)_coding.result

    python CPPred.py -i ../data/Mouse(Zebrafish,S.cerevisiae or Fruit_fly)_ncRNA_test.fa -hex ../Hexamer/Human_Hexamer.tsv -r ../Human_Model/Human.range -mol ../Human_Model/Human.model -spe Human -o Mouse(Zebrasfish,S.cerevisiae or Fruit_fly)_ncRNA.result

    If your data contains multiple species, use the example below:

    python CPPred.py -i ../data/Integrated_coding_RNA_test.fa -hex ../Hexamer/Integrated_Hexamer.tsv -r ../Integrated_Model/Integrated.range -mol ../Integrated_Model/Integrated.model -spe Integrated -o Integrated_coding.result

    python CPPred.py -i ../data/Integrated_ncRNA_test.fa -hex ../Hexamer/Integrated_Hexamer.tsv -r ../Integrated_Model/Integrated.range -mol ../Integrated_Model/Integrated.model -spe Integrated -o Integrated_ncRNA.result

    Program and modules connected with CPPred:

  • LIBSVM: A Library of Support Vector Machines (Version 3.22, December 2016). It is downloaded from http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/libsvm.cgi?+http://www.csie.ntu.edu.tw/~cjlin/libsvm+tar.gz. For more information, please see https://www.csie.ntu.edu.tw/~cjlin/libsvm/
  • FrameKmer.py and fickett.py: The python scripts of CPAT (Version 1.2.2) are downloaded from https://sourceforge.net/projects/rna-cpat/files/v1.2.2/. For more information, please see http://rna-cpat.sourceforge.net/
  • Contact us:

    Any questions about CPPred, please email to liushiyong@gmail.com.

    Reference:

    If you use CPPred, please cite:

    1. Xiaoxue Tong and Shiyong Liu. CPPred: Coding Potential Prediction based on the global description of RNA sequence. Nucleic Acids Research, 1 February 2019 [PubMed] [PDF]



    Last modified: Tues. Feb. 12 17:00:00 CST 2019