RSSearch (RNA Similarity Search) is a deep learning method for efficient and accurate RNA remote homology detection.
git clone https://github.com/ml4bio/RNA-FM.git
cd RNA-FM
conda env create -f environment.yml
conda activate RNA-FM
Processing in RNA-FM environment. cd ./redevelop
Preprocessing script. python launch/predict.py \
--config="pretrained/extract_embedding.yml"\
--data_path="/RSSearch/example/example_sequences.fasta" \
--save_dir="/RSSearch/example/embeddings"\
--save_frequency 1\
--save_embeddings\
--device="cpu"
We can obtain RNA embeddings processed with RNA-FM. cd RSSearch
conda env create -f environment.yml
conda activate RNA
Processing in RNA environment. python /RSSearch/code/predict.py \
--model_path="/RSSearch/model/model_3_1.pth" \
--test_csv="/RSSearch/example/example_testing_data.csv" \
--npy_path="/RSSearch/example/embeddings/representations/" \
--output_csv="/RSSearch/example/results/example_similarity.csv"
We can obtain RNA similarity through the RSSearch model.DB_clu_rep.fasta
), preprocess all RNA sequences using RNA-FM tools and save the generated RNA embeddings to the specified directory /RSSearch/RNAcentral_search/embeddings/
.
cd RSSearch/code
python search_in_RNAcentral_database.py --mode build \
--model_path="/RSSearch/model/model_3_1.pth" \
--input_dir="/RSSearch/embeddings/representations/" \
--index_output="/RSSearch/RNAcentral_search/faiss_index.fa" \
--hdf5_output="/RSSearch/RNAcentral_search/rna_vectors.h5" \
--query_file="/RSSearch/RNAcentral_search/queries" \
--results_dir="/RSSearch/RNAcentral_search/results
Description of key parameters:--mode build
: Set to build mode.--model_path
: The trained RSSearch model path.--input_dir
: RNA embeddings input directory.--index_output
: FAISS index output file. --hdf5_output
: RNA vector data file. python search_in_RNAcentral_database.py --mode query \
[Other parameters remain unchanged]
Mode switching instructions:
--mode build
--mode query
/RSSearch/RNAcentral_search/results/query_results.txt.
--mode
).