Liu Lab at Huazhong University of Science and Technology








RSSearch: RNA remote homology detection based on deep learning methods

RSSearch (RNA Similarity Search) is a deep learning method for efficient and accurate RNA remote homology detection.


Installation

1. RNA-FM preprocessing of RNA sequences

Download and process in RNA-FM environment ( reference https://github.com/ml4bio/RNA-FM ).
	git clone https://github.com/ml4bio/RNA-FM.git
	cd RNA-FM
	conda env create -f environment.yml
	conda activate RNA-FM
Processing in RNA-FM environment.
	cd ./redevelop
Preprocessing script.
	python launch/predict.py \
		--config="pretrained/extract_embedding.yml"\
		--data_path="/RSSearch/example/example_sequences.fasta" \
		--save_dir="/RSSearch/example/embeddings"\
		--save_frequency 1\
		--save_embeddings\
		--device="cpu"
We can obtain RNA embeddings processed with RNA-FM.

2. RSSearch calculates RNA similarity

Download RSSearch(7.04GB) and process in RNA environment.
	cd RSSearch
	conda env create -f environment.yml
	conda activate RNA
Processing in RNA environment.
Example script:
	python /RSSearch/code/predict.py \
		--model_path="/RSSearch/model/model_3_1.pth" \
		--test_csv="/RSSearch/example/example_testing_data.csv" \
		--npy_path="/RSSearch/example/embeddings/representations/" \
		--output_csv="/RSSearch/example/results/example_similarity.csv"
We can obtain RNA similarity through the RSSearch model.

RSSearch database construction and query process

The RSSearch tool supports RNA structure similarity search, which is divided into two main stages: database construction and similarity query. Taking RNAcentral database as an example, the complete operation process is as follows.

1 .Data preprocessing

Obtain FASTA format files of RNA sequences (such as DB_clu_rep.fasta), preprocess all RNA sequences using RNA-FM tools and save the generated RNA embeddings to the specified directory /RSSearch/RNAcentral_search/embeddings/.

2. Database Construction

When you need to build a custom database, execute the following command.
	cd RSSearch/code
	python search_in_RNAcentral_database.py --mode build \
	    --model_path="/RSSearch/model/model_3_1.pth" \
		--input_dir="/RSSearch/embeddings/representations/" \
		--index_output="/RSSearch/RNAcentral_search/faiss_index.fa" \
		--hdf5_output="/RSSearch/RNAcentral_search/rna_vectors.h5" \
		--query_file="/RSSearch/RNAcentral_search/queries" \
		--results_dir="/RSSearch/RNAcentral_search/results
Description of key parameters:
--mode build: Set to build mode.
--model_path: The trained RSSearch model path.
--input_dir: RNA embeddings input directory.
--index_output: FAISS index output file.
--hdf5_output: RNA vector data file.

3. Similarity query

After the database construction is completed, conduct similarity search.
	python search_in_RNAcentral_database.py --mode query \
	[Other parameters remain unchanged]
Mode switching instructions:
Build the database: --mode build
Query nearest neighbor: --mode query
The final search results will be saved to /RSSearch/RNAcentral_search/results/query_results.txt.

Precautions
1) If you only want to search the RNAcentral database, you don't need to rebuild the database.
2) The complete build process is only required when creating a custom database.
3) Ensure that the same parameter configuration is used during the build and query phases (except for --mode).

CONTACT US:
If you have questions. Please contact liushiyong@gmail.com.

Citation

Danyang Li, Xudong Liu, Chengeng Liu, Yuchen Zhu and Shiyong Liu. RNA remote homology detection based on deep learning methods.(2025)

License

This source code is licensed under the MIT license found in the LICENSE file.