Liu Lab at Huazhong University of Science and Technology








RSSearch: an RNA remote homology detection approach based on deep learning and RNA language model

RSSearch (RNA Similarity Search) is a deep learning method for efficient and accurate RNA remote homology detection.


Installation

1. RNA-FM preprocessing of RNA sequences

Download and process in RNA-FM environment ( reference https://github.com/ml4bio/RNA-FM ).
	git clone https://github.com/ml4bio/RNA-FM.git
	cd RNA-FM
	conda env create -f environment.yml
	conda activate RNA-FM
Processing in RNA-FM environment.
	cd ./redevelop
Preprocessing script.
	python launch/predict.py \
		--config="pretrained/extract_embedding.yml"\
		--data_path="/RSSearch/example/example_sequences.fasta" \
		--save_dir="/RSSearch/example/embeddings"\
		--save_frequency 1\
		--save_embeddings\
		--device="cpu"
We can obtain RNA embeddings processed with RNA-FM.

2. RSSearch calculates RNA similarity

Download RSSearch(7.04GB) and process in RNA environment.
	cd RSSearch
	conda env create -f environment.yml
	conda activate RNA
Processing in RNA environment.
Example script:
	python /RSSearch/code/predict.py \
		--model_path="/RSSearch/model/model_3_1.pth" \
		--test_csv="/RSSearch/example/example_testing_data.csv" \
		--npy_path="/RSSearch/example/embeddings/representations/" \
		--output_csv="/RSSearch/example/results/example_similarity.csv"
We can obtain RNA similarity through the RSSearch model.

RSSearch database construction and query process

The RSSearch tool supports RNA structure similarity search, which is divided into two main stages: database construction and similarity query. Taking RNAcentral database as an example, the complete operation process is as follows.

1 .Data preprocessing

Obtain FASTA format files of RNA sequences (such as DB_clu_rep.fasta), preprocess all RNA sequences using RNA-FM tools and save the generated RNA embeddings to the specified directory /RSSearch/RNAcentral_search/embeddings/.

2. Database Construction

When you need to build a custom database, execute the following command.
	cd RSSearch/code
	python search_in_RNAcentral_database.py --mode build \
	    --model_path="/RSSearch/model/model_3_1.pth" \
		--input_dir="/RSSearch/embeddings/representations/" \
		--index_output="/RSSearch/RNAcentral_search/faiss_index.fa" \
		--hdf5_output="/RSSearch/RNAcentral_search/rna_vectors.h5" \
		--query_file="/RSSearch/RNAcentral_search/queries" \
		--results_dir="/RSSearch/RNAcentral_search/results
Description of key parameters:
--mode build: Set to build mode.
--model_path: The trained RSSearch model path.
--input_dir: RNA embeddings input directory.
--index_output: FAISS index output file.
--hdf5_output: RNA vector data file.

3. Similarity query

After the database construction is completed, conduct similarity search.
	python search_in_RNAcentral_database.py --mode query \
	[Other parameters remain unchanged]
Mode switching instructions:
Build the database: --mode build
Query nearest neighbor: --mode query
The final search results will be saved to /RSSearch/RNAcentral_search/results/query_results.txt.

Precautions
1) If you only want to search the RNAcentral database, you don't need to rebuild the database.
2) The complete build process is only required when creating a custom database.
3) Ensure that the same parameter configuration is used during the build and query phases (except for --mode).

CONTACT US:
If you have questions. Please contact liushiyong@gmail.com.

Citation

Danyang Li, Xudong Liu, Chengeng Liu, Yuchen Zhu and Shiyong Liu. RSSearch: an RNA remote homology detection approach based on deep learning and RNA language model.(2025)

License

This source code is licensed under the MIT license found in the LICENSE file.