Boosting Prediction of Protein-Protein Interactions using Word Embedding Techniques

Abstract

Understanding protein-protein interactions (PPIs) helps to identify protein functions and develop other important applications such as drug preparation, protein-disease relationship identification. Machine learning methods have been developed for the PPI prediction task in order to reduce the cost and time of previous experimental methods. In this paper, we study a method for determining PPIs using deep learning and protein sequence representation learning. In our method, an word embedding technique is utilized for protein sequence representation learning. This technique captures the semantic relationship between amino acids in protein sequences. The semantic relationship is then used as the input information, which is fed into a neural network to help recognize the interaction signature of the input protein pair. Different from previous studies, we integrate the protein sequence embedding mechanism into a neural network model. Thereby, the protein sequence embedding is better controlled for PPI prediction by our neural network model. We evaluate our method on benchmark datasets including Yeast, Human, and eight different independent sets. In addition, we also conduct an extensive comparison with the other existing methods. Our results show that the proposed method is superior to other existing methods and achieves high efficiency in predicting cross-species PPIs. The dataset and our source code are available at https://github.com/thnhub/BoostPPIP.git.

https://doi.org/10.26459/hueunijtt.v132i2B.7084
PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2023 Array