Feature’s importance assessment for activation probability measure in topic’s diffusion prediction


In this study, we aim to estimate the sigma coefficient in the activation probability calculation for a topic’s diffusion prediction problem. In our previous studies, we proposed an aggregated activation probability combination of the metapath and text information, in which sigma is the characteristic coefficient of interest’s similarity based on textual content. σ is a parameter that controls the rates of the influence of active probability based on the metapath and interest similarity on aggregated activation probability. In a previous study, we supposed the equal importance between the metapath and textual information, when σ = 0.5. However, for different datasets, this coefficient differs, depending on the meaning of the meta-path and the textual information. In this study, we continue to investigate the importance of the sigma coefficient for the effectiveness of the topic’s diffusion prediction problem on the bibliographic network. We propose to utilize the two most common methods for feature selection: the ANOVA test and mutual information to obtain the significance of two features MP (metapath) and the IS (textual information). The experimental results show that the use of the feature selection methods to estimate the sigma coefficient is reliable and improves the predictive performance of the topic’s diffusion compared with the standard assignment of 0.5.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.