Evaluate the integration performance of text-to-speech artificial intelligence systems for supporting visually impaired students in the smart university model


artificial intelligence system
inclusive education
smart university
digital transformation
visually impaired students hệ thống trí tuệ nhân tạo
chuyển văn bản thành giọng nói
hòa nhập giáo dục
đại học thông minh
chuyển đổi số
sinh viên khiếm thị


People with disabilities have been facing difficulties and barriers in terms of inclusive education, especially at the undergraduate level. In recent years, building and applying the smart university model based on the development of science, technology, and engineering has gradually opened up learning opportunities for disabled people. This study evaluated text-to-speech systems and conducted performance experiments on the integration ability of the smart university model in order to better support visually impaired students in Vietnam’s universities. Along with this, it also showed a suitable roadmap for integrating text-to-speech artificial intelligence systems into the smart university model for Vietnam’s universities.



  1. Pivik, J., McComas, J. and Laflamme, M. (2002), Barriers and facilitators to inclusive education, Exceptional children, 69(1), 97–107.
  2. Bakken, J. P. et al. (2018), Smart university: software systems for students with disabilities, Smart Universities: Concepts, Systems, and Technologies 4, 87–128.
  3. Bakken, J. P., Varidireddy, N. and Uskov, V. L. (2019), Analysis and classification of university centers for students with disabilities, Smart Education and e-Learning 2019, Springer, 445–459.
  4. Cambre, J. et al. (2020), Choice of voices: A large-scale evaluation of text-to-speech voice quality for long-form content, Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–13.
  5. Dai, L. et al. (2022), Evaluating the usage of Text-To-Speech in K12 education, Proceedings of the 2022 6th International Conference on Education and E-Learning, 182–188.
  6. Bakken, J. P. et al. (2019), The quality of text-to-voice and voice-to-text software systems for smart universities: perceptions of college students with disabilities, Smart Education and e-Learning 2018 5, Springer, 51–66.
  7. Alper, A. E. and Alper, F. Ö. (2020), Industry 4.0 revolution and its impacts on labor market, Çukurova Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 29(3), 441–460.
  8. Vu, T. L. A. and Le, T. Q. (2019), Development orientation for higher education training programme of mechanical engineering in industrial revolution 4. 0: a perspective in vietnam, J. Mech. Eng. Res. Dev, 42(1), 68–70.
  9. Lukita, C. et al. (2020), Curriculum 4.0: adoption of industry era 4.0 as assessment of higher education quality, IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 14(3), 297–308.
  10. Sima, V. et al. (2020), Influences of the industry 4.0 revolution on the human capital development and consumer behavior: A systematic review, Sustainability, 12(10), 4035.
  11. Sitepu, E. S., Rangkuti, A. E. and Fachrizal, F. (2020), Analysis of the competency of fresh graduated higher education in supporting industrial era 4.0, IJIET (International Journal of Indonesian Education and Teaching), 4(1), 82–101.
  12. Law, M. Y. (2022), A Review of Curriculum Change and Innovation for Higher Education, Journal of Education and Training Studies, 10(2), 16.
  13. Prasolova-Førland, E. et al. (2018), Practicing interprofessional team communication and collaboration in a smart virtual university hospital, Smart Universities: Concepts, Systems, and Technologies 4, 191–224.
  14. Shahroom, A. A. and Hussin, N. (2018), Industrial revolution 4.0 and education, International Journal of Academic Research in Business and Social Sciences, 8(9), 314–319.
  15. Brand, B. S. et al. (2022), Sapientia: a Smart Campus model to promote device and application flexibility, Advances in Computational Intelligence, 2(1), 18.
  16. Lai, C., Chundra, U. and Lee, M. (2020), Teaching and learning based on IR 4.0: Readiness of attitude among polytechnics lecturers, Journal of Physics: Conference Series, IOP Publishing, 032105.
  17. Romero-Rodríguez, J. -M. et al. (2020), Mobile learning in higher education: Structural equation model for good teaching practices, Ieee Access, 8, 91761–91769.
  18. Alsharhan, E. and Ramsay, A. (2019), Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions, Information Processing & Management, 56(2), 343–353.
  19. Barkana, B. D. and Patel, A. (2020), Analysis of vowel production in Mandarin/Hindi/American-accented English for accent recognition systems, Applied Acoustics, 162, 107203.
  20. Bhuyan, M. and Sarma, S. (2019), A Higher-Order N-gram Model to enhance automatic Word Prediction for Assamese sentences containing ambiguous Words, International Journal of Engineering and Advanced Technology, 8(6), 2921–2926.
  21. Bhuyan, M., Sarma, S. and Rahman, M. (2020), Natural language processing based stochastic model for the correctness of assamese sentences, International Conference on Communication and Electronics Systems (ICCES), IEEE, 1179–1182.
  22. Yang, J. et al. (2014), Deep learning theory and its application in speech recognition, Commun. Countermeas, 33, 1–5.
  23. Oord, A. van den et al. (2016), Wavenet: A generative model for raw audio, arXiv preprint arXiv:1609.03499 [Preprint].
  24. Van den Oord, A. et al. (2016), Conditional image generation with pixelcnn decoders, Advances in neural information processing systems, 29.
  25. Van Den Oord, A., Kalchbrenner, N. and Kavukcuoglu, K. (2016), Pixel recurrent neural networks, International conference on machine learning, PMLR, 1747–1756.
  26. Oord, A. et al. (2018), Parallel wavenet: Fast high-fidelity speech synthesis, International conference on machine learning, PMLR, 3918–3926.
  27. Arık, S. Ö. et al. (2017), Deep voice: Real-time neural text-to-speech, International conference on machine learning, PMLR, 195–204.
  28. Wang, Y. et al. (2017), Tacotron: Towards end-to-end speech synthesis’, arXiv preprint arXiv:1703.10135 [Preprint].
  29. Shen, J. et al. (2018), Natural tts synthesis by conditioning wavenet on mel spectrogram predictions, IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 4779–4783.
  30. Yasuda, Y. et al. (2019), Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 6905–6909.
  31. Tachibana, H., Uenoyama, K. and Aihara, S. (2018), Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention, IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 4784–4788.
  32. Ping, W. et al. (2017), Deep voice 3: Scaling text-to-speech with convolutional sequence learning, arXiv preprint arXiv:1710.07654 [Preprint].
  33. Khanam, F. et al. (2022), Text to Speech Synthesis: A Systematic Review, Deep Learning Based Architecture and Future Research Direction, Journal of Advances in Information Technology, 13(5).
  34. Kumar, Y., Koul, A. and Singh, C. (2023), A deep learning approaches in text-to-speech system: A systematic review and recent research perspective, Multimedia Tools and Applications, 82(10), 15171–15197.
  35. Uskov, V. L. et al. (2018), Smart university: conceptual modeling and systems’ design, Smart Universities: Concepts, Systems, and Technologies 4, 49–86.
  36. Kato, T., Kambayashi, Y. and Kodama, Y. (2018), Using a Programming Exercise Support System as a Smart Educational Technology, Smart Universities: Concepts, Systems, and Technologies 4, 295–324.
  37. Villegas-Ch, W., Palacios-Pacheco, X. and Luján-Mora, S. (2019), Application of a smart city model to a traditional university campus with a big data architecture: A sustainable smart campus, Sustainability, 11(10), 2857.
  38. Jurva, R. et al. (2020), Architecture and operational model for smart campus digital infrastructure, Wireless Personal Communications, 113, 1437–1454.
  39. Utami, R. et al. (2019), Teacher Professional Development in Education 4.0: Awareness of Digital Literacy, in Proceedings of the 1st International Conference on Business, Law And Pedagogy, ICBLP 2019, 13–15 February 2019, Sidoarjo, Indonesia.
  40. Hayudiyani, M. and Arifin, I. (2020), Reorientation of Curriculum in the Face of Industrial Revolution 4.0, in. 1st International Conference on Information Technology and Education (ICITE 2020), Atlantis Press, 659–664.
  41. Heinemann, C. and Uskov, V. L. (2018), Smart university: literature review and creative analysis, Smart Universities: Concepts, Systems, and Technologies 4, 11–46.
  42. Tan, X. et al. (2021), A survey on neural speech synthesis, arXiv preprint arXiv:2106.15561 [Preprint].
  43. Gundle, P. and Chavan, R. (2019), Survey on Text to Speech Synthesis Models and Methods, International Journal of Scientific & Engineering Research, 10(7).
  44. Alonso Martin, F. et al. (2020), Four-features evaluation of text to speech systems for three social robots, Electronics, 9(2), 267.
Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright (c) 2023 Array