Amharic Language Visual Speech Recognition using Hybrid Features

Authors

  • Zelalem Tamrie Department of Computer Science, Kombolcha Institute of Technology, Wollo University, Kombolcha, Ethiopia
* Corresponding author: zelalem.t8@gmail.com

DOI:

https://doi.org/10.20372/ajst.2021.6.2.271

Abstract

Lip motion reading is a process of knowing the words spoken from a video with or without an audio signal by observing the motion of the lips of the speaker. In the previous studies its accuracy is limited because of not applying appropriate image enhancement methods and the algorithms used for feature extraction and feature vector generation. In the present study, we propose automatic visual speech recognition machine learning and computervision techniques for Amharic language lip motion reading. The objective of the study to improve the existing Amharic lip motion reading and the performance of speech recognition systems operating in noisy environments. The collected the video of Amharic speech by recording directly using mobile devices. In this study 14 Amharic words that are frequently talked by patients or health professional in the hospital were recorded. The total number of patients used for the study were 1260 (945 for training and 315 for testing our proposed model. To extract the features, we used Convolutional Neural Networks (CNN), Histogram of Oriented Gradients (HOG) and their combination methods were employed so as to extract the features. We feed these features to random forest independently and with combination to recognize the spoken word. Each of these features were tested by using precision, recall and fl-score classifiers for measuring the performance of our model and to compare the accuracy of our model with previous related works.Our model system records 66.03%, 75.24% and 76.51% accuracy on HOG, CNN and combined features (random forest), respectively.

Keywords:

Convolution Neural Network, Histogram of Oriented Gradients, Lip motion reading, Random Forest.

Metrics

Metrics Loading ...

Downloads

Published

2021-12-31

How to Cite

Tamrie, Z. . (2021). Amharic Language Visual Speech Recognition using Hybrid Features. Abyssinia Journal of Science and Technology, 6(2), 42–50. https://doi.org/10.20372/ajst.2021.6.2.271

info

Issue

Section

Original Research Articles

License

Copyright (c) 2022 Abyssinia Journal of Science and Technology

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.