Group |
2022-12 |
Status |
completed |
Title |
An American Sign Language/Speech to Text Conversation Mediator |
Supervisor |
H. Rivaz, T. Fevens (CSSE) |
Description |
American Sign Language (ASL) enables deaf and mute people to communicate
using hand gestures and actions that represent words and letters in English. As of today,
only a fraction of people suffering from deafness are able to communicate through sign
language. This is due to the low frequency of interaction with people suffering from
deafness and a lack of teaching resources for the language. The project aims to promote
the awareness of the difficulties mute and/or deaf people face not having a reliable or
widespread communication system. To achieve this, an application will be developed that
facilitates the learning of ASL, for those that need it and for those that do not. It will also
provide the ability to translate ASL to text and speech to text, through a video call,
allowing for a seamless conversation between a deaf person and an able-bodied person.
The project will translate ASL to text using an action recognition neural network
developed using Tensorflow to determine which gesture is performed. The neural
network will be trained and tested using Sklearn by providing a train-test split of multiple
videos for each sign - representing one word each - to categorize and recognize the signs.
A holistic model is created using Mediapipe to detect the placement of the hands and
extract the points of interest. Along with Mediapipe, the landmarks on the hands and face
of a person performing the action for both training videos and live feed will be captured
using OpenCV. These data points will be extracted from the pre-recorded sign language
video dataset which in turn will be used to build and test the neural network architecture.
The web application will serve as a text conversation service in which the signer
actions and speaker speech are translated into text, in the form of a text conversation.
This allows both parties to understand what is being communicated by each party in a
hands-off maner (no texting required). It will also enable a learning tool feature, which
will prompt a user to sign a word and score the user based on the action.
The neural network will determine the word associated with the action and
display it on screen. Sentences can be formed from multiple continuous actions. The
model will be hosted and deployed on a Google Cloud Platform (GCP) Virtual Machine
instance, as well as the web application. Speech-To-Text external service offered by
Google Cloud will be used for converting the speech to text. The web application will run
on a web browser from a laptop, each party requiring their own.
The deliverables consist of:
1) Data collection and extraction:
a) Acquire a dataset of varied words signed in ASL, with multiple variations
of each to be processed and used for training.
b) Development of software to process the training videos into useful data for
a neural network to build on. Use of OpenCV and Mediapipe to extract the
ASL gesture landmark points on the hands and face
2) A neural network using PyTorch capable of quickly and accurately determining
the words being signed by a user in real-time from a video feed
3) A real-time software web application capable of:
a) Facilitating a one-on-one conversation through a video and audio feed
with socket programming.
b) Processing the video feed on a cloud instance running the neural network
model capable of determining the signed words and responding with the
textual equivalent. The response being fed back to the feed as text.
c) A learning tool for ASL, prompting a user to sign a specific word/phrase.
A video feed will capture the gesture and the application will then score
the correctness of the gesture compared to the existing model.
4) Hosting of the software application and neural network model on Google Cloud
Platform instances, enabling the use of several AI and Machine Learning APIs.
|
Student Requirement |
● Knowledge in full-stack development for the web application
● Relevant coursework (completed or currently enrolled in): COEN366, COEN
424, COMP 472
● Experience with neural networks (machine learning) using Python for gesture
recognition |
Tools |
● Technologies for AI: OpenCV, Mediapipe, PyTorch, Sklearn (Python)
● Technologies for Web Application: MongoDB, Express, React, Node.js, GCP
● Dataset: 2000 words, comprising of approximately 19000 videos of signed words |
Number of Students |
6 |
Students |
N. Harris, N. Kawwas, M. Sklivas, A. Turkman, A. Mirza, T. Elango |
Comments: |
|
Links: |
|