Deaf Ai translator

Introduction

The communication gap between deaf and hearing individuals in Thailand has reached a critical point. Following the suspension of the government-funded Thai Telecommunication Relay Service (TTRS), the deaf community lost vital access to real-time video interpretation for school, work, and emergencies.

Translating between Thai Sign Language (TSL) and written Thai is uniquely complex because TSL does not follow standard Thai grammar. Furthermore, the lack of high-quality public datasets makes training AI models difficult. This project introduces an automated, AI-driven bidirectional translation system designed to provide a 24/7, scalable, and cost-effective communication bridge.

‍

Key Features

Bidirectional Translation: A dual-component system featuring Sign-to-Text (recognizing gestures) and Text-to-Sign (generating visual representations).
MediaPipe Integration: Utilizes high-performance body tracking to extract 543 distinct landmarks (hands, face, and pose), converting complex video into lightweight numerical data.
Smart Grammar Reordering: A specialized language model that transforms standard Thai (SVO) into the TSL-standard Object-Subject-Verb (OSV) structure.
Mobile-First Design: A streamlined, single-screen interface that combines the webcam feed, chat history, and translation output to minimize cognitive load during conversations.
Privacy-Conscious Processing: By using pose landmarks instead of raw video pixels, the system preserves essential motion data while reducing the processing of sensitive visual information.

‍

Development and Innovation

The project innovates by combining computer vision with sequential machine learning architectures to handle the temporal nature of sign language.

Advanced Neural Architectures

The development team experimented with two primary approaches to capture the "rhythm" of TSL:

Temporal Convolutional Networks (TCN): These apply convolutions along the temporal dimension to learn hand motion trajectories and the speed of signing.
LSTM & Siamese Networks: Long Short-Term Memory (LSTM) networks handle the internal "memory" of a gesture, which are then processed by a Siamese Neural Network to learn similarities between different signs using Contrastive Learning.

The "Text-to-Sign" Pipeline

To generate signs, the system uses the Qwen3-8B open-source model. It is fine-tuned to act as a translator that reorders Thai sentences. For example:

Spoken Thai: "I eat rice" (Subject-Verb-Object)
TSL Structure: "Rice I eat" (Object-Subject-Verb) Once reordered, the system retrieves and displays the corresponding hand-sign video segments.

‍

Impact and Future Directions

This AI-based solution aims to restore independence to the Thai deaf community by providing a tool that is not reliant on human availability or government funding.

24/7 Accessibility: Unlike human interpreters, the AI system can serve thousands of users simultaneously at any time of day.
Scientific Visualization: The project uses t-SNE and PCA to visualize the "Embedding Space," allowing developers to see exactly how the AI clusters similar signs and where it might confuse others.
Standardization: By focusing on the TSL taught in formal deaf education, the project promotes a standardized communication tool that can be used across different regions of Thailand.

Future Roadmap:

Expanding Vocabulary: Continuing to fine-tune models on larger datasets to include technical and emergency-specific terminology.
Non-Manual Marker Enhancement: Further developing the 468-point facial landmark tracking to better interpret the emotional nuances and grammatical modifiers found in facial expressions.

‍

Project Advisor(s)

No items found.

Research Team member(s)

Sippapas Pronanunt

Undergraduate Student

Saritwatt Khanthakamolmart

Undergraduate Student

Jirat Kositchaiwat

Undergraduate Student

Jirapat Kulruchakorn

Undergraduate Student