5548

SignesTrad is a revolutionary edge AI solution that bridges communication gaps between the deaf community and hearing individuals by translating French Sign Language (LSF) to text in real-time. Leveraging the powerful STM32N6 board and computer vision technology, our proposed device will capture hand gestures through its integrated camera, process them using optimized neural networks, and instantly display the translated French text. This portable, low-latency solution will work independently without requir

 Vision and Purpose

Communication barriers between deaf individuals who use French Sign Language (LSF) and those who don't understand it create significant social and practical challenges. While professional interpreters provide excellent support, they're not always available, and existing digital solutions often require stable internet connections or powerful hardware. SignesTrad addresses these limitations by creating an affordable, portable device that performs real-time LSF interpretation directly on the edge, without dependency on cloud services.

 Technical Implementation

SignesTrad harnesses the full potential of the STM32N6 Discovery Kit, particularly its Neural-ART Accelerator NPU, to run complex AI models efficiently at the edge. Our solution consists of:

  1. Data Acquisition System: We utilize the MIPI camera interface of the STM32N6 board to capture high-quality video input at 30fps, with the camera positioned to clearly view the signer's hands and upper body movements.

  2. AI Processing Pipeline: The core of our solution involves a two-stage deep learning approach:

    • A modified MobileNetV3 model for hand and pose detection that identifies and tracks key points on the hands, arms, and face 
    • A temporal GRU (Gated Recurrent Unit) network that analyzes sequences of movements to recognize grammatical structures specific to LSF 
  3. Optimization for Edge Deployment: We plan to employ several techniques to ensure optimal performance on the STM32N6:

    • Model quantization to 8-bit precision 
    • Layer fusion to reduce memory transfers 
    • Custom activation functions optimized for the Neural-ART architecture 
    • Hardware-specific memory allocation to minimize data transfer bottlenecks 
  4. User Interface: A clean, intuitive interface displays the translated text on the integrated LCD screen. The system includes:

    • Real-time text display with minimal latency (<200ms from gesture to text) 
    • Confidence indicators for ambiguous interpretations 
    • Simple controls for adjusting sensitivity and language preferences 
    • Battery status and system diagnostics 

 Hardware Configuration

The SignesTrad prototype will utilize nearly all the key features of the STM32N6 Discovery Kit:

  • The powerful STM32N6 microcontroller serves as the system's brain, coordinating all operations 
  • The Neural-ART NPU accelerates neural network inference by up to 30x compared to CPU-only execution 
  • The MIPI connector interfaces with our custom camera module for high-quality video input 
  • The 32MB HexaRAM provides sufficient memory for our model's activation maps and intermediate results 
  • The onboard LCD display presents translated text to the user 
  • The SD card slot stores our model weights and optional recording capabilities for system improvement 

 Proposed Software Architecture

Our software will be structured in a modular fashion to enable easy maintenance and future expansion:

  1. Camera Interface Module: Will handle video capture, preprocessing, and frame management 
  2. AI Inference Engine: Will coordinate the execution of our neural networks on the Neural-ART NPU 
  3. Sign Language Processing: Will post-process network outputs to handle linguistic features of LSF 
  4. User Interface Manager: Will control display output and user input 
  5. System Management: Will handle power, connectivity, and resource allocation 

 Expected Performance Metrics

Based on our research and preliminary design, we project the following performance targets:

  •  Sign recognition accuracy: 85-90% | For isolated signs
  • Sentence comprehension: 75-80% | For simple sentences
  • Processing latency: <250ms | From gesture to display
  • Frame rate: 12-15 FPS | Sufficient for fluid interpretation
  • Power consumption: ~1.5W | Estimated during active use
  • Initial vocabulary size: 500-600 words | First implementation
  • Boot time: ~5s | From power-on to ready state
 Proposed Implementation Timeline

We anticipate a development process spanning approximately six months:

  • Month 1-2: Dataset collection and model training 
  • Month 3: Initial algorithm implementation and optimization 
  • Month 4: Hardware integration and testing 
  • Month 5: User interface development and performance tuning 
  • Month 6: Final testing, validation, and documentation 

 Why This Project Should Be Selected

SignesTrad represents a perfect match for the STM32 Edge AI Contest for several reasons:

  1. Social Impact: Our project addresses a real-world accessibility challenge faced by approximately 300,000 deaf individuals in France who use LSF as their primary means of communication.

  2. Technical Innovation: We push the boundaries of what's possible with edge AI on microcontrollers, demonstrating how complex computer vision and natural language processing can be optimized for embedded systems.

  3. Complete Utilization of STM32N6 Capabilities: Our solution leverages virtually all the key features of the STM32N6 Discovery Kit, from its Neural-ART NPU to its camera interface, memory resources, and display capabilities.

  4. Practical Implementation: SignesTrad is designed to be usable in everyday situations, with careful attention to user experience, battery life, and real-world performance.

  5. Future Potential: The project establishes a foundation for expanded capabilities and could lead to commercial applications that bring tangible benefits to the deaf community.

By selecting SignesTrad, the judges would support not only technical innovation but also meaningful social inclusion through accessible technology.