Offline Voice-Activated Note Takiing Application

Source Code

The Offline Speech-to-Text, Image-to-Text, and Text-to-Speech Web Application is designed to provide users with a comprehensive solution for converting spoken words and images into text, along with the ability to convert text back into speech, all without requiring an internet connection. This project integrates Vosk for speech recognition, Tesseract OCR for image text extraction, and a text-to-speech (TTS) engine, ensuring a seamless and interactive user experience. The system is built using a Flask-based backend with a web-based frontend developed using HTML, CSS, and JavaScript, enabling smooth and responsive user interaction. The application allows users to record speech through a microphone and transcribe it into text in real-time. Additionally, users can upload images containing text, which are processed using OCR to extract readable text.

Key Technologies Used:

  • Backend and Core Processing: Python 3, Flask, Vosk, Tesseract OCR, pyttsx3, Pydub.
  • Frontend: HTML, CSS, JavaScript, HTML5 Audio API
  • Supporting Tools: Pillow (PIL)

Applications:

Project Features:

Conclusion:

The Offline Voice-Activated Note-Taking Application is a feature-rich, privacy-focused solution integrating three advanced AI domains—Speech Recognition, OCR, and Text-to-Speech—in a single offline web app. It ensures secure, accessible, and efficient usage for users with diverse needs, especially in connectivity-constrained environments. The project demonstrates the effective use of real-time data processing, modular architecture, and AI toolkits, making it a valuable contribution to assistive and AI-driven offline applications.