Offline Voice

The Offline Speech-to-Text, Image-to-Text, and Text-to-Speech Web Application is designed to provide users with a comprehensive solution for converting spoken words and images into text, along with the ability to convert text back into speech, all without requiring an internet connection. This project integrates Vosk for speech recognition, Tesseract OCR for image text extraction, and a text-to-speech (TTS) engine, ensuring a seamless and interactive user experience. The system is built using a Flask-based backend with a web-based frontend developed using HTML, CSS, and JavaScript, enabling smooth and responsive user interaction. The application allows users to record speech through a microphone and transcribe it into text in real-time. Additionally, users can upload images containing text, which are processed using OCR to extract readable text.

Key Technologies Used:

Backend and Core Processing: Python 3, Flask, Vosk, Tesseract OCR, pyttsx3, Pydub.
Frontend: HTML, CSS, JavaScript, HTML5 Audio API
Supporting Tools: Pillow (PIL)

Applications:

Accessibility Tools - Helps visually impaired users to transcribe and listen to text offline.
Educational Use - Students and educators can record lectures, transcribe notes, and generate study material.
Assistive Technology - Supports users with disabilities in note-taking.
Secure Offline Use Cases - Useful in organizations or institutions where internet access is restricted or privacy-sensitive.

Project Features:

Offline STT: Converts microphone input to text using the Vosk engine.
Image OCR: Uploads an image (e.g., printed or handwritten note) and extracts text using Tesseract.
Text-to-Speech (TTS): Reads out both recognized and manually entered text using pyttsx3.
Live Transcription: Real-time display of recognized speech with minimal delay.

Conclusion:

The Offline Voice-Activated Note-Taking Application is a feature-rich, privacy-focused solution integrating three advanced AI domains—Speech Recognition, OCR, and Text-to-Speech—in a single offline web app. It ensures secure, accessible, and efficient usage for users with diverse needs, especially in connectivity-constrained environments. The project demonstrates the effective use of real-time data processing, modular architecture, and AI toolkits, making it a valuable contribution to assistive and AI-driven offline applications.