Speech Recognition Polyfill (STT) av apersongithub

Allows setup-less speech recognition (+ speech to text) in websites such as Google Translate, Duolingo, etc... very configurable. Choose between using OpenAI's Whisper or Vosk models locally and an optional AssemblyAI's API on the server side.

5 (1 omtale)

39 brukere

Last ned Firefox og få utvidelsen

Last ned fil

Om denne utvidelsen

Test the extension by using speech color changer and don't forget to leave a review!

The extension's code is open source and viewable on GitHub.

⭐ Key Features:

Works out of the box if your first language is English, one option away for other languages.
Extension uses WASM, no external client or app needed.
Privacy first, vosk (local) is the default model.
Decent Web Speech API Support through Polyfill (obviously)
Customizable Keybind for Speech to Text anywhere, alongside the polyfill
Many Local Models to choose from (Vosk/Whisper)
Realtime Streaming/Word-to-Word Continuous Speech like Chrome (Vosk/AssemblyAI)
Local & Offline Transcription Support (Vosk/Whisper..see info part)
Cloud/Server Transcription Support
Extensive Per-Site and Default Customization Options
Icon Status and Notification toasts to guide you. Toasts arent enabled by default
Exporting and Importing Settings
Auto Language Detection through the based on site request and some speech models
Many languages to choose from when using their respected country code (support depends on the engine used).

📝 Info:

On first install this extension will open the options page, the default model language is English but this is easily changeable. This extension allows per-site customization and a multitude of different models to decipher language. It also has a key-bind for Speech to Text (default: Alt + A). Keep in mind that this is not a complete solution and the API doesn't have full support but it's pretty close. Speech Detection is nearly as instantaneous (depending on the model), but not as accurate as Google Chrome's Cloud API. The extension icon color/indicator changes depending on the process so pin it to your menu to verify the extension is working as intended. A red mic/error icon does not necessarily mean your mic isn't working but rather the speech may have been cancelled by user input, missing cloud API key, or that it is unintelligible (usually its the latter).
Make sure you are using the correct mic and speak loud, slow, and clear otherwise your voice may not be detected or unintelligible. Change the default model to the cloud or slightly larger or different local ones if you experience problems with voice recognition (this may impair performance). You can also try enabling "boost microphone gain" if you are a soft speaker. If you are constantly processing audio over 1-2 mins, disable the ultimatum processing timeout feature in the settings.
If you're using Duolingo or similar and are trying to do the speaking practice of the language that you are learning, it is recommended to set the language in the extension to the one you are learning (navigate to the site -> click extension icon -> set language then click "save for site"). This isn't required but it will significantly improve the accuracy of your speech since the model now knows the exact language you are trying to speak. (This isn't exactly necessary for every site, one example is google translate which tells us the exact language that is being used through the input box's data so auto-detect works fine). Look at the images for more help.
You can also use my Speech Recognition Polyfill Userscript as an alternative for simplicity/accuracy, or cross-browser compatibility purposes, but the extension is overall better and isn't that complicated to use anyways. This userscript provides one-to-one equivalent speech recognition results when compared to Google Chrome's backend implementation when using the "v1" provider (I think chrome uses this exact endpoint that I reverse engineered lol) in the userscript config. The "v2" provider is more accurate and supports grammar, thought its only SLIGHTLY slower due to technical limitations. Language support with this userscript is not fully known but likely wide-ranging.

❗ Caveats:

8GB of RAM is a minimum requirement since it could easily take up to a decent chuck when utilizing larger models.
A modern CPU/GPU is recommended.
An internet connection is required overall, but especially before using offline mode since the model needs to cache itself first.
Not being illiterate and reading the FAQ before making a review.
Understanding that implementing the Web Speech API perfectly isn't feasible and that this extension has more support than anything else available.

Even though the model runs locally, the extension re-downloads it either when idle or after closing the tab/opening a new one that utilizes the extension (for memory preservation purposes). This is ultimately better than packaging the large models within the extension for the time being and for most models, the download speed will be near instant for the general population. We also have an option in settings to keep the default model cached without re-downloading every time. Its basically locally offline as long as you use the caching option and don't close the browser or switch models. Apart from locally you can use the cloud based model which is less hardware intensive.

The extension will take ~1GB of ram on normal/cloud models and up to ~7GB if you use the biggest model (you don't need to use the biggest model lol). I've implemented decent memory management to compensate.

Vurdert til 5 av 1 anmelder

Støtt denne utvikleren

Utvikleren av denne utvidelsen spør om du kan hjelpe til med å støtte den videre utviklingen ved å gi et lite bidrag.

Bidra nå

Speech Recognition Polyfill (STT) av apersongithub

Nødvendige tillatelser:

Datainnsamling: