FOSS voice to text

Iapar@feddit.de · 1 year ago

FOSS voice to text

The Hobbyist@lemmy.zip · 1 year ago

OpenAI’s Whisper model is a really great one, supports many languages ans translationa and is available both as a pretrained model (https://github.com/openai/whisper) which can be selfhosted and as an API (https://openai.com/pricing).

doodimus@beehaw.org · 1 year ago

I just found out this was foss yesterday, and really want to start playing with it on my system. Is it GPU-agnostic? I have an AMD card and don’t see any mention of GPU support or CUDA requirement on the github docs.

The Hobbyist@lemmy.zip · 1 year ago

AMD has ROCm which is available on And Radeon Instinct GPUs (server GPUs) and some consumer GPUs. You’d need to double check whether your GPU supports ROCm.

It seems there is some discussion happening here on the use of ROCm with Whisper: https://github.com/openai/whisper/discussions/105 And here (showing it might be possible?): https://github.com/openai/whisper/discussions/55

doodimus@beehaw.org · 1 year ago

Thanks for that, I’ve been able to get Stable Diffusion running locally with ROCm so it looks like it should be possible then.

The Hobbyist@lemmy.zip · 1 year ago

I also found this which could be of interest:

MLC-LLM, which “Enable everyone to develop, optimize and deploy AI models natively on everyone’s devices.”

Here used to deploy Llama-2-13B on the RX 7900 XTX:

https://blog.mlc.ai/2023/08/09/Making-AMD-GPUs-competitive-for-LLM-inference?ref=upstract.com

L'unico Dee@feddit.it · 1 year ago

deleted by creator

realslef@fedia.io · 1 year ago

Not great yet and not even good for my accent, but dicio is the only foss one I’ve got working on android so far.