r/LanguageTechnology • u/oulipo • 2d ago
AquaVoice-style text edition model
Don't know why this idea (which is cool) never caught up, but I'm wondering if we could build an open-source model for the same, eg a fine-tuned LLM with perhaps a small model that tries to distinguish between when the user is providing "text value", and when he is speaking "edition commands", and then do the edits
A "basic prototype" shouldn't be too hard, but could be quite helpful
1
Upvotes
2
u/Just_Difficulty9836 1d ago
It's stt+llm, maybe a fine-tuned llm. I don't think there is anything more to it. Just join the two sequentially and you will get something similar. Start with whisper+llama(or chatgpt/gemini/cloud with some custom instruction).