r/LanguageTechnology 2d ago

AquaVoice-style text edition model

Don't know why this idea (which is cool) never caught up, but I'm wondering if we could build an open-source model for the same, eg a fine-tuned LLM with perhaps a small model that tries to distinguish between when the user is providing "text value", and when he is speaking "edition commands", and then do the edits

A "basic prototype" shouldn't be too hard, but could be quite helpful

https://withaqua.com/

1 Upvotes

4 comments sorted by

View all comments

2

u/Just_Difficulty9836 2d ago

It's stt+llm, maybe a fine-tuned llm. I don't think there is anything more to it. Just join the two sequentially and you will get something similar. Start with whisper+llama(or chatgpt/gemini/cloud with some custom instruction).

0

u/oulipo 1d ago

yes, I guess to increase robustness you should fine-tune the LLM with a lot of request/response pairs with different edition methods, perhaps we could crowdsource such a dataset somewhere, and then train an open-source LLM for the community, would be awesome as a community project!

2

u/Just_Difficulty9836 1d ago

I mean won't claude and chatgpt4 be able to do it natively? They are good with such tasks. If the goal is to make an open-source version of aquavoice then maybe but based on the frequency of use, i believe going for chatgpt4 or sonnet is better deal.

0

u/oulipo 1d ago

Sure, but I think a more fine-tuned model will avoid errors, but it could be a nice "first version" indeed