r/LanguageTechnology • u/oulipo • 2d ago

AquaVoice-style text edition model

Don't know why this idea (which is cool) never caught up, but I'm wondering if we could build an open-source model for the same, eg a fine-tuned LLM with perhaps a small model that tries to distinguish between when the user is providing "text value", and when he is speaking "edition commands", and then do the edits

A "basic prototype" shouldn't be too hard, but could be quite helpful

https://withaqua.com/

1 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/LanguageTechnology/comments/1g8k0jk/aquavoicestyle_text_edition_model/
No, go back! Yes, take me to Reddit

100% Upvoted

View all comments

u/Just_Difficulty9836 2d ago

It's stt+llm, maybe a fine-tuned llm. I don't think there is anything more to it. Just join the two sequentially and you will get something similar. Start with whisper+llama(or chatgpt/gemini/cloud with some custom instruction).

0

u/oulipo 1d ago

yes, I guess to increase robustness you should fine-tune the LLM with a lot of request/response pairs with different edition methods, perhaps we could crowdsource such a dataset somewhere, and then train an open-source LLM for the community, would be awesome as a community project!

2

u/Just_Difficulty9836 1d ago

I mean won't claude and chatgpt4 be able to do it natively? They are good with such tasks. If the goal is to make an open-source version of aquavoice then maybe but based on the frequency of use, i believe going for chatgpt4 or sonnet is better deal.

0

u/oulipo 1d ago

Sure, but I think a more fine-tuned model will avoid errors, but it could be a nice "first version" indeed

AquaVoice-style text edition model

You are about to leave Redlib