ADR and Post workflows. Transcription by Assistant / ChatGPT
Hi there,
Some of us were talking about the new "virtual assistant" using voice input with AI tools such as ChatGPT or others.
In particular, we thought of some possibilities for assisting ADR and post-production workflows, such as creating and storing on-the-fly transcription, such as for "speech search" or "script validation" tasks.
Do others have ideas of where AI voice modelling and speech-to-text or large language models might be useful for SoundFlow users?
I hope you are well!
- Christian Scheuer @chrscheuer2023-06-06 09:10:43.570Z
Thanks for sharing, Brenden. Could you elaborate a bit on your ideas in terms of ADR etc.? Would love to hear them and how you'd like this to work.
- In reply tonednednerb⬆:Brenden @nednednerb
Hi Christian,
Well, for myself, I do not do much dialogue replacement for video. However, I do work with voiceover, audiobook, and podcast material.
Quite often, I will receive a spreadsheet or a PDF of a book that has all the text that should have been recorded.
The course of my work involves editing each voiceover to match each row or cell in the spreadsheet.There is a certain point in my workflow where I actually rely on an API call to Microsoft Machine Learning servers that a friend wrote for me. It uses his server and his account because there was no easy way to do a speech-to-text in Pro Tools or Premiere to get each file or clip transcribed into its own row or cell of a CSV.
I do this workflow because it's just a convenient way to be able to scan text visually rather than literally listen to each second of audio (more than a couple times by the end of work). If there are problematic noises or under-spoken pronunciation, often that appears in a weird transcription, or I can see other issues.
I noted that the new "virtual assistant" was going to receive and process voice input. My thoughts quickly hopped over to the idea of the assistant receiving voice input from an audio clip in Pro Tools and then being able to give back that text in useful formats (such as CSV, as I describe above).
Also, sometimes I just want to find, "Where in this file is this specific keyword?" I find it annoying to scroll back and forth, playing, stopping, and navigating. I want to scan a timeline for STT, then search for a text keyword and find an audio timeline point.
I believe that various users could use this voice, speech, and text assistance in a variety of applications, but what I mentioned above would be useful for my regular work.
- BBrandon Jiaconia @Brandon_Jiaconia
Hey Brenden - Pretty wild that you asked this question exactly a year ago today! I was looking into this today and was able to get STT working using Open AI's Whisper. https://openai.com/index/whisper/
It's pretty great, it transcribes the audio and can output several text formats .txt etcI'm able to get STT working in terminal, but for some reason soundflow is not passing the shellscript when I use something like
sf.ui.finder.selectedPaths.map(path => { log(`Now converting ${path}...`); sf.system.exec({ commandLine: `whisper "${path}"--model small --language English --output_format txt` }); });```
Or even running the simple command line that I can run in terminal with SF just doesnt run. May be a permissions issue or SIP issue (I'm on a work computer). But I have other command line SF scripts that all work perfectly.
Christian Scheuer @chrscheuer2024-06-06 22:01:53.531Z
It's most likely because you'll need to specify the full path to the whisper binary.
Type
which whisper
in Terminal to get the full path.
- BIn reply tonednednerb⬆:Brandon Jiaconia @Brandon_Jiaconia
Thanks Christian . I was doing just that in an earlier version. For example:
sf.system.exec({ commandLine: `/opt/homebrew/bin/whisper "/Users/brandon.jiaconia/Desktop/jia_02.aif" --model small --language English --output_format txt` });
The Soundflow Icon is blue for about 5 seconds but nothing happens. The process usually takes awhile in terminal. I was trying to also use an apple script triggered from SF but it also didnt work. I'm going to keep testing, If I can get something working I'll post it!
Christian Scheuer @chrscheuer2024-06-07 09:42:08.717Z
You could tell it to tee its output somewhere, to a log file, so you could better see what's going on.
- BBrandon Jiaconia @Brandon_Jiaconia
The log is telling me that it's failing to find ffmpeg which is a requirement for whisper. But I have ffmpeg installed at the standard /opt/homebrew/bin/ffmpeg. I use it all the time for converting videos etc. Here is the info from the log:
"/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1955, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'Still working directly in terminal so I'm confused. I'm going to keep working on it but if you have any ideas I'm all ears !
Christian Scheuer @chrscheuer2024-06-07 23:48:56.286Z
It's probably because your path (env variable PATH) isn't correctly specified, so you'd have to set it yourself to include the directory of your ffmpeg process.
- BBrandon Jiaconia @Brandon_Jiaconia
Just getting back to this - your suggestion was correct, thank you! I'm now able to run Open AI Whisper with Soundflow to get a transcription from an audio or video file. It runs the selected file in finder. Here is the script:
// Get the path of the selected file in Finder var fullPath = decodeURIComponent(sf.appleScript.finder.selection.getItem(0).asItem.path); // Get the directory of the selected file var directoryPath = fullPath.substring(0, fullPath.lastIndexOf('/')); // Command line sf.system.exec({ commandLine: `PATH=/opt/homebrew/bin:$PATH /opt/homebrew/bin/whisper "${fullPath}" --model small --language English --output_format txt --output_dir "${directoryPath}"` });