Maybe this is just my phone (and laptop), but volume control is irritating when some tracks are configured so that I need to set the volume to 70-80% and some tracks are so “naturally loud” that the lowest setting (5% ish for my phone) is distractingly loud.
On some of my tracks (especially for the classical music ones), within the same track I need to change the volume from 20% to 80% depending on what part I am listening to if I want to hear everything without killing my ear drums.
I get that it would be difficult to do anything about this for streaming or live audio since the phone doesn’t know in advance what the input will be, but for a pre-recorded mp3 file, couldn’t my phone do some digital signal processing?
Do I just have terrible electronic items and is this an issue anyone else experiences? Ot is this problem just harder to solve than I am expecting?
I have a semi-related question if you don’t mind. People often complain about the voice tracks in movies being hard to hear, especially if you don’t have a speaker for the center channel (but even then I have trouble)
Why haven’t they solved this problem by packaging the voice track separately on the bluray/stream so you can turn up the volume of the voices only without blowing your ears out when the music hits?
I don’t know why they don’t, I work in music rather than TV/Film but it infuriated me too! Give me a voice volume control! It would be technically very easy to do implement as a standard but the powers that be just haven’t come together and done it!
I’m glad to hear I’m not the only one thinking it!
Do you think it could be done by diffing a few of the different language tracks?
Unfortunately no, audio files are actually really dumb in that they’re basically just a file of 44100 (or 48000 or 96000 etc) amplitude numbers per second.
So there’s nothing really to diff because it’s basically just a squiggly line, set of squiggly lines or, when compressed, a mathematical expression that when decompressed, recreates a squiggly line.
You could isolate the dialog if you got ahold of a version with no dialog at all and then inverse the polarity of that and sum it with the original but it’s unlikely you’ll find a version without any vocals.
Machine learning vocal isolation tools are probably going to be the best way to go about it as a DIY approach. Ultimate Vocal Remover 5 with the demucs 4 algo is great FOSS software to extract vocals and you could sum that with the original track and adjust the gain to get louder dialogue… it would be a lot of work though…
I don’t really understand still but thanks for trying all the same.