split stereo by channel and silence

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

split stereo by channel and silence

jlnichols
i have stereo wav files, which each channel is different speakers in a conversation.  trying to figure out how best to split a stereo file by both its channel and silence, but still know the order the files should be played in to hear the conversation has a whole. i don't want to merge the 2 channels because often 1 channel has more background noise then the other and sometime speakers will speak over each other and keeping them separate will make it easier to understand them.

what i currently have
  sox stereo.wav L.wav remix 1
  sox stereo.wav R.wav remix 2

  sox R.wav R..wav silence 1 0.05 0.2%  1 0.8 0.2% pad 0.5 : newfile : restart
  sox L.wav L..wav silence 1 0.05 0.2%  1 0.8 0.2% pad 0.5 : newfile : restart

this gets me
R.001.wav
R.002.wav
...
R.020.wav
L.001.wav
L.002.wav
...
L.024.wav

the problem is, for play back sometimes i should play multiple R.###.wav files in a row, or multi L.###.wav files and i have no way of knowing when i should do this with my current setup.

instead of just having an increment counter for the name, is there a way to have have it use the starting time( in seconds or whatever) for that segment of the file? that way i'd have the below files and could just sort by the number for the play order.
R.000.wav
L.030.wav
L.043.wav
R.078.wav
...

if not any other suggestions?

thanks for the help

Reply | Threaded
Open this post in threaded view
|

Re: split stereo by channel and silence

Jeremy Nicoll - ml sox users
On 2017-06-27 01:07, jlnichols wrote:
> i have stereo wav files, which each channel is different speakers in a
> conversation.  trying to figure out how best to split a stereo file by
> both
> its channel and silence, but still know the order the files should be
> played
> in to hear the conversation has a whole.

If you leave it as a stereo file and split by silence you'll get a
sequence of
smaller files in play order, divided up whenever neither person is
talking.  So
essentially

    both001.wav
    both002.wav
    both003.wav

(for that your sox command is going to have to contain: "both%3n.wav" I
think,
though you might need %5n or %7n or something if there's going to be a
huge
number of these files created).

Most of these files should only have one person speaking in them, but
clearly
there's going to be some with both.


I'd then run the sox  stats  effect on each of those files, piping the
output
into a script/program.  Stats tells you the sound level in each channel
of a
file.  You'd need (by experiment, probably) to find out for yourself
what the
levels are in a file where one or other person is silent.  It should
then be
possible for your script/program to decide if that stereo file contains
only
the left channel person speaking, or only the right, or both.

I would use that info to rename each of the 'both' files, so eg

    both103.wav

could become

    voice103L.wav  or  voice103R.wav  or  voice103B.wav

Then you'd have a set of files like

    voice001L.wav
    voice002L.wav
    voice003R.wav
    voice004L.wav
    voice005L.wav
    voice006B.wav
    voice007L.wav
    voice008R.wav
    voice009L.wav
    ...

To listen to just the lefthand person, you'd want to copy the files with
"L" and "B"
in their names elsewhere, to get:

    voice001L.wav
    voice002L.wav
    voice004L.wav
    voice005L.wav
    voice006B.wav
    voice007L.wav
    voice009L.wav
    ...

To listen to just the righthand person, you'd want to copy the "R" and
"B" files:

    voice003R.wav
    voice006B.wav
    voice008R.wav
    ...

To listen to the whole thing with both people, just listen to the whole
'voice' set of
files.


Now, if you think about the "just lefthand person" COPY of the
voicennnx.wav files, ie:

    voice001L.wav
    voice002L.wav
    voice004L.wav
    voice005L.wav
    voice006B.wav
    voice007L.wav
    voice009L.wav
     ...

obviously although these files contain all of the left person's
contributions to the
discussion, the "B" files do also contain the right person interrupting.
  If that was
annoying then ON THIS SET OF COPIED FILES ONLY you could run on each of
the B files
the command like:

   sox voice006B.wav voice006L.wav remix 1

to get a left-channel only copy of what was in the B file.  So then
you'd have

    voice001L.wav
    voice002L.wav
    voice004L.wav
    voice005L.wav
    voice006B.wav        chunk 6, both people
    voice006L.wav        chunk 6, left person only
    voice007L.wav
    voice009L.wav
     ...

and you could delete this copy of the voice006B file if you were sure
you didn't need
it.


Does that help?


--
Jeremy Nicoll - my opinions are my own

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: split stereo by channel and silence

jlnichols
i hadn't thought it all the way through like you have so that does help some thanks, this will end up a script for multiple files, but currently i'm just playing with one. I did tried to split by silence first but background noise on the left channel(mostly left, might have been the combined noise some too) made it so there was only 3 files, if i increase the noise % it breaks it up better but then i start to chop off words from the right side. while if i split on silence after i spilt the channels i get  2 left( unless i increase the noise % which doesn't end up hurting this side, for this file not sure about other files) and 15 right files.

another thing i just thought of, is it possible to split on the silence of just 1 channel while the file is still stereo? then if i remove chopping off the silence, i'd still get the 15 right files, and alot of little files for the left side between the 15 right files, and i'd then just combine the files in-between what gets determined to be good right side files, and that should be the left side

thanks again for the help

On Tue, Jun 27, 2017 at 4:13 AM, Jeremy Nicoll - ml sox users <[hidden email]> wrote:
On 2017-06-27 01:07, jlnichols wrote:
i have stereo wav files, which each channel is different speakers in a
conversation.  trying to figure out how best to split a stereo file by both
its channel and silence, but still know the order the files should be played
in to hear the conversation has a whole.

If you leave it as a stereo file and split by silence you'll get a sequence of
smaller files in play order, divided up whenever neither person is talking.  So
essentially

   both001.wav
   both002.wav
   both003.wav

(for that your sox command is going to have to contain: "both%3n.wav" I think,
though you might need %5n or %7n or something if there's going to be a huge
number of these files created).

Most of these files should only have one person speaking in them, but clearly
there's going to be some with both.


I'd then run the sox  stats  effect on each of those files, piping the output
into a script/program.  Stats tells you the sound level in each channel of a
file.  You'd need (by experiment, probably) to find out for yourself what the
levels are in a file where one or other person is silent.  It should then be
possible for your script/program to decide if that stereo file contains only
the left channel person speaking, or only the right, or both.

I would use that info to rename each of the 'both' files, so eg

   both103.wav

could become

   voice103L.wav  or  voice103R.wav  or  voice103B.wav

Then you'd have a set of files like

   voice001L.wav
   voice002L.wav
   voice003R.wav
   voice004L.wav
   voice005L.wav
   voice006B.wav
   voice007L.wav
   voice008R.wav
   voice009L.wav
   ...

To listen to just the lefthand person, you'd want to copy the files with "L" and "B"
in their names elsewhere, to get:

   voice001L.wav
   voice002L.wav
   voice004L.wav
   voice005L.wav
   voice006B.wav
   voice007L.wav
   voice009L.wav
   ...

To listen to just the righthand person, you'd want to copy the "R" and "B" files:

   voice003R.wav
   voice006B.wav
   voice008R.wav
   ...

To listen to the whole thing with both people, just listen to the whole 'voice' set of
files.


Now, if you think about the "just lefthand person" COPY of the voicennnx.wav files, ie:

   voice001L.wav
   voice002L.wav
   voice004L.wav
   voice005L.wav
   voice006B.wav
   voice007L.wav
   voice009L.wav
    ...

obviously although these files contain all of the left person's contributions to the
discussion, the "B" files do also contain the right person interrupting.  If that was
annoying then ON THIS SET OF COPIED FILES ONLY you could run on each of the B files
the command like:

  sox voice006B.wav voice006L.wav remix 1

to get a left-channel only copy of what was in the B file.  So then you'd have

   voice001L.wav
   voice002L.wav
   voice004L.wav
   voice005L.wav
   voice006B.wav        chunk 6, both people
   voice006L.wav        chunk 6, left person only
   voice007L.wav
   voice009L.wav
    ...

and you could delete this copy of the voice006B file if you were sure you didn't need
it.


Does that help?


--
Jeremy Nicoll - my opinions are my own


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: split stereo by channel and silence

Jeremy Nicoll - ml sox users
On 2017-06-27 15:23, Jon Nichols wrote:
> i hadn't thought it all the way through like you have so that does help
> some thanks, this will end up a script for multiple files, but
> currently
> i'm just playing with one. I did tried to split by silence first but
> background noise on the left channel(mostly left, might have been the
> combined noise some too) made it so there was only 3 files...

You maybe need to start by using  stat  or  stats  to find things out
about
sound levels in the files.  I'm sure in the past I ran some code that
chopped
a long file into (shall we say) a series of 5-second mini files, then
ran
stat/stats on each one, to build up a picture of levels all the way
through
the thing.

If background noise is a problem I think you need to clean that up
first,
perhaps splitting the channels then cleaning them, then perhaps
compressing
them (ie reducing dynamic range) - I'm not sure - then joining them back
into a stereo file.  Then again, increasing dynamic range might make the
split easier, even if you compress the audio again later.

Do you actually have stereo, or double mono?  Can any of the sound of
the
person on left channel be heard on right channel, and vice versa?


> another thing i just thought of, is it possible to split on the silence
> of
> just 1 channel while the file is still stereo?

Don't know... but even if it's not, you could split the whole file into
L & R
then split L into L fragments, and R into R fragments, and doing that
you'd
find out (from the sizes/lengths of each fragment) where each channel's
split
points were.  You could then (though I don't see how it would help)
merge those
two lists of split points into one list and apply those to the original
stereo
file.



--
Jeremy Nicoll - my opinions are my own

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: split stereo by channel and silence

jlnichols
Do you actually have stereo, or double mono?  Can any of the sound of the
person on left channel be heard on right channel, and vice versa?

 i guess double mono might be the right term, its a phone conversation, so each side is completely separate mono audio.


On Tue, Jun 27, 2017 at 12:55 PM, Jeremy Nicoll - ml sox users <[hidden email]> wrote:
On 2017-06-27 15:23, Jon Nichols wrote:
i hadn't thought it all the way through like you have so that does help
some thanks, this will end up a script for multiple files, but currently
i'm just playing with one. I did tried to split by silence first but
background noise on the left channel(mostly left, might have been the
combined noise some too) made it so there was only 3 files...

You maybe need to start by using  stat  or  stats  to find things out about
sound levels in the files.  I'm sure in the past I ran some code that chopped
a long file into (shall we say) a series of 5-second mini files, then ran
stat/stats on each one, to build up a picture of levels all the way through
the thing.

If background noise is a problem I think you need to clean that up first,
perhaps splitting the channels then cleaning them, then perhaps compressing
them (ie reducing dynamic range) - I'm not sure - then joining them back
into a stereo file.  Then again, increasing dynamic range might make the
split easier, even if you compress the audio again later.

Do you actually have stereo, or double mono?  Can any of the sound of the
person on left channel be heard on right channel, and vice versa?


another thing i just thought of, is it possible to split on the silence of
just 1 channel while the file is still stereo?

Don't know... but even if it's not, you could split the whole file into L & R
then split L into L fragments, and R into R fragments, and doing that you'd
find out (from the sizes/lengths of each fragment) where each channel's split
points were.  You could then (though I don't see how it would help) merge those
two lists of split points into one list and apply those to the original stereo
file.




--
Jeremy Nicoll - my opinions are my own

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: split stereo by channel and silence

Jan Stary
In reply to this post by jlnichols
On Jun 26 17:07:31, [hidden email] wrote:
> i have stereo wav files, which each channel is different speakers in a
> conversation.  trying to figure out how best to split a stereo file by both
> its channel and silence, but still know the order the files should be played
> in to hear the conversation has a whole.

Why do you want to do this?

> i don't want to merge the 2
> channels because often 1 channel has more background noise then the other
> and sometime speakers will speak over each other and keeping them separate
> will make it easier to understand them.

You can play the one and then play the other, or just the parts
where they speak over each other.

> the problem is, for play back sometimes i should play multiple R.###.wav
> files in a row, or multi L.###.wav files and i have no way of knowing when i
> should do this with my current setup.

I you play the L and R files in a sequence (whether one-by-one
or with occasional cluster of L or R as you describe), it will
not be the conversation that happend, exactly in the places
where they spoke over each other.

> instead of just having an increment counter for the name, is there a way to
> have have it use the starting time( in seconds or whatever) for that segment
> of the file? that way i'd have the below files and could just sort by the
> number for the play order.

First please descdribe _why_ you are doing this.
Are the parts when they both speak so uninteligible
that you need to separate them into two mono strems
to actually hear what each is saying?

        Jan


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: split stereo by channel and silence

Jan Stary
In reply to this post by jlnichols
On Jun 27 09:23:59, [hidden email] wrote:
> another thing i just thought of, is it possible to split on the silence of
> just 1 channel while the file is still stereo?

I don't think SoX can do that,
but it sounds like a usefull feature!

        Jan


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: split stereo by channel and silence

jlnichols
In reply to this post by Jan Stary
the reason why is i'm trying to use an ASR( Kaldi to be exact) to transcribe the audio. it seems to work better on short audio clips which is why the split on silence and keeping the channels separate makes it easy to know who the speaker is, plus it was was unintelligible to my model when they were speaking over each other in a single mono file.

i'm still very new to figuring out how to use Kaldi, so there easily could be better way within that tool to handle this.

On Thu, Jun 29, 2017 at 5:15 AM, Jan Stary <[hidden email]> wrote:
On Jun 26 17:07:31, [hidden email] wrote:
> i have stereo wav files, which each channel is different speakers in a
> conversation.  trying to figure out how best to split a stereo file by both
> its channel and silence, but still know the order the files should be played
> in to hear the conversation has a whole.

Why do you want to do this?

> i don't want to merge the 2
> channels because often 1 channel has more background noise then the other
> and sometime speakers will speak over each other and keeping them separate
> will make it easier to understand them.

You can play the one and then play the other, or just the parts
where they speak over each other.

> the problem is, for play back sometimes i should play multiple R.###.wav
> files in a row, or multi L.###.wav files and i have no way of knowing when i
> should do this with my current setup.

I you play the L and R files in a sequence (whether one-by-one
or with occasional cluster of L or R as you describe), it will
not be the conversation that happend, exactly in the places
where they spoke over each other.

> instead of just having an increment counter for the name, is there a way to
> have have it use the starting time( in seconds or whatever) for that segment
> of the file? that way i'd have the below files and could just sort by the
> number for the play order.

First please descdribe _why_ you are doing this.
Are the parts when they both speak so uninteligible
that you need to separate them into two mono strems
to actually hear what each is saying?

        Jan


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: split stereo by channel and silence

Dave Graff

According to online docs for Kaldi (http://kaldi-asr.org/doc/tools.html), you should find a utility called "extract-segments", which will take either a 1- or 2-channel wav file as input and will produce as output a listing of speech segments with their time stamps. (It looks like using it on single-channel data is easier/better, and it makes sense to do it this way, because the use of time stamps on the original data means that "silence" regions are not deleted from the data, so portions of interest in the two separate channels retain their original alignment relative to each other -- each speech segment can be handled independently of others, and has a unique identifier to keep track of its position in the overall timeline of the original recording.


I haven't used Kaldi at all myself, but this approach to speech detection (using a listing of time offsets, while preserving the full content of the original recording) is a pretty common procedure.


   Dave Graff



From: Jon Nichols <[hidden email]>
Sent: Thursday, June 29, 2017 9:50:14 AM
To: [hidden email]
Subject: Re: [SoX-users] split stereo by channel and silence
 
the reason why is i'm trying to use an ASR( Kaldi to be exact) to transcribe the audio. it seems to work better on short audio clips which is why the split on silence and keeping the channels separate makes it easy to know who the speaker is, plus it was was unintelligible to my model when they were speaking over each other in a single mono file.

i'm still very new to figuring out how to use Kaldi, so there easily could be better way within that tool to handle this.

On Thu, Jun 29, 2017 at 5:15 AM, Jan Stary <[hidden email]> wrote:
On Jun 26 17:07:31, [hidden email] wrote:
> i have stereo wav files, which each channel is different speakers in a
> conversation.  trying to figure out how best to split a stereo file by both
> its channel and silence, but still know the order the files should be played
> in to hear the conversation has a whole.

Why do you want to do this?

> i don't want to merge the 2
> channels because often 1 channel has more background noise then the other
> and sometime speakers will speak over each other and keeping them separate
> will make it easier to understand them.

You can play the one and then play the other, or just the parts
where they speak over each other.

> the problem is, for play back sometimes i should play multiple R.###.wav
> files in a row, or multi L.###.wav files and i have no way of knowing when i
> should do this with my current setup.

I you play the L and R files in a sequence (whether one-by-one
or with occasional cluster of L or R as you describe), it will
not be the conversation that happend, exactly in the places
where they spoke over each other.

> instead of just having an increment counter for the name, is there a way to
> have have it use the starting time( in seconds or whatever) for that segment
> of the file? that way i'd have the below files and could just sort by the
> number for the play order.

First please descdribe _why_ you are doing this.
Are the parts when they both speak so uninteligible
that you need to separate them into two mono strems
to actually hear what each is saying?

        Jan


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users