Processing Telephone Messages

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Processing Telephone Messages

Colin Sheaff-2
I'm trying to batch process telephone messages I'm getting as 8 bit
WAV files into mp3s for web consumption on a Linux webserver. We want
this script to be fully automated and robust enough to handle both
poor and great line/recording quality.

Current process is SoundForge using a combination of DX and VST
plugins (Sony Wave Hammer, NR, Clipped Peak Restoration) on a Windows
machine, and I'm trying to recreate that (or something about as good)
with SoX, mostly using compand. However current settings are not
sounding nearly as good. I used the language101.com script as a
starting off point and haven't found any better compand settings.
Currently the script looks like:

$SOX "|$SOX $1.wav -r 44100 -p gain -h \
  remix - \
  highpass 100 \
  gain -n \
  compand 0.05,0.2 6:-54,-90,-36,-36,-24,-24,0,-12 0 -90 0.1 \
  gain -0.5 \
  vad -T 0.6 -p 0.2 -t 5 \
  fade 0.1 \
  reverse \
  vad -T 0.6 -p 0.2 -t 5 \
  fade 0.1 \
  reverse" "$2.mp3"

The vad settings for trimming the end are sometimes a bit aggressive
and I'm losing more than just slience. But the deal breaker is that
the compand settings just are nowhere near as robust across the
quality range we get from telephony audio, and don't improve the
signal as much as the SoundForge plugins do.

I'm happy to share more specifics about the SoundForge workflow, but
mostly I'm looking for suggestions about improving the above SoX
script (with or without other plugins) given the above constraints.

Thanks so much for your time,
Colin

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Jan Stary
On Nov 02 15:45:50, Colin Sheaff wrote:
> I'm trying to batch process telephone messages I'm getting as 8 bit
> WAV files into mp3s for web consumption on a Linux webserver.

Why did you decide to use the mp3 format?

The original voice stream (or, as original as you get your hands on it),
is probably GSM; so someone decodes GSM for you into WAV, and you encode
it into MP3, right? That means a quality loss. You could use GSM right away.

It is true that SoX decodes everything into linear PCM before
processing, so processing GSM files would also mean decoding and
encoding, but at least it would be the same decoding/encoding
algorithm on both ends. I believe it would improve the sound quality.

You could also consider some dedicated *speech* codec,
such as Speex (http://www.speex.org/). But I agree that users
are most used to MP3.

> We want
> this script to be fully automated and robust enough to handle both
> poor and great line/recording quality.

That's not necesarilly easy. But at least all the input files are uniform
in their format: 8bit mono WAVs, right? What is the sample rate? Could you
please put a typical good one and a typical bad one somewhere for download
(or send off-list), if privacy allows?

> Current process is SoundForge using a combination of DX and VST
> plugins (Sony Wave Hammer, NR, Clipped Peak Restoration) on a Windows
> machine, and I'm trying to recreate that (or something about as good)
> with SoX, mostly using compand.

This interests me, just to show that SoX can do at least as well
as some $400 Windows audio software :-)

I have done this some years ago in a company that did VoiceIP call
recordings (for banks, insurance companies and call centres, typically).
The codec was G.729 I think. We used SoX and compand, but I no longer
have the code.

I remember the biggest problem was that what you hear in the earpiece
(or the headphones) of the hardware phone is much better than
what actually goes across the wire / across the radio. The telephones
themselves do some nice improvements to the signal: jitter correction,
out-of-time reassembling, silence replacement, etc. If you only sniff
the wire (to, say, tcpdump the UDP packed GSM packets), then the GSM
stream you restore from that is much uglier then what the user hears
from his phone. You want to do some of that in software; most
importantly, make sure that the voice packets are correctly
time-alligned.

> However current settings are not
> sounding nearly as good. I used the language101.com script as a

What script is that?

> starting off point and haven't found any better compand settings.
> Currently the script looks like:
>
> $SOX "|$SOX $1.wav -r 44100 -p gain -h \
>   remix - \
>   highpass 100 \
>   gain -n \
>   compand 0.05,0.2 6:-54,-90,-36,-36,-24,-24,0,-12 0 -90 0.1 \
>   gain -0.5 \
>   vad -T 0.6 -p 0.2 -t 5 \
>   fade 0.1 \
>   reverse \
>   vad -T 0.6 -p 0.2 -t 5 \
>   fade 0.1 \
>   reverse" "$2.mp3"
>
> The vad settings for trimming the end are sometimes a bit aggressive
> and I'm losing more than just slience. But the deal breaker is that
> the compand settings just are nowhere near as robust across the
> quality range we get from telephony audio, and don't improve the
> signal as much as the SoundForge plugins do.

Well, what exactly do the SoundeForge plugins do?
I don't use SoundForge myself, but it porbably has
some options to be set for the individual plugins
(such as the compander).


> I'm happy to share more specifics about the SoundForge workflow, but
> mostly I'm looking for suggestions about improving the above SoX
> script (with or without other plugins) given the above constraints.

How exactly does the signal get to you? Who provides the 8bit WAV
stream, and how does he get the 8bit WAV stream from the original voice
stream? In other words, what is the signal path from the actual phones
down to your software?

        Jan


------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Colin Sheaff-2
Thanks so much for the feedback, Jan. Jste Čech?

On Thu, Nov 3, 2011 at 10:03 AM, Jan Stary <[hidden email]> wrote:
> Why did you decide to use the mp3 format?

Mostly as it has decent compression and is compatible with the
in-browser player we chose.
We may switch players at some point, but for now this meets our needs.

> The original voice stream (or, as original as you get your hands on it),
> is probably GSM; so someone decodes GSM for you into WAV, and you encode
> it into MP3, right? That means a quality loss. You could use GSM right away.

Our 3rd party voicemail provider (SimpleVoiceBox) supplies our audio
to us as wav. We're a small start-up, so there's very little we can
afford to do 100% ourselves yet.

>> We want
>> this script to be fully automated and robust enough to handle both
>> poor and great line/recording quality.
>
> That's not necesarilly easy. But at least all the input files are uniform
> in their format: 8bit mono WAVs, right? What is the sample rate? Could you
> please put a typical good one and a typical bad one somewhere for download
> (or send off-list), if privacy allows?

soxi says the following about our wav files:
Channels       : 1
Sample Rate    : 8000
Precision      : 14-bit
Bit Rate       : 64.0k
Sample Encoding: 8-bit u-law

I've put three examples - loud/distorted, quiet, and mixed quiet and loud at:
http://selfsimilar.net/sox/


>> However current settings are not
>> sounding nearly as good. I used the language101.com script as a
>
> What script is that?

http://sox.sourceforge.net/Docs/Scripts - voice-cleanup.sh

> Well, what exactly do the SoundeForge plugins do?
> I don't use SoundForge myself, but it porbably has
> some options to be set for the individual plugins
> (such as the compander).

SoundForge workflow (which, admittedly, is overkill):
1. resample to 44.1
2. Sony Wave Hammer[1] - limit at 6db and maximize
3. Sony Noise Reduction (clips and pops)
4. Sony Clipped Peak Restoration
5. Insert 2 sec of silence
6. Sony Noise Reduction: (general nr using a standard noise profile)
  i.   reduce noise by 12db
  ii.  attack speed fast - 90
  iii. release speed 50
7. Sony Wave Hammer - volume maximizer - threshold set at -6db with
limiter to avoid clipping

[1] - Sony Wave Hammer is a compressor/volume maximizer plugin. Best
info I could get was here: http://vimeo.com/12129360

Really we just need compression/maximize and some limiting. Maybe
noise reduction, but not really necessary.

Anyways, thanks again for all the help. Děkuji moc. Učim se česky,
protože moje žena je z Prahy. (:

Colin

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Jan Stary
> soxi says the following about our wav files:
> Channels       : 1
> Sample Rate    : 8000
> Precision      : 14-bit
> Bit Rate       : 64.0k
> Sample Encoding: 8-bit u-law

Note that the u-law encoding involves some dynamic compression already.

> I've put three examples - loud/distorted, quiet, and mixed quiet and loud at:
> http://selfsimilar.net/sox/
>
> > Well, what exactly do the SoundeForge plugins do?
> > I don't use SoundForge myself, but it porbably has
> > some options to be set for the individual plugins
> > (such as the compander).
>
> SoundForge workflow (which, admittedly, is overkill):
> 1. resample to 44.1

I don't think you need this for the processing itself,
but the final mp3 encoder might work better with
higher sample rates.


> 2. Sony Wave Hammer[1] - limit at 6db and maximize

This is easily done with 'gain -n -6'.
But the SWH also does some compression, right?
What exactly are the parameters of this compression?

> 3. Sony Noise Reduction (clips and pops)

SoX has the 'noiseprof' and 'noisered' effects, which by definition
(see the manpage) do not perform uniformly, but rather specificly
to the given signal. What does the Sony NR actually do, signal-wise?
Set a compander attack so that the (short) clicks and pops don't
make it through?

> 4. Sony Clipped Peak Restoration

SoX does not have such capability.

> 5. Insert 2 sec of silence

Where?

> 6. Sony Noise Reduction: (general nr using a standard noise profile)

What standard noise profile is that? Can you get it into a noise profile
file?  Then you could use it with 'noisered'.

>   i.   reduce noise by 12db
>   ii.  attack speed fast - 90
>   iii. release speed 50

> 7. Sony Wave Hammer - volume maximizer - threshold set at -6db with
> limiter to avoid clipping

gain -l -6

> [1] - Sony Wave Hammer is a compressor/volume maximizer plugin. Best
> info I could get was here: http://vimeo.com/12129360
>
> Really we just need compression/maximize and some limiting. Maybe
> noise reduction, but not really necessary.

It is trivial to maximize/limit the signal with the 'gain' effect.
What breaks the uniformity is the compression, because the different
recordings will differ very much in their dynamic range. So: what
are the compression parameters of the Sony Wave Hammer (that sound
good to you)?


> Anyways, thanks again for all the help. Děkuji moc. Učim se česky,
> protože moje žena je z Prahy. (:

Vaěe čeština je velmi dobrá :-)

        Jan



------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Jan Stary
In reply to this post by Colin Sheaff-2
On Nov 04 21:04:27, Colin Sheaff wrote:
> I've put three examples - loud/distorted, quiet, and mixed quiet and loud at:
> http://selfsimilar.net/sox/

Could you please also upload the versions (the MP3s)
processed with your current processing chain? I am
fiddling with various compand setting obviously
and would like to have some comparison to what
is "good" for you with SoundForge.

> >> However current settings are not
> >> sounding nearly as good. I used the language101.com script as a
> >
> > What script is that?
>
> http://sox.sourceforge.net/Docs/Scripts - voice-cleanup.sh
>
> > Well, what exactly do the SoundeForge plugins do?
> > I don't use SoundForge myself, but it porbably has
> > some options to be set for the individual plugins
> > (such as the compander).
>
> SoundForge workflow (which, admittedly, is overkill):
> 1. resample to 44.1
> 2. Sony Wave Hammer[1] - limit at 6db and maximize
> 3. Sony Noise Reduction (clips and pops)
> 4. Sony Clipped Peak Restoration
> 5. Insert 2 sec of silence
> 6. Sony Noise Reduction: (general nr using a standard noise profile)
>   i.   reduce noise by 12db
>   ii.  attack speed fast - 90
>   iii. release speed 50
> 7. Sony Wave Hammer - volume maximizer - threshold set at -6db with
> limiter to avoid clipping
>
> [1] - Sony Wave Hammer is a compressor/volume maximizer plugin. Best
> info I could get was here: http://vimeo.com/12129360
>
> Really we just need compression/maximize and some limiting. Maybe
> noise reduction, but not really necessary.
>
> Anyways, thanks again for all the help. Děkuji moc. Učim se česky,
> protože moje žena je z Prahy. (:
>
> Colin
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Sox-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/sox-users

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Colin Sheaff-2
On Sunday, November 6, 2011, Jan Stary <[hidden email]> wrote:
> On Nov 04 21:04:27, Colin Sheaff wrote:
>> I've put three examples - loud/distorted, quiet, and mixed quiet and loud at:
>> http://selfsimilar.net/sox/
>
> Could you please also upload the versions (the MP3s)
> processed with your current processing chain? I am
> fiddling with various compand setting obviously
> and would like to have some comparison to what
> is "good" for you with SoundForge.
>
One thing to clarify - we currently just do an unprocessed convert
from wav to mp3 for the website. It's only for the CD that we send to
our customers at the end that we do the SoundForge processing. But we
want to mimic that for the web. So I'll upload processed wav files
shortly so you can compare.

Colin
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Colin Sheaff-2
In reply to this post by Jan Stary
I've added precessed versions of the example files:
http://selfsimilar.net/sox/

Also, just to take a step back - the process we currently use is for
the CD, and it's probably overkill even for that. All we really want
is to make all the voice messages the same loudness, so that people
don't have to adjust the volume on their computer. This is basically
gain/compression/normalization with the trick of avoiding
clipping/distortion. Noise reduction would be nice, but it's not
absolutely necessary.

To address your points:

>> 2. Sony Wave Hammer[1] - limit at 6db and maximize
>
> This is easily done with 'gain -n -6'.
> But the SWH also does some compression, right?
> What exactly are the parameters of this compression?

We disabled compression in SWH.

>> 3. Sony Noise Reduction (clips and pops)
>
> SoX has the 'noiseprof' and 'noisered' effects, which by definition
> (see the manpage) do not perform uniformly, but rather specificly
> to the given signal. What does the Sony NR actually do, signal-wise?
> Set a compander attack so that the (short) clicks and pops don't
> make it through?

Yes - very similar to the effect built in to Audacity, if you're familiar.

>> 4. Sony Clipped Peak Restoration
>
> SoX does not have such capability.

We mostly introduced the clipped peaks ourselves when boosting the
gain. If we can avoid clipping, we won't need to worry about this.

>> 5. Insert 2 sec of silence
>
> Where?

It's just for the CD so never mind.

>> 6. Sony Noise Reduction: (general nr using a standard noise profile)
>
> What standard noise profile is that? Can you get it into a noise profile
> file?  Then you could use it with 'noisered'.

We use profile #2. Sony baked the profiles into the effect, so without
digging into the binary I don't think we can extract their profile. I
guess we'll have to find or create our own noise profile for general
telephony.

> It is trivial to maximize/limit the signal with the 'gain' effect.
> What breaks the uniformity is the compression, because the different
> recordings will differ very much in their dynamic range. So: what
> are the compression parameters of the Sony Wave Hammer (that sound
> good to you)?

Hopefully the processed file will give you an idea of the touch we're going for.

Thanks again,
Colin

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Colin Sheaff-2
Also, to your point earlier about showing that SoX can be just as good
as expensive software: if we can get SoX to meet our needs, we'd be
more than happy to publicly endorse SoX, and we'll definitely donate
to the PayPal fund. Everyone can use some extra beer money. :-)

Colin

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Fmiser
In reply to this post by Colin Sheaff-2
> Colin Sheaff wrote:

> Also, just to take a step back - the process we currently use
> is for the CD, and it's probably overkill even for that. All
> we really want is to make all the voice messages the same
> loudness, so that people don't have to adjust the volume on
> their computer.

This requirement can get tricky.

It's easy to calculate peaks - but we don't perceive "loudness"
or "volume" based on peaks.  It tends to be a frequency weighed
level average.  This makes it difficult to use software to
adjust the level to what a human would call consistent
volume.   For speech, especially limited audio bandwith like
telephone, comparing RMS levels will probably work okay.

So you may want to calculate the average RMS level to adjust
the gain for a particular track.  I would do it after all the
compression is done.

--   Philip

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Jan Stary
In reply to this post by Colin Sheaff-2
> I've put three examples - loud/distorted, quiet, and mixed quiet and loud at:
> http://selfsimilar.net/sox/

Could you alo please upload typical 'good' one?

        Jan

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Jan Stary
In reply to this post by Colin Sheaff-2
On Nov 06 21:20:23, Colin Sheaff wrote:

> I've added precessed versions of the example files:
> http://selfsimilar.net/sox/
>
> Also, just to take a step back - the process we currently use is for
> the CD, and it's probably overkill even for that. All we really want
> is to make all the voice messages the same loudness, so that people
> don't have to adjust the volume on their computer. This is basically
> gain/compression/normalization with the trick of avoiding
> clipping/distortion. Noise reduction would be nice, but it's not
> absolutely necessary.

I have listened to the unprocessed and processed examples,
and frankly, I don't find the processed ones any 'better',
except they are obviously normalized not to clip (and much
bigger, as they are 16bit PCM @ 44100).

I have processed them with a simple
sox -G in.wav -e signed -b 16 out.wav gain -n -6 rate -h 44100
The output can be downloaded at http://stare.cz/~hans/.tmp/colin/

This output is IMHO no worse than your processed examples.
Are you sure there actually *is* some noise reduction
and peak restoration etc involved?

> We disabled compression in SWH.

Then the recordings with both soft and loud passages will
still have soft and loud passages. I will also try to do
some compression to the examples.

> >> 3. Sony Noise Reduction (clips and pops)
> >
> > SoX has the 'noiseprof' and 'noisered' effects, which by definition
> > (see the manpage) do not perform uniformly, but rather specificly
> > to the given signal. What does the Sony NR actually do, signal-wise?
> > Set a compander attack so that the (short) clicks and pops don't
> > make it through?
>
> Yes - very similar to the effect built in to Audacity, if you're familiar.

There is a loud pop at the end of 'quiet.wav'
which is still there in 'quiet-processed.wav'.
Could you please upload an example with some
clicks and pops listenably processed out?

> >> 4. Sony Clipped Peak Restoration
> >
> > SoX does not have such capability.
>
> We mostly introduced the clipped peaks ourselves when boosting the
> gain. If we can avoid clipping, we won't need to worry about this.

Well, -G makes sure your effects keep enough headroom not to clip.

> >> 6. Sony Noise Reduction: (general nr using a standard noise profile)
> >
> > What standard noise profile is that? Can you get it into a noise profile
> > file?  Then you could use it with 'noisered'.
>
> We use profile #2. Sony baked the profiles into the effect, so without
> digging into the binary I don't think we can extract their profile. I
> guess we'll have to find or create our own noise profile for general
> telephony.

Is there any information about what actually is Sony's
"noise profile #2"? Isn't that just white noise / pink noise / whatever?

> Hopefully the processed file will give you an idea
> of the touch we're going for.

As I said, the processed examples sound to me almost exactly like
the 'raw' ones, except they don't clip. If I didn't know the story
behind, I would say they are just normalized.

        Jan


------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: Processing Telephone Messages

Colin Sheaff-2
Jan,

Thanks for all the help in this. I haven't had time to write back since things have been very busy for the business. Which is a good thing :)

At any rate, I wanted to let you know that we settled on a very simple 'sox --norm in.wav out.mp3'. We're hoping to improve this as time goes on with leading/trailing silence trim and maybe some other tricks, but this is a huge step up from where we were at before. So thanks for working with us on this. Very appreciated, even if the end result is pretty basic.

Now we just have to get sox working on our shared hosting webserver. Fun, fun, fun.</sarcasm>

Cheers,
Colin Sheaff

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users