sound to sox to .txt to chatscript

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

sound to sox to .txt to chatscript

paul fellows
thanks to Eric Wong, hopefully his will be readable this time

I was looking for a sox forum I tried audacity and was directed here.
I do not know the etiquette of this site so I will just start begging for help.

The project I am trying to build is a speech to text system.
The system as 2 parts: first a sox filter to convert the mic input into a binary file with a .txt label. The .txt file is the in put to another beautiful and free program called chatscript. Chatscript is designed to let people make there own chat bots, I am intending to use its most basic property. If the pattern of the input matches the pattern of the rule then output is the rules response. Or less abstractly, if the input pattern matches the rule for P then it prints the letter P.

what I need sox to do is.
A) use the key board space bar to mute and unmute the mic.
B) to take 32 bit samples of the sound.
C) to contain no clobber at first so I can generate the files to go into the rules to be matched against.
D) maximize The gain without clipping.
E) no dither or other noise.
F)  multi thread, to process the tracks in parallel.
G) divide this into 2 tracks, if possible one the inverse of the other.
H) using the echo function (not echoes) add multipal echoes to each track, these echoes should be at the same volume as the track and have no decay. It is these echoes superimposed on there original track that carry the information about how the sound is changing with time.
I) one track would have delays of 11, 17, 23, 31, 47 millisecond. And he other would be 13, 19, 29, 37, 53 millisecond.
J)  these 2 tracks are then converted into a single bindery file by using soxes multiplication option.
K) this is then given the .txt file name.txt

I may have made a mistake I saying no decay for the echo. What I want from the 11 millisecond echo is for the echo to happen once at its original volume after 11 milliseconds then not at all at 22 milliseconds.

if you listened t his as a sound file it would probably sound bad, but hopefully it contains enough parametric information to allow for pattern matching.


------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sound to sox to .txt to chatscript

Fmiser
> paul wrote:
>
> The project I am trying to build is a speech to text system.

You want the characters "b" "o" "o" "k" to be generated from the
sound of a voice saying "book"?

> The system as 2 parts: first a sox filter to convert the mic
> input into a binary file with a .txt label.

SoX does not have any voice recognition capability.

> The .txt file is the in put to another beautiful and free
> program called chatscript. Chatscript is designed to let people
> make there own chat bots, I am intending to use its most basic
> property. If the pattern of the input matches the pattern of the
> rule then output is the rules response. Or less abstractly, if
> the input pattern matches the rule for P then it prints the
> letter P.

What I know as a chat bot, and what wikipedia reports as a chat bot
is a "artificial" user on an IRC channel.  Somehow I don't think
that is what you are wanting to do.  I went to chatscripts website
and found no help as to what it is doing.

> what I need sox to do is....

As Jan implied, SoX can do the _audio_ processing.  The rest of the
stuff that isn't manipulating audio files is beyond it's capability.

> A) use the key board space bar to mute and unmute the mic.
> B) to take 32 bit samples of the sound.
> C) to contain no clobber at first so I can generate the files to
>    go into the rules to be matched against.
> D) maximize The gain without clipping.
> E) no dither or other noise.
> F) multi thread, to process the tracks in parallel.
> G) divide this into 2 tracks, if possible one the inverse of the
>    other.
> H) using the echo function (not echoes) add multipal echoes to
>    each track, these echoes should be at the same volume
>    as the track and have no decay. It is these echoes superimposed
>    on there original track that carry the information about how
>    the sound is changing with time.
> I) one track would have delays of 11, 17, 23, 31, 47 millisecond.
>    And he other would be 13, 19, 29, 37, 53 millisecond.
> J) these 2 tracks are then converted into a single bindery file
>    by using soxes multiplication option.
> K) this is then given the .txt file name.txt

SoX is an "engine" - possibly like how you expect to use
chatscript.  It appears to me you need something else to tie it all
together. Python, Perl, even Javascript might work.  But managing
the mute/un-mute, file name manipulation, maybe "no clobber" (I'm
not sure what you mean by that.  No overwrite?  Generate a new name
rather than overwriting an existing?) are the sort of things that
are better handled by general programming.

-- fm

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sound to sox to .txt to chatscript

paul fellows
In reply to this post by paul fellows
Sorry Jan I should have replied quicker.
You wrote “Does that mean you want the *.txt file to be a text transcript of the speech?”
My answer is almost but not quite, I want the *.txt file to be a …............pattern transcript of the speech.
I want sox to transcribe a twentieth of a second length of microphone in put into a 32 bit binary pattern in a. .txt file.

By devied I mean splitting the mono input from the mic to stereo tracks that can be processed separately. Sorry if my switching between talking about tracks and files as the same thing caused any confusion.

fmiser asked “ You want the characters "b" "o" "o" "k" to be generated from the
sound of a voice saying "book"? My answer is no I ant to say 'book' and have it out put “b” “oo” “c”, after chatscript has done its work.

Fmiser you are right about chat bots being far more complex than the simple pattern matching that I am proposing. but if you strip away all of this extra stuff, what you are left with is pattern matching.

Jan said “The signal itself carries that information. why do you need the echo, exactly?”
I will try to explain it like this. the sound coming in from the mic has all of the information about the sound at that instant. If you where to make an image that instant of sound, and stack it next to images of all of the proceeding instances, you would have a picture of how the sound had changed over time. A conventional speech to text system dose something like that but with out the pretty picture and uses statistical analysis to try to understand the sound. By adding the multiple echoes I will be adding in the information about how the sound has changed. Imagine just 100 samples per second each 32 bits in size and containing the information about how the sound had changed. Each of these 32 bit files should unequally pass the information on to the chatscript side of the project to print to the screen the letter or letters hat match the sounds

to look at this an other way. A sample contains its own information, but nothing about the samples that preceded it. the echoes smear the information across samples.

Sorry if that is still clear as mud. What I am trying to do with the echo effect is to convert the samples to parametric time, as opposed to real time. If you are not familiar with the idea this will sound like techno babble, if you are I have done both of us a disservice by underestimating you.

The reference to no clobber comes from the sox PDF. I read it and thought I was understanding it until I to the bit that said that the effect that stops the train has to be first. (???) I can ham out from the sox PDF what the bits of code might be, but dose any one know of a good sox tutorial on how the bits go together.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sound to sox to .txt to chatscript

Jan Stary
On Dec 30 22:57:32, [hidden email] wrote:
> Sorry Jan I should have replied quicker.
> You wrote ???Does that mean you want the *.txt file to be a text transcript of the speech????
> My answer is almost but not quite, I want the *.txt file to be a ???............pattern transcript of the speech.
> I want sox to transcribe a twentieth of a second length of microphone in put into a 32 bit binary pattern in a. .txt file.

I still don't know what you want. What kind of 32bit pattern?
Individual 32bit samples (assuming it's a 32bit recording)?
What does it have to do with a "txt" file, whatever that means?

For example, here is a tweniteth of a second of sound:
$ sox -c 1 -r 48000 -n file.wav synth 0.05 sine 440
What would the corresponding "32bit pattern" be?

> By devied I mean splitting the mono input from the mic to stereo tracks
> that can be processed separately. Sorry if my switching between talking about tracks and files as the same thing caused any confusion.

Splitting a mono input to stereo tracks surely is confusing.

> Jan said ???The signal itself carries that information. why do you need the echo, exactly????
> I will try to explain it like this. the sound coming in from the mic has all of the information about the sound at that instant. If you where to make an image that instant of sound, and stack it next to images of all of the proceeding instances, you would have a picture of how the sound had changed over time.

Simply, the soundwave. Now what?

> A conventional speech to text system dose something like that but with out the pretty picture and uses statistical analysis to try to understand the sound. By adding the multiple echoes I will be adding in the information about how the sound has changed.

No you won't. The original sundwave already contains that information.

> Imagine just 100 samples per second each 32 bits in size

With a samplerate of 100Hz, you cannot record speech.

> and containing the information about how the sound had changed.
> Each of these 32 bit files should unequally pass the information
> on to the chatscript side of the project to print to the screen the letter or letters hat match the sounds

This just means passing the sequence of individual samples,
i.e. the soundwave, to some speech-to-text program. I still
don't know why you want to cut it to individual samples first.

> to look at this an other way. A sample contains its own information, but nothing about the samples that preceded it. the echoes smear the information across samples.

Riiight. So make 100 echoes for a second of your 100Hz recording.
That way, the last sample will contain all of the information.

> The reference to no clobber comes from the sox PDF. I read it and thought I was understanding it until I to the bit that said that the effect that stops the train has to be first.

The only reference to "clobber" I can see in the manpage
are the --clober and --no-clober options, meaning do (do not, resp.)
overwrite output files without asking.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sound to sox to .txt to chatscript

paul fellows
In reply to this post by paul fellows

Happy new year Jan and all.

Let me start by thanking Jan.
she is right 100 Hz is way to low. I made that mistake from thinking about the low rate of spoken words, typically less than 3 words per second. If I had gone on as I was it would not have worked. Somewhere in the sox documentation it says that phone companies use 8000Hz. So that will be my sample rate from now on.

I will try to explain again from scratch, a '32 bit word' to mean the content of the particular sample I am talking about. For 'input sample' to mean an individual 1/8000 th of a second of the on going microphone input, recoded as a 32 bit word. And 'output sample' to mean an individual 1/8000 th of a second of modified input, recoded as a 32 bit word.

There are two uses of the word sample in sox so to be clear I mean, that 10 seconds of recording is 80 000 samples, if recorded at 8000 samples per second.

1) the microphone picks up the sound and the hardware converts it to digital.
2) sox then tells the computer to treat the input as 32 bit words, one to describe each 1/8000 of a second.
3) these input samples are then processed by sox to produce the output samples.
4) after 54 milliseconds these output samples will contain information from the current input sample plus those from 11, 13, 17, 19, 23, 29, 31, 37, 47 and 53, milliseconds erlaer.

That is what the echoes are for.

5) each output sample is re-branded as *.txt

processing as describe in the original submission.
what I need sox to do is.
A) use the key board space bar to mute and unmute the mic.
B) to take 32 bit samples of the sound.
C) to contain no clobber at first so I can generate the files to go into the rules to be matched against.
D) maximize The gain without clipping.
E) no dither or other noise.
F)  multi thread, to process the tracks in parallel.
G) divide this into 2 tracks, if possible one the inverse of the other.
H) using the echo function (not echoes) add multi pal echoes to each track, \these echoes should be at the same volume as the track and have no decay. It is these echoes superimposed on there original track that carry the information about how the sound is changing with time.
I) one track would have delays of 11, 17, 23, 31, 47 millisecond. And he other would be 13, 19, 29, 37, 53,, millisecond.
J)  these 2 tracks are then converted into a single bindery file by using soxes multiplication option.
K) this is then given the .txt file name.

Subsequent modifications\
I may have made a mistake I saying no decay for the echo. What I want from the 11 millisecond echo is for the echo to happen once at its original volume after 11 milliseconds then not at all at 22 milliseconds.

I can ham out from the sox PDF what the bits of code might be, but dose any one know of a good sox tutorial on how the bits go together.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sound to sox to .txt to chatscript

Jan Stary
On Jan 01 19:52:24, [hidden email] wrote:
> Let me start by thanking Jan.
> she is right 100 Hz is way to low.

he.

> 1) the microphone picks up the sound and the hardware converts it to digital.
> 2) sox then tells the computer to treat the input as 32 bit words, one to describe each 1/8000 of a second.
> 3) these input samples are then processed by sox to produce the output samples.

Up to here, that's a pretty standard recording scenario.
rec -b 32 -r 8k

> 4) after 54 milliseconds these output samples will contain information from the current input sample plus those from 11, 13, 17, 19, 23, 29, 31, 37, 47 and 53, milliseconds erlaer.
> That is what the echoes are for.

I still have no idea why you want to do this at all,
let alone why you want it to be precisely 11, 13, etc.

> 5) each output sample is re-branded as *.txt

What do you even mean by a sample being "rebranded as *.txt", exactly?

> processing as describe in the original submission.
> what I need sox to do is.
> A) use the key board space bar to mute and unmute the mic.

SoX has no say over the spacebar.

> B) to take 32 bit samples of the sound.

rec -b 32

> C) to contain no clobber at first so I can generate the files
> to go into the rules to be matched against.

What clobber?
What rules?

> D) maximize The gain without clipping.

gain -n

> E) no dither or other noise.

-D

> F)  multi thread, to process the tracks in parallel.

What "tracks", and why does that imply "multi-threaded"?

> G) divide this into 2 tracks,

Divide _what_ into 2 tracks?

> if possible one the inverse of the other.

Why?

> H) using the echo function (not echoes) add multi pal echoes to each track,
> \these echoes should be at the same volume as the track and have no decay. It is these echoes superimposed on there original track that carry the information about how the sound is changing with time.

The original soundwave already contains precisely thta information,
woithout superimposing any echo.

> I) one track would have delays of 11, 17, 23, 31, 47 millisecond.
> And he other would be 13, 19, 29, 37, 53,, millisecond.

Why?

> J)  these 2 tracks are then converted into a single bindery file
> by using soxes multiplication option.

What "multiplication option"?

> K) this is then given the .txt file name.

You can give any file any name you want.
What does the "txt" have to do with anything?


I can see very little point in trying to help you further
unless you answer all of these questions in detail.

        Jan


------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users