sox to chatscript

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

sox to chatscript

paul fellows
The idea as I said before is to use sox as one half of a speech to text system.
The other half will be writen in a program / language called chatscript
Chatscript can read an input from a .txt file, normaly the input would be some thing like “i love you.”and the output might be “do you?”, normal conversationa chat. my intention is to use the .txt file that sox will produce when It ears the sound / p / to out put the letter “p”.

> 1) the microphone picks up the sound and the hardware converts it to digital.
> 2) sox then tells the computer to treat the input as 32 bit words, one to describe each 1/8000 of a second.
> 3) these input samples are then processed by sox to produce the output samples.

Up to here, that's a pretty standard recording scenario.
rec -b 32 -r 8k

as the intention process then output the input, not to keep it, so working from the help that you gave me previously and the sox pdf I think it might start something like.
-b 23 -r 8k -e filename.raw

> 4) after 54 milliseconds these output samples will contain information from the current input sample plus those from 11, 13, 17, 19, 23, 29, 31, 37, 47 and 53, milliseconds erlaer.
> That is what the echoes are for.

I still have no idea why you want to do this at all,

this is where we have been we have been gtting our selves confussed.
You have been using the word sample in the sence that you might take a 30 sample of birdsong, and from that piont of view I can see why the echoes made absolutly no sence. I have been thinkig in terms of the last input from the mic, the latest of 8000 samples this second. And I could not understand why you seemed to be saying that this 8000 th of a second some how had information about what happened upto 400 samples previous. The reason for the echoes is to    add those erlyer inputs to the to t he curent one, to make  the output. There are 2 reasons for not just recrdng a tweniteth of a second of sound and usng that as the input to chatscript. 1] A tweniteth at -b 32 -r 8k is 2*10^51200 possible combinations of 1s 0s, eich is far to many to find the match. I would have big nmber thre but my calculator lies and says its infnite. A singel 32 bit word has 2*10^32 = 4294967296 combinations. 2] what about the part fo the sound that crosses between samples, it would not be mached.

let alone why you want it to be precisely 11, 13, etc.

the tweniteth of a second is a best to fit with the rate f changes of sounds with in spoken words, and my well have to change. The actualy quoted numbers are just prime munberd intervals witin that.

> 5) each output sample is re-branded as *.txt

What do you even mean by a sample being "rebranded as *.txt", exactly?

From brousing this forum, I came across “convet mp3 to .txt file” started by  John S Higgs. Ulrich Klauer suggested.
        sox in.mp3 -t dat out.txt
later on in the conversation Jan Stary says

“You can take any string of integers and make a WAV of them.
But yes, you can of course convert DAT <-> WAV both ways.”
so the idea of having sox give .txt output is not imposible. If I have to use wav as my input file type, so be it but  I need .txt output to feed chatscript.

> processing as describe in the original submission.
> what I need sox to do is.
> A) use the key board space bar to mute and unmute the mic.

SoX has no say over the spacebar.

But can the space barcontrole sox. It dose not matter if it can not, its just an extra.

> B) to take 32 bit samples of the sound.

rec -b 32

> C) to contain no clobber at first so I can generate the files
> to go into the rules to be matched against.

What clobber?
What rules?

From the sox pdf clobber overwrites the output file without asking. No clobber dose not overwrite it, I am still not clear if it extends the output file or writes multipal output files

> D) maximize The gain without clipping.

gain -n

> E) no dither or other noise.

-D

I thank  you for D &E

> F)  multi thread, to process the tracks in parallel.

What "tracks", and why does that imply "multi-threaded"?

> G) divide this into 2 tracks,

Divide _what_ into 2 tracks?

> if possible one the inverse of the other.

Why?

Actualy following from our discution I can see that this was me, unnesassrily complicatng  thing that putting all of the echoes In a line, would work just as well
        echo 11 1 13 1 17 1 19 1 23 1 29 1 31 1 37 1 47 1 53 1
this will add the selected previous input smples to the current input sample to make the output sample.

> H) using the echo function (not echoes) add multi pal echoes to each track,
> \these echoes should be at    the same volume as the track and have no decay. It is these echoes superimposed on there        original track that carry the information about how the sound is changing with time.

The original soundwave already contains precisely that information,
woithout superimposing any echo.

as mentioned above a wav file will contain all of the information that was recoded to make it, the individule samples that make up that wav file do not contain the information  about other samples.

> I) one track would have delays of 11, 17, 23, 31, 47 millisecond.
> And he other would be 13, 19, 29, 37, 53,, millisecond.

Why?

to add the selected previous input samples to the current input sample and make the output sample.


> J)  these 2 tracks are then converted into a single bindery file
> by using soxes multiplication option.

What "multiplication option"?

Merging files by multiplication is a built in option for sox see the sox pdf. I do not need it now.

> K) this is then given the .txt file name.

You can give any file any name you want.
What does the "txt" have to do with anything?

I need the output to be .txt because chatscript needs .txt for its input file.


I can see very little point in trying to help you further
unless you answer all of these questions in detail.

        Jan

although there as been much misunderstanding between us. You have helped me a lot.
        -d gain -n -D -b 23 -r 8k -e filename.raw dat root/chatin.txt echo 11 1 13 1 17 1 19 1 23 1 29 1 31 1 37 1 47 1 53 1

or something like this, thanks
                                        paul fellows

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sox to chatscript

Jan Stary
On Jan 03 16:57:31, [hidden email] wrote:
> The idea as I said before is to use sox as one half of a speech to text system.
> The other half will be writen in a program / language called chatscript
> Chatscript can read an input from a .txt file, normaly the input would be some thing like ???i love you.???and the output might be ???do you????, normal conversationa chat. my intention is to use the .txt file that sox will produce when It ears the sound / p / to out put the letter ???p???.

SoX cannot do anything like that.

> > 1) the microphone picks up the sound and the hardware converts it to digital.
> > 2) sox then tells the computer to treat the input as 32 bit words, one to describe each 1/8000 of a second.
> > 3) these input samples are then processed by sox to produce the output samples.
>
> Up to here, that's a pretty standard recording scenario.
> rec -b 32 -r 8k
>
> as the intention process then output the input, not to keep it, so working from the help that you gave me previously and the sox pdf I think it might start something like.
> -b 23 -r 8k -e filename.raw
>
> > 4) after 54 milliseconds these output samples will contain information from the current input sample plus those from 11, 13, 17, 19, 23, 29, 31, 37, 47 and 53, milliseconds erlaer.
> > That is what the echoes are for.
>
> I still have no idea why you want to do this at all,
>
> this is where we have been we have been gtting our selves confussed.
> You have been using the word sample in the sence that you might take a 30 sample of birdsong, and from that piont of view I can see why the echoes made absolutly no sence.

No. I have been using the word 'sample'
in the obvious audio-related sense. A number.

> I have been thinkig in terms of the last input from the mic, the latest of 8000 samples this second. And I could not understand why you seemed to be saying that this 8000 th of a second some how had information about what happened upto 400 samples previous.

Again, no; that's what you was saying will happen if you superimposed
the echoes of the previous samples onto that last sample.

> The reason for the echoes is to    add those erlyer inputs to the to t he curent one, to make  the output. There are 2 reasons for not just recrdng a tweniteth of a second of sound and usng that as the input to chatscript.

I am not suggesting anything like that.

> 1] A tweniteth at -b 32 -r 8k is 2*10^51200 possible combinations of 1s 0s, eich is far to many to find the match. I would have big nmber thre but my calculator lies and says its infnite. A singel 32 bit word has 2*10^32 = 4294967296 combinations. 2] what about the part fo the sound that crosses between samples, it would not be mached.
>
> let alone why you want it to be precisely 11, 13, etc.
>
> the tweniteth of a second is a best to fit with the rate f changes
> of sounds with in spoken words, and my well have to change.

I am having a hard time even parsing that sentence.
I have no idea why exactly is a twentieth of a second
"a best to fit the rate of change of sounds" (whatever that means).
Also, I don't think you have any idea either.

> The actualy quoted numbers are just prime munberd intervals witin that.

What on earth do "prime numbered intervals" have to do with any of that?
Again, you don't have a clue, do you.

> > 5) each output sample is re-branded as *.txt
>
> What do you even mean by a sample being "rebranded as *.txt", exactly?
>
> From brousing this forum, I came across ???convet mp3 to .txt file??? started by  John S Higgs. Ulrich Klauer suggested.
> sox in.mp3 -t dat out.txt
> later on in the conversation Jan Stary says
>
> ???You can take any string of integers and make a WAV of them.
> But yes, you can of course convert DAT <-> WAV both ways.???
> so the idea of having sox give .txt output is not imposible. If I have to use wav as my input file type, so be it but  I need .txt output to feed chatscript.

Sigh. What do you _mean_ by "txt output"? What _format_?
Example: record yourself with a microphone, saying "bullshit".
That's a recorded sound, right? What exactly would be the desired
"txt output" you are imagining to come out?


> > F)  multi thread, to process the tracks in parallel.
>
> What "tracks", and why does that imply "multi-threaded"?
>
> > G) divide this into 2 tracks,
>
> Divide _what_ into 2 tracks?
>
> > if possible one the inverse of the other.
>
> Why?
>
> Actualy following from our discution I can see that this was me, unnesassrily complicatng  thing that putting all of the echoes In a line, would work just as well
> echo 11 1 13 1 17 1 19 1 23 1 29 1 31 1 37 1 47 1 53 1
> this will add the selected previous input smples to the current input sample to make the output sample.

No it won't. Apparently, you haven't even looked at the syntax
of the echo effect, and you still haven't explained whta you even
expect from it and why.

> > H) using the echo function (not echoes) add multi pal echoes to each track,
> > \these echoes should be at    the same volume as the track and have no decay. It is these echoes superimposed on there        original track that carry the information about how the sound is changing with time.
>
> The original soundwave already contains precisely that information,
> woithout superimposing any echo.
>
> as mentioned above a wav file will contain all of the information that was recoded to make it, the individule samples that make up that wav file do not contain the information  about other samples.

Yes. So what?

>
> > I) one track would have delays of 11, 17, 23, 31, 47 millisecond.
> > And he other would be 13, 19, 29, 37, 53,, millisecond.
>
> Why?
>
> to add the selected previous input samples to the current input sample and make the output sample.

WHY?

> > J)  these 2 tracks are then converted into a single bindery file
> > by using soxes multiplication option.
>
> What "multiplication option"?
>
> Merging files by multiplication is a built in option for sox see the sox pdf. I do not need it now.

The only occurence of 'multiplication' in the SoX manpage
is in the description of the hilbert phase-shifter. So
what 'multiplication option' (and what 'SoX pdf') are you talking about.

> > K) this is then given the .txt file name.
>
> You can give any file any name you want.
> What does the "txt" have to do with anything?
>
> I need the output to be .txt because chatscript needs .txt for its input file.

The "txt" most probably does not mean what you think it means.
In particular, it does not men it is text.

> although there as been much misunderstanding between us. You have helped me a lot.
> -d gain -n -D -b 23 -r 8k -e filename.raw dat root/chatin.txt echo 11 1 13 1 17 1 19 1 23 1 29 1 31 1 37 1 47 1 53 1
> or something like this, thanks

This is, of course, not even valid syntax of a SoX command.

Let's wrap it up: SoX cannot do what you want it to do.
PLEASE take this elsewhere. Don't forget your medicine.



------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sox to chatscript

Mike Hamilton-6
In reply to this post by paul fellows

Like many others, I have been baffled by what Paul Fellows was hoping to achieve. The key is where he writes:

 

> From brousing this forum, I came across ?convet mp3 to .txt file? started by  John S Higgs. Ulrich Klauer suggested.

>             sox in.mp3 -t dat out.txt

 

The ".txt" extension has misled Paul. He assumes that given an audio input file, SoX will produce a transcription (speech to text).

 

The example of course, merely writes a text file containing the value of each sample contained within “in.mp3”. Paul, that file will not help you in the slightest with your chatbot project. SoX can do many things, but it cannot, repeat *CANNOT* convert speech to text.

Please try the example "sox in.mp3 -t dat out.txt" and load "out.txt" into a text editor. All should then become clear - and Jan Stary's blood pressure will improve!

 


------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sox to chatscript

paul fellows
In reply to this post by paul fellows

Like many others, I have been baffled by what Paul Fellows was hoping to
achieve. The key is where he writes:

 

> From brousing this forum, I came across ?convet mp3 to .txt file? started
by  John S Higgs. Ulrich Klauer suggested.

>             sox in.mp3 -t dat out.txt

 

The ".txt" extension has misled Paul. He assumes that given an audio input
file, SoX will produce a transcription (speech to text).

>>>>>No!
 

The example of course, merely writes a text file containing the value of
each sample contained within "in.mp3".

>>>>>>>That is what I want sox to do. Take an input from the microphone, modify it, then out put it as a pattern of 32 ones and zeros in a .txt file, to be feed into a pattern maching program.

Paul, that file will not help you in
the slightest with your chatbot project. SoX can do many things, but it
cannot, repeat *CANNOT* convert speech to text.

>>>>>>>I do not expect it to!

Please try the example "sox in.mp3 -t dat out.txt" and load "out.txt" into a
text editor. All should then become clear - and Jan Stary's blood pressure
will improve!

>>>>>>>>The nonsense that this would output is not a problem, provided that the sound /p/ as spoken by me produces the same set of patterns each time, and no other sound produces the same patters. Then it will do what I want. Chatscript will print the letter p.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sox to chatscript

Chris Angelico
On Mon, Jan 5, 2015 at 3:02 AM, paul fellows <[hidden email]> wrote:
> The nonsense that this would output is not a problem, provided that the sound /p/ as spoken by me produces the same set of patterns each time, and no other sound produces the same patters. Then it will do what I want. Chatscript will print the letter p.
>

What you're asking for is the entire pattern recognition code. Perhaps
it would be easier to describe this using written text, taking the
whole question of speech recog out of the picture. Would you expect
the pngtopnm program to be able to produce a file which has a unique
and consistent pattern for the handwritten letter 'p'? No, because
optical character recognition is not a part of pngtopnm's job. Plus,
handwriting is inherently messy (in the computing sense; though mine
is messy in every other sense too), so it's extremely difficult to
produce a perfect transcription.

You're asking for some fairly magical features here, and you'll do far
FAR better to look for a dedicated speech-to-text tool and see if you
can do it with that.

ChrisA

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sox to chatscript

Mike Hamilton-6
In reply to this post by paul fellows
> The nonsense that this would output is not a problem, provided that the
sound /p/ as spoken by me produces the same set of patterns each time, and
no other sound produces the same patters.

(sigh) You've been told AGAIN and AGAIN and AGAIN that this infantile
approach to speech recognition WILL NOT WORK and CAN NEVER WORK, but you're
so convinced that you're better than +60 years of research into speech to
text, so you just won't listen.

> provided that the sound /p/ as spoken by me produces the same set of
patterns each time, and no other sound produces the same patters.

This WILL *NOT NOT NOT* produce the same "set of patterns each time". And
(in English) how would you distinguish between the "p" in pear" and the "p"
in "phone" ?

I'm not sure whether you're (a) a troll (b) a 13 year old kid on a Commodore
64 or (c) just incredibly obtuse.

WHAT YOU REALLY WANT is Microsoft's (free) speech to text API (SAPI) at e.g.
http://msdn.microsoft.com/en-us/library/ms720151%28v=vs.85%29.aspx . All
you'll have to learn is C++, which admittedly is slightly harder than the
DOS command line interface to SoX.

I would feel really bad if you really are a starry-eyed kid and I'm crushing
your dreams ... but someone had to say it.


------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: sox to chatscript

Fmiser
In reply to this post by paul fellows
> paul wrote:
>
> The nonsense that this would output is not a problem,
> provided that the sound /p/ as spoken by me produces the
> same set of patterns each time, and no other sound produces
> the same patters. Then it will do what I want.  Chatscript
> will print the letter p.

It won't.

I'm still not sure I fully understand all you are scheming.
However, I can confidently say that the audio sound of the
letter "p", as it shows up in any word, will NOT "translate"
to the same pattern of numbers every time - even if it is only
one person saying the words.  Speech is just too complex.  

If you do precision gain matching, and the person talks slowly
and clearly, and you do statistical analysis on the numbers,
you might be able to determine that a particular set of
numbers is has a high probability of being a "p".

What I think you are trying to do is build a transcription
tool, or speech recognition.  SoX can process the audio for
you - but if you are hoping a simple pattern matching will be
able to identify all the letters associated with the sounds,
it won't work.  Speech is just too complex. :)

There are a few very mature projects working on speech
recognition and/or transcription.  Maybe you should look at
some of the challenges they have dealt with to give you a
better idea of the task.

http://en.wikipedia.org/wiki/List_of_speech_recognition_software 

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users