too many open files

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

too many open files

Peter P.
Dear Sox list,

I am stumbling over a (known) problem:

I am trying to concatenate many audio files with

sox "*.wav" out.wav

and after some time am getting the error message

sox FAIL formats: can’t open input file ‘whatever.wav’ : Too many open files

I have found other people's postings about the same issue, as eg
http://sox.10957.n7.nabble.com/Sox-Too-many-open-files-concatenating-from-playlists-with-a-large-number-of-entries-td5117.html

The max number of files to be concatenated this way seems to vary in
between operating systems. Or does anyone know exact numbers, or how to
determine the maximum number on a given OS?

Now I understand that sox is keeping them all opened when concatenating
them, which might be necessary in other cases, as when trying to
normalize them all to a common level. Sadly this seems to impose the
described limit. Or is there another reason why all files have to be
kept open?

I tried to work around this by recursively concatenating each input file
to a common (growing) output file using a simple bash until loop. The
performance penalty of this is enormous, as the same output file has to
be opened, read and written each time again.

I am curious what a possible workaround could be, other than dividing
above task between multiple calls of sox, each with a reduced number of
files.

Thank you for your suggestions and comments,
Peter

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: too many open files

Erich Eckner
Hi Peter,

I would suggest to transform all files to a raw format (e.g. without
header) - with identical sample rate, bit rate and number of channels -
and concatenate them via bash and afterwards encode them in your desired
format (e.g. wav). Like so:

for file in inputs/*.wav
do
  sox $file -t raw -
done |
  sox -t raw - output.wav

As mentioned, you might need to specify bit and sample rate and number
of channels manually, though.

cheers,
Erich

On 09.11.2016 10:50, Peter P. wrote:

> Dear Sox list,
>
> I am stumbling over a (known) problem:
>
> I am trying to concatenate many audio files with
>
> sox "*.wav" out.wav
>
> and after some time am getting the error message
>
> sox FAIL formats: can’t open input file ‘whatever.wav’ : Too many open files
>
> I have found other people's postings about the same issue, as eg
> http://sox.10957.n7.nabble.com/Sox-Too-many-open-files-concatenating-from-playlists-with-a-large-number-of-entries-td5117.html
>
> The max number of files to be concatenated this way seems to vary in
> between operating systems. Or does anyone know exact numbers, or how to
> determine the maximum number on a given OS?
>
> Now I understand that sox is keeping them all opened when concatenating
> them, which might be necessary in other cases, as when trying to
> normalize them all to a common level. Sadly this seems to impose the
> described limit. Or is there another reason why all files have to be
> kept open?
>
> I tried to work around this by recursively concatenating each input file
> to a common (growing) output file using a simple bash until loop. The
> performance penalty of this is enormous, as the same output file has to
> be opened, read and written each time again.
>
> I am curious what a possible workaround could be, other than dividing
> above task between multiple calls of sox, each with a reduced number of
> files.
>
> Thank you for your suggestions and comments,
> Peter
>
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Sox-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/sox-users
>

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: too many open files

Eric Wong
In reply to this post by Peter P.
"Peter P." <[hidden email]> wrote:

> Dear Sox list,
>
> I am stumbling over a (known) problem:
>
> I am trying to concatenate many audio files with
>
> sox "*.wav" out.wav
>
> and after some time am getting the error message
>
> sox FAIL formats: can’t open input file ‘whatever.wav’ : Too many open files
>
> I have found other people's postings about the same issue, as eg
> http://sox.10957.n7.nabble.com/Sox-Too-many-open-files-concatenating-from-playlists-with-a-large-number-of-entries-td5117.html
>
> The max number of files to be concatenated this way seems to vary in
> between operating systems. Or does anyone know exact numbers, or how to
> determine the maximum number on a given OS?

"ulimit -n" shows open files on *nix based systems.
You can change it (given appropriate permissions) by setting it
to a higher number (e.g. "ulimit -n 16384")

> Now I understand that sox is keeping them all opened when concatenating
> them, which might be necessary in other cases, as when trying to
> normalize them all to a common level. Sadly this seems to impose the
> described limit. Or is there another reason why all files have to be
> kept open?

I haven't checked, but I can't think of a reason off hand why
they must be kept open.  I suppose there could be some reasons
for wanting to seek around...

I'm too sleepy and barely awake to think straight now :<
Will check the code when I've more time.

> I tried to work around this by recursively concatenating each input file
> to a common (growing) output file using a simple bash until loop. The
> performance penalty of this is enormous, as the same output file has to
> be opened, read and written each time again.
>
> I am curious what a possible workaround could be, other than dividing
> above task between multiple calls of sox, each with a reduced number of
> files.

Similar to what Erich said, maybe something like this works by
utilizing pipelines to avoid temporary files:

        FMT="-ts32 -c2 -b24 -r48000" # adjust format to match your data
        sox $FMT "|for i in *.wav; do sox \$i $FMT -; done" out.wav

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: too many open files

Fmiser
In reply to this post by Peter P.
> Peter wrote:

> sox FAIL formats: can’t open input file ‘whatever.wav’ : Too
> many open files

> I tried to work around this by recursively concatenating each
> input file to a common (growing) output file using a simple bash
> until loop. The performance penalty of this is enormous, as the
> same output file has to be opened, read and written each time
> again.
>
> I am curious what a possible workaround could be, other than
> dividing above task between multiple calls of sox, each with a
> reduced number of files.

I can't help with the SoX source.  But what I would try first if I
were facing that problem is to concatenate in bunches.

For example, take the first 1000 and concatenate them into 01.wav,
then the second 1000 into 02.wav, then finally concatenate 01.wav,
02.wav, etc.

Maybe put the initial groupings into subdirectories to be sure
each file is included only once.

$ mv -nv $(ls path/name/ | head -n 1000) subdir/


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: too many open files

Peter P.
In reply to this post by Erich Eckner
* Erich Eckner <[hidden email]> [2016-11-09 11:20]:

> Hi Peter,
>
> I would suggest to transform all files to a raw format (e.g. without
> header) - with identical sample rate, bit rate and number of channels -
> and concatenate them via bash and afterwards encode them in your desired
> format (e.g. wav). Like so:
>
> for file in inputs/*.wav
> do
>   sox $file -t raw -
> done |
>   sox -t raw - output.wav
>
> As mentioned, you might need to specify bit and sample rate and number
> of channels manually, though.

Thank you for this nice solution Erich, and thank you Eric for your
comments on ulimit.

I managed to get things to work this way. Nevertheless Eric, if you have
a moment to peek into the code and see why sox is keeping files open, it
would be fantastic and could prevent similar questions coming up again.

In order to provide sox with the information necessary to write its
output file from raw data, I query the input file for its properties
using soxi or sox --i .I discovered that sox --i -e yields
        Signed Integer PCM
while the -e flag for setting the encoding expects a different
wording (from manpage):
        signed-integer

I wonder if the encoding reported by soxi should be made identical to
the one required by sox in order to ease setting it in scripts?

Thank you both again,
best, Peter

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users
Reply | Threaded
Open this post in threaded view
|

Re: too many open files

Eric Wong
"Peter P." <[hidden email]> wrote:
> I managed to get things to work this way. Nevertheless Eric, if you have
> a moment to peek into the code and see why sox is keeping files open, it
> would be fantastic and could prevent similar questions coming up again.

Most of the sox input methods are parallel (mix/merge/etc), so
those files all need to be kept open for the duration of the
invocation.

Based on a quick look, it seems serial methods
(concatenate/sequence) can handle closing just fine;
it might make the code a little more complex.

Off the top of my head, "gain -n" needs rewindability, but
that seems to use a temporary file, anyways (since sox is
equipped to handle pipes).

One major issue of delaying fopen() is the files may disappear
from the filesystem immediately after sox opens them started; so
delaying opening them cannot be the default behavior, and needs
to be made an option, instead...

I know I often unlink files ASAP after opening them when I'm
testing different effects on a small filesystem (tmpfs).


But yeah, even command-line args are likely to exceed the limits
of your OS command size, probably before nofile limits (ulimit
-n), so I would favor find + xargs, instead of globbing;
something like:

        export FMT='-ts32 -c2 -r48000'
        find $DIR -name '*.part*.wav' -type f -print0 | \
                xargs -0 -n1 sh -c 'sox "$@" $FMT -' -- |
                sox $FMT - out.wav

This avoids nofile limits as well as any command-line length
limitations the OS might have.

(I'm embarrassed for not thinking of this from my original reply :x)

------------------------------------------------------------------------------
_______________________________________________
Sox-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sox-users