[nzlug] bash help - for loop listing dirs

Martin D Kealey martin at kurahaupo.gen.nz
Thu Feb 22 20:20:35 NZDT 2007


On Thu, 22 Feb 2007, Howard wrote:
> I need to recursively list dirs, then do some stuff.


> Some dirs have spaces in their names, and these are listed as separate
> words in the loop... causing the dir names to be incorrect (breaks in for
> loop occur at spaces).



> I have briefly googled and had a suggestion to use IFS or maybe tr to
> changes the spaces to something else first... and not being a skilled
> basher can someone give me an example please of an 'accepted' way to do
> it?

I'm glad you said "an" accepted way - there are many ways of combining the
elements of the shell toolkit to achieve this.

There are two main factors to consider:
 1. what does "some stuff" comprise, and how thoroughly do you want to
search?
 2. do you want to make a one-off hack that works now, or a more general
tool that you can re-use later?

Firstly, the search:

If "`ls -1`" really does what you want, then it'd be quicker and less
error-prone to go "*" -- let the shell take care of matching all the files.

If not then we're down to using some other program -- ls or find are obvious
choices -- and reading the output from that command split into lines. You
can do it several ways:
 1. set IFS='<newline>' and then get the shell to interpolate with $(...)
    (btw, I STRONGLY recommend you use that syntax rather than `...`)
 2. use "read" in a loop
 3. use xargs
 4. pipe it through a filter that generates the actual commands to run, and
    pipe that into a shell

Now we get to the trick part: each of these methods has advantages and
drawbacks, and which work depend on what "some stuff" you might want to do.

 * setting IFS is OK if you're invoking one external command; if you're
running a composite shell command you might simultaneously need it to have
its normal value, which is obviously somewhere between tricky and impossible

 * using "read" works quite nicely *except* that the commands inside the
loop can't read from the original stdin (unless you take very contorted
measures) and the loop (sometimes but not always!) runs in a sub-shell which
means you can't alter the environment and expect it to stay altered.

 * xargs is an external command, which means altering the environment is
right out. And you can only give it a single command -- if you want to a
pipeline or something tricky it has to be made into a separate script

 * automagically generating commands and then executing them is like
juggling nitroglycerine bottles while snorting acid: it's mind-bending and
very easy to make really serious errors. On the other hand, you can generate
them and NOT execute them straight away -- just put them into a file and
look at them first to make sure they're OK.

There are also issues around speed; shell scripts have never been
particularly quick, which you won't notice if you're only processing a few
hundred files and a few hundred MB of data, but throw a thousand files
comprising 1 GB accessed over NFS and you'll be wishing you could make it run
faster.

Anything you can do within the shell without resorting to an external
command is a win for speed, even if you have to fork sub-shells to do it
right: exec+ ld.so is a real speed-killer.

Failing that, make sure you pipeline, rather than invoke a separate instance
of the command on each file. That's where the IFS and "read" methods really
have an edge. But beware you might actually run out of memory if your
data-set is large enough.

If you actually want examples of these methods let me know which way you're
leaning and I'll sort something out.

-Martin




More information about the NZLUG mailing list