Re: The first 10 files
On 1/26/2013 6:21 PM, Peter Duniho wrote:
On Sat, 26 Jan 2013 17:06:07 -0500, Eric Sosman wrote:
On 1/26/2013 4:15 PM, Robert Klemme wrote:
On 26.01.2013 19:26, Arne Vajh??j wrote:
But I am a bit skeptical about whether a String[] with 30K elements
is really the bottleneck.
If the real bottleneck is the OS calls to get next file, then
a filter like this will not help.
Why?
Because the listFiles() method will fetch the information
for all 30K files from the O/S, will construct 30K File objects
to represent them, and will submit all 30K File objects to the
FileFilter, one by one. The FileFilter will (very quickly)
reject 29.99K of the 30K Files, but ...
Will it?
Necessarily. As far as listFiles() knows, the FileFilter
might accept the very last File object given to it. Therefore,
listFiles() cannot fail to present that very last File -- and
every other File -- for inspection.
It is plausible that the implementation of listFiles() uses an OS API that
enumerates files one at a time. On Windows, getting the first file of the
enumeration is faster than asking for all the files at once.
Meh.
Indeed, I suppose one could throw an exception from the FileFilter accept()
method to interrupt enumeration, if that's how listFiles() is implemented.
That would avoid the need to enumerate more than the needed number of
actual files.
It would also avoid the burden of returning anything from
listFiles() -- like, say, the array of accepted files ...
A seriously hackish approach might be to do the processing
of the files within the FileFilter itself, treating it as a
"visit this File" callback instead of as a predicate. Then if
the FileFilter threw an exception after processing the first N
files -- well, they'd already have been processed, and you were
going to ignore the listFiles() return value anyhow, so ...
But, as I said, that's pretty seriously hackish.
Of course, this is all implementation-dependent and since it's not
explicitly documented, could change at any time anyway.
The performance implications of retrieving information on 30K
files from the O/S are undocumented, true. But the necessity of
retrieving that information is deducible from what *is* documented.
But unless you've
actually examined the implementation details for listFiles(), it's not a
foregone conclusion that the technique of using a FileFilter offers no way
to improve latency.
Maybe this is the disconnect: I understood the O.P.'s concern as
"It's doing three thousand times too much work," not as "It takes
three thousand times as long as it should just to get to the first
File instance." Either way, though, I think a FileFilter (used in a
non-hackish way) cannot reduce either the total work or the latency.
Observe that listFiles() cannot return anything at all until it has
built the entire array of accepted files; Java's arrays have no way
to say "I hold five elements now, but might grow."
All that said, I think John Matthews' comment about the question of what
30K files are doing in a single directory in the first place is perhaps one
of the more useful points in this topic. One doesn't always have control
over that, of course...but if one does, it's certainly worth rethinking
that aspect of the design. There are reasons other than code latency to
avoid so many files in a single directory.
Yeah. The O.P. said something about external processes dumping
files into the directory, possibly dumping many between (widely-
spaced?) executions of his program. That seems odd to me, though,
because if there's a backlog of thirty thousand it seems odd to want
to reduce it by only ten ...
If he's stuck with this overall design, though, I think the
walkFileTree() method of java.nio.file.Files would be a cleaner way
to proceed. His FileVisitor could return FileVisitResult.TERMINATE
after it had seen ten files, and that would be that. No hacks.
--
Eric Sosman
esosman@comcast-dot-net.invalid