Re: Problem with FileLock
Gordon Beaton wrote:
On Fri, 13 Jul 2007 14:34:14 -0700, alejandrina wrote:
We need to update a file (on a file server) from many different
machines. To synchromize the updates I am using FileLock.
Everything works as advertised (ie if a machine gets the lock, it
writes to the file while the other machines wait, everything is
written in the proper order). The same test, using Linux machines,
fails: the file is clobbered (meaning one update gets on top of
another). No Exceptions are thrown; in fact the debug statements
indicate that all the machines are acquiring the exclusive lock)
If you are accessing the files over NFS, then I believe (but am not
sure) that
- the locks may be unknown to the server, and consequently to other
NFS clients.
- each client might be caching its updates, which are not guaranteed
to be visible to other clients without a close/open sequence in
between (if even then). This could result in corruption even if the
locking is in fact working as you expect it to.
What happens when you run two instances of your application on the
*same* host?
I would recommend that you update the file from a single (server)
process that does so on behalf of the other (client) processes, so the
updates are atomic and strictly serialized, without any need for
locking.
For NFS rpc.lockd must be running on the server, it must be contactable using
RPC (i.e. the dynamic RPC port it is registered at must not be blocked by a
firewall), each client needs to co-operates by sending the lock request, and
waiting for the lock responses. Even then it might not work.
NFS was designed to be stateless, principally so that it could survive a server
re-boot or network drop-out. File locking is inherently stateful (hence the
need for the additional service rpc.lockd) and the two don't mix well.
Basically, don't attempt file locking over NFS unless you have plenty of time
to spare to debug it thoroughly. There are many failure modes and you should
not rely on it without rigorous testing of how you code reacts to each failure
mode.
--
Nigel Wade, System Administrator, Space Plasma Physics Group,
University of Leicester, Leicester, LE1 7RH, UK
E-mail : nmw@ion.le.ac.uk
Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555