Nfs-server failing after restart

I have had nfs-server running fine on my server for a while now, but I recently had to reboot and now it fails to start.

The units that are failing are: rpcbind.service, rpc-statd.service and nfs-mountd.service.

systemctl status rpcbind
● rpcbind.service - RPC bind service
    Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: disabled)
    Active: failed (Result: core-dump) since Sat 2020-11-07 19:19:24 GMT; 7min ago
TriggeredBy: ● rpcbind.socket
    Process: 345 ExecStart=/sbin/rpcbind -w (code=exited, status=0/SUCCESS)
Main PID: 350 (code=dumped, signal=ABRT)

Nov 07 19:19:24 pythagoras systemd[1]: Starting RPC bind service...
Nov 07 19:19:24 pythagoras rpcbind[350]: svc_run returned unexpectedly
Nov 07 19:19:24 pythagoras systemd[1]: Started RPC bind service.
Nov 07 19:19:24 pythagoras systemd[1]: rpcbind.service: Main process exited, code=dumped, status=6/ABRT
Nov 07 19:19:24 pythagoras systemd[1]: rpcbind.service: Failed with result 'core-dump'.

$ systemctl status rpc-statd
● rpc-statd.service - NFS status monitor for NFSv2/3 locking.
    Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static)
    Active: failed (Result: exit-code) since Sat 2020-11-07 19:19:31 GMT; 2min 13s ago
    Process: 487 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE)

Nov 07 19:19:31 pythagoras systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Nov 07 19:19:31 pythagoras rpc.statd[489]: Version 2.4.3 starting
Nov 07 19:19:31 pythagoras rpc.statd[489]: Flags:
Nov 07 19:19:31 pythagoras rpc.statd[489]: Failed to read /run/nfs/state: Success
Nov 07 19:19:31 pythagoras rpc.statd[489]: Initializing NSM state
Nov 07 19:19:31 pythagoras rpc.statd[489]: Running as root.  chown /run/nfs to choose different user
Nov 07 19:19:31 pythagoras rpc.statd[489]: unable to register (statd, 1, udp).
Nov 07 19:19:31 pythagoras systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE
Nov 07 19:19:31 pythagoras systemd[1]: rpc-statd.service: Failed with result 'exit-code'.
Nov 07 19:19:31 pythagoras systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking..


$ systemctl status nfs-mountd
● nfs-mountd.service - NFS Mount Daemon
    Loaded: loaded (/usr/lib/systemd/system/nfs-mountd.service; static)
    Active: failed (Result: exit-code) since Sat 2020-11-07 19:19:31 GMT; 1min 59s ago
    Process: 485 ExecStart=/usr/sbin/rpc.mountd (code=exited, status=1/FAILURE)

Nov 07 19:19:31 pythagoras systemd[1]: Starting NFS Mount Daemon...
Nov 07 19:19:31 pythagoras rpc.mountd[485]: unable to register (mountd, 1, udp).
Nov 07 19:19:31 pythagoras rpc.mountd[485]: Cannot register service: RPC: Unable to receive; errno = Connection refused
Nov 07 19:19:31 pythagoras systemd[1]: nfs-mountd.service: Control process exited, code=exited, status=1/FAILURE
Nov 07 19:19:31 pythagoras systemd[1]: nfs-mountd.service: Failed with result 'exit-code'.
Nov 07 19:19:31 pythagoras systemd[1]: Failed to start NFS Mount Daemon.

Core dump log:

Nov 07 19:33:25 pythagoras systemd[1]: Starting RPC bind service...
Nov 07 19:33:25 pythagoras systemd[1]: Started RPC bind service.
Nov 07 19:33:25 pythagoras sudo[1049]: pam_unix(sudo:session): session closed for user root
Nov 07 19:33:25 pythagoras rpcbind[1053]: svc_run returned unexpectedly
Nov 07 19:33:25 pythagoras systemd[1]: Started Process Core Dump (PID 1056/UID 0).
Nov 07 19:33:25 pythagoras systemd[1]: rpcbind.service: Main process exited, code=dumped, status=6/ABRT
Nov 07 19:33:25 pythagoras systemd[1]: rpcbind.service: Failed with result 'core-dump'.
Nov 07 19:33:25 pythagoras systemd-coredump[1057]: Process 1053 (rpcbind) of user 0 dumped core.
                                                
                                                Stack trace of thread 1053:
                                                #0  0x00007ff238efdd65 __GI_raise (libc.so.6 + 0x41d65)
                                                #1  0x00007ff238ee1889 __GI_abort (libc.so.6 + 0x25889)
                                                #2  0x0000557606f79d86 main (rpcbind + 0x3d86)
                                                #3  0x00007ff238ee3dbd __libc_start_main (libc.so.6 + 0x27dbd)
                                                #4  0x0000557606f7a37e _start (rpcbind + 0x437e)
Nov 07 19:33:25 pythagoras systemd[1]: systemd-coredump@5-1056-0.service: Succeeded.

What is the output of

find /usr/lib/systemd -name 'rpcbind*'

also which os version are you on? And what happens if you run

sudo systemctl stop rpcbind
sudo -i rpcbind -i -w
sudo systemctl start rpcbind
sudo systemctl start nfs-server

or if you do sudo systemctl edit --full nfs-mountd.service and add the -i flag (for debugging, I do not recommend you leave the insecure flag)

$ find /usr/lib/systemd -name 'rpcbind*'
/usr/lib/systemd/system/rpcbind.socket
/usr/lib/systemd/system/rpcbind.service
/usr/lib/systemd/system/rpcbind.target

Version is 33930

$ sudo systemctl stop rpcbind
Warning: Stopping rpcbind.service, but it can still be activated by:
rpcbind.socket
$ sudo -i rpcbind -i -w
$ sudo systemctl start rpcbind
Job for rpcbind.service failed because the control process exited with error code.
See "systemctl status rpcbind.service" and "journalctl -xe" for details.
$ sudo systemctl start nfs-server
A dependency job for nfs-server.service failed. See 'journalctl -xe' for details.

I am having the same problem with nfs-server/rpcbind on my Clearlinux host on 33930. It was working for a while and just stopped working after rebooting the host today.

It looks like rpcbind is broken on newer versions of clearlinux. Downgraded to 33920 and still broken. I went back 1 month back when I remember it was working fine after a reboot and downgraded all the way back to 33820.

I don’t knoew in which version rpcbind started to break.

Is it just me or it’s becoming frequent that things brake lately after the latest update? Before this I have a CL host failed to reboot after an update. Forgot which version specifically but it was more than a month ago. I had to rebuild my server. Now this.

I turned off automatic updates for now.

The trouble with rolling release is things break upstream and it’s impossible to catch it all. The problem is when they are then fixed upstream we don’t always get the fix right away, or even in the following release. Some of our rpm’s are right at the edge of upsteam’s latest, and yet some are several bugfixes behind. These are just crazy times I guess and I’m sure more community support and donated cycles on the development end would certainly help, but autospec can be a tricky beast and is unfamiliar to most, as is the stateless architecture.

Remember none of the packages build scripts can be run as-is, the build and install process for all of them has to be refactored to work with the architecture, and some are simplified with new flags and optimized as well. And then of course any actual source code changes would have to been sent back upstream as per GPL . . . it’s a process.

This happens with other rolling distros as well. My other Linux machines run Arch and I “expect the unexpected” :slightly_smiling_face:. I never run automatic updates on anything, including Windows, BSDs (including my Macs and iPhone/Watch).

The system rollback feature that Clear offers is great and is very useful in situations like this. Thanks for reporting the issue. I’m planning to set up a few NFS mounts and hopefully it will be fixed soon. Please open an issue on GitHub.

How do you downgrade the system? I couldn’t find the answer in the swupd documentation.

The obscurity of this feature is intentional, but it is all there in the man page. Please see: https://github.com/clearlinux/swupd-client/issues/456 for an explanation and possible issues while using this feature.

rpcbind is now working in 33960, although nfsdcld is now failing (this doesn’t prevent NFS from serving files). I’m not sure if this is a configuration error on my part.

    Nov 17 22:20:49 pythagoras systemd[1]: Starting NFSv4 Client Tracking Daemon...
    Nov 17 22:20:49 pythagoras systemd[1]: Started NFSv4 Client Tracking Daemon.
    Nov 17 22:20:49 pythagoras nfsdcld[8379]: cld_inotify_setup: inotify_add_watch failed: No such file or directory
    Nov 17 22:20:49 pythagoras systemd[1]: nfsdcld.service: Main process exited, code=exited, status=1/FAILURE
    Nov 17 22:20:49 pythagoras systemd[1]: nfsdcld.service: Failed with result 'exit-code'.

Tried creating the default storagedir for nfsdcld /var/lib/nfs/nfsdcld, but that did not help.

Running with debug=1:

    Nov 17 22:33:22 pythagoras nfsdcld[8539]: attaching /run/nfs/nfsdcltrack/main.sqlite
    Nov 17 22:33:22 pythagoras nfsdcld[8539]: detaching database
    Nov 17 22:33:22 pythagoras nfsdcld[8539]: sqlite_copy_cltrack_records: returning 0
    Nov 17 22:33:22 pythagoras nfsdcld[8539]: sqlite_prepare_dbh: num_cltrack_records = 1
    Nov 17 22:33:22 pythagoras nfsdcld[8539]: Unable to stat /var/lib/nfs/v4recovery: 2
    Nov 17 22:33:22 pythagoras nfsdcld[8539]: sqlite_prepare_dbh: num_legacy_records = 0
    Nov 17 22:33:22 pythagoras nfsdcld[8539]: cld_pipe_init: init pipe handlers
    Nov 17 22:33:22 pythagoras nfsdcld[8539]: cld_inotify_setup: inotify_add_watch failed: No such file or directory
    Nov 17 22:33:22 pythagoras systemd[1]: nfsdcld.service: Main process exited, code=exited, status=1/FAILURE
    Nov 17 22:33:22 pythagoras systemd[1]: nfsdcld.service: Failed with result 'exit-code'.