Ticket #3987 (closed defect: fixed)

Opened 2 years ago

Last modified 4 months ago

Midnight Commander loses content of the current directory

Reported by: memy Owned by: andrew_b
Priority: major Milestone: 4.8.26
Component: mc-vfs Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: merged Votes for changeset: committed-master

Description

When I am browsing a mounted SMB directory Midnight Commander sometimes loses directory content.
When I hit Ctrl-r the content of the directory reappears.

It started today after upgrading of the kernel and samba.
All other means of browsing file system work perfectly fine: shell, Thunar, File Open dialogs, etc.
I have shares mounted on different servers:

  1. Synology NAS
  2. Raspberry Pi 4.19.42-1-ARCH #1 SMP PREEMPT Wed May 15 00:39:47 UTC 2019 armv7l GNU/Linux
  3. Dell mini 5.1.2-arch1-1-ARCH #1 SMP PREEMPT Wed May 15 00:09:47 UTC 2019 x86_64 GNU/Linux

MC demonstrates the same behavior on all of the shares.

When I browse to a directory on a mounted SMB share MC occasionally shows empty directory. Re-read immediately restores the panel. The same happens when I am returning back to the parent directory.

The shares are mounted at boot time using systemd like this but the problem persists with any other way of mounting the SMB share:

[Unit]
Description=Mount Share at boot
Requires=network-online.target
After=network-online.target
Wants=network-online.target

[Mount]
What=//host/share
Where=/mnt/SMB/share
Options=uid=<my-uid>,credentials=/home/username/secret.txt,iocharset=utf8,rw,x-systemd.automount
Type=cifs
TimeoutSec=30

[Install]
WantedBy=multi-user.target

I run Arch Linux.
Kernel version: 5.1.2-arch1-1-ARCH #1 SMP PREEMPT Wed May 15 00:09:47 UTC 2019 x86_64 GNU/Linux
samba version: 4.10.3-1 upgraded from 4.10.2-1

LC_MESSAGES=C mc -V

GNU Midnight Commander 4.8.22
Built with GLib 2.58.3
Using the S-Lang library with terminfo database
With builtin Editor
With subshell support as default
With support for background operations
With mouse support on xterm and Linux console
With support for X11 events
With internationalization support
With multiple codepages support
Virtual File Systems: cpiofs, tarfs, sfs, extfs, ftpfs, sftpfs, fish, smbfs
Data types: char: 8; int: 32; long: 64; void *: 64; size_t: 64; off_t: 64;

LC_MESSAGES=C mc -F

Home directory: /home/username
Profile root directory: /home/username

[System data]
    Config directory: /etc/mc/
    Data directory:   /usr/share/mc/
    File extension handlers: /usr/lib/mc/ext.d/
    VFS plugins and scripts: /usr/lib/mc/
	extfs.d:        /usr/lib/mc/extfs.d/
	fish:           /usr/lib/mc/fish/

[User data]
    Config directory: /home/username/.config/mc/
    Data directory:   /home/username/.local/share/mc/
	skins:          /home/username/.local/share/mc/skins/
	extfs.d:        /home/username/.local/share/mc/extfs.d/
	fish:           /home/username/.local/share/mc/fish/
	mcedit macros:  /home/username/.local/share/mc/mc.macros
	mcedit external macros: /home/username/.local/share/mc/mcedit/macros.d/macro.*
    Cache directory:  /home/username/.cache/mc/

mc --configure-options

 '--prefix=/usr' '--libexecdir=/usr/lib' '--sysconfdir=/etc' '--enable-vfs-smb' '--with-screen=slang' '--with-x' 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt' 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now' 'CPPFLAGS=-D_FORTIFY_SOURCE=2'

Change History

comment:1 Changed 2 years ago by andrew_b

  • Component changed from mc-core to mc-vfs
  • Blocked By 1 added

comment:2 Changed 2 years ago by memy

Something interesting happened.
I continued experimenting and all of the sudden everything started to work as expected.
I immediately opened another instance of MC in a new terminal. That new instance did not work though.
So now I have two instances side by side and one of them works consistently while another occasionally shows me empty directory.
I am doing very simple test. I have "Lynx-like motion" option turned on, so I constantly pressing left and right arrow keys.
One instance of MC works as a clockwork, while another shows empty directory from time to time.
Unfortunately I don't know what "fixed" the first instance. I tried to repeat in the second instance the actions I remember doing in the first one, but the second instance continues to fail.

comment:3 follow-up: ↓ 8 Changed 2 years ago by ravenexp

I have the same problem with SMB shares mounted via systemd mount units.
It also happened after the recent kernel and samba updates.

Other file managers like Gnome Nautilus do not suffer from this issue.

comment:4 Changed 23 months ago by memy

After downgrading linux kernel to 5.0.13 everything went back to normal.

comment:5 Changed 20 months ago by andrew_b

  • Status changed from new to closed
  • Resolution set to invalid
  • Milestone Future Releases deleted

comment:6 Changed 20 months ago by memy

I wonder why this ticket has been closed?
The problem still persists while running Linux 5.1+
This is obviously a MC bug, because all other ways to get directory content work flawlessly (ls, other file managers, open and save dialogs, etc).
Also, "Blocked By" field may be incorrect. I am using SMB shares mounted by systemd/mount, not SMB links so I suspect that built-in SMB library has nothing to do with the problem.

Please advise how could I provide more information to help with bug triaging.

The problem is really annoying, I have to hit Ctrl-r almost every time I descend to a folder.

comment:7 Changed 19 months ago by memy

I finally found some time to debug the problem.
I found out that vfs/local/local.c/local_readdir()/readdir() sometimes returns error EINTR while reading directories of the mounted SMB share.
Retrying the filemanager/dir.c/dir_list_load() helps to solve this problem.

comment:8 in reply to: ↑ 3 ; follow-up: ↓ 9 Changed 19 months ago by memy

Replying to ravenexp:

I have the same problem with SMB shares mounted via systemd mount units.
It also happened after the recent kernel and samba updates.

Other file managers like Gnome Nautilus do not suffer from this issue.

If you are still interested, I have implemented quick and dirty patch: https://github.com/myelsukov/mc/commit/66016a03ddeba115444c41783b30f92719c28b46
It works fine for me even though it is not a most beautiful piece of code :)

comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 19 months ago by andrew_b

Replying to memy:

If you are still interested, I have implemented quick and dirty patch:

Why you patch dir_list_load() instead of local_readdir() or even mc_readdir()?

comment:10 in reply to: ↑ 9 Changed 19 months ago by memy

Replying to andrew_b:

Why you patch dir_list_load() instead of local_readdir() or even mc_readdir()?

My experiments (including several months of using mc with kernel 5.1+) showed that EINTR was raised by the very first readdir(). In other words, I was always getting empty directory list not an incomplete directory list.
I started with retrying readdir() in the local_readdir(). Unfortunately, readdir() would stop (NULL result with errno == 0) after returning "." and ".." directory entries in that case. This left me with a single choice: retry the whole dir_list_load().

I have no idea why no other program (ls, find, grep, different "File Open" and "File Save" dialogs, numerous file managers) demonstrated this "bad" behavior though...

comment:12 Changed 12 months ago by andrew_b

Ticket #4074 has been marked as a duplicate of this ticket.

comment:13 follow-up: ↓ 14 Changed 7 months ago by trtracer

This is still open, right? I have the same problem with 4.8.25

comment:14 in reply to: ↑ 13 ; follow-up: ↓ 18 Changed 7 months ago by andrew_b

Replying to trtracer:

This is still open, right?

No. It's closed as invalid.

I can't say why readdir(3) sometimes returns NULL instead of next file.
Reading of directory twice each time is not an our way.

comment:15 Changed 7 months ago by zaytsev

Why wouldn't you guys take it up with kernel developers?

comment:16 follow-up: ↓ 17 Changed 7 months ago by trtracer

I took a closer look at it and it seems a part of getdents (the syscall behind readdir) can be interrupted by SIGCHLD and this does not get restarted - even if you specify SA_RESTART when calling sigaction().

See this BT from GDB:

Catchpoint 1 (signal SIGCHLD), __getdents64 (fd=12, buf=buf@entry=0x7ffff7106040 "", nbytes=1048576) at ../sysdeps/unix/sysv/linux/getdents64.c:27
27      ../sysdeps/unix/sysv/linux/getdents64.c: No such file or directory.
(gdb) bt
#0  __getdents64 (fd=12, buf=buf@entry=0x7ffff7106040 "", nbytes=1048576) at ../sysdeps/unix/sysv/linux/getdents64.c:27
#1  0x00007ffff7844ea4 in __GI___readdir64 (dirp=0x7ffff7106010) at ../sysdeps/posix/readdir.c:65
#2  0x000055555560390a in local_readdir (data=0x555555709660) at local.c:111
#3  0x0000555555588a8f in mc_readdir (dirp=dirp@entry=0x55555573bd30) at interface.c:500
[...]

The documentation of signal(7) does not mention getdents. Neither that it eventually would be restarted nor that it wouldn't. But maybe because this only happens when accessing SMB-Shared, the getdents syscall (or a part of it) falls in the category of network functions that are never restarted.

Anyway, why not simply cover this condition by changing local_readdir like this:

local_readdir (void *data)
{
    void *dir;

    do {
        errno=0;
        dir = readdir (*(DIR **) data);
    }while (dir == NULL && errno == EINTR);
    return dir;
}

This works for me and I don't see this has a negative impact.

comment:17 in reply to: ↑ 16 Changed 7 months ago by memy

Replying to trtracer:

Anyway, why not simply cover this condition by changing local_readdir like this:

I tried this and it didn't work for me, readdir always returned null after the EINTR. In my experiments I had to call opendir again to recover.
May be something had changed in the kernel after that...

comment:18 in reply to: ↑ 14 Changed 5 months ago by drolevar

  • Status changed from closed to reopened
  • Resolution invalid deleted

Replying to andrew_b:

No. It's closed as invalid.

I can't say why readdir(3) sometimes returns NULL instead of next file.
Reading of directory twice each time is not an our way.

Well, apparently it is a change introduced in the kernel here:
https://github.com/torvalds/linux/commit/b30c74c73c787d853ecb9fcf5c59511a09a4ec59

And I see it's being fixed in a variety of other projects, like Go, for example:
https://github.com/golang/go/issues/38836

The way it was fixed is exactly repeating it if EINTR is returned.
That's the way it's implemented in the kernel since 5.1 and I think it's by design.

comment:21 Changed 5 months ago by andrew_b

  • Status changed from reopened to accepted
  • Owner set to andrew_b
  • Branch state changed from no branch to on review
  • Milestone set to 4.8.26

Branch: 3987_smb_dir_content
changeset:0343e7b65092921ffd47554c7f1d1aaaf222bc97

Please test.

comment:22 Changed 5 months ago by andrew_b

  • Blocked By 1 removed

comment:23 Changed 4 months ago by andrew_b

  • Votes for changeset set to andrew_b
  • Branch state changed from on review to approved

comment:24 Changed 4 months ago by andrew_b

  • Status changed from accepted to testing
  • Votes for changeset changed from andrew_b to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

comment:25 Changed 4 months ago by andrew_b

  • Status changed from testing to closed
Note: See TracTickets for help on using tickets.