Ticket #4071 (new defect)

Opened 3 months ago

Last modified 2 months ago

Sometimes mc hangs on directory change

Reported by: olfway Owned by:
Priority: major Milestone: Future Releases
Component: mc-core Version: 4.8.24
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

I use mc 4.8.24 on mac os 10.15.3

❯ env LC_MESSAGES=C /opt/mc/bin/mc -V
GNU Midnight Commander unknown
Built with GLib 2.64.1
Using the S-Lang library with terminfo database
With builtin Editor
With subshell support as default
With support for background operations
With mouse support on xterm
With internationalization support
With multiple codepages support
Virtual File Systems: cpiofs, tarfs, sfs, extfs, ftpfs, fish
Data types: char: 8; int: 32; long: 64; void *: 64; size_t: 64; off_t: 64;
❯ /opt/mc//bin/mc --configure-options
 '--prefix' '/opt/mc' '--without-x' '--with-screen=slang' '--disable-doxygen-html' '--disable-doxygen-dot' '--disable-doxygen-doc' 'CFLAGS=-O0 -g -ggdb' 'LDFLAGS=-L/usr/local/opt/gettext/lib -L/usr/local/opt/gettext/lib' 'CPPFLAGS=-I/usr/local/opt/gettext/include -I/usr/local/opt/gettext/include'

fish shell, version 3.1.0

Sometimes, then I press enter to change directory mc just hangs.
Also, there is a zombie kill process after that

Backtrace from mc:


(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff73a3a3b2 libsystem_kernel.dylib`__sigsuspend + 10
    frame #1: 0x00000001002f9394 mc`synchronize at common.c:497:9
    frame #2: 0x00000001002f81ba mc`feed_subshell(how=0, fail_on_error=0) at common.c:609:13
    frame #3: 0x00000001002f87e8 mc`do_subshell_chdir(vpath=0x00007fa164c0f8b0, update_prompt=0) at common.c:1345:5
    frame #4: 0x0000000100276299 mc`subshell_chdir(vpath=0x00007fa164c0f8b0) at panel.c:3234:9
    frame #5: 0x00000001002745f6 mc`_do_panel_cd(panel=0x00007fa164c0ec70, new_dir_vpath=0x00007fa164c0f570, cd_type=cd_exact) at panel.c:3275:5
    frame #6: 0x00000001002744e3 mc`do_panel_cd(panel=0x00007fa164c0ec70, new_dir_vpath=0x00007fa164c0f570, cd_type=cd_exact) at panel.c:4628:9
    frame #7: 0x00000001002758be mc`do_cd(new_dir_vpath=0x00007fa164c0f570, exact=cd_exact) at panel.c:5028:11
    frame #8: 0x0000000100279e76 mc`do_enter_on_file_entry(fe=0x00000001008670b8) at panel.c:2795:14
    frame #9: 0x000000010027865e mc`do_enter(panel=0x00007fa164c0ec70) at panel.c:2855:12
    frame #10: 0x00000001002765c1 mc`panel_execute_cmd(panel=0x00007fa164c0ec70, command=1) at panel.c:3446:9
    frame #11: 0x00000001002763f4 mc`panel_key(panel=0x00007fa164c0ec70, key=10) at panel.c:3608:20
    frame #12: 0x0000000100272655 mc`panel_callback(w=0x00007fa164c0ec70, sender=0x0000000000000000, msg=MSG_KEY, parm=10, data=0x0000000000000000) at panel.c:3688:16
    frame #13: 0x000000010023075a mc`send_message(w=0x00007fa164c0ec70, sender=0x0000000000000000, msg=MSG_KEY, parm=10, data=0x0000000000000000) at widget-common.h:216:15
    frame #14: 0x0000000100231a16 mc`dlg_key_event(h=0x00007fa164f05d90, d_key=10) at dialog.c:489:19
    frame #15: 0x0000000100231439 mc`dlg_process_event(h=0x00007fa164f05d90, key=10, event=0x00007ffeef9f5640) at dialog.c:1134:9
    frame #16: 0x0000000100231d48 mc`frontend_dlg_run(h=0x00007fa164f05d90) at dialog.c:545:9
    frame #17: 0x0000000100231b8e mc`dlg_run(h=0x00007fa164f05d90) at dialog.c:1167:5
    frame #18: 0x000000010026d8ed mc`do_nc at midnight.c:1836:16
    frame #19: 0x000000010020bb92 mc`main(argc=1, argv=0x00007ffeef9f5808) at main.c:405:21
    frame #20: 0x00007fff738d97fd libdyld.dylib`start + 1
    frame #21: 0x00007fff738d97fd libdyld.dylib`start + 1
(lldb) frame variable
(lldb) up
frame #1: 0x00000001002f9394 mc`synchronize at common.c:497:9
   494
   495 	    /* Wait until the subshell has stopped */
   496 	    while (subshell_alive && !subshell_stopped)
-> 497 	        sigsuspend (&old_mask);
   498
   499 	    if (subshell_state != ACTIVE)
   500 	    {
(lldb) frame variable
(sigset_t) sigchld_mask = 524288
(sigset_t) old_mask = 0

Backtrace from fish (part of):

frame #2: 0x00000001012380fe fish`exec_external_command(parser=0x00007fb2e2d02030, j=std::__1::shared_ptr<job_t>::element_type @ 0x00007fb2e2f0f3c8 strong=2 weak=1, p=0x00007fb2e2f0f530, proc_io_chain=0x00007ffeeeb43930) at exec.cpp:573:17
   570 	            // We successfully made the attributes and actions; actually call
   571 	            // posix_spawn.
   572 	            int spawn_ret =
-> 573 	                posix_spawn(&pid, actual_cmd, &actions, &attr, const_cast<char *const *>(argv),
   574 	                            const_cast<char *const *>(envv));
   575
   576 	            // This usleep can be used to test for various race conditions

(const char *) actual_cmd = 0x00007ffeeeb435d1 "/bin/kill"

Change History

comment:1 Changed 2 months ago by olfway

I get it again, mc hangs

   495 	    /* Wait until the subshell has stopped */
   496 	    while (subshell_alive && !subshell_stopped) {
-> 497 	        sigsuspend (&old_mask);

I checked with lldb and mc is waiting in synchronize at common.c:497

Current values
subshell_alive = 1
subshell_stopped = 1

fish subshell actually stopped

I'm not sure how this could be possible

comment:2 Changed 2 months ago by olfway

I'm able to reproduce it like this:

Go to ~/Library folder (I guess any folder with lots subfolders will work)
Point cursor to the last subfolder
Start pressing Up ; Enter ; Up ; Enter ; Up ; ... etc as fast as possible
Usually, mc hangs, some times I have to go to the latest subfolder and start again

comment:3 Changed 2 months ago by olfway

Tried to rewrite synchronize with nanosleep and without sig* functions, it works without issues

     /* Wait until the subshell has stopped */
     while (subshell_alive && !subshell_stopped) {
-        sigsuspend (&old_mask);
+        // sigsuspend (&old_mask);
+        ts.tv_nsec = 1000 * 1000;
+        nanosleep(&ts, NULL);
     }

(and commented out other sig* calls in synchronize function)

So it seems something wrong with this while loop

subshell already stopped, subshell_stopped=1 and sigsuspend waiting for a signal blocking mc

comment:4 Changed 2 months ago by ossi

uh-oh, this whole file is full of race conditions. all access to the two volatile variables while the subshell is running needs to happen with SIGCHLD being blocked - the only function that gets it right is synchronize().

this is a deja-vu from the SIGWINCH episode - we're again left with the choice of using pselect() or the self-pipe trick, preferably again opting for the latter in expectation of using the glib event loop at some point (there is also forkfd() since linux 5.4).

a propos nothing, i found this while reading the code:

diff --git a/src/subshell/common.c b/src/subshell/common.c
index 06699233c..ee6309900 100644
--- a/src/subshell/common.c
+++ b/src/subshell/common.c
@@ -354,5 +354,5 @@ init_subshell_child (const char *pty_name)
     /* Attach all our standard file descriptors to the pty */

-    /* This is done just before the fork, because stderr must still      */
+    /* This is done just before the exec, because stderr must still      */
     /* be connected to the real tty during the above error messages; */
     /* otherwise the user will never see them.                   */
Note: See TracTickets for help on using tickets.