Ticket #117 (new defect)

Opened 11 years ago

Last modified 6 years ago

savannah: consecutive resize events not handled correctly

Reported by: egmont Owned by:
Priority: minor Milestone: Future Releases
Component: mc-core Version: master
Keywords: Cc: ossi, zaytsev, egmont@…
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description (last modified by ossi) (diff)

Original: http://savannah.gnu.org/bugs/?17822

Submitted by:Egmont Koblinger <egmont>Submitted on:Fri 22 Sep 2006 01:04:44 PM UTC
Category:Screen outputSeverity:3 - Normal
Status:NonePrivacy:Public
Assigned to:NoneOpen/Closed:Open
Release:4.6.1Operating System:GNU/Linux

Original submission:

Modern desktop environments and window managers usually resize the
windows in an opaque way: plenty of resize events are sent to the
application while the edge of the window is being dragged. This
gives users a much better feedback than the ancient method (only
showing where the edges of the window will be, and resizing when
the mouse button is released), though it definitely requires much
more cpu resources.

Unfortunately mc is unable to properly handle when the terminal is
resized opaquely. It receives tons of resize events (sigwinch),
most likely ignores the new ones while processing an older one.
Quite often when I finish resizing my window, mc draws its panels
with a different size. Hence if I enlarge my terminal, I might get
black columns/rows on its right/bottom part. If I shrink the
window, the whole screen can be garbled.

Then when I press one key, or click somewhere in the terminal with
the mouse, suddenly mc's screen is repainted with the right size,
and everything goes on perfectly.

It would be nice to eliminate these race conditions and make mc
resize correctly to the final terminal size, no matter how many
resize events were sent in a very short time. 

Bonus (reported in #368): if you start mc in an xterm and resize the terminal before mc shows the panels, it will not be adjusted to the terminal's size until you resize again once the panels are there.

Comment 1 by Egmont Koblinger <egmont> at Tue 10 Oct 2006 03:27:46 PM UTC:

Normally mc stays in a select() call in key.c:get_event() which is
called by dialog.c:frontend_run_dlg().

frontend_run_dlg() checks if the window size has changed and calls
the proper functions in this case. Then it does a lot of other
things, then calls get_event() which yet again does a lot of other
things and finally arrives at that select().

A subsequent window resize event (SIGWINCH) causes this select to
return with -1 EINTR which is later handled correctly.

The problem is caused by the lot of code lines executed after
checking for a window size change, but before entering the select
call. If another window size change event occurs while executing
these commands, it will not be handled until select() exits due to
a keypress for example.

In key.c:get_event(), I placed the following line above the
"flag = select (...)" statement:
if (winch_flag) return EV_NONE;
and then my resizing problems were gone. So now I check again for
a resize event right before entering the select function.

Though this modification seems to solve my bug, another bug
appeared. Press F5 on ".." so an error dialog box appears. Resize
the window now. mc enters an infinite loop consuming 100% CPU time
and not reacting on any keypress. I don't yet know how to fix this
new bug. Swapping the new statement and the enable_interrupt_key()
call right before it doesn't help either.

On the other hand, even if we didn't have any side effects, my
patch still doesn't fix my original problem, only decreases the
possibility for this to occur. Still there is a chance that
SIGWINCH arrives right after checking for winch_flag but before
entering the select() call.

I'm just curious how to solve this problem 100% perfectly, without
any race condition, without the slightest possibility to misbehave.
Now I'm interested in the theory first, not in the implementation
details in mc. Let's suppose a single application that only wants
to do two things: process data arriving from several file
descriptors, and always correctly display the terminal size from
its main loop (i.e. not from a signal handler, since doing complex
things from a signal handler is just plain wrong). My only idea is
the following: create a pipe, the sigwinch handler writes a byte
into it, and the select() call checks for its reading end in
addition to the other file descriptors it is interested in. This
way the select() call immediately exits if there's an unprocessed
sigwinch event, no matter if it occured before or during this
select call. And of course we have to set the close-on-exec flag on
these fd's so that subprocesses don't inherit them.

Comment 2 by Egmont Koblinger <egmont> at Tue 10 Oct 2006 03:40:51 PM UTC:

Oh, I just found the pselect() call which is supposed to solve this
problem. However, its manpage says the Linux kernel supports this
only since 2.6.16 which is a quite new piece. It says that glibc
had an emulation which is vulnerable to the race condition I just
mentioned and pselect was just introduced for.
The manpage also mentions and recommends the self-pipe trick which
is more portable.

Comment 3 by Egmont Koblinger <egmont> at Tue 10 Oct 2006 04:22:16 PM UTC:

Added a resize.patch. This does not suffer from the new bug, though
I don't understand what made a difference.

Still a rewrite to using pipes is needed to fill a minor race condition.

Comment 4 by Leonard den Ottolander <leonardjo> at Fri 03 Nov 2006 07:14:55 PM UTC:

Shall I commit this patch or should I wait for the piped version?

Comment 5 by Egmont Koblinger <egmont> at Mon 06 Nov 2006 12:01:58 PM UTC:

Please commit it if it seems to be okay for you (I've been using mc
4.6.1 with this patch for a month, and found no side effects, while
it makes mc much better when resizing the terminal). But please
don't yet close this bugreport to remind us that a proper fix is
still needed. This patch only makes it much better, but still
there's a small chance for the bug to appear. Until we have
a proper fix, it's good to apply a temporary hack to heavily
decrease the chance for the bug.

Comment 6 by Leonard den Ottolander <leonardjo> at Mon 06 Nov 2006 10:24:35 PM UTC:

As I understand your patch/hack, eliminating the timeouts on a
window resize event significantly reduces the chance mc is still
waiting inside the get_event() loop when a new resize event occurs.

Comment 7 by Leonard den Ottolander <leonardjo> at Wed 08 Nov 2006 01:38:24 PM UTC:

Committed. Quite an improvement in behaviour. Thank you.

Comment 8 by Denis Vlasenko <vda> at Mon 01 Oct 2007 03:21:44 PM UTC:

If you run without mouse support (mc -d), mc doesn't detect resizes.

You need to press a key for mc to resize itself to new window after
window is maximized or resized (press up/down arrow, Ctrl-O, almost
anything will work).

Tested on 4.6.2-pre1.

Comment 9 by Oswald Buddenhagen <ossi> at Mon 01 Oct 2007 09:03:01 PM UTC:

when a resize is sent before mc is completely up, it doesn't get
the geometry right, either - and yes, this happens about every time
an xterm -e mc is restored by session management on my system. so
the signal handler should be set up before the initial geometry is
queried.

Comment 10 by Pavel Tsekov <ptsekov> at Tue 02 Oct 2007 09:18:19 AM UTC:

Please, do not steal existing bug reports.

Attachments

10939-resize.patch (1.3 KB) - added by slavazanko 11 years ago.

Change History

comment:1 Changed 50 years ago by andrew_b

  • Blocking 2198 added

(In #2198) Replying to Spinal:

That's probably the bug introduced in new-generation of midnight commander, developed by Slavaz.

Definetely, no.

I don't have the same bug in mc-4.6.2-pre1 version used on my Debian Lenny server.

Strange. I've tried 4.6.1 from repo, and it doesn't work also. It seems, subshell resizing in NCurses-based mc didn't work ever.

Replying to egmont:

I've tested it with the following 4x2 combinations, on Mac OS 10.5:
{ 4.6.1, 4.6.2, 4.7.0.5, 4.7.2 } x { slang, ncurses }

It's always the ncurses build that doesn't resize properly, with any of those mc versions.

Correct, it is an NCurses-related bug only. But implementation of window resize handling in S-Lang-based mc is not quite correct.

In general, the implementation of SIGWINCH signal handling in mc should be rewritten. One of possible way is described in #117.

Changed 11 years ago by slavazanko

comment:1 Changed 10 years ago by styx

  • Milestone set to future releases

comment:2 Changed 10 years ago by andrew_b

  • severity set to no branch
  • Description modified (diff)

comment:3 Changed 9 years ago by ossi

i had a quick look at the code, and it is the very definition of "mess". :)

what should be used here is the so-called signal self-pipe trick which is employed to serialize signals into the regular event loop.
a pretty minimal qt-based example can be seen at http://websvn.kde.org/trunk/KDE/kdebase/workspace/kscreensaver/libkscreensaver/main.cpp?r1=737514&r2=737513&pathrev=737514&view=patch

that would also remove the need for "minimal" resize handling and special-casing of slang vs. ncurses. the current code isn't signal-safe anyway - you may still get a random crash during resizing.

the same also applies to the sigchld handler (or rather, the two of them, which is a problem in itself). so maybe it would be appropriate to re-purpose this bug to a general "clean up signal handling".

comment:4 Changed 9 years ago by ossi

  • Cc ossi@… added

comment:5 Changed 9 years ago by zaytsev

  • Cc zaytsev added
  • Version changed from 4.6.1 to master
  • Description modified (diff)
  • Milestone changed from Future Releases to 4.7

Moved an extra bit from #368 to this report.

comment:6 Changed 8 years ago by andrew_b

  • Branch state set to no branch
  • Milestone changed from 4.7 to Future Releases

comment:7 Changed 7 years ago by andrew_b

  • Blocking 2198 removed

comment:8 follow-up: ↓ 9 Changed 6 years ago by ossi

to fix this bug properly, one needs to make some improvements to the main loop.
what do you guys think of switching to the glib mainloop and using proper socket notifiers and timers instead of the current handcrafted solution?

comment:9 in reply to: ↑ 8 Changed 6 years ago by andrew_b

Replying to ossi:

what do you guys think of switching to the glib mainloop?

Yes, we think about that.

comment:10 Changed 6 years ago by ossi

  • Cc ossi added; ossi@… removed
  • Description modified (diff)
  • Reporter changed from slavazanko to egmont

comment:11 Changed 6 years ago by ossi

  • Description modified (diff)

comment:12 Changed 6 years ago by egmont

  • Cc egmont@… added
  • Priority changed from major to minor

Now that pselect() is widely available, one quick and easy way to properly fix the problem is to change the main select() call to a pselect() which blocks/unblocks sigwinch. Glib mainloop would of course be much nicer, but probably more work. For what it's worth, the bug never occurred to me since it was "almost fixed", the race condition window is so small.

comment:13 Changed 6 years ago by ossi

i don't really like that - the critical sections would have to be quite large, and it's a nightmare to make sure everything is covered.

note that the glib mainloop is no hard precondition. i just wouldn't want to think of hacking the existing mainloop(s) if a cleaner solution is planned anyway.

Note: See TracTickets for help on using tickets.