Ticket #3984 (new defect)

Opened 4 months ago

Last modified 4 months ago

mcdiff Segmentation fault in ydiff.c when calling g_utf8_offset_to_pointer()

Reported by: Bernward Owned by:
Priority: critical Milestone: Future Releases
Component: mcdiff Version: 4.8.22
Keywords: openbsd Cc: robert@…, egmont, howaboutsynergy@…
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

The use of mcdiff, from command line as well as from the menu, often leads to a crash of mc with the message "Segmentation fault (core dumped)". This happens sooner with bigger windows and bigger files. I append a test case of two files a and b, which crashes always immediately at the command line "mcdiff a b" in an xterm W:1264 H:992.

Back Trace:

(gdb) bt
#0  0x000009234c7dafad in g_utf8_offset_to_pointer (str=Variable "str" is not available.
) at ../glib-2.58.3/glib/gutf8.c:358
#1  0x00000920d374b036 in dview_str_utf8_offset_to_pos (text=0x92397f3efe0 "aaaaaaaaaa", length=103) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/diffviewer/ydiff.c:622
#2  0x00000920d374a4d5 in dview_display_file (dview=Variable "dview" is not available.
) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/diffviewer/ydiff.c:2661
#3  0x00000920d3745966 in dview_callback (w=0x923cfe20400, sender=Variable "sender" is not available.
) at charsets.h:91
#4  0x00000920d36d36d6 in dlg_init (h=0x922d9e6f900) at widget-common.h:210
#5  0x00000920d36d4055 in dlg_run (h=0x922d9e6f900) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/lib/widget/dialog.c:1192
#6  0x00000920d3744dbf in dview_diff_cmd (f0=Variable "f0" is not available.
) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/diffviewer/ydiff.c:3484
#7  0x00000920d36f0809 in do_nc () at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/filemanager/midnight.c:999
#8  0x00000920d36b77f1 in main (argc=3, argv=0x7f7ffffce828) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/main.c:409
(gdb) quit

In https://developer.gnome.org/glib/stable/glib-Unicode-Manipulation.html#g-utf8-offset-to-pointer, I found the following:

Note that this function doesn't abort when reaching the end of str . Therefore you should be sure that offset is within string boundaries before calling that function. Call g_utf8_strlen() when unsure.

So the call of function g-utf8-offset-to-pointer() could be a problem. I made some test about this:

  1. Try: I replaced the body of dview_str_utf8_offset_to_pos() by {return length;}. This worked without crashes, but the lines in the diff view were wrongly adjusted by the count of 2-byte-characters in the line.
  1. My conclusion is, that the task of dview_str_utf8_offset_to_pos() is to return a kind of line-adjusting factor. This factor should be a little bit greater than the variable 'length', just by the count of 2-byte-characters in the line. As the glib manual as cited above recommends the use of g_utf8_strlen(), I tried to use it to avoid the crash. But then there is no more need to use dview_str_utf8_offset_to_pos(), because the line-adjusting factor may be calculated by:
    { return length + ( length - g_utf8_strlen (text, length) ); }
    
    I integrated this as in the patch appended, it works with correct line lengths and without crashes.
  1. In ydiff.c are 2 calls to g-utf8-offset-to-pointer(), both in dview_str_utf8_offset_to_pos(), at lines 622 and 640. My patch replace the second occurrence similar as the first, but I do not really understand what happens there and I did not test anything of the second replacement at line 640.
  1. The potentially risky function "g_utf8_offset_to_pointer" is also used in lib/strutil/strutilutf8.c at lines 897 and 905. Maybe, that should be checked too.

I am using OpenBSD 6.5 on GIGABYTE GB-BXBT-2807, here some infos from my system:

$ LC_MESSAGES=C mc -V
GNU Midnight Commander 4.8.22
Built with GLib 2.58.3
Using the S-Lang library with terminfo database
With builtin Editor
With subshell support as default
With support for background operations
With mouse support on xterm
With support for X11 events
With internationalization support
With multiple codepages support
Virtual File Systems: cpiofs, tarfs, sfs, extfs, ftpfs, sftpfs, fish, smbfs
Data types: char: 8; int: 32; long: 64; void *: 64; size_t: 64; off_t: 64;

$ uname -a
OpenBSD Brix 6.5 GENERIC.MP#0 amd64

$ LC_MESSAGES=C mc -F
Home directory: /home/bernward
Profile root directory: /home/bernward

[System data]
    Config directory: /etc/mc/
    Data directory:   /usr/local/share/mc/
    File extension handlers: /usr/local/libexec/mc/ext.d/
    VFS plugins and scripts: /usr/local/libexec/mc/
        extfs.d:        /usr/local/libexec/mc/extfs.d/
        fish:           /usr/local/libexec/mc/fish/

[User data]
    Config directory: /home/bernward/.config/mc/
    Data directory:   /home/bernward/.local/share/mc/
        skins:          /home/bernward/.local/share/mc/skins/
        extfs.d:        /home/bernward/.local/share/mc/extfs.d/
        fish:           /home/bernward/.local/share/mc/fish/
        mcedit macros:  /home/bernward/.local/share/mc/mc.macros
        mcedit external macros: /home/bernward/.local/share/mc/mcedit/macros.d/macro.*
    Cache directory:  /home/bernward/.cache/mc/

$ mc --configure-options
 '--with-subshell' '--enable-vfs-sftp' '--enable-vfs-smb' '--enable-charset' '--with-slang-includes=/usr/local/include' 'CONFIGURE_ENV=' 'LOCALBASE=/usr/local' 'LIBS=-lm' '--prefix=/usr/local' '--sysconfdir=/etc' '--mandir=/usr/local/man' '--infodir=/usr/local/info' '--localstatedir=/var' '--disable-silent-rules' '--disable-gtk-doc' 'CC=cc' 'CFLAGS=-O2 -pipe'

Best regards, Bernward.

Attachments

a (2 bytes) - added by Bernward 4 months ago.
test file a
b (704 bytes) - added by Bernward 4 months ago.
test file b
patch-src_diffviewer_ydiff_c_proposal (672 bytes) - added by Bernward 4 months ago.
patch proposal for src/diffviewer/ydiff.c

Change History

Changed 4 months ago by Bernward

  • Attachment a added

test file a

Changed 4 months ago by Bernward

  • Attachment b added

test file b

Changed 4 months ago by Bernward

patch proposal for src/diffviewer/ydiff.c

comment:1 Changed 4 months ago by egmont

  • Cc egmont added

Thanks for the report!

I'm sorry guys that I don't have time to take a closer look, but I'd like to point out that:

just by the count of 2-byte-characters in the line

UTF-8 characters can be longer than 2 bytes. What about them, are they correctly addressed?

+ return 2 * length - result;

This is extremely fishy to me. What's the exact semantic of the variables that require such a weird formula? I can't easily imagine it.

Also, a variable shouldn't be called "result" if its value is not the result (or extremely closely related to, as in the code line that gets removed in the patch).

comment:2 Changed 4 months ago by howaboutsynergy

  • Cc howaboutsynergy@… added
Note: See TracTickets for help on using tickets.