Ticket #3984 (new defect)
mcdiff Segmentation fault in ydiff.c when calling g_utf8_offset_to_pointer()
Reported by: | Bernward | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | Future Releases |
Component: | mcdiff | Version: | 4.8.22 |
Keywords: | openbsd | Cc: | robert@…, egmont, howaboutsynergy@… |
Blocked By: | Blocking: | ||
Branch state: | no branch | Votes for changeset: |
Description
The use of mcdiff, from command line as well as from the menu, often leads to a crash of mc with the message "Segmentation fault (core dumped)". This happens sooner with bigger windows and bigger files. I append a test case of two files a and b, which crashes always immediately at the command line "mcdiff a b" in an xterm W:1264 H:992.
Back Trace:
(gdb) bt #0 0x000009234c7dafad in g_utf8_offset_to_pointer (str=Variable "str" is not available. ) at ../glib-2.58.3/glib/gutf8.c:358 #1 0x00000920d374b036 in dview_str_utf8_offset_to_pos (text=0x92397f3efe0 "aaaaaaaaaa", length=103) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/diffviewer/ydiff.c:622 #2 0x00000920d374a4d5 in dview_display_file (dview=Variable "dview" is not available. ) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/diffviewer/ydiff.c:2661 #3 0x00000920d3745966 in dview_callback (w=0x923cfe20400, sender=Variable "sender" is not available. ) at charsets.h:91 #4 0x00000920d36d36d6 in dlg_init (h=0x922d9e6f900) at widget-common.h:210 #5 0x00000920d36d4055 in dlg_run (h=0x922d9e6f900) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/lib/widget/dialog.c:1192 #6 0x00000920d3744dbf in dview_diff_cmd (f0=Variable "f0" is not available. ) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/diffviewer/ydiff.c:3484 #7 0x00000920d36f0809 in do_nc () at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/filemanager/midnight.c:999 #8 0x00000920d36b77f1 in main (argc=3, argv=0x7f7ffffce828) at /usr/ports/pobj/mc-4.8.22/mc-4.8.22/src/main.c:409 (gdb) quit
In https://developer.gnome.org/glib/stable/glib-Unicode-Manipulation.html#g-utf8-offset-to-pointer, I found the following:
Note that this function doesn't abort when reaching the end of str . Therefore you should be sure that offset is within string boundaries before calling that function. Call g_utf8_strlen() when unsure.
So the call of function g-utf8-offset-to-pointer() could be a problem. I made some test about this:
- Try: I replaced the body of dview_str_utf8_offset_to_pos() by {return length;}. This worked without crashes, but the lines in the diff view were wrongly adjusted by the count of 2-byte-characters in the line.
- My conclusion is, that the task of dview_str_utf8_offset_to_pos() is to return a kind of line-adjusting factor. This factor should be a little bit greater than the variable 'length', just by the count of 2-byte-characters in the line. As the glib manual as cited above recommends the use of g_utf8_strlen(), I tried to use it to avoid the crash. But then there is no more need to use dview_str_utf8_offset_to_pos(), because the line-adjusting factor may be calculated by:
{ return length + ( length - g_utf8_strlen (text, length) ); }
I integrated this as in the patch appended, it works with correct line lengths and without crashes.
- In ydiff.c are 2 calls to g-utf8-offset-to-pointer(), both in dview_str_utf8_offset_to_pos(), at lines 622 and 640. My patch replace the second occurrence similar as the first, but I do not really understand what happens there and I did not test anything of the second replacement at line 640.
- The potentially risky function "g_utf8_offset_to_pointer" is also used in lib/strutil/strutilutf8.c at lines 897 and 905. Maybe, that should be checked too.
I am using OpenBSD 6.5 on GIGABYTE GB-BXBT-2807, here some infos from my system:
$ LC_MESSAGES=C mc -V GNU Midnight Commander 4.8.22 Built with GLib 2.58.3 Using the S-Lang library with terminfo database With builtin Editor With subshell support as default With support for background operations With mouse support on xterm With support for X11 events With internationalization support With multiple codepages support Virtual File Systems: cpiofs, tarfs, sfs, extfs, ftpfs, sftpfs, fish, smbfs Data types: char: 8; int: 32; long: 64; void *: 64; size_t: 64; off_t: 64; $ uname -a OpenBSD Brix 6.5 GENERIC.MP#0 amd64 $ LC_MESSAGES=C mc -F Home directory: /home/bernward Profile root directory: /home/bernward [System data] Config directory: /etc/mc/ Data directory: /usr/local/share/mc/ File extension handlers: /usr/local/libexec/mc/ext.d/ VFS plugins and scripts: /usr/local/libexec/mc/ extfs.d: /usr/local/libexec/mc/extfs.d/ fish: /usr/local/libexec/mc/fish/ [User data] Config directory: /home/bernward/.config/mc/ Data directory: /home/bernward/.local/share/mc/ skins: /home/bernward/.local/share/mc/skins/ extfs.d: /home/bernward/.local/share/mc/extfs.d/ fish: /home/bernward/.local/share/mc/fish/ mcedit macros: /home/bernward/.local/share/mc/mc.macros mcedit external macros: /home/bernward/.local/share/mc/mcedit/macros.d/macro.* Cache directory: /home/bernward/.cache/mc/ $ mc --configure-options '--with-subshell' '--enable-vfs-sftp' '--enable-vfs-smb' '--enable-charset' '--with-slang-includes=/usr/local/include' 'CONFIGURE_ENV=' 'LOCALBASE=/usr/local' 'LIBS=-lm' '--prefix=/usr/local' '--sysconfdir=/etc' '--mandir=/usr/local/man' '--infodir=/usr/local/info' '--localstatedir=/var' '--disable-silent-rules' '--disable-gtk-doc' 'CC=cc' 'CFLAGS=-O2 -pipe'
Best regards, Bernward.
Attachments
Change History
Changed 6 years ago by Bernward
- Attachment patch-src_diffviewer_ydiff_c_proposal added
patch proposal for src/diffviewer/ydiff.c
comment:1 Changed 6 years ago by egmont
- Cc egmont added
Thanks for the report!
I'm sorry guys that I don't have time to take a closer look, but I'd like to point out that:
just by the count of 2-byte-characters in the line
UTF-8 characters can be longer than 2 bytes. What about them, are they correctly addressed?
+ return 2 * length - result;
This is extremely fishy to me. What's the exact semantic of the variables that require such a weird formula? I can't easily imagine it.
Also, a variable shouldn't be called "result" if its value is not the result (or extremely closely related to, as in the code line that gets removed in the patch).
test file a