Ticket #1629 (closed defect: fixed)

Opened 10 years ago

Last modified 8 years ago

[Patch] Problems displaying UTF-8 manual pages

Reported by: egmont Owned by: slavazanko
Priority: major Milestone: 4.8.0-pre1
Component: mcview Version: 4.7.0-pre3
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset: commited-master commited-stable

Description

Formatted manual pages using UTF-8 charset are handled incorrectly by mc-4.7-pre2.

Problem 1: Accented characters in bold (yellow) text are displayed incorrectly: the letter, followed by a dot, then the letter once again, all in white.

Problem 2: Search does not find text that are bold or underlined (yellow or red) and contain accented letters. Exception: it finds if only the first character is accented, but highlights the match incorrectly.

I attach such a formatted manpage (emitted by groff-utf8) for your convenience - you might have to press F9 in mcview.

Attachments

chsh-man-hu-formatted.txt (1.7 KB) - added by egmont 10 years ago.
Formatted UTF-8 manual page
chsh-man-hu-formatted.zip (1.0 KB) - added by egmont 10 years ago.
Same in .zip so that you can download it
mc-4.7-pre2-utf8-nroff-proof-of-concept.patch (1.2 KB) - added by egmont 10 years ago.
proof of concept only - not a ready patch
mc-4.7.0.2-utf8-nroff-proof-of-concept.patch (1.2 KB) - added by egmont 10 years ago.

Change History

Changed 10 years ago by egmont

Formatted UTF-8 manual page

Changed 10 years ago by egmont

Same in .zip so that you can download it

comment:1 Changed 10 years ago by andrew_b

  • Component changed from mc-core to mcview

comment:2 Changed 10 years ago by egmont

I attach a proof of concept patch. This patch fixes displaying UTF-8 nroff pages, but probably break non-utf8 ones. It doesn't address the issues with search either.

So this patch is not to be applied in its current form, it needs more work. It just shows where the bugs are: "from - 2" obviously assumes single byte locales, so does mcview_get_byte to get c_next, as well as g_ascii_isprint (there's a fixme entry about it somewhere else in that file).

Changed 10 years ago by egmont

proof of concept only - not a ready patch

comment:3 Changed 10 years ago by slavazanko

  • Status changed from new to accepted
  • Owner set to slavazanko
  • severity changed from no branch to on rework
  • Milestone changed from 4.7 to 4.7.0-pre4

created branch 1629_nroff_utf8 (parent: master)

changeset:7adcdc570a07353ad64bc017352eb245dff3753b

Please, test it. I'm tested on:

*) with option --enable-charset in utf-8 codepage
*) with option --enable-charset in one-byte codepage
*) with option --disable-charset in utf-8 codepage (old behavior like no patch)
*) with option --disable-charset one-byte codepage (old behavior like no patch)

With '--disable-charset' option mans looks as no patch present.

Keep here your comments about this, please.

comment:4 Changed 10 years ago by egmont

Haven't tested, but the patch looks good. One question, though:

What's the point in reverting c_prev's assignment for the "no charset" case? I think my version is cleaner here: the source is shorter (you save one mcview_get_byte() call and one #ifdef), plus, the charset and the no charset code becomes much more similar to each other. I think having two different code paths for setting c_prev makes the code harder to understand and maintain.

comment:5 Changed 10 years ago by slavazanko

  • Version changed from 4.7.0-pre2 to 4.7.0-pre3
  • severity changed from on rework to on review

Branch rebased. Also, fixed c_prev's assignment

initial changeset:95311b5e5a0cc8d0c4b79a93dabe906319fe533c

Review, Please.

comment:6 Changed 10 years ago by egmont

I cannot compile&run right now, but thoroughly looking at the patch it looks good. If you've tested, please go ahead and apply, thanks!

Please don't forget that it only addresses the first issue in this bug, not the second one. That one's still open.

comment:7 Changed 10 years ago by slavazanko

  • severity changed from on review to on rework

yep, i forgot about second part of bugreport :)

comment:8 Changed 10 years ago by slavazanko

  • Milestone changed from 4.7.0-pre4 to 4.7

comment:9 Changed 10 years ago by egmont

Friendly ping... What happened to the changesets created by slavazanko? They are no longer accessible.

For your convenience I attach an updated version of my patch (rebased on 4.7.0.2).

Even though this patch only fixes the first part of the bug, I'd appreciate if you went ahead and submitted the fix. Later we can build on top of this and fix searching in manpages too.

Changed 10 years ago by egmont

comment:10 Changed 9 years ago by egmont

  • Summary changed from Problems displaying UTF-8 manual pages to [Patch] Problems displaying UTF-8 manual pages

Ping...

You nearly submitted the patch 6 months ago, and since then nothing happened. Moreover, your changesets are no longer available - why?

I'd be happy to work on the second part of the story, but only if the first part is already submitted - I don't have time to waste for creating patches if they are ignored by the developers. Thanks for your understanding.

comment:11 Changed 8 years ago by slavazanko

  • severity changed from on rework to on review
  • Milestone changed from 4.7 to 4.8.0-pre1

Recreated branch changeset:1629_nroff_utf8.

Initial changeset:f5055c78684694a7a81575c30ee9815927b81d4c

Branch have resolutions of two issues:

  • show UTF-8 symbols in nroff'ed text and
  • UTF-8 search over nroff'ed text.

Review, please

comment:12 Changed 8 years ago by slavazanko

  • Keywords stable-candidate added

comment:13 Changed 8 years ago by slavazanko

  • Blocking 265 added

comment:14 Changed 8 years ago by angel_il

  • Votes for changeset set to angel_il

comment:15 Changed 8 years ago by andrew_b

  • Votes for changeset changed from angel_il to angel_il andrew_b
  • severity changed from on review to approved

comment:16 Changed 8 years ago by slavazanko

  • Status changed from accepted to testing
  • Votes for changeset changed from angel_il andrew_b to commited-master
  • Resolution set to fixed
  • Blocking 265 removed
  • severity changed from approved to merged

Merge changeset:d7fd84507c063adb79b02b544d07203b73518831

Commits:

git log --pretty=oneline 7d52669..79660a7

comment:17 Changed 8 years ago by slavazanko

  • Keywords stable-candidate removed
  • Status changed from testing to closed
  • Votes for changeset changed from commited-master to commited-master commited-stable
  • Branch state set to no branch

cherry-picked in stable branch:

git log --pretty=oneline 84cb81e5..52047442
Note: See TracTickets for help on using tickets.