Ticket #1539 (closed defect: fixed)

Opened 10 years ago

Last modified 5 years ago

Dealing with utf-8 man pages in view/open

Reported by: dmartina Owned by: slavazanko
Priority: minor Milestone: 4.8.13
Component: mcview Version: 4.7.0-pre1
Keywords: utf8 man Cc: dmartina@…, egmont@…
Blocked By: Blocking:
Branch state: merged Votes for changeset: committed-master

Description

Weird characters are displayed when viewing/opening man page files.

Attachments

mc-1539-man-pipeline.patch (790 bytes) - added by egmont 5 years ago.
Demo fix
mc-1539-man-pipeline-v2.patch (931 bytes) - added by egmont 5 years ago.
Demo fix v2

Change History

comment:1 Changed 10 years ago by dmartina

  • Cc dmartina@… added

nroff filter was run with -Tlatin1. "man" could do the job by itself:

... { zsoelim %f 2>/dev/null || cat %f; } | man -l -Tutf8 - ;; esac

(extensions file, tested in Ubuntu 8.04)

Changes in autoconf scripts are needed as this solution may not be portable to other systems

Last edited 6 years ago by andrew_b (previous) (diff)

comment:2 Changed 10 years ago by angel_il

  • Milestone changed from 4.7.0-pre3 to 4.7.0-pre4

comment:3 Changed 9 years ago by slavazanko

  • Milestone changed from 4.7.0-pre4 to 4.7

comment:4 Changed 7 years ago by andrew_b

  • Branch state set to no branch
  • Milestone changed from 4.7 to Future Releases

comment:5 Changed 6 years ago by lemzwerg

Ticket #2922 gives a better solution which seems to be portable.

comment:6 Changed 5 years ago by egmont

  • Cc egmont@… added

Friendly ping :)

After 5 years, this is still an issue.

On Ubuntu 14.04, standing in <mc_source>/doc/man/ru, typing "man ./mc.1" brings up the manual correctly in less, but "mcview mc.1" (or F3 in mc) does something quite broken.

Since creating the ticket, UTF-8 became way more adopted and is definitely the standard by now. Also, systems (at least Linuxes) have upgraded their groff package to a new version that properly supports UTF-8.

You can just type "man mc" or similar in the command line, and all the accents appear correctly at least for those languages that are supported by all graphical terminal emulators nowadays: left-to-right languages without combining characters (e.g. latin, cyrillic, greek, CJK scripts).

This should work equally good, out of the box in mc in UTF-8 environments. (With other locales or legacy systems, it's a nice bonus if we can get them to work, but way less important than UTF-8 and is getting less and less important day by day.)

comment:7 Changed 5 years ago by egmont

See also ticket #3243 comment 1.

Last edited 5 years ago by egmont (previous) (diff)

Changed 5 years ago by egmont

Demo fix

comment:8 Changed 5 years ago by egmont

This is a demo fix that works for me and fixes the accents on Ubuntu Trusty (man-db 2.6.7.1), in UTF-8 environment when you press F3 on a manual page file.

The whole man-zsoelim-tbl-eqn-troff-nroff-idontknowwhat pipeline is terribly complicated (I don't understand it at all), and IMO one of the worst parts of Unix system and should have died out decades ago. It didn't, so we have to live with this...

But, understanding the pipeline and starting in the middle leads to something that probably noone understands and has other subtle bugs (e.g. #2921).

So, in my opinion the best we can do is not to care about any of the internals, just use the most user-facing frontend: the "man" command. This is the command that knows how to take care of everything: invoking the correct filters, handling the charset correctly, etc.

Luckily "man" has an option ("-l") to take a local file rather than looking up the manpage along the standard manpath.

When the output is not a tty (which is the case here), "man" seems to ignore the pager and remove all formatting by default. The option "-P cat" is hence totally useless, but it's a nice safeguard against possible different man implementations, to make sure they don't mess up anything if they invoke the pager.

The environment variable MAN_KEEP_FORMATTING forces "man" to keep the formatting sequences for bold and underlined, even if the output is not a tty.

I don't know if all "man" implementation support the "-l" flag. If not, we need ugly conditions in configure. If yes, we should probably remove checking for nroff from configure, and remove manual invocations of nroff througout the source (that is, change all the code following the current patch's spirit).

We should check if we should pass -D to man to make it more robust (ignore MANOPTS). Also, we should find the option that guarantees that it produces the old-fashioned codes for bold and underlined (as it does by default) rather than real ANSI color escape sequences (which it can somehow be configured to do -- but for mc we should force not to do it).

comment:9 follow-up: ↓ 12 Changed 5 years ago by egmont

Note that a very similar patch in #3243 causes the manpage to be formatted to match the terminal's width there, whereas in this ticket the manpage is formatted for 80 column. I don't know why.

comment:10 follow-up: ↓ 11 Changed 5 years ago by lemzwerg

Regarding the troff pipeline: This is the very reason why there exists the groff program: It constructs the necessary calls of the pipeline in the right order.

Basically, using man seems to be a good option. On the other hand, it's an additional dependency, but I guess that people who are going to look for man pages do have man installed...

comment:11 in reply to: ↑ 10 Changed 5 years ago by egmont

Replying to lemzwerg:

This is the very reason why there exists the groff program: It constructs the necessary calls of the pipeline in the right order.

I'm open to any solution that's better than mine :) If you could some up with a patch using groff rather than man, that would be great.

(This whole man pipeline has always been a mystery to me and I'm not planning to get any more familiar with it than absolutely necessary to find one working solution.)

comment:12 in reply to: ↑ 9 Changed 5 years ago by egmont

Replying to egmont:

[...] whereas in this ticket the manpage is formatted for 80 column.

So, with my patch, pressing F3 on a compressed manpage formats it to 80 columns, pressing F3 on an uncompressed manpage formats it according to the terminal's width.

Seems that "man" tries to figure out the width by first looking at $COLUMN, if it's not set then querying its stdin's tty settings, finally defaulting to 80.

The solution is either to modify my patch to uncompress to a temporary file and pass that file to man rather than feeding it on its stdin, or to modify mc to set $COLUMN for its child processes.

Anyway, it's a really minor issue compared to the original bug.

comment:13 Changed 5 years ago by egmont

Actually, "man" can take care of uncompressing the given file. This leads to the simplest possible solution for the width discrepancy, see the updated patch.

Changed 5 years ago by egmont

Demo fix v2

comment:14 Changed 5 years ago by egmont

Patch updated to make it work on Fedora 20 too. Unlike Ubuntu, Fedora's man uses the new-style ANSI color escape sequences for bold/underline rather than the backspace-overwrite sequence. To revert to the old-style backspace-overwrite sequence which is understood by mcview, a "-c" has to be passed to *roff.

comment:15 Changed 5 years ago by slavazanko

  • Owner set to slavazanko
  • Status changed from new to accepted

comment:16 Changed 5 years ago by slavazanko

  • Branch state changed from no branch to on review

Created branch 1539_utf8_man
initial changeset:6229a775353a2e0bfca8fcc402dbf8d2630df459.

Last edited 5 years ago by andrew_b (previous) (diff)

comment:17 Changed 5 years ago by slavazanko

  • Votes for changeset set to slavazanko
  • Branch state changed from on review to approved

comment:18 Changed 5 years ago by slavazanko

  • Status changed from accepted to testing
  • Votes for changeset changed from slavazanko to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

comment:19 Changed 5 years ago by slavazanko

  • Status changed from testing to closed
  • Milestone changed from Future Releases to 4.8.13

comment:20 follow-up: ↓ 21 Changed 5 years ago by egmont

Hi Slava,

Could you please also take care of #3243? It's a very similar problem, with identical fix to this one.

There's also some configure check that verifies if nroff supports -c, I haven't paid attention to that. Maybe the hardcoded -c could be replaced by some @NROFF_WHATEVER@. Unfortunately I can't verify my patch on systems other than Ubuntu and Redhat (especially non-Linuxes).

comment:21 in reply to: ↑ 20 ; follow-up: ↓ 22 Changed 5 years ago by andrew_b

Replying to egmont:

There's also some configure check that verifies if nroff supports -c, I haven't paid attention to that. Maybe the hardcoded -c could be replaced by some @NROFF_WHATEVER@.

We already have check of nroff and it's flags in configure.ac (lines 62..110).

comment:22 in reply to: ↑ 21 Changed 5 years ago by egmont

Replying to andrew_b:

We already have check of nroff and it's flags in configure.ac (lines 62..110).

Yup, but I'm not using its result in my patch :( I haven't completed those bits, sorry.

That's why I think "-c" should be replaced by some placeholder in that patch. I'm not sure, I'm not an autoconf/automake magician.

Note: See TracTickets for help on using tickets.