Ticket #1539 (closed defect: fixed)
Dealing with utf-8 man pages in view/open
Reported by: | dmartina | Owned by: | slavazanko |
---|---|---|---|
Priority: | minor | Milestone: | 4.8.13 |
Component: | mcview | Version: | 4.7.0-pre1 |
Keywords: | utf8 man | Cc: | dmartina@…, egmont@… |
Blocked By: | Blocking: | ||
Branch state: | merged | Votes for changeset: | committed-master |
Description
Weird characters are displayed when viewing/opening man page files.
Attachments
Change History
comment:4 Changed 13 years ago by andrew_b
- Branch state set to no branch
- Milestone changed from 4.7 to Future Releases
comment:5 Changed 12 years ago by lemzwerg
Ticket #2922 gives a better solution which seems to be portable.
comment:6 Changed 10 years ago by egmont
- Cc egmont@… added
Friendly ping :)
After 5 years, this is still an issue.
On Ubuntu 14.04, standing in <mc_source>/doc/man/ru, typing "man ./mc.1" brings up the manual correctly in less, but "mcview mc.1" (or F3 in mc) does something quite broken.
Since creating the ticket, UTF-8 became way more adopted and is definitely the standard by now. Also, systems (at least Linuxes) have upgraded their groff package to a new version that properly supports UTF-8.
You can just type "man mc" or similar in the command line, and all the accents appear correctly at least for those languages that are supported by all graphical terminal emulators nowadays: left-to-right languages without combining characters (e.g. latin, cyrillic, greek, CJK scripts).
This should work equally good, out of the box in mc in UTF-8 environments. (With other locales or legacy systems, it's a nice bonus if we can get them to work, but way less important than UTF-8 and is getting less and less important day by day.)
comment:8 Changed 10 years ago by egmont
This is a demo fix that works for me and fixes the accents on Ubuntu Trusty (man-db 2.6.7.1), in UTF-8 environment when you press F3 on a manual page file.
The whole man-zsoelim-tbl-eqn-troff-nroff-idontknowwhat pipeline is terribly complicated (I don't understand it at all), and IMO one of the worst parts of Unix system and should have died out decades ago. It didn't, so we have to live with this...
But, understanding the pipeline and starting in the middle leads to something that probably noone understands and has other subtle bugs (e.g. #2921).
So, in my opinion the best we can do is not to care about any of the internals, just use the most user-facing frontend: the "man" command. This is the command that knows how to take care of everything: invoking the correct filters, handling the charset correctly, etc.
Luckily "man" has an option ("-l") to take a local file rather than looking up the manpage along the standard manpath.
When the output is not a tty (which is the case here), "man" seems to ignore the pager and remove all formatting by default. The option "-P cat" is hence totally useless, but it's a nice safeguard against possible different man implementations, to make sure they don't mess up anything if they invoke the pager.
The environment variable MAN_KEEP_FORMATTING forces "man" to keep the formatting sequences for bold and underlined, even if the output is not a tty.
I don't know if all "man" implementation support the "-l" flag. If not, we need ugly conditions in configure. If yes, we should probably remove checking for nroff from configure, and remove manual invocations of nroff througout the source (that is, change all the code following the current patch's spirit).
We should check if we should pass -D to man to make it more robust (ignore MANOPTS). Also, we should find the option that guarantees that it produces the old-fashioned codes for bold and underlined (as it does by default) rather than real ANSI color escape sequences (which it can somehow be configured to do -- but for mc we should force not to do it).
comment:9 follow-up: ↓ 12 Changed 10 years ago by egmont
Note that a very similar patch in #3243 causes the manpage to be formatted to match the terminal's width there, whereas in this ticket the manpage is formatted for 80 column. I don't know why.
comment:10 follow-up: ↓ 11 Changed 10 years ago by lemzwerg
Regarding the troff pipeline: This is the very reason why there exists the groff program: It constructs the necessary calls of the pipeline in the right order.
Basically, using man seems to be a good option. On the other hand, it's an additional dependency, but I guess that people who are going to look for man pages do have man installed...
comment:11 in reply to: ↑ 10 Changed 10 years ago by egmont
Replying to lemzwerg:
This is the very reason why there exists the groff program: It constructs the necessary calls of the pipeline in the right order.
I'm open to any solution that's better than mine :) If you could some up with a patch using groff rather than man, that would be great.
(This whole man pipeline has always been a mystery to me and I'm not planning to get any more familiar with it than absolutely necessary to find one working solution.)
comment:12 in reply to: ↑ 9 Changed 10 years ago by egmont
Replying to egmont:
[...] whereas in this ticket the manpage is formatted for 80 column.
So, with my patch, pressing F3 on a compressed manpage formats it to 80 columns, pressing F3 on an uncompressed manpage formats it according to the terminal's width.
Seems that "man" tries to figure out the width by first looking at $COLUMN, if it's not set then querying its stdin's tty settings, finally defaulting to 80.
The solution is either to modify my patch to uncompress to a temporary file and pass that file to man rather than feeding it on its stdin, or to modify mc to set $COLUMN for its child processes.
Anyway, it's a really minor issue compared to the original bug.
comment:13 Changed 10 years ago by egmont
Actually, "man" can take care of uncompressing the given file. This leads to the simplest possible solution for the width discrepancy, see the updated patch.
comment:14 Changed 10 years ago by egmont
Patch updated to make it work on Fedora 20 too. Unlike Ubuntu, Fedora's man uses the new-style ANSI color escape sequences for bold/underline rather than the backspace-overwrite sequence. To revert to the old-style backspace-overwrite sequence which is understood by mcview, a "-c" has to be passed to *roff.
comment:15 Changed 10 years ago by slavazanko
- Status changed from new to accepted
- Owner set to slavazanko
comment:16 Changed 10 years ago by slavazanko
- Branch state changed from no branch to on review
Created branch 1539_utf8_man initial changeset:6229a775353a2e0bfca8fcc402dbf8d2630df459.
Review please.
comment:17 Changed 10 years ago by slavazanko
- Votes for changeset set to slavazanko
- Branch state changed from on review to approved
comment:18 Changed 10 years ago by slavazanko
- Status changed from accepted to testing
- Votes for changeset changed from slavazanko to committed-master
- Resolution set to fixed
- Branch state changed from approved to merged
Merged to master. Merge changeset:903c5c926ddbad7b2d2716e5d169dad34422f4c8
comment:19 Changed 10 years ago by slavazanko
- Status changed from testing to closed
- Milestone changed from Future Releases to 4.8.13
comment:20 follow-up: ↓ 21 Changed 10 years ago by egmont
Hi Slava,
Could you please also take care of #3243? It's a very similar problem, with identical fix to this one.
There's also some configure check that verifies if nroff supports -c, I haven't paid attention to that. Maybe the hardcoded -c could be replaced by some @NROFF_WHATEVER@. Unfortunately I can't verify my patch on systems other than Ubuntu and Redhat (especially non-Linuxes).
comment:21 in reply to: ↑ 20 ; follow-up: ↓ 22 Changed 10 years ago by andrew_b
Replying to egmont:
There's also some configure check that verifies if nroff supports -c, I haven't paid attention to that. Maybe the hardcoded -c could be replaced by some @NROFF_WHATEVER@.
We already have check of nroff and it's flags in configure.ac (lines 62..110).
comment:22 in reply to: ↑ 21 Changed 10 years ago by egmont
Replying to andrew_b:
We already have check of nroff and it's flags in configure.ac (lines 62..110).
Yup, but I'm not using its result in my patch :( I haven't completed those bits, sorry.
That's why I think "-c" should be replaced by some placeholder in that patch. I'm not sure, I'm not an autoconf/automake magician.
(extensions file, tested in Ubuntu 8.04)
Changes in autoconf scripts are needed as this solution may not be portable to other systems