Ticket #2681 (closed defect: invalid)

Opened 8 years ago

Last modified 8 years ago

Hungarian translation seems to be partially double latin2->utf8 converted

Reported by: nice0051 Owned by:
Priority: minor Milestone:
Component: translations Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

mc menu seems to be correct, however the man page and hints seem to be double converted, which means for example:

The hungarian translation of "Force black and white display" in the man page look like "Fekete-fehĂŠr megjelenĂtĂŠs kĂŠrĂŠse" instead of "Fekete-fehér megjelenítés kérése". An utf8 -> latin2 conversion could probably help in this situation.

Change History

comment:1 Changed 8 years ago by nice0051

BTW I'm running mc 4.7.5 on openSUSE and my env contains this variable:

LANG=hu_HU.UTF-8

comment:2 follow-up: ↓ 4 Changed 8 years ago by andrew_b

Do you build mc yourself or use package form openSUSE repo?

Last edited 8 years ago by andrew_b (previous) (diff)

comment:3 Changed 8 years ago by egmont

A couple of years ago I spent some time studying how accents (and especially UTF-8) was supposed to work with manpages. The situation was utterly broken, and I don't expect it to be much better now.

Manpages of mc-4.8.0 are properly encoded in UTF-8. I haven't checked older versions.

Unfortunately the story about what charset applications (i.e. manpage viewers) expect for the manpages is unclear. There are at least 2 competing command line "man" applications, distributions choose one of them randomly. These two work differently. At least one of them, maybe both, are backed up by "groff" and friends which didn't support UTF-8 for a long time. Some might try to guess the charset based on the file's content. Some others might derive it from the directory name, e.g. /usr/share/man/hu contains files in Latin-2 while /usr/share/man/hu.utf8 contains UTF-8 encoding, or something along these lines. And so far we haven't talked about other (graphical) manpage viewers.

The situation is so complicated that basically there's no way for MC to automatically install the manpages properly. It depends on a very complex setup of man/groff applications, their implementations, version numbers, config options etc. to install them properly, so this should be the job of distribution maintainers. They know what encodings their man system expects, their build system could do automatic suggestions (e.g. automatically check for double-UTF8 and force the package maintainer to fix it).

There's one thing MC could do:

Beginning with groff-1.20.1 (or so) it recognizes if the codeset is mentioned in the manpage, namely either the first or the second line has to be this:

.\" -*- mode: troff; coding: utf-8 -*-

and other conditions have to be met (such as cmdline options to groff to make sure it invokes preconv), see the manpage of "preconv" inside groff distribution for details. MC could add this line; this would fix the characters of manpages on /some/ setups.

(nice0051: just for curiosity, could you please verify if adding this as the first line of your manpage fixes the problem for you? köszi :))

comment:4 in reply to: ↑ 2 ; follow-up: ↓ 7 Changed 8 years ago by nice0051

Replying to andrew_b:

Do you build mc yourself or use package form openSUSE repo?

I use the standard openSUSE packages.

comment:5 Changed 8 years ago by nice0051

OK, I did this:

cd /usr/share/man/hu/man1
echo '.\" -*- mode: troff; coding: utf-8 -*-' > mc_b.1
gzip -dc mc.1.gz >> mc_b.1
gzip mc_b.1
man -l mc_b.1.gz

But it didn't help.

comment:6 Changed 8 years ago by nice0051

However, doing this, definitely helps:

gzip -dc mc.1.gz | iconv -f utf-8 -t iso-8859-2 > mc_b.1
gzip mc_b.1
man -l mc_b.1.gz

comment:7 in reply to: ↑ 4 Changed 8 years ago by andrew_b

Replying to nice0051:

Replying to andrew_b:

Do you build mc yourself or use package form openSUSE repo?

I use the standard openSUSE packages.

Please check whether the extra conversion is made in spec file.

comment:8 follow-up: ↓ 9 Changed 8 years ago by egmont

Could you please check whether the actual mc.1 file is encoded in UTF-8 or in some erroneous "double UTF-8"? It would be nice to know if the file is mangled, or if it's runtime man displaying pipeline (groff etc.) that mangles it.

Anyway, this pretty much sounds like something you should report to the opensuse folks and they should fix it. Converting manpages back from modern utf-8 to legacy crappy latin-x just to please broken manpage viewers is IMO not something mainstream MC should do.

comment:9 in reply to: ↑ 8 Changed 8 years ago by nice0051

Replying to egmont:

Could you please check whether the actual mc.1 file is encoded in UTF-8 or in some erroneous "double UTF-8"? It would be nice to know if the file is mangled, or if it's runtime man displaying pipeline (groff etc.) that mangles it.

You're right. mc.spec is full of iconv commands. It's a SuSE specific problem.

https://bugzilla.novell.com/show_bug.cgi?id=733686

comment:10 Changed 8 years ago by andrew_b

  • Keywords hungarian utf8 removed
  • Status changed from new to closed
  • Resolution set to invalid
  • Milestone Future Releases deleted

Thanks!

Closed.

Note: See TracTickets for help on using tickets.