Ticket #2975 (closed defect: invalid)

Opened 12 years ago

Last modified 8 years ago

Accents in .html file are badly showned with internal viewer

Reported by: albator Owned by:
Priority: major Milestone:
Component: mc-core Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

1) I create a file named something.html (.html extension is important for this bug). Then, mcedit will know it is a file with HTML inside when I will use F3 editor.
2) I press F4 and put letters with accents in this file :
<html>
<head>
<title>test</title>
</head>
<body>
test échange paraître l'ouïe
</body>
</html>

3) I press F3 to see the file
4) I see "test A(c)change paraA(R)tre l'ouA-e" at the screen.

I should see "test échange paraître l'ouïe" but not "test A(c)change paraA(R)tre l'ouA-e".

It seems that the HTML interpreter doesn't work well with the F3 editor.

david@ordinateur ~/Documents/logiciels/COMPILIBRE $ mc -V
GNU Midnight Commander 4.8.3
Built with GLib 2.34.0
Using the S-Lang library with terminfo database
Avec l'éditeur intégré
With subshell support as default
Avec le support des opérations en tâche de fond
Avec support de la souris pour xterm et la console Linux
Avec le support des événements X11
Avec le support de l'internationalisation
Avec le support de plusieurs codepages
Virtual File Systems: cpiofs, tarfs, sfs, extfs, ext2undelfs, ftpfs, fish
Data types: char: 8; int: 32; long: 64; void *: 64; size_t: 64; off_t: 64;

Attachments

mcedit_bug_2.png (18.7 KB) - added by albator 12 years ago.
mcedit_bug_1.png (21.1 KB) - added by albator 12 years ago.

Change History

Changed 12 years ago by albator

Changed 12 years ago by albator

comment:1 Changed 12 years ago by andrew_b

  • Keywords mcedit html F3 F4 accent removed
  • Milestone changed from 4.8 to Future Releases
  • Version changed from 4.8.3 to master
  • Component changed from mcedit to mc-core
  • Summary changed from Accents in .html file are badly showned with F3 editor to Accents in .html file are badly showned with internal viewer

F3 calls viewer, not editor.

Duplicate of #.

Version 0, edited 12 years ago by andrew_b (next)

comment:2 Changed 12 years ago by andrew_b

Try latest release.

comment:3 in reply to: ↑ description Changed 12 years ago by andrew_b

Replying to albator:

3) I press F3 to see the file
4) I see "test A(c)change paraA(R)tre l'ouA-e" at the screen.

I should see "test échange paraître l'ouïe" but not "test A(c)change paraA(R)tre l'ouA-e".

It seems that the HTML interpreter doesn't work well with the F3 editor.

Press Shift-F3 on your html file. What do you see: correct text or not?

MC doesn't include its own HTML interpreter. It uses console web-browser to dump HTML to plain text. See handling HTML files in mc.ext (for mc < 4.8.4)

# html
regex/\.([hH][tT][mM][lL]?)$
        View=%view{ascii} links -dump %f 2>/dev/null || w3m -dump %f 2>/dev/null || lynx -dump -force_html %f

or the same in /usr/libexec/mc/ext.d/web.sh:

    case "${filetype}" in
    html)
        links -dump "${MC_EXT_FILENAME}" 2>/dev/null || \
            w3m -dump "${MC_EXT_FILENAME}" 2>/dev/null || \
            lynx -dump -force_html "${MC_EXT_FILENAME}"
        ;;

I your browser processes unicode text incorrectly, try another one.

comment:4 Changed 12 years ago by albator

When I press SHIFT-F3, I see all the text with HTML tags.

In /etc/mc/mx.ext, I've got :

regex/\.([hH][tT][mM][lL]?)$

<------>Open=(if test -n "" && test -n "$DISPLAY"; then ( file://%d/%p &) 1>&2; else links %f <------>View=%view{ascii} links -dump %f 2>/dev/null
lynx -force_html %f
w3m -dump %f 2>/dev/null lynx -dump -force_html %f

lynx wasn't installed on my GNU/Linux, so, I added it : apt-get install lynx.
Now, I've got always the same problem. I will try with another HTML editor.

comment:5 Changed 12 years ago by egmont

This is not really an MC bug, it's rather a user error.

MC works with the system's default codeset, which is most likely UTF-8 on your system, but can be overridden with Alt+E. The file is saved in this encoding, and is opened so if Shift+F3 is pressed.

HTML files, when rendered, are to be interpreted as Latin-1, unless the HTML file itself (or the HTTP header, but we don't have that here) overrides this. When pressing F3, the external lynx or w3m takes this into account, and converts the accents accordingly, in this case from the input Latin-1 (since undefined) to UTF-8 (the system's locale settings). Note that if you view your file in any graphical browser, using file:/ URL scheme, it should also show incorrect accents ("test échange paraître l'ouïe"), clearly proving that your HTML file is incorrect.

The HTML file needs to specify its actual encoding correctly, using a line such as

<meta http-equiv="content-type" content="text/html; charset=UTF-8">

and this should match the one you set with Alt+E (and since we're in the 21st century, I don't see any reason to choose anything other than UTF-8).

Unfortunately "links" is stupid not to know how to figure out the system's encoding, so it drops the accents by default. What MC could/should do is (1) favor "elinks", "lynx", "w3m", since they all do this correctly, (2) pass a -codepage param to "links", somehow magically putting the actual codeset there, or do some other magic like "links -dump -codepage UTF-8 %f | iconv -f UTF-8 -c" to convert to the system's default charset, without having an argument placeholder that is substituted with the charset's name.

comment:6 Changed 12 years ago by slavazanko

  • Status changed from new to closed
  • Resolution set to invalid

comment:7 Changed 8 years ago by andrew_b

  • Milestone Future Releases deleted
Note: See TracTickets for help on using tickets.