Ticket #3529 (new defect)

Opened 4 years ago

Search does not always respect the chosen codepage

Reported by: egmont Owned by:
Priority: major Milestone: Future Releases
Component: mc-search Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

Create a file that uses the Latin-1 or Latin-2 encoding and contains the letter "é" a couple of times (among other characters as well). Note that "é"'s codepoint is the same in Latin-1 and Latin-2.

Change your terminal's charset to Latin-1. Start "LC_ALL=en_US bash". Run "locale charmap" to verify that the charset is indeed ISO-8859-1. (If you don't have the en_US locale installed, you might need to run "sudo locale-gen en_US" or something similar depending on your distro.)

Start mcview with the file created above. Press Alt-E to verify that mcview assumes the file is encoded in UTF-8. Accordingly, each "é" is visually replaced by a dot.

Use F7 to search for "é". No match, as expected.

Change the file's charset to ISO 8859-1. The é's appear in the file.

Search (F7) for "é". No match. Expected: match the é's.

Change the file's charset to ISO 8859-2. The é's remain unchanged.

Search (F7) for "é". Matches, as expected.

Change the file's charset back to ISO-8859-1. The é's remain unchanged.

Search (F7) for "é". Matches, as expected (although 2 steps before it didn't match under the same circumstances).

Change the file's charset to ISO 8859-5. The é's are replaced by dots.

Search (F7) for "é". Matches the corresponding dot. Expected: not to match.

Change the file's charset back to UTF-8. é's are still replaced by dots.

Search again (this time with the 'n' key): Matches the dots. Expected: not to match.

Search with F7: Does not match, as expected.

---

It looks to me that there are perhaps two underlying bugs:

  • The file's selected codepage is not always taken into account. The behavior even depends on which codepage was selected previously.
  • Pressing 'n' does not convert the internal search pattern accordingly to the charset change (this one's also reproducible with UTF-8 locale and terminal); opening the F7 dialog does.

Expected behavior: in all cases, searching should happen accordingly to the selected codepage, that is, the file's current look.

Note: See TracTickets for help on using tickets.