Ticket #3529 (new defect)
Search does not always respect the chosen codepage
Reported by: | egmont | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | Future Releases |
Component: | mc-core | Version: | master |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Branch state: | no branch | Votes for changeset: |
Description
Create a file that uses the Latin-1 or Latin-2 encoding and contains the letter "é" a couple of times (among other characters as well). Note that "é"'s codepoint is the same in Latin-1 and Latin-2.
Change your terminal's charset to Latin-1. Start "LC_ALL=en_US bash". Run "locale charmap" to verify that the charset is indeed ISO-8859-1. (If you don't have the en_US locale installed, you might need to run "sudo locale-gen en_US" or something similar depending on your distro.)
Start mcview with the file created above. Press Alt-E to verify that mcview assumes the file is encoded in UTF-8. Accordingly, each "é" is visually replaced by a dot.
Use F7 to search for "é". No match, as expected.
Change the file's charset to ISO 8859-1. The é's appear in the file.
Search (F7) for "é". No match. Expected: match the é's.
Change the file's charset to ISO 8859-2. The é's remain unchanged.
Search (F7) for "é". Matches, as expected.
Change the file's charset back to ISO-8859-1. The é's remain unchanged.
Search (F7) for "é". Matches, as expected (although 2 steps before it didn't match under the same circumstances).
Change the file's charset to ISO 8859-5. The é's are replaced by dots.
Search (F7) for "é". Matches the corresponding dot. Expected: not to match.
Change the file's charset back to UTF-8. é's are still replaced by dots.
Search again (this time with the 'n' key): Matches the dots. Expected: not to match.
Search with F7: Does not match, as expected.
---
It looks to me that there are perhaps two underlying bugs:
- The file's selected codepage is not always taken into account. The behavior even depends on which codepage was selected previously.
- Pressing 'n' does not convert the internal search pattern accordingly to the charset change (this one's also reproducible with UTF-8 locale and terminal); opening the F7 dialog does.
Expected behavior: in all cases, searching should happen accordingly to the selected codepage, that is, the file's current look.