Ticket #388 (closed defect: fixed)
problem with input charset in new files (internal editor)
Reported by: | mnk | Owned by: | slavazanko |
---|---|---|---|
Priority: | major | Milestone: | 4.7 |
Component: | mcedit | Version: | master |
Keywords: | commited-master | Cc: | pahan@… |
Blocked By: | Blocking: | ||
Branch state: | Votes for changeset: |
Description
While things work in utf8 locale as they should with F4,
they don't with F14.
But the fix is seems rather trivial.
However, I noticed something, that I'm not sure,
wherever or not should be broken: it's a problem with
double-byte locales, i.e. ja_JP.
While things get displayed correctly in ja_JP.utf8,
ja_JP is treated as a single-byte locale, that is
mc is printing each byte of two byte char.
Attachments
Change History
Changed 16 years ago by mnk
- Attachment new-file-fix.patch added
comment:1 Changed 16 years ago by andrew_b
- Version changed from 4.6.2 to master
- Component changed from mc-core to mcedit
comment:2 Changed 16 years ago by angel_il
you need select codeset in editor (ctrl-t), next step save settings in editor menu.
comment:3 Changed 16 years ago by mnk
You either failed to understand me or
I've been too vague. When I said
"things work in utf8 locale as they should with F4",
I've meant "both internal edit and viewer work correctly
with already existing files (as codepage is set correctly)",
but for new files, this bug happens.
comment:4 in reply to: ↑ description Changed 16 years ago by slyfox
Replying to mnk:
While things work in utf8 locale as they should with F4,
they don't with F14.
But the fix is seems rather trivial.
Yes, default input is definetly broken in this mode.
However, I noticed something, that I'm not sure,
wherever or not should be broken: it's a problem with
double-byte locales, i.e. ja_JP.
While things get displayed correctly in ja_JP.utf8,
ja_JP is treated as a single-byte locale, that is
mc is printing each byte of two byte char.
Can You attach sample text file in your default encoding of ja_JP locale?
And please, post result of get_locale.sh program.
Thanks!
comment:5 follow-up: ↓ 6 Changed 16 years ago by mnk
I think, I used too few words in that case too.
On unrelated note: there's no such script in mc git tree
and it's not on my system, so you're probably talking
about something specific to your distro.
Obviously, not-utf8 default for ja_JP is EUC-JP (well, perhaps
it's EUC-JP-MS, but I doubt it). I was not talking about edit/view
this time (though that may be broken too - didn't test it yet),
it was about menu names and i.e. dates.
comment:6 in reply to: ↑ 5 Changed 16 years ago by slyfox
Replying to mnk:
I think, I used too few words in that case too.
On unrelated note: there's no such script in mc git tree
and it's not on my system, so you're probably talking
about something specific to your distro.
All is fine. I meant panel display too.
Forgot to mention, I've just attached it to this bug: http://midnight-commander.org/attachment/ticket/388/get_locale.sh
I use gentoo and I'd like to generate the same locale.
Obviously, not-utf8 default for ja_JP is EUC-JP (well, perhaps
it's EUC-JP-MS, but I doubt it). I was not talking about edit/view
this time (though that may be broken too - didn't test it yet),
it was about menu names and i.e. dates.
Currently we have hardcoded list of multibyte encodings in-source:
src/strutil.c:
... static const char *str_utf8_encodings[] = { ...
You might try to add Your charset there and see what will happen.
comment:7 Changed 16 years ago by mnk
Well, I use Gentoo too and I simply let it generate all
locales (yes, it takes a while, so what ?)
'LANG=ja_JP locale -k charmap' gives EUC-JP.
I'll think about str_utf8_encodings. After all,
I don't even know Japanese, so it's not a priority for me.
But it would be nice if it would work.
On a different unrelated note: as you've already dropped
glib1, perhaps next thing to consider would be bumping
that dependency and using g_option* stuff, instead of
embedded popt ?
comment:8 Changed 16 years ago by mnk
It doesn't look like that could work - everything related
to str_utf8_encodings is meant for utf8 only.
EUC_JP is double-byte, but not utf8 -
adding it thre would just break things horribly.
Encodings like EUC-JP, GB2312, BIG5 need a special case
for them (probably one case for all, not for each).
comment:9 Changed 16 years ago by slavazanko
hm... interest case. How work with these encodings?
Such as: getting of string length - not byte-lenght; getting wide of one char in bytes, move forward, move backward... need to know how realize all interface functions described into src/strutils.c file
If this will done - we will have full support of these encodings.
comment:10 Changed 16 years ago by mnk
The best, I can do here, is to direct you to
http://en.wikipedia.org/wiki/Extended_Unix_Code
with added note:
ja_JP - EUC-JP
ko_KR - EUC-KR
zh_CN - GB2312 (EUC-CN compliant)
zh_TW - BIG5 (not EUC-TW compliant - separate page about it)
Though, I think better pages could be found about this problem.
comment:11 follow-up: ↓ 12 Changed 16 years ago by slavazanko
- Keywords review added
- Status changed from new to accepted
- Owner set to slavazanko
created branch: 388_charset_in_new_files (parent: master)
initial commit: changeset:559e2138e02ddb211b57c7145cdd2410a522fcb1
Review, pls.
P.S. Need to move in separate ticket EUC native support stuff.
comment:12 in reply to: ↑ 11 Changed 16 years ago by andrew_b
Replying to slavazanko:
initial commit: changeset:559e2138e02ddb211b57c7145cdd2410a522fcb1
No need to call get_codepage_id() twice. :)
comment:13 Changed 16 years ago by slavazanko
branch rebased. New initial commit: changeset:5e032e27d23e7ee368bf2dc4fe42f8e6784fb56b
Review.
comment:14 follow-up: ↓ 15 Changed 16 years ago by mnk
And what about embedded popt stuff ?
Rejected/delayed/accepted ?
comment:15 in reply to: ↑ 14 Changed 16 years ago by andrew_b
comment:19 Changed 16 years ago by slavazanko
- Status changed from accepted to testing
- Keywords commited-master added; vote-slyfox vote-andrew_b approved removed
- Resolution set to fixed
fix for F14 problem