Ticket #388 (closed defect: fixed)

Opened 10 years ago

Last modified 9 years ago

problem with input charset in new files (internal editor)

Reported by: mnk Owned by: slavazanko
Priority: major Milestone: 4.7
Component: mcedit Version: master
Keywords: commited-master Cc: pahan@…
Blocked By: Blocking:
Branch state: Votes for changeset:

Description

While things work in utf8 locale as they should with F4,
they don't with F14.
But the fix is seems rather trivial.

However, I noticed something, that I'm not sure,
wherever or not should be broken: it's a problem with
double-byte locales, i.e. ja_JP.
While things get displayed correctly in ja_JP.utf8,
ja_JP is treated as a single-byte locale, that is
mc is printing each byte of two byte char.

Attachments

new-file-fix.patch (344 bytes) - added by mnk 10 years ago.
fix for F14 problem
get_locale.sh (411 bytes) - added by slyfox 10 years ago.
locale detector

Change History

Changed 10 years ago by mnk

fix for F14 problem

comment:1 Changed 10 years ago by andrew_b

  • Version changed from 4.6.2 to master
  • Component changed from mc-core to mcedit

comment:2 Changed 10 years ago by angel_il

you need select codeset in editor (ctrl-t), next step save settings in editor menu.

comment:3 Changed 10 years ago by mnk

You either failed to understand me or
I've been too vague. When I said
"things work in utf8 locale as they should with F4",
I've meant "both internal edit and viewer work correctly
with already existing files (as codepage is set correctly)",
but for new files, this bug happens.

Changed 10 years ago by slyfox

locale detector

comment:4 in reply to: ↑ description Changed 10 years ago by slyfox

Replying to mnk:

While things work in utf8 locale as they should with F4,
they don't with F14.
But the fix is seems rather trivial.

Yes, default input is definetly broken in this mode.

However, I noticed something, that I'm not sure,
wherever or not should be broken: it's a problem with
double-byte locales, i.e. ja_JP.
While things get displayed correctly in ja_JP.utf8,
ja_JP is treated as a single-byte locale, that is
mc is printing each byte of two byte char.

Can You attach sample text file in your default encoding of ja_JP locale?
And please, post result of get_locale.sh program.

Thanks!

comment:5 follow-up: ↓ 6 Changed 10 years ago by mnk

I think, I used too few words in that case too.
On unrelated note: there's no such script in mc git tree
and it's not on my system, so you're probably talking
about something specific to your distro.

Obviously, not-utf8 default for ja_JP is EUC-JP (well, perhaps
it's EUC-JP-MS, but I doubt it). I was not talking about edit/view
this time (though that may be broken too - didn't test it yet),
it was about menu names and i.e. dates.

comment:6 in reply to: ↑ 5 Changed 10 years ago by slyfox

Replying to mnk:

I think, I used too few words in that case too.
On unrelated note: there's no such script in mc git tree
and it's not on my system, so you're probably talking
about something specific to your distro.

All is fine. I meant panel display too.

Forgot to mention, I've just attached it to this bug: http://midnight-commander.org/attachment/ticket/388/get_locale.sh
I use gentoo and I'd like to generate the same locale.

Obviously, not-utf8 default for ja_JP is EUC-JP (well, perhaps
it's EUC-JP-MS, but I doubt it). I was not talking about edit/view
this time (though that may be broken too - didn't test it yet),
it was about menu names and i.e. dates.

Currently we have hardcoded list of multibyte encodings in-source:
src/strutil.c:

...
static const char *str_utf8_encodings[] = {
...

You might try to add Your charset there and see what will happen.

comment:7 Changed 10 years ago by mnk

Well, I use Gentoo too and I simply let it generate all
locales (yes, it takes a while, so what ?)
'LANG=ja_JP locale -k charmap' gives EUC-JP.

I'll think about str_utf8_encodings. After all,
I don't even know Japanese, so it's not a priority for me.
But it would be nice if it would work.

On a different unrelated note: as you've already dropped
glib1, perhaps next thing to consider would be bumping
that dependency and using g_option* stuff, instead of
embedded popt ?

comment:8 Changed 10 years ago by mnk

It doesn't look like that could work - everything related
to str_utf8_encodings is meant for utf8 only.
EUC_JP is double-byte, but not utf8 -
adding it thre would just break things horribly.
Encodings like EUC-JP, GB2312, BIG5 need a special case
for them (probably one case for all, not for each).

comment:9 Changed 10 years ago by slavazanko

hm... interest case. How work with these encodings?
Such as: getting of string length - not byte-lenght; getting wide of one char in bytes, move forward, move backward... need to know how realize all interface functions described into src/strutils.c file
If this will done - we will have full support of these encodings.

comment:10 Changed 10 years ago by mnk

The best, I can do here, is to direct you to
http://en.wikipedia.org/wiki/Extended_Unix_Code
with added note:
ja_JP - EUC-JP
ko_KR - EUC-KR
zh_CN - GB2312 (EUC-CN compliant)
zh_TW - BIG5 (not EUC-TW compliant - separate page about it)

Though, I think better pages could be found about this problem.

comment:11 follow-up: ↓ 12 Changed 10 years ago by slavazanko

  • Keywords review added
  • Status changed from new to accepted
  • Owner set to slavazanko

created branch: 388_charset_in_new_files (parent: master)
initial commit: changeset:559e2138e02ddb211b57c7145cdd2410a522fcb1

Review, pls.

P.S. Need to move in separate ticket EUC native support stuff.

comment:12 in reply to: ↑ 11 Changed 10 years ago by andrew_b

Replying to slavazanko:

initial commit: changeset:559e2138e02ddb211b57c7145cdd2410a522fcb1

No need to call get_codepage_id() twice. :)

comment:13 Changed 10 years ago by slavazanko

branch rebased. New initial commit: changeset:5e032e27d23e7ee368bf2dc4fe42f8e6784fb56b

Review.

comment:14 follow-up: ↓ 15 Changed 10 years ago by mnk

And what about embedded popt stuff ?
Rejected/delayed/accepted ?

comment:15 in reply to: ↑ 14 Changed 10 years ago by andrew_b

Replying to mnk:

And what about embedded popt stuff ?
Rejected/delayed/accepted ?

#390

comment:16 Changed 10 years ago by slyfox

  • Keywords vote-slyfox added

comment:17 Changed 10 years ago by Hubbitus

  • Cc pahan@… added

comment:18 Changed 10 years ago by andrew_b

  • Keywords vote-andrew_b approved added; review removed

comment:19 Changed 10 years ago by slavazanko

  • Keywords commited-master added; vote-slyfox vote-andrew_b approved removed
  • Status changed from accepted to testing
  • Resolution set to fixed

comment:20 Changed 10 years ago by slavazanko

  • Status changed from testing to closed
Note: See TracTickets for help on using tickets.