Ticket #1730 (closed defect: fixed)

Opened 15 years ago

Last modified 13 years ago

troubles in Viewer with utf8 and widechars

Reported by: egmont Owned by: angel_il
Priority: major Milestone: 4.8.1
Component: mcview Version: master
Keywords: Cc: galtgendo@…
Blocked By: Blocking:
Branch state: merged Votes for changeset: committed-master committed-stable

Description

  • floating wrapping: if you scroll through long CJK text

(meaning - nearly no spaces, almost whole double-width),
you'll see line break of the first line is moving as you scroll
(bad description, see yourself); IIRC, it was that way even before regression

  • 4096 bytes break: every 4096 bytes there's a chance it happens in the middle

of an utf8 char (I'm not sure if this can happen on non-double-width and
don't know yet where does this number come from), leading to a valid char(s)
treated as unprintable

Attachments

mc-4.7.5.5-mcviewutf8fix.patch (936 bytes) - added by sergem 13 years ago.
fix, based on patch from #2372

Change History

comment:1 Changed 15 years ago by mnk

  • Cc galtgendo@… added

comment:2 Changed 15 years ago by mnk

Just a little note:
I suspect that some of those problems can be reproduces in any utf8
text outside of ASCII range (so cyrillic may be affected too).

comment:3 Changed 15 years ago by mnk

  • Summary changed from troubles in Viewer with CJK to troubles in Viewer with utf8 and widechars

FYI: this problem affects ANY utf8 text.
While in single byte locales it's OK to break
file content into arbitrary sized blocks, that's
a wrong thing to do with utf8.

As long as a more than one byte sized utf8 char is put on
the block border, you'll get the second problem.

It looks like there's already a thing that could help with
fixing that (g_utf8_get_char_validated returns -2 on a still valid,
but not yet complete sequence), but I can't figure out yet
the loading part.

comment:4 Changed 13 years ago by egmont

Friendly ping :-) Now that I see work going on again, I'm wondering if anyone feels like taking it.

I was just recently bitten by the bug that UTF-8 characters crossing a 4kB boundary are handled incorrectly in mcview.

Try:
{ for i in $(seq 1 4095); do echo -n x ; done ; echo -e '\0303\0241x\n' ; } > /tmp/x
mcview /tmp/x
enable line wrapping

Expected: see an "á" near the end of the file.
Actual: something else, or two dots.

Last edited 13 years ago by egmont (previous) (diff)

comment:5 Changed 13 years ago by slavazanko

Friendly ping :-)

Friendly pong :)

Now that I see work going on again, I'm wondering if anyone feels like taking it.

Yep, sorry for very long answer - all our forces was in VFS restructurization...

Expected: see an "á" near the end of the file.
Actual: something else, or two dots.

Thanks for detailed test case.

Changed 13 years ago by sergem

fix, based on patch from #2372

comment:6 Changed 13 years ago by andrew_b

  • Branch state set to no branch
  • Milestone changed from 4.7 to Future Releases

comment:7 Changed 13 years ago by angel_il

  • Owner set to angel_il
  • Status changed from new to accepted

comment:8 Changed 13 years ago by angel_il

  • Branch state changed from no branch to on review

branch: 1730_mcview_utf8_fix (parent: master)

Please review

comment:9 Changed 13 years ago by andrew_b

  • Votes for changeset set to andrew_b

comment:10 Changed 13 years ago by slavazanko

  • Votes for changeset changed from andrew_b to andrew_b slavazanko
  • Branch state changed from on review to approved

comment:11 Changed 13 years ago by angel_il

  • Status changed from accepted to testing
  • Votes for changeset changed from andrew_b slavazanko to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

comment:12 Changed 13 years ago by angel_il

  • Status changed from testing to closed

comment:13 Changed 13 years ago by andrew_b

  • Keywords stable-candidate added

comment:14 Changed 13 years ago by angel_il

cherry-picked:
commit 03a2c310ac969826bc64f31355692521d554047c (fixup)
commit 0b8409aa85893613aac4c82e0b85f41ec66fc99c (const UTF8_CHAR_LEN)

Version 0, edited 13 years ago by angel_il (next)

comment:15 Changed 13 years ago by andrew_b

  • Keywords stable-candidate removed
  • Votes for changeset changed from committed-master to committed-master committed-stable

comment:16 Changed 13 years ago by andrew_b

  • Milestone changed from Future Releases to 4.8.1
Note: See TracTickets for help on using tickets.