Ticket #1730 (closed defect: fixed)
troubles in Viewer with utf8 and widechars
Reported by: | egmont | Owned by: | angel_il |
---|---|---|---|
Priority: | major | Milestone: | 4.8.1 |
Component: | mcview | Version: | master |
Keywords: | Cc: | galtgendo@… | |
Blocked By: | Blocking: | ||
Branch state: | merged | Votes for changeset: | committed-master committed-stable |
Description
- floating wrapping: if you scroll through long CJK text
(meaning - nearly no spaces, almost whole double-width),
you'll see line break of the first line is moving as you scroll
(bad description, see yourself); IIRC, it was that way even before regression
- 4096 bytes break: every 4096 bytes there's a chance it happens in the middle
of an utf8 char (I'm not sure if this can happen on non-double-width and
don't know yet where does this number come from), leading to a valid char(s)
treated as unprintable
Attachments
Change History
comment:2 Changed 15 years ago by mnk
Just a little note:
I suspect that some of those problems can be reproduces in any utf8
text outside of ASCII range (so cyrillic may be affected too).
comment:3 Changed 15 years ago by mnk
- Summary changed from troubles in Viewer with CJK to troubles in Viewer with utf8 and widechars
FYI: this problem affects ANY utf8 text.
While in single byte locales it's OK to break
file content into arbitrary sized blocks, that's
a wrong thing to do with utf8.
As long as a more than one byte sized utf8 char is put on
the block border, you'll get the second problem.
It looks like there's already a thing that could help with
fixing that (g_utf8_get_char_validated returns -2 on a still valid,
but not yet complete sequence), but I can't figure out yet
the loading part.
comment:4 Changed 13 years ago by egmont
Friendly ping :-) Now that I see work going on again, I'm wondering if anyone feels like taking it.
I was just recently bitten by the bug that UTF-8 characters crossing a 4kB boundary are handled incorrectly in mcview.
Try:
{ for i in $(seq 1 4095); do echo -n x ; done ; echo -e '\0303\0241x\n' ; } > /tmp/x
mcview /tmp/x
enable line wrapping
Expected: see an "á" near the end of the file.
Actual: something else, or two dots.
comment:5 Changed 13 years ago by slavazanko
Friendly ping :-)
Friendly pong :)
Now that I see work going on again, I'm wondering if anyone feels like taking it.
Yep, sorry for very long answer - all our forces was in VFS restructurization...
Expected: see an "á" near the end of the file.
Actual: something else, or two dots.
Thanks for detailed test case.
Changed 13 years ago by sergem
- Attachment mc-4.7.5.5-mcviewutf8fix.patch added
fix, based on patch from #2372
comment:6 Changed 13 years ago by andrew_b
- Branch state set to no branch
- Milestone changed from 4.7 to Future Releases
comment:7 Changed 13 years ago by angel_il
- Owner set to angel_il
- Status changed from new to accepted
comment:8 Changed 13 years ago by angel_il
- Branch state changed from no branch to on review
branch: 1730_mcview_utf8_fix (parent: master)
Please review
comment:10 Changed 13 years ago by slavazanko
- Votes for changeset changed from andrew_b to andrew_b slavazanko
- Branch state changed from on review to approved
comment:11 Changed 13 years ago by angel_il
- Status changed from accepted to testing
- Votes for changeset changed from andrew_b slavazanko to committed-master
- Resolution set to fixed
- Branch state changed from approved to merged
comment:14 Changed 13 years ago by angel_il
cherry-picked to 4.7.5-stable:
commit 03a2c310ac969826bc64f31355692521d554047c (fixup)
commit 0b8409aa85893613aac4c82e0b85f41ec66fc99c (const UTF8_CHAR_LEN)
comment:15 Changed 13 years ago by andrew_b
- Keywords stable-candidate removed
- Votes for changeset changed from committed-master to committed-master committed-stable