Ticket #2145 (new defect)
JavaScript highlighting bug in regexp.
Reported by: | baryluk | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | Future Releases |
Component: | mcedit | Version: | master |
Keywords: | Cc: | ||
Blocked By: | #2142 | Blocking: | |
Branch state: | no branch | Votes for changeset: |
Description
Consider this simple example file.js:
function f(a) { var r = /'/; return r.test(a); }
This is correct javascript code which search for occurrence of ' or returns null if there is no such.
mcedit incorrectly highlights ' (green) in the /'/, and assumes it is string which continues beyond end of regexp (including comments, and probably to the end of file).
All examples here: http://smp.if.uj.edu.pl/~baryluk/mcedit-js-highlight-bug/
Thanks.
Change History
comment:1 Changed 15 years ago by andrew_b
- Version changed from version not selected to master
- Component changed from mc-config-ini to mcedit
comment:3 Changed 15 years ago by baryluk
I don't think it is duplicate.
It is different syntax highlighting file and language.
And different problem. It looks that current JS highlighting rules doesn't
include case for regexps at all. Finding regular expression isn't hard,
scan non-comments and non-strings for /\/.+(?<!\\)\//[igmy]* regexp.
As i understand problem is that / can be both division operator and beginig of the regexp. Simples difference is that before / in regexp we need to have: { or ( or [ or , or ; or = or other oprator (+ - / & && || | ^ ~). But befor division operator we need to have ) or digit or letter or dot. (We inore of course whitespaces in the match)
.+ becuase 'var a = //;' is illegal as this is empty regexp (would always match), and actually it is a commened ';' string
comment:4 Changed 15 years ago by baryluk
Regexp should be
/\/.+(?<!\\)\/[igmy]*/
With negative lookbehind and ignore space, to distinguish from the division operator followed by comment:
/(?<=[{(\[,:=\-+\/&|~])[\s\n]*(\/.+(?<!\\)\/[igmy])*/
After further refinmentens (like allowing / in after [ (character class), and then disallowing after [ if it was escaped \[.
/(?<=[{(\[,:=\-+\/&|~!])[\s\n]*(\/(\\\/|[^\[]|(\[.*\/\])|\\\[)+?\/[igmy]*)/
this will match to
var a = /"d"/;
but not to
var a = x / y; //
PS. Some regexp engine have limited lookbehind capabilities, but this shouldn't be probleme here, as we are only matching only single char before possible regexp (ignoring whitespaces).
But perl gives me correctly some comments (which should are processed on its own, so will not be matched in real life:
cat *.js | perl -n -e 'if (/(?<=[{(\[,:=\-+\/&|~!])[\s\n]*(\/(\\\/|[^\[]|(\[.*\/\])|\\\[)+?\/[igmy]*)/mg) {print $1,"\n";}'
output
/^0+/ /Apple.*Mobile.*Safari/ /^\/ /^\[object\s(.*)\]$/ /\/\/ /\s+/g /^\s+/ /::/ /_/ /"/g /'/g /\\./g /&/g /</g /(^|.|\r|\n)(#\{(.*?)\})/ /\s+/ /Konqueror|Safari|KHTML/ /Gecko\/(\d{4})/ /^\s*(text|application)\/(x-)?(java|ecma)script(;.*)?\s*$/i /^\s*https?:\/\/[^\/]*/ /\S/ /opacity:\s*(\d?\.?\d*)/ /alpha\(opacity=(.*)\)/ /rv:1\.8\.0/ /^(?:object|applet|embed)$/i /\S/ /\S/ /\S/ /\S/ /^(\d+)$/ /"/g /^\s*~\s*/ /^\s*>\s*/ /^\s*\+\s*/ /^\s/ /^\d+$/ /^(?:button|reset|submit)$/i /\s/ /\s/ /\s+/ / 2/ /213/ /213/ /213/ /213/ /213/ /213/ /\/asd\/asd/ /[a/]*/ /\[a\/\]*/ ...
looks very well. "..." have some false positives, but it is ONLY becuase there was comments there of the form:
sadasdasd = asdsad ; // cadasdsad http://google.com.
comment:5 Changed 15 years ago by baryluk
Changing not needed groups to be noncapturing (?: ...):
/(?<=[{(\[,:=\-+\/&|~!])[\s\n]*(\/(?:\\\/|[^\[]|(?:\[.*\/\])|\\\[)+?\/[igmy]*)/
brings slight more efficient matching and gives exactly same matche.
Is it possible to use this somehow in the syntax files? Or something more simple and not based on the regexp need to be used?