Ticket #2145 (new defect)

Opened 15 years ago

Last modified 5 years ago

JavaScript highlighting bug in regexp.

Reported by: baryluk Owned by:
Priority: minor Milestone: Future Releases
Component: mcedit Version: master
Keywords: Cc:
Blocked By: #2142 Blocking:
Branch state: no branch Votes for changeset:

Description

Consider this simple example file.js:

function f(a) {
   var r = /'/;
  return r.test(a);
}

This is correct javascript code which search for occurrence of ' or returns null if there is no such.

mcedit incorrectly highlights ' (green) in the /'/, and assumes it is string which continues beyond end of regexp (including comments, and probably to the end of file).

http://smp.if.uj.edu.pl/~baryluk/mcedit-js-highlight-bug/Def_030.png

All examples here: http://smp.if.uj.edu.pl/~baryluk/mcedit-js-highlight-bug/

Thanks.

Change History

comment:1 Changed 15 years ago by andrew_b

  • Version changed from version not selected to master
  • Component changed from mc-config-ini to mcedit

comment:2 Changed 15 years ago by zaytsev

Duplicate: #2142 ?

comment:3 Changed 15 years ago by baryluk

I don't think it is duplicate.

It is different syntax highlighting file and language.
And different problem. It looks that current JS highlighting rules doesn't
include case for regexps at all. Finding regular expression isn't hard,
scan non-comments and non-strings for /\/.+(?<\\)\[igmy]* regexp.

As i understand problem is that / can be both division operator and beginig of the regexp. Simples difference is that before / in regexp we need to have: { or ( or [ or , or ; or = or other oprator (+ - / & &&
| ~). But befor division operator we need to have ) or digit or letter or dot. (We inore of course whitespaces in the match)

.+ becuase 'var a = ;' is illegal as this is empty regexp (would always match), and actually it is a commened ';' string

Version 0, edited 15 years ago by baryluk (next)

comment:4 Changed 15 years ago by baryluk

Regexp should be

/\/.+(?<!\\)\/[igmy]*/

With negative lookbehind and ignore space, to distinguish from the division operator followed by comment:

/(?<=[{(\[,:=\-+\/&|~])[\s\n]*(\/.+(?<!\\)\/[igmy])*/

After further refinmentens (like allowing / in after [ (character class), and then disallowing after [ if it was escaped \[.

/(?<=[{(\[,:=\-+\/&|~!])[\s\n]*(\/(\\\/|[^\[]|(\[.*\/\])|\\\[)+?\/[igmy]*)/

this will match to

var a = /"d"/;

but not to

var a = x / y; // 

PS. Some regexp engine have limited lookbehind capabilities, but this shouldn't be probleme here, as we are only matching only single char before possible regexp (ignoring whitespaces).

But perl gives me correctly some comments (which should are processed on its own, so will not be matched in real life:

cat *.js | perl -n -e 'if (/(?<=[{(\[,:=\-+\/&|~!])[\s\n]*(\/(\\\/|[^\[]|(\[.*\/\])|\\\[)+?\/[igmy]*)/mg) {print $1,"\n";}'

output

/^0+/
/Apple.*Mobile.*Safari/
/^\/
/^\[object\s(.*)\]$/
/\/\/
/\s+/g
/^\s+/
/::/
/_/
/"/g
/'/g
/\\./g
/&/g
/&lt;/g
/(^|.|\r|\n)(#\{(.*?)\})/
/\s+/
/Konqueror|Safari|KHTML/
/Gecko\/(\d{4})/
/^\s*(text|application)\/(x-)?(java|ecma)script(;.*)?\s*$/i
/^\s*https?:\/\/[^\/]*/
/\S/
/opacity:\s*(\d?\.?\d*)/
/alpha\(opacity=(.*)\)/
/rv:1\.8\.0/
/^(?:object|applet|embed)$/i
/\S/
/\S/
/\S/
/\S/
/^(\d+)$/
/"/g
/^\s*~\s*/
/^\s*>\s*/
/^\s*\+\s*/
/^\s/
/^\d+$/
/^(?:button|reset|submit)$/i
/\s/
/\s/
/\s+/
/ 2/
/213/
/213/
/213/
/213/
/213/
/213/
/\/asd\/asd/
/[a/]*/
/\[a\/\]*/
...

looks very well. "..." have some false positives, but it is ONLY becuase there was comments there of the form:

sadasdasd = asdsad ; // cadasdsad http://google.com.

comment:5 Changed 15 years ago by baryluk

Changing not needed groups to be noncapturing (?: ...):

/(?<=[{(\[,:=\-+\/&|~!])[\s\n]*(\/(?:\\\/|[^\[]|(?:\[.*\/\])|\\\[)+?\/[igmy]*)/

brings slight more efficient matching and gives exactly same matche.

Is it possible to use this somehow in the syntax files? Or something more simple and not based on the regexp need to be used?

comment:6 Changed 13 years ago by andrew_b

  • Branch state set to no branch
  • Milestone changed from 4.7 to Future Releases

comment:7 Changed 12 years ago by andrew_b

  • Blocked By 2142 added
Note: See TracTickets for help on using tickets.