Ticket #3830 (new task)

Opened 7 years ago

Last modified 7 years ago

mcedit: create a corpus of sample files in various syntaxes for testing purposes

Reported by: mooffie Owned by:
Priority: major Milestone: Future Releases
Component: mcedit Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description (last modified by mooffie) (diff)

(Henceforth, "syntax" == "syntax highlighting".)

Our editor has many syntax definitions (*.syntax files).

If we ever fix things on the C side of mcedit, or modify a syntax definition file, we'll have a problem: since we don't have a collection of sample files, in the various syntaxes, to test our fixes against, we (the maintainers) would have to create these sample files ourselves. And we'd have to create good files: such that demonstrate every nook and cranny in the syntax definitions.

This is a lot of work, so I suggest we start small: have just one or two sample files for now, close this ticket, and add more sample files as time goes by.

To alleviate this burden we ought to make a rule:

Any new syntax definition must be contributed together with a sample file(s). The people writing the syntax files know their language best, so they're the ones who should provide the samples.

Change History

comment:1 Changed 7 years ago by mooffie

Some random notes:

  • The corpus can be placed in tests/src/editor/curpus in the meantime.
  • For documentation purposes, we should also collect files that show imperfections/bugs in our syntax definitions (we could embed the string ".fail." in their filenames). This does not imply that we're intending to fix these imperfections.
  • This ticket does not deal with the testing code itself (regression test; this can be very easily implemented when/if mc has scripting support; mc2 proves this).

comment:2 Changed 7 years ago by mooffie

  • Description modified (diff)

comment:3 follow-up: ↓ 4 Changed 7 years ago by zaytsev

I have to say that I'm always on the side of more tests, but in this particular case I can't help asking if we have a larger problem.

What I mean by that is the last time I had a look at the syntax highlighter code, I've almost got a heart attack. It's compact and ingenious, and it's been actually working for a very long time, but it's all but easily understandable, well documented and properly tested. On top of that it has some genetic deficiencies, like the nested quoting bug. We also ended up having a whole library of highlighting rules, which as you correctly mention are not tested, but also not really maintained.

I've been thinking about it for quite awhile and my thoughts are that we aren't the first project attacking this problem, and there are tons of libraries for that purpose. To name only few I personally used in the past:

Of course, there are good arguments against introducing a dependency on a syntax highlighting library, but maybe there is some middle ground like implement a minimalist engine and automatically generate syntax files from e.g. Pygments collection...

Just thought I'd raise the point before you invest substantial amount of time in testing of the existing highlighter, even though a test corpus would be useful irrespectively of whether highlighter will gets replaced or not.

comment:4 in reply to: ↑ 3 Changed 7 years ago by andrew_b

comment:5 follow-up: ↓ 6 Changed 7 years ago by zaytsev

... not that I'm a huge fan of colorer myself.

It's not very well maintained (it seems that Igor gave up on it a long time ago), there are not so much syntax definitions available, they are mostly not up to date, the definition syntax is blood chilling, and the engine is written in C++ with a dependency on Apache Xerces.

I would really rather look in the direction of GtkSourceView and/or Scintilla.

comment:6 in reply to: ↑ 5 Changed 7 years ago by andrew_b

Replying to zaytsev:

... not that I'm a huge fan of colorer myself.

It's not very well maintained (it seems that Igor gave up on it a long time ago), there are not so much syntax definitions available, they are mostly not up to date, the definition syntax is blood chilling, and the engine is written in C++ with a dependency on Apache Xerces.

So can we close #2931 as wontfix?

comment:7 Changed 7 years ago by zaytsev

I don't know, I personally would rather re-purpose it to integrate an alternative syntax highlighter without specifically naming colorer.

comment:8 Changed 7 years ago by zaytsev

So the cool kids get cool libraries, and all we get is crap:

:-(

comment:9 follow-up: ↓ 10 Changed 7 years ago by teresaejunior

comment:10 in reply to: ↑ 9 Changed 7 years ago by andrew_b

Replying to teresaejunior:

What about this one? http://www.andre-simon.de/doku/highlight/en/changelog.php

This is C++.

comment:11 Changed 7 years ago by zaytsev

It being C++ isn't even the worst part of it :-/ For once, I couldn't find any embedding documentation / API for that one, and it doesn't seem to support incremental highlighting, etc. Apparently it's really geared towards whole-file colorization and thus I don't think it's suitable for integration with the editor, at best, one could try to use it for the viewer to generate colorized version using ANSI output...

comment:12 Changed 7 years ago by ossi

so it seems that re-implementing the kate highlighting engine (now KSyntaxHighlighting) is popular - apart from the haskell-based skylighting mentioned above (and its predecessor highlighting-kate), qt creator also did it (with c++ again), as did Syntax::Highlight::Engine::Kate in perl. with so much code around to rip off from be inspired by, it would be a shame not to re-implement it again, this time in plain c. :D

Note: See TracTickets for help on using tickets.