| 1 |
lars |
1 |
# $Id: README,v 1.1 2007/06/03 02:35:28 ssttoo Exp $
|
|
|
2 |
|
|
|
3 |
Introduction
|
|
|
4 |
============
|
|
|
5 |
|
|
|
6 |
Text_Highlighter is a class for syntax highlighting. The main idea is to
|
|
|
7 |
simplify creation of subclasses implementing syntax highlighting for
|
|
|
8 |
particular language. Subclasses do not implement any new functioanality, they
|
|
|
9 |
just provide syntax highlighting rules. The rules sources are in XML format.
|
|
|
10 |
To create a highlighter for a language, there is no need to code a new class
|
|
|
11 |
manually. Simply describe the rules in XML file and use Text_Highlighter_Generator
|
|
|
12 |
to create a new class.
|
|
|
13 |
|
|
|
14 |
|
|
|
15 |
This document does not contain a formal description of API - it is very
|
|
|
16 |
simple, and I believe providing some examples of code is sufficient.
|
|
|
17 |
|
|
|
18 |
|
|
|
19 |
Highlighter XML source
|
|
|
20 |
======================
|
|
|
21 |
|
|
|
22 |
Basics
|
|
|
23 |
------
|
|
|
24 |
|
|
|
25 |
Creating a new syntax highlighter begins with describing the highlighting
|
|
|
26 |
rules. There are two basic elements: block and region. A block is just a
|
|
|
27 |
portion of text matching a regular expression and highlighted with a single
|
|
|
28 |
color. Keyword is an example of a block. A region is defined by two regular
|
|
|
29 |
expressions: one for start of region, and another for the end. The main
|
|
|
30 |
difference from a block is that a region can contain blocks and regions
|
|
|
31 |
(including same-named regions). An example of a region is a group of
|
|
|
32 |
statements enclosed in curly brackets (this is used in many languages, for
|
|
|
33 |
example PHP and C). Also, characters matching start and end of a region may be
|
|
|
34 |
highlighted with their own color, and region contents with another.
|
|
|
35 |
|
|
|
36 |
Blocks and regions may be declared as contained. Contained blocks and regions
|
|
|
37 |
can only appear inside regions. If a region or a block is not declared as
|
|
|
38 |
contained, it can appear both on top level and inside regions. Block or region
|
|
|
39 |
declared as not-contained can only appear on top level.
|
|
|
40 |
|
|
|
41 |
For any region, a list of blocks and regions that can appear inside this
|
|
|
42 |
region can be specified.
|
|
|
43 |
|
|
|
44 |
In this document, the term "color group" is used. Chunks of text assigned to
|
|
|
45 |
same color group will be highlighted with same color. Note that in versions
|
|
|
46 |
prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only
|
|
|
47 |
HTML output is supported, so "color group" is more appropriate term.
|
|
|
48 |
|
|
|
49 |
Elements
|
|
|
50 |
--------
|
|
|
51 |
|
|
|
52 |
The toplevel element is <highlight>. Attribute lang is required and denotes
|
|
|
53 |
the name of the language. Its value is used as a part of generated class name,
|
|
|
54 |
and must only contain letters, digits and underscores. Optional attribute
|
|
|
55 |
case, when given value yes, makes the language case sensitive (default is case
|
|
|
56 |
insensitive). Allowed subelements are:
|
|
|
57 |
|
|
|
58 |
* <authors>: Information about the authors of the file.
|
|
|
59 |
<author>: Information about a single author of the file. (May be used
|
|
|
60 |
multiple times, one per author.)
|
|
|
61 |
- name="...": Author's name. Required.
|
|
|
62 |
- email="...": Author's email address. Optional.
|
|
|
63 |
|
|
|
64 |
* <default>: Default color group.
|
|
|
65 |
- innerGroup="...": color group name. Required.
|
|
|
66 |
|
|
|
67 |
* <region>: Region definition
|
|
|
68 |
- name="...": Region name. Required.
|
|
|
69 |
- innerGroup="...": Default color group of region contents. Required.
|
|
|
70 |
- delimGroup="...": color group of start and end of region. Optional,
|
|
|
71 |
defaults to value of innerGroup attribute.
|
|
|
72 |
- start="...", end="...": Regular expression matching start and end
|
|
|
73 |
of region. Required. Regular expression delimiters are optional, but
|
|
|
74 |
if you need to specify delimiter, use /. The only case when the
|
|
|
75 |
delimiters are needed, is specifying regular expression modifiers,
|
|
|
76 |
such as m or U. Examples: \/\* or /$/m.
|
|
|
77 |
- contained="yes": Marks region as contained.
|
|
|
78 |
- never-contained="yes": Marks region as not-contained.
|
|
|
79 |
- <contains>: Elements allowed inside this region.
|
|
|
80 |
- all="yes" Region can contain any other region or block
|
|
|
81 |
(except not-contained). May be used multiple times.
|
|
|
82 |
- <but> Do not allow certain regions or blocks.
|
|
|
83 |
- region="..." Name of region not allowed within
|
|
|
84 |
current region.
|
|
|
85 |
- block="..." Name of block not allowed within
|
|
|
86 |
current region.
|
|
|
87 |
- region="..." Name of region allowed within current region.
|
|
|
88 |
- block="..." Name of block allowed within current region.
|
|
|
89 |
- <onlyin> Only allow this region within certain regions. May be
|
|
|
90 |
used multiple times.
|
|
|
91 |
- block="..." Name of parent region
|
|
|
92 |
|
|
|
93 |
* <block>: Block definition
|
|
|
94 |
- name="...": Block name. Required.
|
|
|
95 |
- innerGroup="...": color group of block contents. Optional. If not
|
|
|
96 |
specified, color group of parent region or default color group will be
|
|
|
97 |
used. One would only want to omit this attribute if there are
|
|
|
98 |
keyword groups (see below) inherited from this block, and no special
|
|
|
99 |
highlighting should apply when the block does not match the keyword.
|
|
|
100 |
- match="..." Regular expression matching the block. Required.
|
|
|
101 |
Regular expression delimiters are optional, but if you need to
|
|
|
102 |
specify delimiter, use /. The only case when the delimiters are
|
|
|
103 |
needed, is specifying regular expression modifiers, such as m or U.
|
|
|
104 |
Examples: #|\/\/ or /$/m.
|
|
|
105 |
- contained="yes": Marks block as contained.
|
|
|
106 |
- never-contained="yes": Marks block as not-contained.
|
|
|
107 |
- <onlyin> Only allow this block within certain regions. May be used
|
|
|
108 |
multiple times.
|
|
|
109 |
- block="..." Name of parent region
|
|
|
110 |
- multiline="yes": Marks block as multi-line. By default, whole
|
|
|
111 |
blocks are assumed to reside in a single line. This make the things
|
|
|
112 |
faster. If you need to declare a multi-line block, use this
|
|
|
113 |
attribute.
|
|
|
114 |
- <partgroup>: Assigns another color group to a part of the block that
|
|
|
115 |
matched a subpattern.
|
|
|
116 |
- index="n": Subpattern index. Required.
|
|
|
117 |
- innerGroup="...": color group name. Required.
|
|
|
118 |
|
|
|
119 |
This is an example from CSS highlighter: the measure is matched as
|
|
|
120 |
a whole, but the measurement units are highlighted with different
|
|
|
121 |
color.
|
|
|
122 |
|
|
|
123 |
<block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"
|
|
|
124 |
innerGroup="number" contained="yes">
|
|
|
125 |
<onlyin region="property"/>
|
|
|
126 |
<partGroup index="1" innerGroup="string" />
|
|
|
127 |
</block>
|
|
|
128 |
|
|
|
129 |
* <keywords>: Keyword group definition. Keyword groups are useful when you
|
|
|
130 |
want to highlight some words that match a condition for a block with a
|
|
|
131 |
different color. Keywords are defined with literal match, not regular
|
|
|
132 |
expressions. For example, you have a block named identifier matching a
|
|
|
133 |
general identifier, and want to highlight reserved words (which match
|
|
|
134 |
this block as well) with different color. You inherit a keyword group
|
|
|
135 |
"reserved" from "identifier" block.
|
|
|
136 |
- name="...": Keyword group. Required.
|
|
|
137 |
- ifdef="...", ifndef="..." : Conditional declaration. See
|
|
|
138 |
"Conditions" below.
|
|
|
139 |
- inherits="...": Inherited block name. Required.
|
|
|
140 |
- innerGroup="...": color group of keyword group. Required.
|
|
|
141 |
- case="yes|no": Overrides case-sensitivity of the language.
|
|
|
142 |
Optional, defaults to global value.
|
|
|
143 |
- <keyword>: Single keyword definition.
|
|
|
144 |
- match="..." The keyword. Note: this is not a regular
|
|
|
145 |
expression, but literal match (possibly case insensitive).
|
|
|
146 |
|
|
|
147 |
Note that for BC reasons element partClass is alias for partGroup, and
|
|
|
148 |
attributes innerClass and delimClass are aliases of innerGroup and
|
|
|
149 |
delimGroup, respectively.
|
|
|
150 |
|
|
|
151 |
|
|
|
152 |
Conditions
|
|
|
153 |
----------
|
|
|
154 |
|
|
|
155 |
Conditional declarations allow enabling or disabling certain highlighting
|
|
|
156 |
rules at runtime. For example, Java highlighter has a very big list of
|
|
|
157 |
keywords matching Java standard classes. Finding a match in this list can take
|
|
|
158 |
much time. For that reason, corresponding keyword group is declared with
|
|
|
159 |
"ifdef" attribute :
|
|
|
160 |
|
|
|
161 |
<keywords name="builtin" inherits="identifier" innerClass="builtin"
|
|
|
162 |
case="yes" ifdef="java.builtins">
|
|
|
163 |
<keyword match="AbstractAction" />
|
|
|
164 |
<keyword match="AbstractBorder" />
|
|
|
165 |
<keyword match="AbstractButton" />
|
|
|
166 |
...
|
|
|
167 |
...
|
|
|
168 |
<keyword match="_Remote_Stub" />
|
|
|
169 |
<keyword match="_ServantActivatorStub" />
|
|
|
170 |
<keyword match="_ServantLocatorStub" />
|
|
|
171 |
</keywords>
|
|
|
172 |
|
|
|
173 |
This keyword group will be only enabled when "java.builtins" is passed as an
|
|
|
174 |
element of "defines" option:
|
|
|
175 |
|
|
|
176 |
$options = array(
|
|
|
177 |
'defines' => array(
|
|
|
178 |
'java.builtins',
|
|
|
179 |
),
|
|
|
180 |
'numbers' => HL_NUMBERS_TABLE,
|
|
|
181 |
);
|
|
|
182 |
$highlighter =& Text_Highlighter::factory('java', $options);
|
|
|
183 |
|
|
|
184 |
"ifndef" attribute has reverse meaning.
|
|
|
185 |
|
|
|
186 |
Currently, "ifdef" and "ifndef" attributes are only supported for <keywords>
|
|
|
187 |
tag.
|
|
|
188 |
|
|
|
189 |
|
|
|
190 |
|
|
|
191 |
Class generation
|
|
|
192 |
================
|
|
|
193 |
|
|
|
194 |
Creating XML description of highlighting rules is the most complicated part of
|
|
|
195 |
the process. To generate the class, you need just few lines of code:
|
|
|
196 |
|
|
|
197 |
<?php
|
|
|
198 |
require_once 'Text/Highlighter/Generator.php';
|
|
|
199 |
$generator =& new Text_Highlighter_Generator('php.xml');
|
|
|
200 |
$generator->generate();
|
|
|
201 |
$generator->saveCode('PHP.php');
|
|
|
202 |
?>
|
|
|
203 |
|
|
|
204 |
|
|
|
205 |
|
|
|
206 |
Command-line class generation tool
|
|
|
207 |
==================================
|
|
|
208 |
|
|
|
209 |
Example from previous section looks pretty simple, but it does not handle any
|
|
|
210 |
errors which may occur during parsing of XML source. The package provides a
|
|
|
211 |
command-line script to make generation of classes even more simple, and takes
|
|
|
212 |
care of possible errors. It is called generate (on Unix/Linux) or generate.bat
|
|
|
213 |
(on Windows). This script is able to process multiple files in one run, and
|
|
|
214 |
also to process XML from standard input and write generated code to standard
|
|
|
215 |
output.
|
|
|
216 |
|
|
|
217 |
Usage:
|
|
|
218 |
generate options
|
|
|
219 |
|
|
|
220 |
Options:
|
|
|
221 |
-x filename, --xml=filename
|
|
|
222 |
source XML file. Multiple input files can be specified, in which
|
|
|
223 |
case each -x option must be followed by -p unless -d is specified
|
|
|
224 |
Defaults to stdin
|
|
|
225 |
-p filename, --php=filename
|
|
|
226 |
destination PHP file. Defaults to stdout. If specied multiple times,
|
|
|
227 |
each -p must follow -x
|
|
|
228 |
-d dirname, --dir=dirname
|
|
|
229 |
Default destination directory. File names will be taken from XML input
|
|
|
230 |
("lang" attribute of <highlight> tag)
|
|
|
231 |
-h, --help
|
|
|
232 |
This help
|
|
|
233 |
|
|
|
234 |
Examples
|
|
|
235 |
|
|
|
236 |
Read from php.xml, write to PHP.php
|
|
|
237 |
|
|
|
238 |
generate -x php.xml -p PHP.php
|
|
|
239 |
|
|
|
240 |
Read from php.xml, write to standard output
|
|
|
241 |
|
|
|
242 |
generate -x php.xml
|
|
|
243 |
|
|
|
244 |
Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php
|
|
|
245 |
|
|
|
246 |
generate -x php.xml -p PHP.php -x xml.xml -p XML.php
|
|
|
247 |
|
|
|
248 |
Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to
|
|
|
249 |
/some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, and
|
|
|
250 |
php.xml contains <highlight lang="php">)
|
|
|
251 |
|
|
|
252 |
generate -x php.xml -x xml.xml -d /some/dir/
|
|
|
253 |
|
|
|
254 |
|
|
|
255 |
|
|
|
256 |
Renderers
|
|
|
257 |
=========
|
|
|
258 |
|
|
|
259 |
Introduction
|
|
|
260 |
------------
|
|
|
261 |
|
|
|
262 |
Text_Highlighter supports renderes. Using renderers, you can get output in
|
|
|
263 |
different formats. Two renderers are included in the package:
|
|
|
264 |
|
|
|
265 |
- HTML renderer. Generates HTML output. A style sheet should be linked to
|
|
|
266 |
the document to display colored text
|
|
|
267 |
|
|
|
268 |
- Console renderer. Can be used to output highlighted text to
|
|
|
269 |
color-capable terminals, either directly or trough less -r
|
|
|
270 |
|
|
|
271 |
|
|
|
272 |
Renderers API
|
|
|
273 |
-------------
|
|
|
274 |
|
|
|
275 |
Renderers are subclasses of Text_Highlighter_Renderer. Renderer should
|
|
|
276 |
override at least two methods - acceptToken and getOutput. Overriding other
|
|
|
277 |
methods is optional, depending on the nature of renderer's output and details
|
|
|
278 |
of implementation.
|
|
|
279 |
|
|
|
280 |
string reset()
|
|
|
281 |
resets renderer state. This method is called every time before a new
|
|
|
282 |
source file is highlighted.
|
|
|
283 |
|
|
|
284 |
string preprocess(string $code)
|
|
|
285 |
preprocesses code. Can be used, for example, to normalize whitespace
|
|
|
286 |
before highlighting. Returns preprocessed string.
|
|
|
287 |
|
|
|
288 |
void acceptToken(string $group, string $content)
|
|
|
289 |
the core method of the renderer. Highlighter passes chunks of text to
|
|
|
290 |
this method in $content, and color group in $group
|
|
|
291 |
|
|
|
292 |
void finalize()
|
|
|
293 |
signals the renderer that no more tokens are available.
|
|
|
294 |
|
|
|
295 |
mixed getOutput()
|
|
|
296 |
returns generated output.
|
|
|
297 |
|
|
|
298 |
|
|
|
299 |
Setting renderer options
|
|
|
300 |
--------------------------------
|
|
|
301 |
|
|
|
302 |
Renderers accept an optional argument to their constructor - options array.
|
|
|
303 |
Elements of this array are renderer-specific.
|
|
|
304 |
|
|
|
305 |
HTML renderer
|
|
|
306 |
-------------
|
|
|
307 |
|
|
|
308 |
HTML renderer produces HTML output with optional line numbering. The renderer
|
|
|
309 |
itself does not provide information about actual colors of highlighted text.
|
|
|
310 |
Instead, <span class="hl-XXX"> is used, where XXX is replaced with color group
|
|
|
311 |
name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet.
|
|
|
312 |
If 'use_language' option with value evaluating to true was passed, class names
|
|
|
313 |
will be formatted as "hl-LANG-XXX", where LANG is language name as defined in
|
|
|
314 |
highlighter XML source ("lang" attribute of <highlight> tag) in lower case.
|
|
|
315 |
|
|
|
316 |
There are 3 special CSS classes:
|
|
|
317 |
|
|
|
318 |
hl-main - this class applies to whole output or right table column,
|
|
|
319 |
depending on 'numbers' option
|
|
|
320 |
hl-gutter - applies to left column in table
|
|
|
321 |
hl-table - applies to whole table
|
|
|
322 |
|
|
|
323 |
HTML renderer accepts following options (each being optional):
|
|
|
324 |
|
|
|
325 |
* numbers - line numbering style.
|
|
|
326 |
|
|
|
327 |
HL_NUMBERS_LI - use <ol></ol> for line numbering
|
|
|
328 |
HL_NUMBERS_TABLE - create a 2-column table, with line numbers in left
|
|
|
329 |
column and highlighted text in right column
|
|
|
330 |
|
|
|
331 |
* tabsize - tabulation size. Defaults to 4
|
|
|
332 |
|
|
|
333 |
Example:
|
|
|
334 |
|
|
|
335 |
require_once 'Text/Highlighter/Renderer/Html.php';
|
|
|
336 |
$options = array(
|
|
|
337 |
'numbers' => HL_NUMBERS_LI,
|
|
|
338 |
'tabsize' => 8,
|
|
|
339 |
);
|
|
|
340 |
$renderer =& new Text_Highlighter_Renderer_HTML($options);
|
|
|
341 |
|
|
|
342 |
Console renderer
|
|
|
343 |
----------------
|
|
|
344 |
|
|
|
345 |
Console renderer produces output for displaying on a color-capable terminal,
|
|
|
346 |
either directly or through less -r, using ANSI escape sequences. By default,
|
|
|
347 |
this renderer only highlights most common color groups. Additional colors
|
|
|
348 |
can be specified using 'colors' option. This renderer also accepts 'numbers'
|
|
|
349 |
option - a boolean value, and 'tabsize' option.
|
|
|
350 |
|
|
|
351 |
Example :
|
|
|
352 |
|
|
|
353 |
require_once 'Text/Highlighter/Renderer/Console.php';
|
|
|
354 |
$colors = array(
|
|
|
355 |
'prepro' => "\033[35m",
|
|
|
356 |
'types' => "\033[32m",
|
|
|
357 |
);
|
|
|
358 |
$options = array(
|
|
|
359 |
'numbers' => true,
|
|
|
360 |
'tabsize' => 8,
|
|
|
361 |
'colors' => $colors,
|
|
|
362 |
);
|
|
|
363 |
$renderer =& new Text_Highlighter_Renderer_Console($options);
|
|
|
364 |
|
|
|
365 |
|
|
|
366 |
ANSI color escape sequences have the following format:
|
|
|
367 |
|
|
|
368 |
ESC[#;#;....;#m
|
|
|
369 |
|
|
|
370 |
where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is
|
|
|
371 |
one of the following:
|
|
|
372 |
|
|
|
373 |
|
|
|
374 |
1 for bold on
|
|
|
375 |
4 underline (mono only)
|
|
|
376 |
5 blink on
|
|
|
377 |
7 reverse video on
|
|
|
378 |
8 nondisplayed (invisible)
|
|
|
379 |
30 black foreground
|
|
|
380 |
31 red foreground
|
|
|
381 |
32 green foreground
|
|
|
382 |
33 yellow foreground
|
|
|
383 |
34 blue foreground
|
|
|
384 |
35 magenta foreground
|
|
|
385 |
36 cyan foreground
|
|
|
386 |
37 white foreground
|
|
|
387 |
40 black background
|
|
|
388 |
41 red background
|
|
|
389 |
42 green background
|
|
|
390 |
43 yellow background
|
|
|
391 |
44 blue background
|
|
|
392 |
45 magenta background
|
|
|
393 |
46 cyan background
|
|
|
394 |
47 white background
|
|
|
395 |
|
|
|
396 |
|
|
|
397 |
How to use Text_Highlighter class
|
|
|
398 |
=================================
|
|
|
399 |
|
|
|
400 |
Creating a highlighter object
|
|
|
401 |
-----------------------------
|
|
|
402 |
|
|
|
403 |
To create a highlighter for a certain language, use Text_Highlighter::factory()
|
|
|
404 |
static method:
|
|
|
405 |
|
|
|
406 |
require_once 'Text/Highlighter.php';
|
|
|
407 |
$hl =& Text_Highlighter::factory('php');
|
|
|
408 |
|
|
|
409 |
|
|
|
410 |
Setting a renderer
|
|
|
411 |
------------------
|
|
|
412 |
|
|
|
413 |
Actual output is produced by a renderer.
|
|
|
414 |
|
|
|
415 |
require_once 'Text/Highlighter.php';
|
|
|
416 |
require_once 'Text/Highlighter/Renderer/Html.php';
|
|
|
417 |
$options = array(
|
|
|
418 |
'numbers' => HL_NUMBERS_LI,
|
|
|
419 |
'tabsize' => 8,
|
|
|
420 |
);
|
|
|
421 |
$renderer =& new Text_Highlighter_Renderer_HTML($options);
|
|
|
422 |
$hl =& Text_Highlighter::factory('php');
|
|
|
423 |
$hl->setRenderer($renderer);
|
|
|
424 |
|
|
|
425 |
Note that for BC reasons, it is possible to use highlighter without setting a
|
|
|
426 |
renderer. If no renderer is set, HTML renderer will be used by default. In
|
|
|
427 |
this case, you should pass options as second parameter to factory method. The
|
|
|
428 |
following example works exactly as previous one:
|
|
|
429 |
|
|
|
430 |
require_once 'Text/Highlighter.php';
|
|
|
431 |
$options = array(
|
|
|
432 |
'numbers' => HL_NUMBERS_LI,
|
|
|
433 |
'tabsize' => 8,
|
|
|
434 |
);
|
|
|
435 |
$hl =& Text_Highlighter::factory('php', $options);
|
|
|
436 |
|
|
|
437 |
|
|
|
438 |
Getting output
|
|
|
439 |
--------------
|
|
|
440 |
|
|
|
441 |
And finally, do the highlighting and get the output:
|
|
|
442 |
|
|
|
443 |
require_once 'Text/Highlighter.php';
|
|
|
444 |
require_once 'Text/Highlighter/Renderer/Html.php';
|
|
|
445 |
$options = array(
|
|
|
446 |
'numbers' => HL_NUMBERS_LI,
|
|
|
447 |
'tabsize' => 8,
|
|
|
448 |
);
|
|
|
449 |
$renderer =& new Text_Highlighter_Renderer_HTML($options);
|
|
|
450 |
$hl =& Text_Highlighter::factory('php');
|
|
|
451 |
$hl->setRenderer($renderer);
|
|
|
452 |
$html = $hl->highlight(file_get_contents('example.php'));
|
|
|
453 |
|
|
|
454 |
# vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */
|
|
|
455 |
|