Blame | Letzte Änderung | Log anzeigen | RSS feed
# $Id: README,v 1.1 2007/06/03 02:35:28 ssttoo Exp $Introduction============Text_Highlighter is a class for syntax highlighting. The main idea is tosimplify creation of subclasses implementing syntax highlighting forparticular language. Subclasses do not implement any new functioanality, theyjust provide syntax highlighting rules. The rules sources are in XML format.To create a highlighter for a language, there is no need to code a new classmanually. Simply describe the rules in XML file and use Text_Highlighter_Generatorto create a new class.This document does not contain a formal description of API - it is verysimple, and I believe providing some examples of code is sufficient.Highlighter XML source======================Basics------Creating a new syntax highlighter begins with describing the highlightingrules. There are two basic elements: block and region. A block is just aportion of text matching a regular expression and highlighted with a singlecolor. Keyword is an example of a block. A region is defined by two regularexpressions: one for start of region, and another for the end. The maindifference from a block is that a region can contain blocks and regions(including same-named regions). An example of a region is a group ofstatements enclosed in curly brackets (this is used in many languages, forexample PHP and C). Also, characters matching start and end of a region may behighlighted with their own color, and region contents with another.Blocks and regions may be declared as contained. Contained blocks and regionscan only appear inside regions. If a region or a block is not declared ascontained, it can appear both on top level and inside regions. Block or regiondeclared as not-contained can only appear on top level.For any region, a list of blocks and regions that can appear inside thisregion can be specified.In this document, the term "color group" is used. Chunks of text assigned tosame color group will be highlighted with same color. Note that in versionsprior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not onlyHTML output is supported, so "color group" is more appropriate term.Elements--------The toplevel element is <highlight>. Attribute lang is required and denotesthe name of the language. Its value is used as a part of generated class name,and must only contain letters, digits and underscores. Optional attributecase, when given value yes, makes the language case sensitive (default is caseinsensitive). Allowed subelements are:* <authors>: Information about the authors of the file.<author>: Information about a single author of the file. (May be usedmultiple times, one per author.)- name="...": Author's name. Required.- email="...": Author's email address. Optional.* <default>: Default color group.- innerGroup="...": color group name. Required.* <region>: Region definition- name="...": Region name. Required.- innerGroup="...": Default color group of region contents. Required.- delimGroup="...": color group of start and end of region. Optional,defaults to value of innerGroup attribute.- start="...", end="...": Regular expression matching start and endof region. Required. Regular expression delimiters are optional, butif you need to specify delimiter, use /. The only case when thedelimiters are needed, is specifying regular expression modifiers,such as m or U. Examples: \/\* or /$/m.- contained="yes": Marks region as contained.- never-contained="yes": Marks region as not-contained.- <contains>: Elements allowed inside this region.- all="yes" Region can contain any other region or block(except not-contained). May be used multiple times.- <but> Do not allow certain regions or blocks.- region="..." Name of region not allowed withincurrent region.- block="..." Name of block not allowed withincurrent region.- region="..." Name of region allowed within current region.- block="..." Name of block allowed within current region.- <onlyin> Only allow this region within certain regions. May beused multiple times.- block="..." Name of parent region* <block>: Block definition- name="...": Block name. Required.- innerGroup="...": color group of block contents. Optional. If notspecified, color group of parent region or default color group will beused. One would only want to omit this attribute if there arekeyword groups (see below) inherited from this block, and no specialhighlighting should apply when the block does not match the keyword.- match="..." Regular expression matching the block. Required.Regular expression delimiters are optional, but if you need tospecify delimiter, use /. The only case when the delimiters areneeded, is specifying regular expression modifiers, such as m or U.Examples: #|\/\/ or /$/m.- contained="yes": Marks block as contained.- never-contained="yes": Marks block as not-contained.- <onlyin> Only allow this block within certain regions. May be usedmultiple times.- block="..." Name of parent region- multiline="yes": Marks block as multi-line. By default, wholeblocks are assumed to reside in a single line. This make the thingsfaster. If you need to declare a multi-line block, use thisattribute.- <partgroup>: Assigns another color group to a part of the block thatmatched a subpattern.- index="n": Subpattern index. Required.- innerGroup="...": color group name. Required.This is an example from CSS highlighter: the measure is matched asa whole, but the measurement units are highlighted with differentcolor.<block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"innerGroup="number" contained="yes"><onlyin region="property"/><partGroup index="1" innerGroup="string" /></block>* <keywords>: Keyword group definition. Keyword groups are useful when youwant to highlight some words that match a condition for a block with adifferent color. Keywords are defined with literal match, not regularexpressions. For example, you have a block named identifier matching ageneral identifier, and want to highlight reserved words (which matchthis block as well) with different color. You inherit a keyword group"reserved" from "identifier" block.- name="...": Keyword group. Required.- ifdef="...", ifndef="..." : Conditional declaration. See"Conditions" below.- inherits="...": Inherited block name. Required.- innerGroup="...": color group of keyword group. Required.- case="yes|no": Overrides case-sensitivity of the language.Optional, defaults to global value.- <keyword>: Single keyword definition.- match="..." The keyword. Note: this is not a regularexpression, but literal match (possibly case insensitive).Note that for BC reasons element partClass is alias for partGroup, andattributes innerClass and delimClass are aliases of innerGroup anddelimGroup, respectively.Conditions----------Conditional declarations allow enabling or disabling certain highlightingrules at runtime. For example, Java highlighter has a very big list ofkeywords matching Java standard classes. Finding a match in this list can takemuch time. For that reason, corresponding keyword group is declared with"ifdef" attribute :<keywords name="builtin" inherits="identifier" innerClass="builtin"case="yes" ifdef="java.builtins"><keyword match="AbstractAction" /><keyword match="AbstractBorder" /><keyword match="AbstractButton" />......<keyword match="_Remote_Stub" /><keyword match="_ServantActivatorStub" /><keyword match="_ServantLocatorStub" /></keywords>This keyword group will be only enabled when "java.builtins" is passed as anelement of "defines" option:$options = array('defines' => array('java.builtins',),'numbers' => HL_NUMBERS_TABLE,);$highlighter =& Text_Highlighter::factory('java', $options);"ifndef" attribute has reverse meaning.Currently, "ifdef" and "ifndef" attributes are only supported for <keywords>tag.Class generation================Creating XML description of highlighting rules is the most complicated part ofthe process. To generate the class, you need just few lines of code:<?phprequire_once 'Text/Highlighter/Generator.php';$generator =& new Text_Highlighter_Generator('php.xml');$generator->generate();$generator->saveCode('PHP.php');?>Command-line class generation tool==================================Example from previous section looks pretty simple, but it does not handle anyerrors which may occur during parsing of XML source. The package provides acommand-line script to make generation of classes even more simple, and takescare of possible errors. It is called generate (on Unix/Linux) or generate.bat(on Windows). This script is able to process multiple files in one run, andalso to process XML from standard input and write generated code to standardoutput.Usage:generate optionsOptions:-x filename, --xml=filenamesource XML file. Multiple input files can be specified, in whichcase each -x option must be followed by -p unless -d is specifiedDefaults to stdin-p filename, --php=filenamedestination PHP file. Defaults to stdout. If specied multiple times,each -p must follow -x-d dirname, --dir=dirnameDefault destination directory. File names will be taken from XML input("lang" attribute of <highlight> tag)-h, --helpThis helpExamplesRead from php.xml, write to PHP.phpgenerate -x php.xml -p PHP.phpRead from php.xml, write to standard outputgenerate -x php.xmlRead from php.xml, write to PHP.php, read from xml.xml, write to XML.phpgenerate -x php.xml -p PHP.php -x xml.xml -p XML.phpRead from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to/some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, andphp.xml contains <highlight lang="php">)generate -x php.xml -x xml.xml -d /some/dir/Renderers=========Introduction------------Text_Highlighter supports renderes. Using renderers, you can get output indifferent formats. Two renderers are included in the package:- HTML renderer. Generates HTML output. A style sheet should be linked tothe document to display colored text- Console renderer. Can be used to output highlighted text tocolor-capable terminals, either directly or trough less -rRenderers API-------------Renderers are subclasses of Text_Highlighter_Renderer. Renderer shouldoverride at least two methods - acceptToken and getOutput. Overriding othermethods is optional, depending on the nature of renderer's output and detailsof implementation.string reset()resets renderer state. This method is called every time before a newsource file is highlighted.string preprocess(string $code)preprocesses code. Can be used, for example, to normalize whitespacebefore highlighting. Returns preprocessed string.void acceptToken(string $group, string $content)the core method of the renderer. Highlighter passes chunks of text tothis method in $content, and color group in $groupvoid finalize()signals the renderer that no more tokens are available.mixed getOutput()returns generated output.Setting renderer options--------------------------------Renderers accept an optional argument to their constructor - options array.Elements of this array are renderer-specific.HTML renderer-------------HTML renderer produces HTML output with optional line numbering. The rendereritself does not provide information about actual colors of highlighted text.Instead, <span class="hl-XXX"> is used, where XXX is replaced with color groupname (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet.If 'use_language' option with value evaluating to true was passed, class nameswill be formatted as "hl-LANG-XXX", where LANG is language name as defined inhighlighter XML source ("lang" attribute of <highlight> tag) in lower case.There are 3 special CSS classes:hl-main - this class applies to whole output or right table column,depending on 'numbers' optionhl-gutter - applies to left column in tablehl-table - applies to whole tableHTML renderer accepts following options (each being optional):* numbers - line numbering style.0 - no numbering (default)HL_NUMBERS_LI - use <ol></ol> for line numberingHL_NUMBERS_TABLE - create a 2-column table, with line numbers in leftcolumn and highlighted text in right column* tabsize - tabulation size. Defaults to 4Example:require_once 'Text/Highlighter/Renderer/Html.php';$options = array('numbers' => HL_NUMBERS_LI,'tabsize' => 8,);$renderer =& new Text_Highlighter_Renderer_HTML($options);Console renderer----------------Console renderer produces output for displaying on a color-capable terminal,either directly or through less -r, using ANSI escape sequences. By default,this renderer only highlights most common color groups. Additional colorscan be specified using 'colors' option. This renderer also accepts 'numbers'option - a boolean value, and 'tabsize' option.Example :require_once 'Text/Highlighter/Renderer/Console.php';$colors = array('prepro' => "\033[35m",'types' => "\033[32m",);$options = array('numbers' => true,'tabsize' => 8,'colors' => $colors,);$renderer =& new Text_Highlighter_Renderer_Console($options);ANSI color escape sequences have the following format:ESC[#;#;....;#mwhere ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # isone of the following:0 for normal display1 for bold on4 underline (mono only)5 blink on7 reverse video on8 nondisplayed (invisible)30 black foreground31 red foreground32 green foreground33 yellow foreground34 blue foreground35 magenta foreground36 cyan foreground37 white foreground40 black background41 red background42 green background43 yellow background44 blue background45 magenta background46 cyan background47 white backgroundHow to use Text_Highlighter class=================================Creating a highlighter object-----------------------------To create a highlighter for a certain language, use Text_Highlighter::factory()static method:require_once 'Text/Highlighter.php';$hl =& Text_Highlighter::factory('php');Setting a renderer------------------Actual output is produced by a renderer.require_once 'Text/Highlighter.php';require_once 'Text/Highlighter/Renderer/Html.php';$options = array('numbers' => HL_NUMBERS_LI,'tabsize' => 8,);$renderer =& new Text_Highlighter_Renderer_HTML($options);$hl =& Text_Highlighter::factory('php');$hl->setRenderer($renderer);Note that for BC reasons, it is possible to use highlighter without setting arenderer. If no renderer is set, HTML renderer will be used by default. Inthis case, you should pass options as second parameter to factory method. Thefollowing example works exactly as previous one:require_once 'Text/Highlighter.php';$options = array('numbers' => HL_NUMBERS_LI,'tabsize' => 8,);$hl =& Text_Highlighter::factory('php', $options);Getting output--------------And finally, do the highlighting and get the output:require_once 'Text/Highlighter.php';require_once 'Text/Highlighter/Renderer/Html.php';$options = array('numbers' => HL_NUMBERS_LI,'tabsize' => 8,);$renderer =& new Text_Highlighter_Renderer_HTML($options);$hl =& Text_Highlighter::factory('php');$hl->setRenderer($renderer);$html = $hl->highlight(file_get_contents('example.php'));# vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */