Blame | Letzte Änderung | Log anzeigen | RSS feed
<html><head><title>API description</title><link rel="stylesheet" type="text/css" medial="all" title="Default" href="css/help.css"/><style type="text/css">div.note {margin: 0.5em 0;}div.class {margin: 0.5em 0 0.5em 2em;}div.interface {margin: 1em 0 0.5em 0;padding: 2px 5px;background-color: #f0f0f0;}span.interface_name {font-weight: bold;}span.method_name {font-weight: bold;}</style></head><body><h1>Beware: GLOBALS!</h1><p>At the moment, the layout/conversion engine makes use of several global variables:<ul><li>$g_config array (in particular, $g_config['renderforms'], $g_config['renderlinks'], $g_config['renderimages'],$g_config['debugbox'], $g_config['mode'], $g_config['cssmedia'] and $g_config['draw_page_border']elements for all output methods and $g_config['ps2pdf'] and $g_config['transparency_workaround'] for'fastps' output method.</li><li>$g_px_scale</li><li>$g_pt_scale</li></ul>Please take this into account while using API. We're planning to get rid of these globals eventually. For a while,you may initialize these global with the code from samples above.</p><p>Also, there's some global items script initializes itself:<ul><li>$g_box_uid</li><li>$g_colors</li><li>$__g_css_manager</li><li>$__g_css_handler_set</li><li>$g_encoding_aliases</li><li>$g_frame_level</li><li>$g_font_resolver</li><li>$g_font_resolver_pdf</li><li>$g_html_entities</li><li>$g_image_cache</li><li>$g_last_assigned_font_id</li><li>$g_manager_encodings</li><li>$g_media</li><li>$g_predefined_media</li><li>$g_stylesheet_title</li><li>$g_tag_attrs</li><li>$g_unicode_glyphs</li><li>$g_utf8_converters</li></ul>There's no need to initialize or modify these variables; just don't accidentally overwrite them. Some of themare here for "historical" reasons and will be eventually removed. Some are here due lack of static class variablesin older PHP versions.</p><h1>Conversion pipeline</h1><div><b>PipelineFactory</b> is a simple factory class simplifying building of <b>Pipeline</b> instances;<b>create_default_pipeline()</b> will build a simple ready-to-run conversion pipeline. The usage of<b>PipelineFactory</b> is not required; you may create the <b>Pipeline</b> object and fillthe appropriate fields manually.<pre class="code">class PipelineFactory {function create_default_pipeline();}</pre></div><div><b>Pipeline</b> class describe the process of conversion as a whole; it contains references to classes, describedabove and is responsible for calling them in correct order and error handling.<pre class="code">class Pipeline {var $fetchers;var $data_filters;var $parser;var $pre_tree_filters;var $layout_engine;var $post_tree_filters;var $output_driver;var $output_filter;var $destination;function Pipeline();function configure($options);function process($data_id, &$media);function process_batch($data_id_array, &$media);function error_message();function &get_dispatcher();}</pre></div></div><h1>Description of interfaces and classes</h1><div class="note">Almost all interfaces described below include<span class="method_name">error_message</span> method.It should return the user-readable description ofthe error. This description MAY contain HTML tags, but should remainreadable in case tags are removed.</div><div class="interface"><p><span class="interface_name">Fetcher</span> interface provides a method offetching the data requiredto build a document tree. Normally, classes implementing this interface wouldfetch an HTML/XHTML string from somewhere (e.g. from remove HTTP server,local file or database). Nevertheless, it MAY fetch ANY data provided thatthis data will be understood by parser. The pipeline object may containseveral fetcher objects; in this case they're used one-by-one untilone of them return non-null value.</p><p>It is assumed that if you need to get data from non-standard places (e.g. from template engine or database), youshould implement <span class="interface_name">Fetcher</span> in your own class.</p><p>Note that the <b>get_data</b> method returns the <b>FetchedData</b> object (or one of its descendants) instead ofHTML string!</p></div><img src="UML/Fetchers.PNG"/><dl><dt>get_data($data_id)</dt><dd>Fetches the URL and returns page content and supplementary information.<ul><li>$data_id – URI identifying the page location</li></ul></dd><dt>get_base_url()</dt><dd>Returns URL to be used as the base url when resolving relative links</dd></dl><div class="class"><b>FetcherURL</b> reads remote HTML page via HTTP or HTTPS.</div><div class="class"><b>FetcherLocalFile</b> reads local file; in this case $data_id should contain path to the file to be read.</div><div class="interface"><B>DataFilter</b> interface describes the filters modifying the raw input data.The main purpose of these filters is to fix the raw data so that it can beprocessed by parser without errors.</div><img src="UML/Data_filters.PNG"/><dl><dt>process($data)</dt><dd>Processes the FetchedData object and returns another FetchedData object with (probably) modified content<ul><li>$data – FetchedData object</li></ul></dd></dl><div class="class"><b>DataFilterDoctype</b> tries to detect the mode this document should be rendered in (HTML, XHTML, QUIRKS).</div><div class="class"><b>DataFilterHTML2XHTML</b>The precise description of this filter actions are beyond the scope of thisdocument. In general, it makes the input document a wellformed XML document(possibly throwing out invalid parts, by the way). Note that it is achievedby extensive use of regular expressions; no XML/HTML parsers involvedin conversion at this stage.</div><div class="class"><b>DataFilterXHTML2XHTML</b> does some additional XHTML processing required for thescript; for example, it removes comments, SCRIPT tags and does some other steps simplifyingdocument processing.</div><div class="class"><b>DataFilterUTF8</b> converts content from the source encoding to UTF-8. It is a good ideato use this filter if you're not limited by ASCII encoding.</div><div class="interface"><b>Parser</b> interface provides a method of building the DOM tree from thefiltered data.</div><img src="UML/Parsers.PNG"/><dl><dt>process($data)</dt><dd>Processes the FetchedData object and returns the document tree (somewhat similar to DOM) object.<ul><li>$data – FetchedData object</li></ul></dd></dl><div class="class"><b>ParserXHTML</b></div><div class="interface"><b>PreTreeFilter</b> interface describes a procedure of document tree transformation executed beforethe layout engine starts.</div><img src="UML/Pre_filters.PNG"/><dl><dt>process($data)</dt><dd>Make some modifications in document tree (in-place) before the layout engine have been run.<ul><li>$data – Document tree object</li></ul></dd></dl><div class="class" id="filter-pre-html2ps-fields"><b>PreTreeFilterHTML2PSFields</b> handles the processingof special fields (such a date, page count, page number, etc.).</div><div class="class"><b>PreTreeFilterHeaderFooter</b> adds script-generated header and footer to the document tree.</div><div class="interface"><b>LayoutEngine</b> interface of a class processingof the document tree and calculating positions of page elements. In theory, different implementationsof this interface will allow us to use "lightweight" layout engines in case we donot need full HTML/CSS support.</div><img src="UML/Layout_engines.PNG"/><dl><dt>process($data)</dt><dd>Runs the layout process (document tree object is modified in-place).<ul><li>$data – Document tree object</li></ul></dd></dl><div class="class"><b>LayoutEngineDefault</b> - a standard layout engine HTML2PS uses.</div><div class="interface"><b>PostTreeFilter</b> interface describes a procedure of document tree transformation executed afterthe layout engine completes.</div><img src="UML/Post_filters.PNG"/><dl><dt>process($data)</dt><dd>Apply some changes to document tree (in-place) after the layout engine have been run.<ul><li>$data – document tree object</li></ul></dd></dl><div class="interface"<b>OutputDriver</b> interface contains device-specific functions - drawing, movement, fonts selection, etc.In general, description of this interface is beyond the scope of this document, as users are not intendedto implement this interface themselves. Instead, they would use pre-defined output drivers described below.</div><img src="UML/Output_drivers.PNG"/><div class="class"><b>OutputDriverPDFLIB</b> outputs PDF using PDFLIB.</div><div class="class"><b>OutputDriverFPDF</b> outputs PDF using FPDF</div><div class="class"><b>OutputDriverFastPS</b> handles Postscript Level 3 output.</div><div class="class"><b>OutputDriverFastPSLevel2</b> handles Postscript Level 2 output.</div><div class="interface"><b>OutputFilter</b> interface describes the filter applied to generated PS or PDF file.</div><img src="UML/Output_filters.PNG"/><div class="class"><b>OutputFilterPS2PDF</b> runs the PS2PDF utitity on the generated file.</div><div class="class"><b>OutputFilterGZIP</b> compresses generated file using ZLIB.</div><div class="interface"><b>Destination</b> interface describes the "channel" object which determines where the final output fileshould be placed.</div><img src="UML/Destinations.PNG"/><div class="class"><b>DestinationBrowser</b> outputs the generated file directly to the browser.</div><div class="class"><b>DestinationDownload</b> outputs the generated file directly to the browser.Unlike <b>DestinationBrowser</b>, this class send headers preventing the file from being opened directlyin the browser window.</div><div class="class"><b>DestinationFile</b> saves generated file on the server side.</div><h2>Implementing your own fetcher class</h2><p>Sometimes you may need to convert HTML code taken from database or from other non-standard sources.In this case you should implement <b>Fetcher</b> interface yourself, returning the string to be convertedfrom the <span class="method_name">get_data</span> method. Additional parameters (like database connection settings,template variables, etc) may be specified either as globals (not recommended, though), passed as a parametersto constructor of fetcher object or as $dataId parameter of <span class="method_name">get_data</span> method.</p><p>Keep in mind that if you're including files from your HTML code (e.g. stylesheets or images), you should eitherreturn null from your fetcher for URL of these files, or handle them yourself. Unless you do it,these files will not be available.</p><pre>class MyFetcherLocalFile extends Fetcher {var $_content;function MyFetcherLocalFile($file) {$this->_content = file_get_contents($file);}function get_data($dummy1) {return new FetchedDataURL($this->_content, array(), "");}function get_base_url() {return "";}}</pre>Also see <tt>sample.simplest.from.file.php</tt> and <tt>sample.simples.from.memory.php</tt> files.<h1>Class dependencies</h1>The pipeline object contains the following:<ul><li>one or more objects implementing <b>Fetcher</b> interface;</li><li>zero or more objects implementing <b>DataFilter</b> interface;</li><li>one object implementing <b>Parser</b> interface;</li><li>zero or more objects implementing <b>PreTreeFilter</b> interface;</li><li>one object implementing <b>LayoutEngine</b> interface;</li><li>zero or more objects implementing <b>PostTreeFilter</b> interface;</li><li>one object implementing <b>OutputDriver</b> interface;</li><li>one object implementing <b>Destination</b> interface;</li></ul>No other dependencies between class in interfaces (except "implements").Note that order of filters is important; imagine you're using some king of tree filter which adds header blockcontaining HTML2PS-specific fields. In this case you must add this filter before PostTreeFilterHTML2PSFields, oryou'll get raw field codes in generated output.</body></html>