| 1 |
lars |
1 |
<html>
|
|
|
2 |
<head>
|
|
|
3 |
<title>API description</title>
|
|
|
4 |
<link rel="stylesheet" type="text/css" medial="all" title="Default" href="css/help.css"/>
|
|
|
5 |
<style type="text/css">
|
|
|
6 |
div.note {
|
|
|
7 |
margin: 0.5em 0;
|
|
|
8 |
}
|
|
|
9 |
|
|
|
10 |
div.class {
|
|
|
11 |
margin: 0.5em 0 0.5em 2em;
|
|
|
12 |
}
|
|
|
13 |
|
|
|
14 |
div.interface {
|
|
|
15 |
margin: 1em 0 0.5em 0;
|
|
|
16 |
padding: 2px 5px;
|
|
|
17 |
background-color: #f0f0f0;
|
|
|
18 |
}
|
|
|
19 |
|
|
|
20 |
span.interface_name {
|
|
|
21 |
font-weight: bold;
|
|
|
22 |
}
|
|
|
23 |
|
|
|
24 |
span.method_name {
|
|
|
25 |
font-weight: bold;
|
|
|
26 |
}
|
|
|
27 |
</style>
|
|
|
28 |
</head>
|
|
|
29 |
<body>
|
|
|
30 |
|
|
|
31 |
<h1>Beware: GLOBALS!</h1>
|
|
|
32 |
<p>
|
|
|
33 |
At the moment, the layout/conversion engine makes use of several global variables:
|
|
|
34 |
<ul>
|
|
|
35 |
<li>$g_config array (in particular, $g_config['renderforms'], $g_config['renderlinks'], $g_config['renderimages'],
|
|
|
36 |
$g_config['debugbox'], $g_config['mode'], $g_config['cssmedia'] and $g_config['draw_page_border']
|
|
|
37 |
elements for all output methods and $g_config['ps2pdf'] and $g_config['transparency_workaround'] for
|
|
|
38 |
'fastps' output method.</li>
|
|
|
39 |
<li>$g_px_scale</li>
|
|
|
40 |
<li>$g_pt_scale</li>
|
|
|
41 |
</ul>
|
|
|
42 |
Please take this into account while using API. We're planning to get rid of these globals eventually. For a while,
|
|
|
43 |
you may initialize these global with the code from samples above.
|
|
|
44 |
</p>
|
|
|
45 |
<p>
|
|
|
46 |
Also, there's some global items script initializes itself:
|
|
|
47 |
<ul>
|
|
|
48 |
<li>$g_box_uid</li>
|
|
|
49 |
<li>$g_colors</li>
|
|
|
50 |
<li>$__g_css_manager</li>
|
|
|
51 |
<li>$__g_css_handler_set</li>
|
|
|
52 |
<li>$g_encoding_aliases</li>
|
|
|
53 |
<li>$g_frame_level</li>
|
|
|
54 |
<li>$g_font_resolver</li>
|
|
|
55 |
<li>$g_font_resolver_pdf</li>
|
|
|
56 |
<li>$g_html_entities</li>
|
|
|
57 |
<li>$g_image_cache</li>
|
|
|
58 |
<li>$g_last_assigned_font_id</li>
|
|
|
59 |
<li>$g_manager_encodings</li>
|
|
|
60 |
<li>$g_media</li>
|
|
|
61 |
<li>$g_predefined_media</li>
|
|
|
62 |
<li>$g_stylesheet_title</li>
|
|
|
63 |
<li>$g_tag_attrs</li>
|
|
|
64 |
<li>$g_unicode_glyphs</li>
|
|
|
65 |
<li>$g_utf8_converters</li>
|
|
|
66 |
</ul>
|
|
|
67 |
There's no need to initialize or modify these variables; just don't accidentally overwrite them. Some of them
|
|
|
68 |
are here for "historical" reasons and will be eventually removed. Some are here due lack of static class variables
|
|
|
69 |
in older PHP versions.
|
|
|
70 |
</p>
|
|
|
71 |
|
|
|
72 |
<h1>Conversion pipeline</h1>
|
|
|
73 |
<div>
|
|
|
74 |
<b>PipelineFactory</b> is a simple factory class simplifying building of <b>Pipeline</b> instances;
|
|
|
75 |
<b>create_default_pipeline()</b> will build a simple ready-to-run conversion pipeline. The usage of
|
|
|
76 |
<b>PipelineFactory</b> is not required; you may create the <b>Pipeline</b> object and fill
|
|
|
77 |
the appropriate fields manually.
|
|
|
78 |
|
|
|
79 |
<pre class="code">
|
|
|
80 |
class PipelineFactory {
|
|
|
81 |
function create_default_pipeline();
|
|
|
82 |
}
|
|
|
83 |
</pre>
|
|
|
84 |
</div>
|
|
|
85 |
|
|
|
86 |
<div>
|
|
|
87 |
<b>Pipeline</b> class describe the process of conversion as a whole; it contains references to classes, described
|
|
|
88 |
above and is responsible for calling them in correct order and error handling.
|
|
|
89 |
<pre class="code">
|
|
|
90 |
class Pipeline {
|
|
|
91 |
var $fetchers;
|
|
|
92 |
var $data_filters;
|
|
|
93 |
var $parser;
|
|
|
94 |
var $pre_tree_filters;
|
|
|
95 |
var $layout_engine;
|
|
|
96 |
var $post_tree_filters;
|
|
|
97 |
var $output_driver;
|
|
|
98 |
var $output_filter;
|
|
|
99 |
var $destination;
|
|
|
100 |
|
|
|
101 |
function Pipeline();
|
|
|
102 |
|
|
|
103 |
function configure($options);
|
|
|
104 |
function process($data_id, &$media);
|
|
|
105 |
function process_batch($data_id_array, &$media);
|
|
|
106 |
function error_message();
|
|
|
107 |
|
|
|
108 |
function &get_dispatcher();
|
|
|
109 |
}
|
|
|
110 |
</pre>
|
|
|
111 |
</div>
|
|
|
112 |
|
|
|
113 |
</div>
|
|
|
114 |
|
|
|
115 |
<h1>Description of interfaces and classes</h1>
|
|
|
116 |
|
|
|
117 |
<div class="note">
|
|
|
118 |
Almost all interfaces described below include
|
|
|
119 |
<span class="method_name">error_message</span> method.
|
|
|
120 |
It should return the user-readable description of
|
|
|
121 |
the error. This description MAY contain HTML tags, but should remain
|
|
|
122 |
readable in case tags are removed.
|
|
|
123 |
</div>
|
|
|
124 |
|
|
|
125 |
<div class="interface">
|
|
|
126 |
<p><span class="interface_name">Fetcher</span> interface provides a method of
|
|
|
127 |
fetching the data required
|
|
|
128 |
to build a document tree. Normally, classes implementing this interface would
|
|
|
129 |
fetch an HTML/XHTML string from somewhere (e.g. from remove HTTP server,
|
|
|
130 |
local file or database). Nevertheless, it MAY fetch ANY data provided that
|
|
|
131 |
this data will be understood by parser. The pipeline object may contain
|
|
|
132 |
several fetcher objects; in this case they're used one-by-one until
|
|
|
133 |
one of them return non-null value.</p>
|
|
|
134 |
|
|
|
135 |
<p>It is assumed that if you need to get data from non-standard places (e.g. from template engine or database), you
|
|
|
136 |
should implement <span class="interface_name">Fetcher</span> in your own class.</p>
|
|
|
137 |
|
|
|
138 |
<p>
|
|
|
139 |
Note that the <b>get_data</b> method returns the <b>FetchedData</b> object (or one of its descendants) instead of
|
|
|
140 |
HTML string!
|
|
|
141 |
</p>
|
|
|
142 |
</div>
|
|
|
143 |
|
|
|
144 |
<img src="UML/Fetchers.PNG"/>
|
|
|
145 |
|
|
|
146 |
<dl>
|
|
|
147 |
<dt>get_data($data_id)</dt>
|
|
|
148 |
<dd>
|
|
|
149 |
Fetches the URL and returns page content and supplementary information.
|
|
|
150 |
<ul>
|
|
|
151 |
<li>$data_id – URI identifying the page location</li>
|
|
|
152 |
</ul>
|
|
|
153 |
</dd>
|
|
|
154 |
|
|
|
155 |
<dt>get_base_url()</dt>
|
|
|
156 |
<dd>Returns URL to be used as the base url when resolving relative links</dd>
|
|
|
157 |
</dl>
|
|
|
158 |
|
|
|
159 |
<div class="class">
|
|
|
160 |
<b>FetcherURL</b> reads remote HTML page via HTTP or HTTPS.
|
|
|
161 |
</div>
|
|
|
162 |
|
|
|
163 |
<div class="class">
|
|
|
164 |
<b>FetcherLocalFile</b> reads local file; in this case $data_id should contain path to the file to be read.
|
|
|
165 |
</div>
|
|
|
166 |
|
|
|
167 |
<div class="interface">
|
|
|
168 |
<B>DataFilter</b> interface describes the filters modifying the raw input data.
|
|
|
169 |
The main purpose of these filters is to fix the raw data so that it can be
|
|
|
170 |
processed by parser without errors.
|
|
|
171 |
</div>
|
|
|
172 |
|
|
|
173 |
<img src="UML/Data_filters.PNG"/>
|
|
|
174 |
|
|
|
175 |
<dl>
|
|
|
176 |
<dt>process($data)</dt>
|
|
|
177 |
<dd>
|
|
|
178 |
Processes the FetchedData object and returns another FetchedData object with (probably) modified content
|
|
|
179 |
<ul>
|
|
|
180 |
<li>$data – FetchedData object</li>
|
|
|
181 |
</ul>
|
|
|
182 |
</dd>
|
|
|
183 |
</dl>
|
|
|
184 |
|
|
|
185 |
<div class="class">
|
|
|
186 |
<b>DataFilterDoctype</b> tries to detect the mode this document should be rendered in (HTML, XHTML, QUIRKS).
|
|
|
187 |
</div>
|
|
|
188 |
|
|
|
189 |
<div class="class">
|
|
|
190 |
<b>DataFilterHTML2XHTML</b>
|
|
|
191 |
The precise description of this filter actions are beyond the scope of this
|
|
|
192 |
document. In general, it makes the input document a wellformed XML document
|
|
|
193 |
(possibly throwing out invalid parts, by the way). Note that it is achieved
|
|
|
194 |
by extensive use of regular expressions; no XML/HTML parsers involved
|
|
|
195 |
in conversion at this stage.
|
|
|
196 |
</div>
|
|
|
197 |
|
|
|
198 |
<div class="class">
|
|
|
199 |
<b>DataFilterXHTML2XHTML</b> does some additional XHTML processing required for the
|
|
|
200 |
script; for example, it removes comments, SCRIPT tags and does some other steps simplifying
|
|
|
201 |
document processing.
|
|
|
202 |
</div>
|
|
|
203 |
|
|
|
204 |
<div class="class">
|
|
|
205 |
<b>DataFilterUTF8</b> converts content from the source encoding to UTF-8. It is a good idea
|
|
|
206 |
to use this filter if you're not limited by ASCII encoding.
|
|
|
207 |
</div>
|
|
|
208 |
|
|
|
209 |
<div class="interface">
|
|
|
210 |
<b>Parser</b> interface provides a method of building the DOM tree from the
|
|
|
211 |
filtered data.
|
|
|
212 |
</div>
|
|
|
213 |
|
|
|
214 |
<img src="UML/Parsers.PNG"/>
|
|
|
215 |
|
|
|
216 |
<dl>
|
|
|
217 |
<dt>process($data)</dt>
|
|
|
218 |
<dd>
|
|
|
219 |
Processes the FetchedData object and returns the document tree (somewhat similar to DOM) object.
|
|
|
220 |
<ul>
|
|
|
221 |
<li>$data – FetchedData object</li>
|
|
|
222 |
</ul>
|
|
|
223 |
</dd>
|
|
|
224 |
</dl>
|
|
|
225 |
|
|
|
226 |
<div class="class">
|
|
|
227 |
<b>ParserXHTML</b>
|
|
|
228 |
</div>
|
|
|
229 |
|
|
|
230 |
<div class="interface">
|
|
|
231 |
<b>PreTreeFilter</b> interface describes a procedure of document tree transformation executed before
|
|
|
232 |
the layout engine starts.
|
|
|
233 |
</div>
|
|
|
234 |
|
|
|
235 |
<img src="UML/Pre_filters.PNG"/>
|
|
|
236 |
|
|
|
237 |
<dl>
|
|
|
238 |
<dt>process($data)</dt>
|
|
|
239 |
<dd>
|
|
|
240 |
Make some modifications in document tree (in-place) before the layout engine have been run.
|
|
|
241 |
<ul>
|
|
|
242 |
<li>$data – Document tree object</li>
|
|
|
243 |
</ul>
|
|
|
244 |
</dd>
|
|
|
245 |
</dl>
|
|
|
246 |
|
|
|
247 |
<div class="class" id="filter-pre-html2ps-fields">
|
|
|
248 |
<b>PreTreeFilterHTML2PSFields</b> handles the processing
|
|
|
249 |
of special fields (such a date, page count, page number, etc.).
|
|
|
250 |
</div>
|
|
|
251 |
|
|
|
252 |
<div class="class">
|
|
|
253 |
<b>PreTreeFilterHeaderFooter</b> adds script-generated header and footer to the document tree.
|
|
|
254 |
</div>
|
|
|
255 |
|
|
|
256 |
<div class="interface">
|
|
|
257 |
<b>LayoutEngine</b> interface of a class processing
|
|
|
258 |
of the document tree and calculating positions of page elements. In theory, different implementations
|
|
|
259 |
of this interface will allow us to use "lightweight" layout engines in case we do
|
|
|
260 |
not need full HTML/CSS support.
|
|
|
261 |
</div>
|
|
|
262 |
|
|
|
263 |
<img src="UML/Layout_engines.PNG"/>
|
|
|
264 |
|
|
|
265 |
<dl>
|
|
|
266 |
<dt>process($data)</dt>
|
|
|
267 |
<dd>
|
|
|
268 |
Runs the layout process (document tree object is modified in-place).
|
|
|
269 |
<ul>
|
|
|
270 |
<li>$data – Document tree object</li>
|
|
|
271 |
</ul>
|
|
|
272 |
</dd>
|
|
|
273 |
</dl>
|
|
|
274 |
|
|
|
275 |
<div class="class">
|
|
|
276 |
<b>LayoutEngineDefault</b> - a standard layout engine HTML2PS uses.
|
|
|
277 |
</div>
|
|
|
278 |
|
|
|
279 |
<div class="interface">
|
|
|
280 |
<b>PostTreeFilter</b> interface describes a procedure of document tree transformation executed after
|
|
|
281 |
the layout engine completes.
|
|
|
282 |
</div>
|
|
|
283 |
|
|
|
284 |
<img src="UML/Post_filters.PNG"/>
|
|
|
285 |
|
|
|
286 |
<dl>
|
|
|
287 |
<dt>process($data)</dt>
|
|
|
288 |
<dd>
|
|
|
289 |
Apply some changes to document tree (in-place) after the layout engine have been run.
|
|
|
290 |
<ul>
|
|
|
291 |
<li>$data – document tree object</li>
|
|
|
292 |
</ul>
|
|
|
293 |
</dd>
|
|
|
294 |
</dl>
|
|
|
295 |
|
|
|
296 |
<div class="interface"
|
|
|
297 |
<b>OutputDriver</b> interface contains device-specific functions - drawing, movement, fonts selection, etc.
|
|
|
298 |
In general, description of this interface is beyond the scope of this document, as users are not intended
|
|
|
299 |
to implement this interface themselves. Instead, they would use pre-defined output drivers described below.
|
|
|
300 |
</div>
|
|
|
301 |
|
|
|
302 |
<img src="UML/Output_drivers.PNG"/>
|
|
|
303 |
|
|
|
304 |
<div class="class">
|
|
|
305 |
<b>OutputDriverPDFLIB</b> outputs PDF using PDFLIB.
|
|
|
306 |
</div>
|
|
|
307 |
|
|
|
308 |
<div class="class">
|
|
|
309 |
<b>OutputDriverFPDF</b> outputs PDF using FPDF
|
|
|
310 |
</div>
|
|
|
311 |
|
|
|
312 |
<div class="class">
|
|
|
313 |
<b>OutputDriverFastPS</b> handles Postscript Level 3 output.
|
|
|
314 |
</div>
|
|
|
315 |
|
|
|
316 |
<div class="class">
|
|
|
317 |
<b>OutputDriverFastPSLevel2</b> handles Postscript Level 2 output.
|
|
|
318 |
</div>
|
|
|
319 |
|
|
|
320 |
<div class="interface">
|
|
|
321 |
<b>OutputFilter</b> interface describes the filter applied to generated PS or PDF file.
|
|
|
322 |
</div>
|
|
|
323 |
|
|
|
324 |
<img src="UML/Output_filters.PNG"/>
|
|
|
325 |
|
|
|
326 |
<div class="class">
|
|
|
327 |
<b>OutputFilterPS2PDF</b> runs the PS2PDF utitity on the generated file.
|
|
|
328 |
</div>
|
|
|
329 |
|
|
|
330 |
<div class="class">
|
|
|
331 |
<b>OutputFilterGZIP</b> compresses generated file using ZLIB.
|
|
|
332 |
</div>
|
|
|
333 |
|
|
|
334 |
<div class="interface">
|
|
|
335 |
<b>Destination</b> interface describes the "channel" object which determines where the final output file
|
|
|
336 |
should be placed.
|
|
|
337 |
</div>
|
|
|
338 |
|
|
|
339 |
<img src="UML/Destinations.PNG"/>
|
|
|
340 |
|
|
|
341 |
<div class="class">
|
|
|
342 |
<b>DestinationBrowser</b> outputs the generated file directly to the browser.
|
|
|
343 |
</div>
|
|
|
344 |
|
|
|
345 |
<div class="class">
|
|
|
346 |
<b>DestinationDownload</b> outputs the generated file directly to the browser.
|
|
|
347 |
Unlike <b>DestinationBrowser</b>, this class send headers preventing the file from being opened directly
|
|
|
348 |
in the browser window.
|
|
|
349 |
</div>
|
|
|
350 |
|
|
|
351 |
<div class="class">
|
|
|
352 |
<b>DestinationFile</b> saves generated file on the server side.
|
|
|
353 |
</div>
|
|
|
354 |
|
|
|
355 |
<h2>Implementing your own fetcher class</h2>
|
|
|
356 |
<p>
|
|
|
357 |
Sometimes you may need to convert HTML code taken from database or from other non-standard sources.
|
|
|
358 |
In this case you should implement <b>Fetcher</b> interface yourself, returning the string to be converted
|
|
|
359 |
from the <span class="method_name">get_data</span> method. Additional parameters (like database connection settings,
|
|
|
360 |
template variables, etc) may be specified either as globals (not recommended, though), passed as a parameters
|
|
|
361 |
to constructor of fetcher object or as $dataId parameter of <span class="method_name">get_data</span> method.
|
|
|
362 |
</p>
|
|
|
363 |
<p>
|
|
|
364 |
Keep in mind that if you're including files from your HTML code (e.g. stylesheets or images), you should either
|
|
|
365 |
return null from your fetcher for URL of these files, or handle them yourself. Unless you do it,
|
|
|
366 |
these files will not be available.
|
|
|
367 |
</p>
|
|
|
368 |
|
|
|
369 |
<pre>
|
|
|
370 |
class MyFetcherLocalFile extends Fetcher {
|
|
|
371 |
var $_content;
|
|
|
372 |
|
|
|
373 |
function MyFetcherLocalFile($file) {
|
|
|
374 |
$this->_content = file_get_contents($file);
|
|
|
375 |
}
|
|
|
376 |
|
|
|
377 |
function get_data($dummy1) {
|
|
|
378 |
return new FetchedDataURL($this->_content, array(), "");
|
|
|
379 |
}
|
|
|
380 |
|
|
|
381 |
function get_base_url() {
|
|
|
382 |
return "";
|
|
|
383 |
}
|
|
|
384 |
}
|
|
|
385 |
</pre>
|
|
|
386 |
|
|
|
387 |
Also see <tt>sample.simplest.from.file.php</tt> and <tt>sample.simples.from.memory.php</tt> files.
|
|
|
388 |
|
|
|
389 |
<h1>Class dependencies</h1>
|
|
|
390 |
The pipeline object contains the following:
|
|
|
391 |
<ul>
|
|
|
392 |
<li>one or more objects implementing <b>Fetcher</b> interface;</li>
|
|
|
393 |
<li>zero or more objects implementing <b>DataFilter</b> interface;</li>
|
|
|
394 |
<li>one object implementing <b>Parser</b> interface;</li>
|
|
|
395 |
<li>zero or more objects implementing <b>PreTreeFilter</b> interface;</li>
|
|
|
396 |
<li>one object implementing <b>LayoutEngine</b> interface;</li>
|
|
|
397 |
<li>zero or more objects implementing <b>PostTreeFilter</b> interface;</li>
|
|
|
398 |
<li>one object implementing <b>OutputDriver</b> interface;</li>
|
|
|
399 |
<li>one object implementing <b>Destination</b> interface;</li>
|
|
|
400 |
</ul>
|
|
|
401 |
|
|
|
402 |
No other dependencies between class in interfaces (except "implements").
|
|
|
403 |
|
|
|
404 |
Note that order of filters is important; imagine you're using some king of tree filter which adds header block
|
|
|
405 |
containing HTML2PS-specific fields. In this case you must add this filter before PostTreeFilterHTML2PSFields, or
|
|
|
406 |
you'll get raw field codes in generated output.
|
|
|
407 |
|
|
|
408 |
</body>
|
|
|
409 |
</html>
|