Subversion-Projekte lars-tiefland.php_share

Revision

Details | Letzte Änderung | Log anzeigen | RSS feed

Revision Autor Zeilennr. Zeile
1 lars 1
<html>
2
<head>
3
<title>API description</title>
4
<link rel="stylesheet" type="text/css" medial="all" title="Default" href="css/help.css"/>
5
<style type="text/css">
6
div.note {
7
  margin: 0.5em 0;
8
}
9
 
10
div.class {
11
  margin: 0.5em 0 0.5em 2em;
12
}
13
 
14
div.interface {
15
  margin: 1em 0 0.5em 0;
16
  padding: 2px 5px;
17
  background-color: #f0f0f0;
18
}
19
 
20
span.interface_name {
21
  font-weight: bold;
22
}
23
 
24
span.method_name {
25
  font-weight: bold;
26
}
27
</style>
28
</head>
29
<body>
30
 
31
<h1>Beware: GLOBALS!</h1>
32
<p>
33
At the moment, the layout/conversion engine makes use of several global variables:
34
<ul>
35
<li>$g_config array (in particular, $g_config['renderforms'], $g_config['renderlinks'], $g_config['renderimages'],
36
            $g_config['debugbox'], $g_config['mode'], $g_config['cssmedia'] and $g_config['draw_page_border']
37
            elements for all output methods and $g_config['ps2pdf'] and $g_config['transparency_workaround'] for
38
            'fastps' output method.</li>
39
<li>$g_px_scale</li>
40
<li>$g_pt_scale</li>
41
</ul>
42
Please take this into account while using API. We're planning to get rid of these globals eventually. For a while,
43
you may initialize these global with the code from samples above.
44
</p>
45
<p>
46
Also, there's some global items script initializes itself:
47
<ul>
48
<li>$g_box_uid</li>
49
<li>$g_colors</li>
50
<li>$__g_css_manager</li>
51
<li>$__g_css_handler_set</li>
52
<li>$g_encoding_aliases</li>
53
<li>$g_frame_level</li>
54
<li>$g_font_resolver</li>
55
<li>$g_font_resolver_pdf</li>
56
<li>$g_html_entities</li>
57
<li>$g_image_cache</li>
58
<li>$g_last_assigned_font_id</li>
59
<li>$g_manager_encodings</li>
60
<li>$g_media</li>
61
<li>$g_predefined_media</li>
62
<li>$g_stylesheet_title</li>
63
<li>$g_tag_attrs</li>
64
<li>$g_unicode_glyphs</li>
65
<li>$g_utf8_converters</li>
66
</ul>
67
There's no need to initialize or modify these variables; just don't accidentally overwrite them. Some of them
68
are here for "historical" reasons and will be eventually removed. Some are here due lack of static class variables
69
in older PHP versions.
70
</p>
71
 
72
<h1>Conversion pipeline</h1>
73
<div>
74
<b>PipelineFactory</b> is a simple factory class simplifying building of <b>Pipeline</b> instances;
75
<b>create_default_pipeline()</b> will build a simple ready-to-run conversion pipeline. The usage of
76
<b>PipelineFactory</b> is not required; you may create the <b>Pipeline</b> object and fill
77
the appropriate fields manually.
78
 
79
<pre class="code">
80
class PipelineFactory {
81
  function create_default_pipeline();
82
}
83
</pre>
84
</div>
85
 
86
<div>
87
<b>Pipeline</b> class describe the process of conversion as a whole; it contains references to classes, described
88
above and is responsible for calling them in correct order and error handling.
89
<pre class="code">
90
class Pipeline {
91
  var $fetchers;
92
  var $data_filters;
93
  var $parser;
94
  var $pre_tree_filters;
95
  var $layout_engine;
96
  var $post_tree_filters;
97
  var $output_driver;
98
  var $output_filter;
99
  var $destination;
100
 
101
  function Pipeline();
102
 
103
  function configure($options);
104
  function process($data_id, &$media);
105
  function process_batch($data_id_array, &$media);
106
  function error_message();
107
 
108
  function &get_dispatcher();
109
}
110
</pre>
111
</div>
112
 
113
</div>
114
 
115
<h1>Description of interfaces and classes</h1>
116
 
117
<div class="note">
118
Almost all interfaces described below include
119
<span class="method_name">error_message</span> method.
120
It should return the user-readable description of
121
the error. This description MAY contain HTML tags, but should remain
122
readable in case tags are removed.
123
</div>
124
 
125
<div class="interface">
126
<p><span class="interface_name">Fetcher</span> interface provides a method of
127
fetching the data required
128
to build a document tree. Normally, classes implementing this interface would
129
fetch an HTML/XHTML string from somewhere (e.g. from remove HTTP server,
130
local file or database). Nevertheless, it MAY fetch ANY data provided that
131
this data will be understood by parser. The pipeline object may contain
132
several fetcher objects; in this case they're used one-by-one until
133
one of them return non-null value.</p>
134
 
135
<p>It is assumed that if you need to get data from non-standard places (e.g. from template engine or database), you
136
should implement <span class="interface_name">Fetcher</span> in your own class.</p>
137
 
138
<p>
139
Note that the <b>get_data</b> method returns the <b>FetchedData</b> object (or one of its descendants) instead of
140
HTML string!
141
</p>
142
</div>
143
 
144
<img src="UML/Fetchers.PNG"/>
145
 
146
<dl>
147
<dt>get_data($data_id)</dt>
148
<dd>
149
Fetches the URL and returns page content and supplementary information.
150
<ul>
151
<li>$data_id &ndash; URI identifying the page location</li>
152
</ul>
153
</dd>
154
 
155
<dt>get_base_url()</dt>
156
<dd>Returns URL to be used as the base url when resolving relative links</dd>
157
</dl>
158
 
159
<div class="class">
160
<b>FetcherURL</b> reads remote HTML page via HTTP or HTTPS.
161
</div>
162
 
163
<div class="class">
164
<b>FetcherLocalFile</b> reads local file; in this case $data_id should contain path to the file to be read.
165
</div>
166
 
167
<div class="interface">
168
<B>DataFilter</b> interface describes the filters modifying the raw input data.
169
The main purpose of these filters is to fix the raw data so that it can be
170
processed by parser without errors.
171
</div>
172
 
173
<img src="UML/Data_filters.PNG"/>
174
 
175
<dl>
176
<dt>process($data)</dt>
177
<dd>
178
Processes the FetchedData object and returns another FetchedData object with (probably) modified content
179
<ul>
180
<li>$data &ndash; FetchedData object</li>
181
</ul>
182
</dd>
183
</dl>
184
 
185
<div class="class">
186
<b>DataFilterDoctype</b> tries to detect the mode this document should be rendered in (HTML, XHTML, QUIRKS).
187
</div>
188
 
189
<div class="class">
190
<b>DataFilterHTML2XHTML</b>
191
The precise description of this filter actions are beyond the scope of this
192
document. In general, it makes the input document a wellformed XML document
193
(possibly throwing out invalid parts, by the way). Note that it is achieved
194
by extensive use of regular expressions; no XML/HTML parsers involved
195
in conversion at this stage.
196
</div>
197
 
198
<div class="class">
199
<b>DataFilterXHTML2XHTML</b> does some additional XHTML processing required for the
200
script; for example, it removes comments, SCRIPT tags and does some other steps simplifying
201
document processing.
202
</div>
203
 
204
<div class="class">
205
<b>DataFilterUTF8</b> converts content from the source encoding to UTF-8. It is a good idea
206
to use this filter if you're not limited by ASCII encoding.
207
</div>
208
 
209
<div class="interface">
210
<b>Parser</b> interface provides a method of building the DOM tree from the
211
filtered data.
212
</div>
213
 
214
<img src="UML/Parsers.PNG"/>
215
 
216
<dl>
217
<dt>process($data)</dt>
218
<dd>
219
Processes the FetchedData object and returns the document tree (somewhat similar to DOM) object.
220
<ul>
221
<li>$data &ndash; FetchedData object</li>
222
</ul>
223
</dd>
224
</dl>
225
 
226
<div class="class">
227
<b>ParserXHTML</b>
228
</div>
229
 
230
<div class="interface">
231
<b>PreTreeFilter</b> interface describes a procedure of document tree transformation executed before
232
the layout engine starts.
233
</div>
234
 
235
<img src="UML/Pre_filters.PNG"/>
236
 
237
<dl>
238
<dt>process($data)</dt>
239
<dd>
240
Make some modifications in document tree (in-place) before the layout engine have been run.
241
<ul>
242
<li>$data &ndash; Document tree object</li>
243
</ul>
244
</dd>
245
</dl>
246
 
247
<div class="class" id="filter-pre-html2ps-fields">
248
<b>PreTreeFilterHTML2PSFields</b> handles the processing
249
of special fields (such a date, page count, page number, etc.).
250
</div>
251
 
252
<div class="class">
253
<b>PreTreeFilterHeaderFooter</b> adds script-generated header and footer to the document tree.
254
</div>
255
 
256
<div class="interface">
257
<b>LayoutEngine</b> interface of a class processing
258
of the document tree and calculating positions of page elements. In theory, different implementations
259
of this interface will allow us to use &quot;lightweight&quot; layout engines in case we do
260
not need full HTML/CSS support.
261
</div>
262
 
263
<img src="UML/Layout_engines.PNG"/>
264
 
265
<dl>
266
<dt>process($data)</dt>
267
<dd>
268
Runs the layout process (document tree object is modified in-place).
269
<ul>
270
<li>$data &ndash; Document tree object</li>
271
</ul>
272
</dd>
273
</dl>
274
 
275
<div class="class">
276
<b>LayoutEngineDefault</b> - a standard layout engine HTML2PS uses.
277
</div>
278
 
279
<div class="interface">
280
<b>PostTreeFilter</b> interface describes a procedure of document tree transformation executed after
281
the layout engine completes.
282
</div>
283
 
284
<img src="UML/Post_filters.PNG"/>
285
 
286
<dl>
287
<dt>process($data)</dt>
288
<dd>
289
Apply some changes to document tree (in-place) after the layout engine have been run.
290
<ul>
291
<li>$data &ndash; document tree object</li>
292
</ul>
293
</dd>
294
</dl>
295
 
296
<div class="interface"
297
<b>OutputDriver</b> interface contains device-specific functions - drawing, movement, fonts selection, etc.
298
In general, description of this interface is beyond the scope of this document, as users are not intended
299
to implement this interface themselves. Instead, they would use pre-defined output drivers described below.
300
</div>
301
 
302
<img src="UML/Output_drivers.PNG"/>
303
 
304
<div class="class">
305
<b>OutputDriverPDFLIB</b> outputs PDF using PDFLIB.
306
</div>
307
 
308
<div class="class">
309
<b>OutputDriverFPDF</b> outputs PDF using FPDF
310
</div>
311
 
312
<div class="class">
313
<b>OutputDriverFastPS</b> handles Postscript Level 3 output.
314
</div>
315
 
316
<div class="class">
317
<b>OutputDriverFastPSLevel2</b> handles Postscript Level 2 output.
318
</div>
319
 
320
<div class="interface">
321
<b>OutputFilter</b> interface describes the filter applied to generated PS or PDF file.
322
</div>
323
 
324
<img src="UML/Output_filters.PNG"/>
325
 
326
<div class="class">
327
<b>OutputFilterPS2PDF</b> runs the PS2PDF utitity on the generated file.
328
</div>
329
 
330
<div class="class">
331
<b>OutputFilterGZIP</b> compresses generated file using ZLIB.
332
</div>
333
 
334
<div class="interface">
335
<b>Destination</b> interface describes the &quot;channel&quot; object which determines where the final output file
336
should be placed.
337
</div>
338
 
339
<img src="UML/Destinations.PNG"/>
340
 
341
<div class="class">
342
<b>DestinationBrowser</b> outputs the generated file directly to the browser.
343
</div>
344
 
345
<div class="class">
346
<b>DestinationDownload</b> outputs the generated file directly to the browser.
347
Unlike <b>DestinationBrowser</b>, this class send headers preventing the file from being opened directly
348
in the browser window.
349
</div>
350
 
351
<div class="class">
352
<b>DestinationFile</b> saves generated file on the server side.
353
</div>
354
 
355
<h2>Implementing your own fetcher class</h2>
356
<p>
357
Sometimes you may need to convert HTML code taken from database or from other non-standard sources.
358
In this case you should implement <b>Fetcher</b> interface yourself, returning the string to be converted
359
from the <span class="method_name">get_data</span> method. Additional parameters (like database connection settings,
360
template variables, etc) may be specified either as globals (not recommended, though), passed as a parameters
361
to constructor of fetcher object or as $dataId parameter of <span class="method_name">get_data</span> method.
362
</p>
363
<p>
364
Keep in mind that if you're including files from your HTML code (e.g. stylesheets or images), you should either
365
return null from your fetcher for URL of these files, or handle them yourself. Unless you do it,
366
these files will not be available.
367
</p>
368
 
369
<pre>
370
class MyFetcherLocalFile extends Fetcher {
371
  var $_content;
372
 
373
  function MyFetcherLocalFile($file) {
374
    $this->_content = file_get_contents($file);
375
  }
376
 
377
  function get_data($dummy1) {
378
    return new FetchedDataURL($this->_content, array(), "");
379
  }
380
 
381
  function get_base_url() {
382
    return "";
383
  }
384
}
385
</pre>
386
 
387
Also see <tt>sample.simplest.from.file.php</tt> and <tt>sample.simples.from.memory.php</tt> files.
388
 
389
<h1>Class dependencies</h1>
390
The pipeline object contains the following:
391
<ul>
392
<li>one or more objects implementing <b>Fetcher</b> interface;</li>
393
<li>zero or more objects implementing <b>DataFilter</b> interface;</li>
394
<li>one object implementing <b>Parser</b> interface;</li>
395
<li>zero or more objects implementing <b>PreTreeFilter</b> interface;</li>
396
<li>one object implementing <b>LayoutEngine</b> interface;</li>
397
<li>zero or more objects implementing <b>PostTreeFilter</b> interface;</li>
398
<li>one object implementing <b>OutputDriver</b> interface;</li>
399
<li>one object implementing <b>Destination</b> interface;</li>
400
</ul>
401
 
402
No other dependencies between class in interfaces (except &quot;implements&quot;).
403
 
404
Note that order of filters is important; imagine you're using some king of tree filter which adds header block
405
containing HTML2PS-specific fields. In this case you must add this filter before PostTreeFilterHTML2PSFields, or
406
you'll get raw field codes in generated output.
407
 
408
</body>
409
</html>