| 1 |
lars |
1 |
<html>
|
|
|
2 |
<head>
|
|
|
3 |
<title>How do "fetchers" work?</title>
|
|
|
4 |
<link rel="stylesheet" type="text/css" medial="all" title="Default" href="css/help.css"/>
|
|
|
5 |
<style type="text/css">
|
|
|
6 |
div.note {
|
|
|
7 |
margin: 0.5em 0;
|
|
|
8 |
}
|
|
|
9 |
|
|
|
10 |
div.class {
|
|
|
11 |
margin: 0.5em 0 0.5em 2em;
|
|
|
12 |
}
|
|
|
13 |
|
|
|
14 |
div.interface {
|
|
|
15 |
margin: 1em 0 0.5em 0;
|
|
|
16 |
padding: 2px 5px;
|
|
|
17 |
background-color: #f0f0f0;
|
|
|
18 |
}
|
|
|
19 |
|
|
|
20 |
span.interface_name {
|
|
|
21 |
font-weight: bold;
|
|
|
22 |
}
|
|
|
23 |
|
|
|
24 |
span.method_name {
|
|
|
25 |
font-weight: bold;
|
|
|
26 |
}
|
|
|
27 |
</style>
|
|
|
28 |
</head>
|
|
|
29 |
<body>
|
|
|
30 |
|
|
|
31 |
<h1>How do "fetchers" work?</h1>
|
|
|
32 |
<p>
|
|
|
33 |
Basically, "fetcher" is a simple object responsible for delivering external files to the script.
|
|
|
34 |
Default fetcher object supplied with html2ps/pdf fetches HTML, images and CSS from remote sites using HTTP protocol.
|
|
|
35 |
If you're using your own fetcher, you need to implement 'get_data' function returning contents of requested file and,
|
|
|
36 |
probably, 'get_base_url', returning URL to be used as a base one while resolving relative URLs in recently fetched HTML file.
|
|
|
37 |
</p>
|
|
|
38 |
|
|
|
39 |
<p>
|
|
|
40 |
The image below illustrates simple html2ps session using default fetcher while converting html file from abstract test.com site.
|
|
|
41 |
</p>
|
|
|
42 |
|
|
|
43 |
<img src="uml/Simple_fetcher_session.PNG"/>
|
|
|
44 |
|
|
|
45 |
<p>
|
|
|
46 |
If you have pages stored on your local system or dynamically generated and kept in memory, you don't need to use HTTP protocol to fetch them.
|
|
|
47 |
In this case, you should use custom fetcher, so session will look similar to image below. Note that fetcher processes <em>all</em> requests,
|
|
|
48 |
returning valid content for all requests; this makes difference from the <em>very simple</em> fetcher supplied with html2ps, which <em>does always
|
|
|
49 |
return</em> memory string content whatever the request is. Internals of the fully-featured fetcher will depend on your system architecture greatly,
|
|
|
50 |
so most likely such fetcher will never be included to html2ps distribution.
|
|
|
51 |
</p>
|
|
|
52 |
|
|
|
53 |
<img src="uml/Custom_fetcher_session.PNG"/>
|
|
|
54 |
|
|
|
55 |
<p>
|
|
|
56 |
The image below illustrates why images and external stylesheets are not rendered when you're using <em>too simple</em> fetcher object.
|
|
|
57 |
</p>
|
|
|
58 |
|
|
|
59 |
<img src="uml/Simple_custom_fetcher_session.PNG"/>
|
|
|
60 |
|
|
|
61 |
<p>
|
|
|
62 |
Sometimes you need to fetch files from different places; for example, HTML code is generated locally, while images and CSS files should be fetched via
|
|
|
63 |
HTTP protocol. In this case you'll need to use several fetchers at once, as illustrated below. Note that in this case you need to implement 'get_base_url'
|
|
|
64 |
function returning correct URL so script will be able to resolve relative URLs contained in HTML code.
|
|
|
65 |
</p>
|
|
|
66 |
|
|
|
67 |
<img src="uml/Multiple_fetcher_session.PNG"/>
|
|
|
68 |
|
|
|
69 |
</body>
|
|
|
70 |
</html>
|