Capturing websites to an image
This is a “thinking outside the box” way to automatically snap a website and save it to a JPG. It uses webswoon, a great little app, a little buggy here and there but it does the job.
You will need:
Webswoon
Apache + PHP ( Note: Apache MUST be running as a standalone, not as a service )
A Windows Machine to run it on ( oh stop crying ) - I use an old PIII with Windows 2000 installed.
Right, I’m gonna assume everything is installed. Are you asking me why Apache must be running standalone? Because webswoon is actually a wrapper for the IE ActiveX object ( or something like that ). For IE to run it need a desktop to run on with color depth and all those funky things. Services don’t have a desktop, so IE can’t run. There may be a way around it, frankly I haven’t the time to make it work.
Now configure webswoon:
openbrowserwindow=0
thumbnailwidth=150
thumbnailheight=110
browserwidth=850
browserheight=640
waitdelay=2
updatecaptures=0
updatedays=60
updatehours=0
updatemins=0
exportfolder=C:\Program Files\Apache Group\Apache2\htdocs\captures
capturefilenameformat=%z.%e
ignoreurl=
imageformat=JPG
timeoutdelay=13
deletecaptures=0
removeborder=1
loopmode=0
loopdelay=60
jpegcompression=80
automode=0
showbrowsererror=0
language=en
disablejavascript=1
resizecapture=1
cropmethod=1
keepmargins=1
blankmargin=20
canvawidth=1200
canvaheight=6000
“exportfolder” does not really matter as we override that later. You can play with the other settings, have at them.
Now, we create a batch file, somewhere that Apache can get hold of it:
cd "{path to webswoon}"
"{path to webswoon}\webswoon_console" -f %1 -o "{where ur captures are going}"
Fill out the curly brackets with your own paths, and save as capture.bat
Now the PHP file thusly:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | error_reporting(0); // Use ur frickin imagination here ... $path = "C:/Program Files/WebSwoon"; $url = $HTTP_GET_VARS["URL"]; if(empty($url))return; $filename = "/lists/".getmicrotime().".txt"; $file = fopen($path.$filename,"w+"); fputs($file,"http://".$url."\r\n"); fclose($file); // Add the absolute path to your batch file // I just coded it here a c:\ $exec = "c:\capture.bat $filename"; $WshShell = new COM("WScript.Shell"); $WshShell->Run("$exec", 0, true); unlink($path.$filename); |
This allows you to pass the URL ( without HTTP in this case ) as a parameter (”URL”) but you can do this many other ways, this is just the way I do it.
So, PHP comes along, creates a file, named with the current time in milliseconds and sets the contents to the URL we want to capture. This file normally has to be in the webswoon directory ( I put it in the “lists” subdirectory ). We then call our batch file to launch webswoon in command line mode, which will read the list of URLs and capture them to our output directory, with a filename of {MD5 filename}.jpg
Another note is that the PHP exec() function doesn’t work in this instance ( at least, it didn’t when I wrote the original code ), so we use a windows shell object ( essentially use Explorer to run the batch file )
$WshShell = new COM("WScript.Shell");
$WshShell->Run("$exec", 0, true);
Webswoon will sometimes get stuck under it’s own feet. It is set to timeout after 15 seconds, but some pages will screw that up. So I have a scheduled job that runs a windows version of the unix KILL command thusly:
kill -f webswoon*
It runs every so often just to tidy things up.
Want to see a working example of all this stuff?
http://convert.springheadmedia.com/webshot/index.php