Capturing websites to an image

September 21, 2007 on 7:08 pm | In PHP, Windows | No Comments

This is a “thinking outside the box” way to automatically snap a website and save it to a JPG. It uses webswoon, a great little app, a little buggy here and there but it does the job.

You will need:
Webswoon
Apache + PHP ( Note: Apache MUST be running as a standalone, not as a service )
A Windows Machine to run it on ( oh stop crying ) - I use an old PIII with Windows 2000 installed.

Right, I’m gonna assume everything is installed. Are you asking me why Apache must be running standalone? Because webswoon is actually a wrapper for the IE ActiveX object ( or something like that ). For IE to run it need a desktop to run on with color depth and all those funky things. Services don’t have a desktop, so IE can’t run. There may be a way around it, frankly I haven’t the time to make it work.

Now configure webswoon:

openbrowserwindow=0
thumbnailwidth=150
thumbnailheight=110
browserwidth=850
browserheight=640
waitdelay=2
updatecaptures=0
updatedays=60
updatehours=0
updatemins=0
exportfolder=C:\Program Files\Apache Group\Apache2\htdocs\captures
capturefilenameformat=%z.%e
ignoreurl=
imageformat=JPG
timeoutdelay=13
deletecaptures=0
removeborder=1
loopmode=0
loopdelay=60
jpegcompression=80
automode=0
showbrowsererror=0
language=en
disablejavascript=1
resizecapture=1
cropmethod=1
keepmargins=1
blankmargin=20
canvawidth=1200
canvaheight=6000

“exportfolder” does not really matter as we override that later. You can play with the other settings, have at them.

Now, we create a batch file, somewhere that Apache can get hold of it:

cd "{path to webswoon}"
"{path to webswoon}\webswoon_console" -f %1 -o "{where ur captures are going}"

Fill out the curly brackets with your own paths, and save as capture.bat

Now the PHP file thusly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
error_reporting(0);
 
// Use ur frickin imagination here ...
$path = "C:/Program Files/WebSwoon";
 
$url = $HTTP_GET_VARS["URL"];
if(empty($url))return;
$filename = "/lists/".getmicrotime().".txt";
$file = fopen($path.$filename,"w+");
fputs($file,"http://".$url."\r\n");
fclose($file);
 
// Add the absolute path to your batch file
// I just coded it here a c:\
$exec = "c:\capture.bat $filename";
$WshShell = new COM("WScript.Shell");
$WshShell->Run("$exec", 0, true);
unlink($path.$filename);

This allows you to pass the URL ( without HTTP in this case ) as a parameter (”URL”) but you can do this many other ways, this is just the way I do it.

So, PHP comes along, creates a file, named with the current time in milliseconds and sets the contents to the URL we want to capture. This file normally has to be in the webswoon directory ( I put it in the “lists” subdirectory ). We then call our batch file to launch webswoon in command line mode, which will read the list of URLs and capture them to our output directory, with a filename of {MD5 filename}.jpg

Another note is that the PHP exec() function doesn’t work in this instance ( at least, it didn’t when I wrote the original code ), so we use a windows shell object ( essentially use Explorer to run the batch file )

$WshShell = new COM("WScript.Shell");
$WshShell->Run("$exec", 0, true);

Webswoon will sometimes get stuck under it’s own feet. It is set to timeout after 15 seconds, but some pages will screw that up. So I have a scheduled job that runs a windows version of the unix KILL command thusly:

kill -f webswoon*

It runs every so often just to tidy things up.

Want to see a working example of all this stuff?

http://convert.springheadmedia.com/webshot/index.php

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Google] [StumbleUpon]

Finding what svchost is looping

September 5, 2007 on 10:10 am | In Windows | 2 Comments

Every notice your windows machine going slow. Gone into task manager, sorted by CPU time and notices svchost going like a bat in chocolate ( hmmmmmm, chocolate bats … ). svchost.exe is a container for many other processes that run in the background of your windows machine. Some are nice, some are evil. Some are required, some not so much. But how can you tell exactly what that svchost is responsible for. You can try this script. Copy into notepad and save as “something.vbs”. The run it. It will pop up a dialog with a list of PIDs and what they are responsible for. If you don’t think you need it, kill it in taskmanager.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
set objIdDictionary = CreateObject("Scripting.Dictionary")
strComputer = "."
strReturn   = ""
Set objWMIService = GetObject("winmgmts:" _
 & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
Set colServices = objWMIService.ExecQuery _
 ("Select * from Win32_Service Where State <> 'Stopped'")
For Each objService in colServices
 If objIdDictionary.Exists(objService.ProcessID) Then
 Else
 objIdDictionary.Add objService.ProcessID, objService.ProcessID
 End If
Next
colProcessIDs = objIdDictionary.Items
For i = 0 to objIdDictionary.Count - 1
 Set colServices = objWMIService.ExecQuery _
 ("SELECT * FROM Win32_Service WHERE ProcessID = '" & _
 colProcessIDs(i) & "'")
 strReturn = strReturn & "Process ID: " & colProcessIDs(i) & Chr(13)
 For Each objService in colServices
 strReturn = strReturn & VbTab & objService.DisplayName & Chr(13)
 Next
Next
 
 Wscript.Echo strReturn
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Google] [StumbleUpon]

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^