(c) Scott M Baker, firstname.lastname@example.org
This documentation also serves for UltraSucker/Web, and you may find specific differences between QuadSucker and UltraSucker below.
QuadSucker/Web downloads files from websites to your computer. Unattended operation is supported, allowing entire websites to be downloaded automatically without user intervention. Here is a list of some possible uses:
This program is optimized for downloading from image-intensive sites and provides built-in jpeg and gif image viewing during the download process. Multithreading is used to download multiple files in parallel to maximize transfer efficiency. Specific benefits include:
Installation is very simple and mostly automated:
QuadSucker/Web will begin downloading files from the URL you provided. As each file is downloaded, it will be parsed and the hyperlinks extracted to form new URL's which will be downloaded, and this process will continue until the entire site has been downloaded.
When files are downloaded, they are (by default) placed in a directory structure that mirror's the webservers. For example, suppose you were to download files from http://www.yahoo.com/science/astronomy, and QuadSucker/Web was installed into c:\program files\QuadWeb. Your files would be placed in:
Note that the web includes hyperlinks which may reach outside of that directory. For example, yahoo places many of it's images in http://us.yimg.com.Therefore, some files would get thrown into c:\program files\QuadWeb\us.yimg.com
To view the files you have downloaded, use the viewer program of your choice. If you've downloaded a large collection of images, then try SB-Software's SBJV or Sortpics utilities.
Note: Most options above, such as the download directory settings and spidering behaviour may be modified by QuadSucker/Web's setup menu.
Some websites protect access with authentication which requires you to enter a name and password to access the site. Normally your browser would pop up a window asking for this information when authentication is required, but QuadSucker/Web is intended for automated operation, so it would be best if you entered your authentication information ahead of time.
Press the <add> button to add a new authentication setting. You will be prompted for the following:
- Host Name/Url Substring. This is a text string which uniquely identifies the URL's to which the authentication will apply. For example, if the site you want to access is "www.mysite.com", then you could simply enter the hostname "www.mysite.com" into the box. Some sites may have multiple authentications for different sections. For example, you could have a separate password for "www.mysite.com/section1/" and "www.mysite.com/section2/". In this cases, just enter the longer url strings to differentiate between them. A good tip is to keep the Host Name/Url Substring as short as possible to uniquely identify the authenticated pages. If a simple hostname will work (i.e. "www.mysite.com"), then just use that. There's no reason to get fancy.
- Authentication 'user name'. This is the authentication user name, as assigned to you by the site administrator.
- Authentication 'password'. This is the password, as assigned to you by the site administrator.
Note that some Adult sites, such as AdultCheck do not normally use authentication. On these sites you are actually sent to an AdultCheck form which asks you for information and then directs you back to a "special page" on the originating site. All you need to do is follow this link in your browser, and note which "special page" is returned -- then enter the special page's url into QuadSucker/Web.
Directory Structure Creation:
If this option is enabled, then a directory structure will be created which mirrors the host's directory structure. (See the notes under Quick Start). If this option is disabled then all of the files will be placed in the same directory.
Most users will probably want directory structure creation enabled. The drawback is that sometimes files may have to be hunted down the the directory structure because web servers sometimes place inline images in funny places. Users who are just downloading bulk files (i.e. from a picture site) may wish to disable directory creation so all of the images are lumped into the same spot.
The download path is where files will be stored on your computer. You should use a hard drive where there is plenty of disk space! The default is the program directory where QuadSucker/Web is installed.
Delete "short" files
When a web server sends a file, it (usually) includes a count of how many bytes are supposed to arrive in the file. Sometimes a file is completed with fewer bytes than there are supposed to be. In this case, the server has probably aborted the transfer and just not told us about it. The file is probably worthless and should be deleted.
Delete files that time out
A file times out if no bytes are received in a certain interval. The files is most probably worthless and should be deleted.
Delete files that are aborted
Files are aborted when the user presses the <Stop> button, right-clicks and aborts a specific file, or closes the program. The files are probably ruined and should be deleted.
Spidering is the process of following links in a website to download all of the files. Options in this section control what kind of links are followed and how many of them are followed:
- Off-site images. If the current page includes an image which is located on another server, that image will be downloaded.
Example: If you are downloading from the website http://www.foo.bar/ and it contains the inline image http://www.anotherserver.com/image1.jpg, then the image will only be downloaded if Off-site images is enabled.
- Off-site pages. If the current page includes a hyperlink to a page which is located on another server, that page will be downloaded.
Example: If you are downloading from the website http://www.foo.bar/ and it has a link to the page http://www.anotherserver.com/page1.html, then the page will only be downloaded if Off-site pages is enabled.
- Process one website at a time. If enabled, then QuadSucker will attempt to download all of the pages from a given site together. If disabled, then QuadSucker will download pages in sequential order regardless of which server the page is located on. This option only makes sense if Off-site images or Off-site pages are enabled.
Example: Assume you are downloading from the website http://www.foo.bar/ and it has a link to the page http://www.server1.com/page1.html. If the process one website at a time option is enabled, then QuadSucker will continue to download all pages in http://www.foo.bar/ before downloading any of the pages in http://www.server1.com/page1.html.
Note: Regardless of the setting of this option, pages which match the host name of original URL that you typed into QuadSucker/Web will always get preferential treatment over pages which do not. This special rule is applied because most users want to ensure they receive all pages in the original site.
Caution: enabling this option may have unintended effects. For example, many websites have a links to netscape or microsoft for browser download instructions. If you enable this option, then QuadSucker could download the entire netscape and/or microsoft sites before downloading what you had intended.
- Forward links only. If checked, then only links in directories deeper than the current directory will be followed.
- Treat linked images as inlines. Many picture oriented sites include links to JPEG/GIF images; When you click on the link you get a full-screen image. However, this image is still technically a hyperlink, and SBWcc would normally consider it like a normal hyperlink. If this option is checked, then JPEG/GIF images included as hyperlinks get a special status and are treated just like inline images. (Good for picture sites)
- Follow Off-Site Redirections. Redirections are created when webpages are moved from one location to another. For example, you request http://host1/foo.html, and the server responds with a redirection to http://new_host/foo.html.This is normally handled transparently by your browser. If you leave the off-site redirections option unchecked, then QuadSucker/Web will not follow such redirections.
Spider Link Depth Limit:
The link depth limit allows you to set how 'deep' the links will be followed. For example, if the maximum link depth was set to '3', and QuadSucker/Web followed the file a->b->c, and c contained another hyperlink, then the new hyperlink would not be added since the depth is already three levels deep.
The duplicate checker maintains a database of recently downloaded items to prevent downloading of those items in the future. The database is maintained independently of the files themselves, so you may delete the files from your hard drive and still "remember" them in the duplicate list.
Duplicate Checker Mode:
The duplicate checker may be disabled in which no duplicate checking is performed (however, URL's will still be remembered), or the duplicate checking may be set to either 'all' or 'not-modified'. The 'all' setting will always reject a duplicate if it is present in the database. The 'not-modified' setting will ask the web-server if the document has been modified. Asking the web server does incur significant overhead, but if you are downloading large files, the savings will be substantial.
Duplicate Checker Entries/Count:
This is a tally of how many URL's are stored in the duplicate checker. You may also selectively delete some of the entries if you wish to get rid of them.
The filter options control which types of files are downloaded to your computer. Html (Hypertext Markup Language) files are always downloaded since they are reuqired for spidering to function. Other file types, such as gifs, jpegs, and sound files may be individually turned on and off.
You may also limit by the byte size of the object, rejecting files that are either too large or too small.
The advanced options are intended for people who feel the need to tamper with things that are best left alone. I suggest you not modify any of the advanced options unless you specifically know what you are doing!
(Please forgive me while I get a bit technical here; If you don't understand the following, then don't worry about it)
Web pages contain several different types of links, such as relative links and absolute links. A relative link specifies a location relative to the page which you are currently viewing. For example, if you were viewing http://myhost.com/mypage.html, and it contained a link called "page2.html", this would be a relative link, and would resolve to http://myhost.com/page2.html.
However, an absolute link specifies an entirely new location. For example, if you came across the link http://anotherhost.com/page2.html, then this would be an absolute link and have no relation to the source document.
Now, assume that QuadSucker/Web has downloaded a web page to your computer, and you open that page, located on your local disk, with netscape. Any relative links will be resolved properly by netscape and point to copies of pages which QuadSucker/Web has downloaded. However, absolute links always point to an entirely new location. If you follow an absolute link with your browser, you would find yourself back out on the Internet, downloading the linked page from scratch.
(end of the technical discussion, now let's talk about the impact)
The link relativizer translates absolute links into relative links. This is essential for viewing pages with your web browser offline. What this means:
- If the main goal of your use of QuadSucker/Web is to download websites and browse them offline, then you will need to enable the link relativizer.
- If the main goal of your use of QuadSucker/Web is not to browse websites offline, but rather to download images, archive files, sounds, movies, or other data and view it in applications other than a web browser, then you do not need to enable the link relativizer.
- If your goal is to download websites and preserve the content exactly the same as it is online, for example you wish to edit the site and re-upload it, then do not enable the link relativizer.
All that being said, it should also be noted that the link relativizer is EXPERIMENTAL. I'm still working out the bugs, so some pages might not display properly. I suggest you try to get what you want done without using the link relativizer, and only enable it if you determine there is a problem that absolutely requires the relativizer to work.
Proxy Server Configuration:
A proxy server is a computer which services web requests on behalf of it's network clients. Proxy servers are common in large corporations and even in small companies where once computer is etup with an Internet connection and other computers on the LAN use it. If you don't have a proxy server, then you won't need to use this option.
File: Open Bookmark File:
Many people have asked me for a way to automatically download all pages stored in their Netscape bookmark file. This option has been added specifically for that reason. In case you don't use netscape, or don't know what a netscape bookmark file is, it is a html file stored on your hard drive containing links to your favorite (or otherwise notable) pages. In fact, any html file stored on your computer may be used as the bookmark file.
If you want to download each page in your bookmark file + it's images, but not spider the sites:
- Go into setup, find the <Spider> tab, and set the spider link depth to 1. This will cause QuadSucker/Web to only download the pages, and not spider the websites.
- Use File:Open Bookmark File pull-down menu option to set the URL to your bookmark file.
- Press Start.
If you want to spider each site in your bookmark file:
- Go into setup, find the <Spider> tab, and disable the spider link depth, or set it to the number of levels deep that you want to spider each site plus one.
- Use File:Open Bookmark File pull-down menu option to the set URL to your bookmark file.
- Press Start.
You don't need to use netscape to create bookmark files -- you can create them with your favorite html editing tool. You can even create them by hand. For example, a bookmark file for the quadsucker, sortpics, and sb-software websites might look like the following:
File: URL Sequence Wizard:
'The Sequenced URL wizard lets you download a series of URLs (pages) that have a numerical order to them. For example, if you know a website numbers it's pages, then you might want to download the following:
The sequence wizard requires you to seperate the URL into three parts:
In our example, the beginning part is "http://website.com/page", the middle part is a number from 1 to 500, and the end part is ".html".
The sequence wizard also lets you choose padding. Padding will add zeroes until a number is a certain length. For example, if you set padding to 5, then the following numbers would be used:
Setup: Priority Keywords:
QuadSucker/Web maintains a list, also known as a queue, or webpages that are waiting to be downloaded. Normally, this queue is services in first in/first out order. The first page to be retrieved is the first page that was entered into the waiting queue, and new pages are inserted at the tail of the queue. It works like a line at a grocery store -- the customer at the front of the line is served first and new customers are added at the back of the line (to do otherwise would be chaos!).
However, sometimes you might want some URL's to have a greater priority than others. For example, you are downloading from a fine art website, and Rafael (a good choice) is your favorite painter. The fine art website is very large and you're anxious to see the Rafael pictures as soon as possible. You'd rather they make their way to the front of the waiting queue, ahead of other less favorable painters.
This is where the priority keywords option is used. If a URL is placed encountered which contains a priority keyword, then it automatically gets placed at the front of the waiting queue. Continuing our example above, if you entered "rafael" into the priority keywords dialog, then any URL which contained the substring"rafael" would get pushed to the front of the list. It's as simple as that.
Setup: Kill Strings:
Kill strings may be used to prevent certain files from being downloaded. For example, if you entered "yahoo" as a kill strings, any URL that contained the phrase "yahoo" would not be downloaded. These would include:
Pretty simple, eh? Just enter the things you don't like and they won't be downloaded.
A technical note: Requests are killed as they are retrieved from the queues. This means you can use the queue viewer to see what's coming up and add kill strings for it. The items won't be killed immediately -- they will remain in the queue until QuadSucker gets to them and then will be killed. (If this doesn't make sense to you then just ignore it)
View: Log Window
The log window presents a list of errors and undesirable events that have occurred since you pressed the start button.
The status window presents a summary of how many files have been downloaded, how many bytes have been downloaded, and a count for various types of errors which QuadSucker/Web has had to deal with while processing your requests.
View: Image Indexer
The Image Indexer is designed to allow you to browse image files downloaded via QuadSucker/web with your web browser. The Image Indexer produces index files containing links and thumbnails for each image. When you execute the image indexer, it will create two index files in each directory in your download path. THese files will be used by your brwoser when you wish to view the images. Everything is automatic, and should be fairly self explanatory.
Many sites use forms to log people in and out of the site. For example, you may come across a page where you have to enter your name, account id, and/or password in order to access a site.
There is a distinction between form-based access and authentication. Forms typically request their credentials in the main window of your browser. Many forms are very elaborate with graphics and logos for the website, textual descriptions of the website, and/or a disclaimer. Authentication, however, presents itself as a small pop-up window which will appear over the top of your browser window. This section describes forms, if you're looking for information on authentication, check a pages above.
The problem with Forms is that they are difficult to automate -- each form is different. QuadSucker cannot determine how a form needs to be filled out by itself, since forms are website-dependent. However, there is an interactive feature that allows you to log into a website using a browser built into QuadSucker/Web.
Here are some steps to get you started:
QuadWeb supports a number of command line options that can be used to automate program options. You may, for example, cause QuadWeb to launch, automatically download a specific URL, and then automatically exit. Here are the following command line options:
|-autoexit||automatically exit the program when downloading is complete (registered users only)|
|-autominimize||automatically minimize the program window (registered users only)|
|-autostart||automatically connect and start downloading|
|-singlepage||set single page mode (download page plus images)|
|-singlefile||set single file mode|
|-entiresite||set entire site mode (all pages and all files on one site)|
|-multisite||set multiple site mode (spider multiple web sites)|
In addition, you can specify a URL on the command line. Some Examples:
UltraSucker is a special stripped-down high performance version of QuadSucker. It is extremely similar in function and configuration with the following differences:
QuadSucker should suffice for most people. The thumbnails are nice, and four threads achieve very good utilization of the network connection. However, if you have a good high-speed connection to the Internet (ISDN, cable modem, DirectPC, T1, DSL, etc), and you are dealing with a slow end-server, then UltraSucker/Web may gain you some performance at the expense of losing a few bells and whistles.
Your internet connection has a fixed capacity. If you're using a 56kbps modem, then you can only download 56kbps. It doesn't matter if your pulling 4 web pages at a time, 12 web pages at a time, or a million pages at once. Only 56kbps is going to come down the line. QuadSucker/Web will work just fine for you.
However, let's say that you're using a faster connection, such as a DSL at 512kbps. Assume the web server you're connecting to is very slow, giving you huge latencies and dishing out documents at about 50kbps. That means 462kbps of your DSL connection is under-utilized. If you could pull twelve web pages at a time, you could fill your DSL line to the maximum. That's where ultrasucker fits in. There are other factors to consider, which make the argument more complex, but the basic truth is the same.
|You have a 56kbps modem connection to the internet||Use QuadSucker/Web|
|You have a fast connection (128kpbs or more) to the internet||Try UltraSucker/Web and see if performance increases. If not, go back to QuadSucker.|
|The web server you're talking to is really really really slow||Use UltraSucker/Web|
|You like thumbnail images to be displayed while you're downloading so you can see what is going on.||Use QuadSucker/Web.|
This program is distributed as "Shareware". This means you are allowed to try it for a limited amount of time to determine if it suits your needs, and then you must pay for it in order to continue using it. I don't set a specific evaluation period -- you may take as long as you feel necessary, but I think 30 days is a good guideline, and if you're using the unregistered version after 30 days, then you ought to feel very very guilty. :(
However, you are allowed to fully evaluate the program during the evaluation period. Make sure it works to your satisfaction before you register -- that's what the evaluation period is for.
QuadSucker registration is included in my SB-Software registration package. By paying the modest $26.95 registration fee, you are registered to not only QuadSucker/Web, but all of my 'SB' series shareware programs as well. Likewise, if you've already registered for one of my other programs (such as SBNews or SBWcc), your registration is automatically good for QuadSucker/Web.
I feel this is the farest and simplest policy -- other authors will make you register for each and every program seperately, even when there is considerable overlap between the products. With SB-Software, you get it all!
Registration is good for life, and includes all past, present, and future versions.
For full details of the registration policy, and a list of programs included in the registration fee, see http://www.sb-software.com/credit/register.html.
You may register online via credit card at http://www.sb-software.com/credit/
You may reach me at the QuadSucker website, http://www.quadsucker.com/, or the SB-Software website, http://www.sb-software.com/.
I can be reached via email at email@example.com or firstname.lastname@example.org.
My US mailing address is available on the website somewhere.