Download Files Larger Than The Download Limit

Linux, curl, proxy August 18th, 2008

Yesterday I had to download a file of size 232 MB. As I connect to the Internet via proxy and the authorities have restricted the file size download limit to 150 MB, I had to find some alternative. Similar scenarios are there in almost all Educational Institutions.

I was searching for the solution, somebody told me about JGet (a Multithreaded Java Software), but it hogs up a lot of system resources. I was in search of an alternate solution, then somebody on ##linux in irc.freenode.net told me about curl. Though I had heard of it earlier as well and I used it to automate web browsing like posting form data etc. But this time when I went through the man pages of curl, I realized how powerful is curl.

There are hundreds of options for CURL. I will just demonstrate you the one option which can help you download files of almost any size.

Lets start off…

In this example I am trying to download 32 Bit Fedora Live CD iso image. In this example, what we will do is download a part of file in one connection. After downloading all files we will concatenate them. This is what almost any Download manager software does.

To download a part of file theres an option -r/ --range available in curl which lets us specify the range of bytes of the file to be downloaded.

Range can be specified in following ways :
curl -r 0-499
This specifies to download first 500 bytes
curl -r 500-999
This specifies to download bytes starting from 500 to 999.
curl -r -500
specifies to download last 500 bytes.
curl -r 500-
specifies to download the bytes from offset 500 and forward

There are few more options. Check out the man pages. ( I have copied the simple options here).

Now here I have to download Fedora live CD iso which is 691 MB in size. So, I will download 100 MB in each connection, like this:
curl -# -r 0-99999999 -o fedora.iso.part1 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &

curl -# -r 100000000-199999999 -o fedora.iso.part2 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &

curl -# -r 200000000-299999999 -o fedora.iso.part3 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &

curl -# -r 300000000-399999999 -o fedora.iso.part4 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &

curl -# -r 400000000-499999999 -o fedora.iso.part5 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &

curl -# -r 500000000-599999999 -o fedora.iso.part6 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &

curl -# -r 600000000- -o fedora.iso.part7 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &

In the above code -# is to suppress the details of progress meter. You can ignore this option.

After all the files are downloaded, we need to concatenate them. This is quite simple.

$ cat fedora.iso.part? > fedora-9-live.iso
And your live CD iso image is ready as fedora-9-live.iso
To verify the file has been downloaded and concatenated correctly you may verify the checksums before burning it to a CD/DVD.

This process could be automated if somehow by a simple shell script if we could get the file size before starting the downloaded. To accomplish this, I once again went through the man pages. And I found a useful option -I/ --head . This option fetches the HTTP-header only. One can easily get the file size as Content-Length from the HTTP headers. But, when I tried this through proxy servers I got a X-Squid-Error: ERR_TOO_BIG , so couldn’t determine the correct file size. And hence I am unable to automate the process. Anybody, any help ?? (please post it as comment).

Happy Downloading !!