Yesterday I had to download a file of size 232 MB. As I connect to the Internet via proxy and the authorities have restricted the file size download limit to 150 MB, I had to find some alternative. Similar scenarios are there in almost all Educational Institutions.
I was searching for the solution, somebody told me about JGet (a Multithreaded Java Software), but it hogs up a lot of system resources. I was in search of an alternate solution, then somebody on ##linux in irc.freenode.net told me about curl. Though I had heard of it earlier as well and I used it to automate web browsing like posting form data etc. But this time when I went through the man pages of curl, I realized how powerful is curl.
There are hundreds of options for CURL. I will just demonstrate you the one option which can help you download files of almost any size.
Lets start off…
In this example I am trying to download 32 Bit Fedora Live CD iso image. In this example, what we will do is download a part of file in one connection. After downloading all files we will concatenate them. This is what almost any Download manager software does.
To download a part of file theres an option -r/ --range available in curl which lets us specify the range of bytes of the file to be downloaded.
Range can be specified in following ways :
curl -r 0-499
This specifies to download first 500 bytes
curl -r 500-999
This specifies to download bytes starting from 500 to 999.
curl -r -500
specifies to download last 500 bytes.
curl -r 500-
specifies to download the bytes from offset 500 and forward
There are few more options. Check out the man pages. ( I have copied the simple options here).
Now here I have to download Fedora live CD iso which is 691 MB in size. So, I will download 100 MB in each connection, like this:
curl -# -r 0-99999999 -o fedora.iso.part1 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &
curl -# -r 100000000-199999999 -o fedora.iso.part2 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &
curl -# -r 200000000-299999999 -o fedora.iso.part3 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &
curl -# -r 300000000-399999999 -o fedora.iso.part4 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &
curl -# -r 400000000-499999999 -o fedora.iso.part5 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &
curl -# -r 500000000-599999999 -o fedora.iso.part6 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &
curl -# -r 600000000- -o fedora.iso.part7 http://download.fedoraproject.org/pub/fedora/linux/releases/9/Live/i686/Fedora-9-i686-Live.iso &
In the above code -# is to suppress the details of progress meter. You can ignore this option.
After all the files are downloaded, we need to concatenate them. This is quite simple.
$ cat fedora.iso.part? > fedora-9-live.iso
And your live CD iso image is ready as fedora-9-live.iso
To verify the file has been downloaded and concatenated correctly you may verify the checksums before burning it to a CD/DVD.
This process could be automated if somehow by a simple shell script if we could get the file size before starting the downloaded. To accomplish this, I once again went through the man pages. And I found a useful option -I/ --head . This option fetches the HTTP-header only. One can easily get the file size as Content-Length from the HTTP headers. But, when I tried this through proxy servers I got a X-Squid-Error: ERR_TOO_BIG , so couldn’t determine the correct file size. And hence I am unable to automate the process. Anybody, any help ?? (please post it as comment).
Happy Downloading !!
Pingback: Download Files Larger Than The Download Limit : blog edvdbox
Pingback: www.ubuntukungfu.org
Nice article there! But I had a query, somewhat unrelated, which is how do you connect to IRC chats from within IITG? I’ve tried it many times, using many hacks into the config files, none of which work. I’m using Ubuntu Linux, and the chat clients that I’ve used are X-Chat, Konversation, and pidgin too, but without any avail. Please help, if you can!
@abcdefg
use firefox addon chatzilla
Sexy solution for those using *nix! Keep it up, ye Geek Warrior!
Man, this is awesome! I loved this one and I’m intended to carry on with this. I have the same problem, a bad guy proxy configuration. I’ve tried to download a file beyond its limits, and curl does not write the file. If you try to read the file (sort of “file exists”), then you’ll be able to know when the download has finished.
I’ve made some tests, and there’s no problem with curl when you try to download a block larger than the specified range.
Hi!
I’ve made an automated script to download it. It is very simple, but is useful and I hope its reliable. I’m testing it downloading some Fedora 10 isos.
Here’s it:
http://pastie.org/283677
Oh, and yes, you have to define the name of the downloaded file. I don’t know a thing about string manipulation.
@Anderson
Thanks.
If you wish I can include your script in my blog.
Feel free to do it. But if you can make some kind of optimization or correction on the script, I will be grateful.
Hi!
Here’s a better version, improved, supports resuming:
http://pastie.org/284370
What about BitTorrent? (where available – distro images should be no problem)
need help!!!!!
when i used curl to download am jus getting some 1 kb file and i wanna download backtrack os of some 700 mb and its an ftp site will this curl work with ftp sites???
Why don’t you just use DownThemAll in Firefox? It would do the same thing for you albeit automatically.
Hi there!
I have the same problem whith proxy.. but the restricted size to download files is set to 10 MB !!!
And my problem is the Update Manager, I’m usign Ubuntu 9.04.
Thanks for your help.
Pingback: automate curl download by range | No
hmm, interesting, i have try to implement this method, it works for 2nd chunk till the end. but for first chunk it fail
for first 2 byte (byte 0-1), still thinking how to download first 2 byte.
arggghhh…….
@tito
Download the first 2byte separately.
we cannot download the first byte only as it is blocked by squid. how can we download them separately?
Hi Anderson, i have found your script and downloaded, but i am very beginners like in scripting so can you please tell me how to use your script in terminal window, like we are doing in curl for downloading a file i.e
curl -r 0-149999999 -o filename.avi.part1 http://www.abc.com &