If your job entails daily multiple downloads the routine of doing this continuously can really be a bore and it will eventually wear you down. You should be thinking of a more efficient way to get by, and the Python Wget is the way to go.
What I would be doing in this article is to show you the different ways to download multiple files simultaneously by using the Wget function.
Without much ado, let us get into it!
Prerequisites For Using Wget
Since this tutorial would be a hands-on guide, what better place to start than the prerequisites.
Here they are:
- The VS Code (Virtual Studio Code): In this guide, I would be using a 64-bit, version 1.58.2.
- Python: For this guide, I am using Python version 3.9.6
- A windows computer: The steps would work on Windows 7 and 8.1, but I would be using Windows 10.
How to Download and Install Wget on Windows
First things first, Python wget is a command-line tool for downloading files from the internet. Aside from being included with Unix-based operating systems, it also has a Windows counterpart. The most recent Wget Windows version is 1.21.1 at the time of writing.
Here is the process for downloading and installing Wget on Windows.
2. After the download, search for the downloaded executable and add it to the following path: C:\Windows\System32. This process adds wget.exe to its correct PATH environment variable.
The PATH environment variable defines a collection of folders that will be searched for commands and executable applications. Adding it to this directory also means that you may execute the wget script from any working directory using the command prompt.
Finally, you would want to confirm what version of Python wget you have downloaded from the command prompt. Simply use the code – “wget –version“.
Wget has been successfully installed in your computer if you see an output similar to the one shown in the image below.
How to Download a File Directly from a URL
Let’s get started with basic Python wget commands now that you’ve installed Wget. You could wish to get a file from a certain URL. In such a scenario, all you need is the fundamental wget command structure and the URL from which you want to download the file.
The fundamental structure for running the wget command is shown below. You’ll see that you’ll enter several parameters following the wget command, followed by the website URL.
wget [options] url
How to Download a File Froma a URL to a Working Directory
With the Python wget function structure still clear in your mind, let’s see how to download the file to a working directory without using any additional parameters on the command.
Run the following script to download the wget.exe file to the working directory from the THIS URL.
The file has been successfully downloaded when you see an output similar to one in the image below on your command prompt.
How to Download a File From a URL to a Specific File Path
Now you have downloaded a file from a URL to a working directory, but what if your real intention is to get it to a specific file path? If that’s the case, use the command below to provide the download location.
What we need is the normal Python wget command with an additional
--directory-prefix parameter for the location we intend to save the downloaded file.
For this example, we will be saving the file on the desktop (C:\Users\afamo\Desktop)
wget ‐‐directory-prefix=C:\Users\afamo\Desktop https://eternallybored.org/misc/wget/1.21.1/64/wget.exe
Once we are done with the download, we could verify by opening file explorer, navigating to the location (C:\Users\afamo\Desktop), and confirming we have the file well downloaded.
How to Download a File From a URL and Rename it
It’s cool enough that you can download a file to your desired folder with a single command. However, it’s possible that you’d like to download a file and save it with a different name.
If that’s the case, the -o option is the solution! Using the -o option, you can change the name of the file you’re downloading.
To download the wget.exe file from a given URL, use the basic wget command syntax as shown below. Add the -o option this time to rename the file you’re downloading. So you’re renaming the file second_get.exe instead of wget.exe.
The retrieved file is labeled second_wget.exe, as you can see in File Explorer below.
How to Download a File’s Newer Version
Perhaps you’d want to upgrade to a new release of a file you’ve already downloaded. If that’s the case, use the —timestamp option in the Python wget query. Website applications are often revised, the —timestamp option looks for the most latest edition of the file at the provided URL.
In the Python wget query below, a check (—timestamp) for a latest version of the wget.exe file will be performed and it will be downloaded to the C:\Users\afamo\Desktop location.
wget ‐‐timestamp ‐‐directory-prefix=C:\Users\afamo\Desktop https://eternallybored.org/misc/wget/1.21.1/64/wget.exe
The outcome would be identical to the previous instances if the file (wget.exe) was altered from the edition you supplied. If not, have a look at the screenshot below. Take note of the Not Modified section, which indicates that the file you’re downloading hasn’t been updated.
How to Download a File’s When A Website Requires Username and Password
To view or download particular items and files, most websites need a user to be signed in. Python Wget provides the —user and —password arguments to make this feasible. Wget uses login details (username and password) to verify the connection demand when downloading from a website using these settings.
The fundamental structure of this query is shown below. It would work on websites that require a login.
wget –user=yourusername –ask-password=yourpassword https://downloads.mongodb.com/compass/mongodb-compass-1.28.1-win32-x64.zip
Input your credentials where you have “yourusername” and “yourpassword”.
How to Download a Web Page
Perhaps you’re trying to save a local copy of a web page rather than a file. In such a scenario, you’ll use a similar query to download a file, but with a few more parameters.
Use the Python wget command shown below if you need to download the domain.com home page and save it to the specified directory as domain.com.
wget -r http://domain.com/ -o log
Using this command would also make a log file in the working directory as opposed to displaying this on the console.
The image below depicts a copy of the log file that has been created.
It is also possible to include many options in the syntax. In the example below I have decided to write my options together (-drc) instead of separately (-d -r -c).
wget -d -r -c http://domains.com/ -o log
wget -drc http://domains.com/ -o log
The first is a standard way of declaring options and the second is the combined method.
How to Download a Complete Website Using Python Wget
You might wish to download a complete website instead of just a single web page to understand how it is constructed. To accomplish this, set up the wget query as follows:
- Replicate the website (www.technicalustad.com) using (–mirror), and take note that all of the files (-p), these would include images, scripts, images, etc., are added as part of the download.
- The next step is to add the -P option. Use this to set the location for the download (./local-dir).
- Finally, you would have to make sure that only the required website is downloaded. This is done by including the –convert-links option to the python wget script. This is step is important because a lot of websites would have links that point to resources for external websites. Downloading a website may entail also downloading its linked websites.
wget –mirror -p –convert-links -P ./local-dir //www.technicalustad.com/
At the end of the download, the output on your console should look a bit like the one displayed below.
The Python Wget function will download every single file that comprises the whole website. You will find this in the local-dir folder.
It is possible to tweak the script and include a wait time between the download of web pages. This interval is set using the –wait option. You can also add an option for the download speed using the –limit option.
wget –mirror -p –convert-links -P ./local-dir –wait=25 –limit-rate=60K //www.technicalustad.com/
How to Download Files Simultaneously from Different URLs
Manually downloading files each day, as you did in the previous cases, is obviously a time-consuming job. Wget allows you to get files from many URLs with a single command and only a single text file.
Does this sound like a decent bargain to you? Let’s get started!
Open any text editor and type in the Addresses of the files you want to download, one after the other, as shown below.
The next step is to run the script below to begin the download of these files listed in your txt file. Please note to replace txtname with the name of the file where you have saved these URLs.
wget -i txtname
How to Resume an Interrupted Download
You should know how to use the wget script to download files by now. However, it’s possible that your download was stopped midway. What would you do in this situation? Another useful feature of Python wget is the ability to restart a broken or stopped download.
Here’s an example of a download that was halted because your internet connection was lost.
When you regain internet access, the download process will immediately restart. However, how would you complete the download if the command prompt abruptly crashed or your PC rebooted? The —continue option will undoubtedly come in handy. In such cases, you would need the –continue option.
Using the Python wget function below will be your best bet to continue the broken download.
wget –continue https://yourarticletoday.com/snagit/snagit.exe
There is an alternative way for going about this. In this case, the aim may be to try a few times after a download has been interrupted.
In this situation, you would need to call the –tries option. In the example shown below, I have set it to try 12 times to complete the download in case of an interruption. You may want to practicalize this by interrupting your connection a number of times to see how it affects the –tries option.
wget –tries=12 https://www.google.com/images/moving/googleslogo/1x/googleslogo_color_292x92dp.png
How to Create a Python Script for File Download
So far, I have shown how to download files by executing queries, but did you know you can also construct a script to download things for you? Let’s get started developing Python code.
1. Make a folder called “downloader.”
2. Open Visual Studio, then select File. Navigate to Open, then Open Folder from the File menu to access the downloader folder you have just created.
3. In the project directory, select the “new file” button to make a new Python script file named app.py.
4. Select New Terminal from the Terminal menu to start a new command-line terminal.
How to Install and Activate Virtual Environment
Let’s get started establishing a virtual environment now that we have created a project folder and script file. A virtual environment (VE) is a separate workspace for Python projects in which all of the necessary packages are installed. This (VE) virtual environment will be activated in order for your software to run in the future.
To install the VE package and establish a virtual environment, run the following instructions in your Visual Studio terminal.
pip install virtualenv
On Windows, run this code to activate your VE: download\Scripts\activate
On Unix/Linux, run this code to activate your VE: source download/bin/activate
How to Install the Python wget Module
Now that you’ve set up your virtual environment, let us install the wget module. This module was created to give an API to Python’s developer community. This wget module makes it easier to use and implement the wget command in Python.
You must save the dependencies in a requirements.txt file when creating a Python project. This file will assist you in installing the same edition of the packages that were previously installed.
To install this Python Wget module and include it in the requirements.txt file, use the commands following.
pip install wget # Install the wget module
pip freeze > requirements.txt # Add wget to requirements.txt
The first installs the wget module, while the second will add wget ino the newly created requirements.txt
Now copy the following code into the app.py file you prepared in VS Code.
The code below serves to modifies the file download output so that you can see the progress of each file download using a custom progress bar.
Finally, use the following command to launch the app.py function.
Each file’s download status is shown in percentages, along with the file’s total and currently downloaded sizes in bytes.
You’ve learned how to use the Python wget command to download files throughout this lesson. You’ve also downloaded files using wget commands, as well as using the wget module in a Python code to download several files.
I am curious to know how you would use Python Wget to download files automatically in your future project? Is it possible to set up a download job that runs on a regular basis?