Introduction
HTTrack is a unique piece of software to extract static pages from the web. In this guide, I am going to walk you through advanced configurations on Ubuntu 20.04 LTS. I will show you how to use various settings of HTTrack to extract any particular page for development purposes. HTTrack has enormous benefits for web developers to maintain a clean echo-system of their web applications. It helps them to mitigate any front-end problems. I am using the Ubuntu 20.04 LTS version for this guide.
Installing HTTrack
If you haven’t installed HTTrack, then open the command-line interface to apply the following commands.
$ sudo apt install httrack webhttrack
HTTrack is only available as a web app for Linux operating systems. It can be used as standalone software on Mac and Windows, but it is not the case for us.
Running HTTrack
Once installed you will run it via the command line as it is the only option you have.
When you run HTTrack then it will look something like this:
Now is the time to work with the advanced configurations of HTTrack.
Configure HTTrack on Ubuntu
STEP 1. Select a language
HTTrack prompts you to select a language first. If English is the default language then you do not need to worry about it. Otherwise, select an appropriate language and move ahead.
Step 2. Enter project details
Now I am going to add project details. The data comes from LinuxWays.Net as shown below.
Step 3. Select Action and Add URLs
Now I am going to select an action out of the given list and add URLs as shown above. It depends on what I want to achieve. Here is how each of the actions is different than one another.
Download web site(s) This option will copy a full website and will help you to browse it locally.
Download web site(s) + questions This action will do the same as the previous one, but it will also download any URL which works with a query string.
Get individual files This will download all files separately. It means .css, .html, and the rest of the available files on the server.
Download all sites in pages (multiple mirrors) This downloads all the sites available on a single server at once.
Test links in pages (bookmark test) Depending on what we want to test on our website, this action will help us to test links on a particular page.
The remaining two configurations are supposed to continue an interrupted action.
Step 4. My Test Case
In my test case, I am going to select Get Individual Files. Here is how it looks now.
I will input a URL here which is http://linuxways.net.
Step 5. Enter URL
Now I will help you with URL and credentials. Add required details as shown below.
Step 6. Add Settings
Click OK to add settings and set any options as required as shown below.
Step 7. Last Step – Get Ready to Mirror
In this step, I am ready to mirror my selected website. However, for the test case, I will save the settings and exit.
Conclusion
In this article, I walked you through every aspect of HTTrack settings. Now you are ready to mirror any website using HTTrack on Ubuntu 20.04 Linux distribution. In case of any issue, do not hesitate to reach us.