Ubuntu

Understanding HTTrack Advanced Configurations on Ubuntu

htttrack

Introduction

HTTrack is a unique piece of software to extract static pages from the web. In this guide, I am going to walk you through advanced configurations on Ubuntu 20.04 LTS. I will show you how to use various settings of HTTrack to extract any particular page for development purposes. HTTrack has enormous benefits for web developers to maintain a clean echo-system of their web applications. It helps them to mitigate any front-end problems. I am using the Ubuntu 20.04 LTS version for this guide.

Installing HTTrack

If you haven’t installed HTTrack, then open the command-line interface to apply the following commands.

$ sudo apt install httrack webhttrack

HTTrack is only available as a web app for Linux operating systems. It can be used as standalone software on Mac and Windows, but it is not the case for us.

Running HTTrack

Once installed you will run it via the command line as it is the only option you have.

When you run HTTrack then it will look something like this:

Now is the time to work with the advanced configurations of HTTrack.

Configure HTTrack on Ubuntu

STEP 1. Select a language

HTTrack prompts you to select a language first. If English is the default language then you do not need to worry about it. Otherwise, select an appropriate language and move ahead.

Step 2. Enter project details

Now I am going to add project details. The data comes from LinuxWays.Net as shown below.

Step 3. Select Action and Add URLs

Now I am going to select an action out of the given list and add URLs as shown above. It depends on what I want to achieve. Here is how each of the actions is different than one another.

Download web site(s) This option will copy a full website and will help you to browse it locally.

Download web site(s) + questions This action will do the same as the previous one, but it will also download any URL which works with a query string.

Get individual files This will download all files separately. It means .css, .html, and the rest of the available files on the server.

Download all sites in pages (multiple mirrors) This downloads all the sites available on a single server at once.

Test links in pages (bookmark test) Depending on what we want to test on our website, this action will help us to test links on a particular page.

The remaining two configurations are supposed to continue an interrupted action.

Step 4. My Test Case

In my test case, I am going to select Get Individual Files. Here is how it looks now.

I will input a URL here which is http://linuxways.net.

Step 5. Enter URL

Now I will help you with URL and credentials. Add required details as shown below.

Step 6. Add Settings

Click OK to add settings and set any options as required as shown below.

Step 7. Last Step – Get Ready to Mirror

In this step, I am ready to mirror my selected website. However, for the test case, I will save the settings and exit.

Conclusion

In this article, I walked you through every aspect of HTTrack settings. Now you are ready to mirror any website using HTTrack on Ubuntu 20.04 Linux distribution. In case of any issue, do not hesitate to reach us.

Similar Posts