In this quick tutorial, we will build a Bash command that downloads a site’s favicon using the HTTPie library.

1 - Download a page’s HTML Content

For this tutorial, I will use http://www.fipradio.fr/player as our test website – as it turns out, it is also my favorite radio.

If you have never looked at HTTPie, have a quick look at my introduction to the library.

Downloading the HTML is the easiest part of this tutorial, using HTTPie:

$ http http://www.fipradio.fr/player

The full HTML will be printed out to the console.

2 - Extract the favicon’s URL

I am by no means an expert in bash and regexp but here is a solution that does the job. Inspired by this post and this one.

Grep the favicon’s tag

Here is the code:

$ grep -o 'href="http[^"]*ico"'

-o will retain only the matching part of the line instead of the whole line
'href="http[^"]*ico"' will match strings like href=”http:[..]ico”


Let’s pipe the HTML to the grep expression:

$ http http://www.fipradio.fr/player | grep -o 'href="http[^"]*ico"'
href="http://www.fipradio.fr/sites/all/themes/custom/fip/favicon.ico"

Sed the URL

Use the following Sed function to extract the URL of the favicon:

$ sed 's/href="\(.*\)"/\1/'

s/ means we want to save only what we match
href="\(.*\)" is what we want to match
/\1/ refers to the characters captured by the escaped parentheses.


After piping it up:

$ http http://www.fipradio.fr/player | grep -o 'href="http[^"]*ico"' | sed 's/href="\(.*\)"/\1/'
http://www.fipradio.fr/sites/all/themes/custom/fip/favicon.ico

Notes

  • There will be quite a few cases where this script won’t work. For example for websites served over https or when there is querystring after the .ico to avoid caching issues.
  • I wouldn’t have been able to come up with the regex out of the blue. I used quite a few google searches before finding a solution that I liked. This tutorial is not meant to be a detailed explanation about grep, sed or regex.

3 - Download the icons

To download a file with HTTPie, use:

$ http -d url

We will use a while loop with read to iterate through the list of favicon URLs:

$ { while read url ; do http -d $url ; done ; }

And here is the final command:

$ http http://www.fipradio.fr/player | grep -o 'href="http[^"]*ico"' | sed 's/href="\(.*\)"/\1/' | { while read url ; do http -d $url ; done ; }

Press enter and you will see a progress bar for the favicon.ico download.