In this quick tutorial, we will build a Bash command that downloads a site’s favicon using the HTTPie library.
1 - Download a page’s HTML Content
For this tutorial, I will use http://www.fipradio.fr/player as our test website – as it turns out, it is also my favorite radio.
If you have never looked at HTTPie, have a quick look at my introduction to the library.
Downloading the HTML is the easiest part of this tutorial, using HTTPie:
$ http http://www.fipradio.fr/player |
The full HTML will be printed out to the console.
2 - Extract the favicon’s URL
I am by no means an expert in bash and regexp but here is a solution that does the job. Inspired by this post and this one.
Grep the favicon’s tag
Here is the code:
$ grep -o 'href="http[^"]*ico"' |
-o
will retain only the matching part of the line instead of the whole line'href="http[^"]*ico"'
will match strings like href=”http:[..]ico”
Let’s pipe the HTML to the grep expression:
$ http http://www.fipradio.fr/player | grep -o 'href="http[^"]*ico"' |
href="http://www.fipradio.fr/sites/all/themes/custom/fip/favicon.ico" |
Sed the URL
Use the following Sed function to extract the URL of the favicon:
$ sed 's/href="\(.*\)"/\1/' |
s/
means we want to save only what we matchhref="\(.*\)"
is what we want to match/\1/
refers to the characters captured by the escaped parentheses.
After piping it up:
$ http http://www.fipradio.fr/player | grep -o 'href="http[^"]*ico"' | sed 's/href="\(.*\)"/\1/' |
http://www.fipradio.fr/sites/all/themes/custom/fip/favicon.ico |
Notes
- There will be quite a few cases where this script won’t work. For example for websites served over https or when there is querystring after the .ico to avoid caching issues.
- I wouldn’t have been able to come up with the regex out of the blue. I used quite a few google searches before finding a solution that I liked. This tutorial is not meant to be a detailed explanation about grep, sed or regex.
3 - Download the icons
To download a file with HTTPie, use:
$ http -d url |
We will use a while loop with read to iterate through the list of favicon URLs:
$ { while read url ; do http -d $url ; done ; } |
And here is the final command:
$ http http://www.fipradio.fr/player | grep -o 'href="http[^"]*ico"' | sed 's/href="\(.*\)"/\1/' | { while read url ; do http -d $url ; done ; } |
Press enter and you will see a progress bar for the favicon.ico download.