For this tutorial, I will use http://www.fipradio.fr/player as our test website – as it turns out, it is also my favorite radio.
If you have never looked at HTTPie, have a quick look at my introduction to the library.
Downloading the HTML is the easiest part of this tutorial, using HTTPie:
The full HTML will be printed out to the console.
Here is the code:
-o will retain only the matching part of the line instead of the whole line
'href="http[^"]*ico"' will match strings like href=”http:[..]ico”
Let’s pipe the HTML to the grep expression:
Use the following Sed function to extract the URL of the favicon:
s/ means we want to save only what we match
href="\(.*\)" is what we want to match
/\1/ refers to the characters captured by the escaped parentheses.
After piping it up:
- There will be quite a few cases where this script won’t work. For example for websites served over https or when there is querystring after the .ico to avoid caching issues.
- I wouldn’t have been able to come up with the regex out of the blue. I used quite a few google searches before finding a solution that I liked. This tutorial is not meant to be a detailed explanation about grep, sed or regex.
To download a file with HTTPie, use:
We will use a while loop with read to iterate through the list of favicon URLs:
And here is the final command:
Press enter and you will see a progress bar for the favicon.ico download.