My old downloading tool
I read online the manga, the Journey of Shuna, by Miyazaki Hayao and wanted to keep a copy, so I turned to my old downloading tool for help. It was written in Common Lisp as I was learning it. It has an abstract interface such that if you give it functions that find the image on the current page, find the location of next page and know when the last page of the current manga is reached respectively, it will automatically download and save the files sequentially. In the past I used it to download one manga from one site and that site was no more. How time flies.
After some time of studying the html/js code of the site, I realised that the site had anti-scraping measures in place. My previous abstraction does not fit the task. Actually location of next page is trivial: increment a number in the URL. The html file of each page contains a total page count, so we know when to stop. The only obfuscation comes from hiding the image. If only my tool were a firefox extension! It could just snatch the image after firefox finishes parsing the DOM.
My tooling is just wrong for this task. Despite being a very powerful language that enables one to do whatever one wants in whatever way, Common Lisp is not easy to use. One has to either implement the function one needs or find other people's libraries to use. Every time one uses others' library, one finds another language to learn. A lot time is spent on reading documentation. I don't use Common Lisp often enough. I forget what I have learned.
When I finished coding, that meant the program was in ready state. It ran, downloaded the images and went into another long wait before I wanted another manga.
I dusted off this tool again to download another manga and failed!!!
Of course I didn't have error catching code in place. When your first shot at downloading a manga just succeeded and you just shelved the tool, you never had a chance to write error handling code. My console burst into horrible red characters.
Guess what went wrong.
This time the friggn pathname of the image contains non-ascii characters. That was totally unexpected. Sorry for being an ascii-centred guy who is essentially ignorant of the vast world of unicode. I then added a function to percentage-encode the pathname, before feeding it to the request library.