Scaricare interamente un sito con wget
Updated at: 01/10/2014
Per scaricare interamente un sito, mettiti all’interno di una cartella vuota e lancia il seguente comando:
wget --no-clobber --convert-links -r -p -E -e robots=off --domain=dominio_sito -U Mozilla http://url
eventualmente è possibile specificare anche queste due opzioni, nel caso in cui il webserver abbia un qualche meccanismo di protezione per bot e simili
--limit-rate=200k
--random-wait
Qui sotto la spiegazione dei vari parametri:
-limit-rate=200k: Limit download to 200 Kb /sec
--no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed).
--convert-links: convert links so that they work locally, off-line, instead of pointing to a website online
--random-wait: Random waits between download - websites dont like their websites downloaded
-r: Recursive - downloads full website
-p: downloads everything even pictures (same as --page-requsites, downloads the images, css stuff and so on)
-E: gets the right extension of the file, without most html and other files have no extension
-e robots=off: act like we are not a robot - not like a crawler - websites dont like robots/crawlers unless they are google/or other famous search engine
-U mozilla: pretends to be just like a browser Mozilla is looking at a page instead of a crawler like wget
--domain=dominio: don’t follow link outside this domain
--no-parent: is a very handy option that guarantees wget will not download anything from the folders beneath the folder you want to acquire. Use this to make sure wget does not fetch more than it needs to if just just want to download the files in a folder.
Riferimenti:
http://www.kossboss.com/linux---wget-full-website
http://linuxreviews.org/quicktips/wget/
http://www.linuxjournal.com/content/downloading-entire-web-site-wget