Python: Ripcage - multithreaded download of web resources

ripcage is a python script, which is clean and easy to adapt. It serves as an demonstration of using the multithreaded workerpool module to download multiple resources in parallel - in principle through any protocol and authentication supported by your python libraries/modules such as TCP/IP, UDP, OAuth, TSL, SFTP, FTP,.. The initial version also entered meta information of each download into a MySql database, which is stripped for conciseness.  You can invoke any command line tool / script to post-process each resource once fetched.

Given that Python is fast and efficient to use when it comes to protocol usage and implementation, there is no limit of what you may fetch. Fork and adapt it from here.


LihatTutupKomentar