For those of you who didn't know, I've released PageScan, a web content scraper for the purpose of web-based malware analysis. It assist on static analysis by scraping and listing any redirection, iframe, javascript, and links found inside the web page. Below are some of the output from the PageScan;
CLI output |
HTML output |
Features
- Scrap HTML content, JavaScript code (inline or external JS), iframe, and links
- Follow iframe and redirection (meta and 301/302 redirection)
- TXT/HTML output
- User-defined Referer and User Agent
Future Development
- Scrap iframe/redirection address from JavaScript (in document.write() or conditions)
- Properly execute JavaScript code (for obfuscated redirection or content)
- Yara signature module for scraped contents
Feel free to dig into the source code. This tool license is WTFPL, so do what ever you want to do with with the code. You can get the latest version of PageScan at https://github.com/d3t0n4t0r/pagescan.