Have you Google Hacked Your Site Yet?
Sometimes organizations store files on their web sites that they believe are not accessible to the general public. As part of ongoing vulnerability scanning and pen testing it is also good to add Google Hacking to your tool box. Google Hacking involves using the Google search engine to identify vulnerabilities in websites. A multitude of operators and modifiers are available that allow you to search your sites with more capability than most people know exists.
Google hacking search queries are used to identify vulnerabilities in web sites and applications and can be quite powerful in detecting files containing PII, PHI, credentials and other sensitive data. Google hacking is sometimes referred to as Google dorking, and can also be used by a hacker so it is prudent to be familiar with the tool so you can find the sensitive data before the bad guys.
While this blog is not meant to be a tutorial on Google Hacking [go to Wikipedia for a good tutorial with links to other sites if you want to go in depth], here are some examples of simple queries that can help detect sensitive information. These queries are typed directly into the Google Search Engine as if you are doing a simple Google Search that we all do every day.
site:xxx.com ssn
This query or “tag” as often described will search the entire requested site for any examples where social security numbers are used. This will often turn up blank tax forms etc. but every once in a while BSC has discovered a completed tax form or W4 that has no business being on the site. To be thorough you could do it again by spelling out SSN. Of course, ssn can be switched to dob or any other string that might point to sensitive data.
site:xxx.com filetype:xls
This query will search the entire identified site for any Excel files which are often a good source of sensitive information. This can be replaced with pdf, doc, pps or other file extensions you may want to review to ensure the data is safe.
info:URL
This query will give you the information that Google has on the given URL including cache info and source code.
Those are just three simple examples of how to “Google Hack” your site. This easy-to-learn syntax is quite powerful and as you continue to learn you will have more control over what is accessible to others on your web site.