We are performing measurement studies to better educate researchers about use of modern public cloud infrastructure-as-a-service (IaaS) systems such as Amazon EC2 and Microsoft Azure. We are particularly interested in quantifying how much cloud users take advantage of the elasticity of these services, meaning that it is easy to start and stop virtual machine instances and/or scale up deployments to match demand. To do such measurements, we are performing lightweight probing of IP addresses known to be associated with EC2 and Azure on a day-by-day basis.
For each IP address known to be associated with EC2 and Azure (excepting those that we omit upon request), WhoWas sends a TCP SYN probe to first port 80 (HTTP default) and then to 443 (HTTPS default). If both probes fail to respond, then a probe is sent to port 22 (SSH). For any IP address X that responds as being open on port 80 or port 443, WhoWas first generates a URL either "http://X/robots.txt" or "https://X/robots.txt" as appropriate, and submits a GET request to the resultant URL. WhoWas examines the robots.txt, if our bot is allowed, then WhoWas sends a GET request to "http://X" or "https://X" to get the top-level page of target web service.
While technically any data we obtain is public, in the sense that we are simply fetching web pages from publicly advertised IP addresses, it may be that cloud tenants inadvertently made accessible what should not be. So as to respect potential privacy issues, we have no plans to make the data sets so far gathered public. Researchers can contact us if they are interested in obtaining access.
You can email Liang Wang Click Me to request related datasets.