WebCom secrets: How we hosted 70,000 domains on one Apache instance

A chief virtue of time is that it provides distance. Time is the 4th dimension we live in and it gives us the opportunity to share what once was, without fear of reprisal. It has been 12 years since I was let go from Verio, almost as much time as I worked for WebCom/Verio/NTT. I feel there is enough distance between then and now to share some secrets without fear of reprisal.

WebCom did things differently, we pioneered name-based virtual hosting and we learned how to do more with less. Back when WebCom was starting to do name-based hosting it was common for many providers to put 2,000 IP addresses on an SGI machine running IRIX. I assume that the allure of SGI had to do with decent horsepower and a BSD derived OS that could host a lot of IP addresses per NIC. Back then the BSD network stack was considered to be one of the best.

When I started we had HP PA-RISC machines, a Sun 4/330, and a Windows NT 3.51 486 running MS SQL Server (Sybase). By the end of the year we’d signed a lease on a Sun Enterprise 1000 server, a piece of “big iron” at the time. I think we had 4 SuperSPARC processors and 512MB of RAM. We looked at offering IP based hosting on Sun, but their OS only allowed up to 255 IPs per NIC. We briefly considered an inexpensive array of SCO Unix boxes, but Linux was never in the running because Chris considered it an immature OS. I spent my entire career there championing Linux, and winning.

We decided to go the Big Ole Server route with Sun, first with the S1000E, then an Enterprise 4000 in 1997. Early on we ran Netscape Enterprise Server, a commercial web server product from Netscape, written by the same people who wrote NCSA httpd. This was a modular web server with a plugin architecture and it could be expanded by writing NSAPI modules to perform actions in the chain of operations. Apache wasn’t really on the radar at this point. Chris wrote the first name-based hosting plugin for Netscape, this solution lasted us until around 20,000 domains, then the underlying architecture of Netscape became a bottleneck.

I proposed a stopgap measure to help spread the load: The vast majority of our content was static media and web pages, we could implement a reverse caching layer in front of the Netscape server and reduce the load.

We purchased 2 BIG/IP machines for load balancing, these spread the incoming load among 10 Squid reverse proxy cache boxes running RedHat 5. These were Intel Apollo desktop computers with 180Mhz Pentium Pro processors, 256MB of memory, and an 8GB HDD. This gave us around 2GB of in-memory cache storage, and about 40GB of HDD cache space. This approach had some teething issues, primarily due to how cache invalidation is done with reverse proxies, the solution never fully worked seamlessly and it had a fatal flaw: latency.

Back when we had the Squid caches, latency was a major problem with reverse proxy caches. The web hosting industry was increasingly being held hostage by 3rd party performance metric websites that sought to prove who was the fastest. The company Keynote Systems would spider your site and track latency and performance of your servers. Our proxy caching solution scored consistently low in the Keynote rankings, which resulted in pressure on me to deliver a magic bullet.

Apache to the rescue

The magic bullet came in the form of the Apache web server. Apache had matured significantly since we started using the Netscape server and it was then considered the de facto standard for web hosting on Unix, there were still some IIS holdouts that thought Microsoft was the one true way.

The solution to our performance problems was to do a clean sheet redesign of our web hosting platform. I evaluated every single feature we used in the Netscape server and I scoured the Apache docs to find matching features. 4 years of growth showed that Apache could do almost all of what we needed out of the box, almost. I had to write a couple Apache modules to implement custom user matches (http://www.webcom.com/username), but there was no solution for hosting 70,000 domains in a single Apache instance. I tried starting Apache with 20,000+ domain names, the config file was enormous and it took forever for Apache to parse it. The memory footprint was large and performance just wasn’t there.

There are 2 ways you can look at an obstacle: it is either immovable, or you can just think creatively. I chose the latter.

Sun implemented /tmp using tmpfs, which is an in-memory filesystem that shares system RAM with the disk cache. Linux has a very similar design today, the models are nearly identical. The tmpfs on Sun was VERY fast at handling lots of inodes in a single directory, so much so that it was probably an order of magnitude faster than UFS or VxFS. I exploited this capability to put 140,000+ symlinks in a single directory in /tmp and used it like a very simple database.

Back then we had Sybase, which is fully transactional, and connections are considered expensive. If we made our web server dependent on Sybase for every lookup (this is what happened early on with the NSAPI module), and the database had a hiccup or restart, your services would stop working. It’s okay if you have a short outage to things like a control panel (self services), but it cannot affect web hosting. In mission critical applications that needed database lookups, we would create DBM files with a key->value store. The key was usually a single text string and the value would be a delimited chunk of row data. We would run stored procedures to update the DBM files periodically, then rename them in place.

The limitation of using DBM files is that there was no really clean way to signal to a process to refresh the file handle on the DBM. We built-in signal handlers to do this for some long running processes, but for Apache it would not be feasible due to the delay in processing during a reload.

Abusing tmpfs

I used the tmpfs as a key->value store database by using the symlink name as the key and the contents of the symlink as the value. We would create 140,000+ symlinks, each named for the DNS name for the domain, with the contents of the symlink pointing to the directory where the domain content was stored. The Apache server would check if the HTTP request was for the base server (the server without a vhost defined) or for a pre-configured vhost. If there was no vhost directive for the request, it would lookup a symlink for “www.hostname.com” or “hostname.com”, perform a readlink to obtain the directory where the hostname pointed, then inject that into the document root of the request.

We created a single base configuration file that was appropriate for all customer domains, then we created a vhost for www.webcom.com, which served our static HTML, CGI services, and the /username redirects.

By using the tmpfs as a fast, lightweight, OS arbitrated flat database, we could host 70,000 (and more) domains on a single instance of Apache. It was FAST, so fast that we removed the entire proxy layer and just pointed all customer traffic to the main server. The capacity of Apache running on a 12 processor Enterprise 4000 was easily 6-10x more than Netscape Enterprise Server.

About the time I made these changes to Apache, I started having practical and philosophical concerns over taking Open Source Software and modifying it for commercial use, without contributing those changes back to the upstream project. The real practical business problem with that model is that every time a new Apache release comes out, I have to port my patches to the new release. I had to do this a couple times due to CVEs for Apache. I also surmised that we could actually set the standard had we released our code, instead of someone else doing it later. Someone else did do it later.

It’s been over 20 years since I wrote this code, it’s time for it to make it into the world, even if it has no practical value anymore.

diff -r -u apache_1.3.6/src/include/httpd.h apache_1.3.6_webcom/src/include/httpd.h
--- apache_1.3.6/src/include/httpd.h	2002-07-25 01:38:03.000000000 -0700
+++ apache_1.3.6_webcom/src/include/httpd.h	2002-07-25 01:42:00.000000000 -0700
@@ -306,7 +306,7 @@
  * the overhead.
  */
 #ifndef HARD_SERVER_LIMIT
-#define HARD_SERVER_LIMIT 256
+#define HARD_SERVER_LIMIT 397
 #endif
 
 /*
@@ -894,6 +894,8 @@
     int limit_req_line;      /* limit on size of the HTTP request line    */
     int limit_req_fieldsize; /* limit on size of any request header field */
     int limit_req_fields;    /* limit on number of request header fields  */
+
+    char *namevirtual_symlink_dir;	/* pedward - virtualhost hack */
 };
 
 /* These are more like real hosts than virtual hosts */
diff -r -u apache_1.3.6/src/main/http_config.c apache_1.3.6_webcom/src/main/http_config.c
--- apache_1.3.6/src/main/http_config.c	2002-07-25 01:38:05.000000000 -0700
+++ apache_1.3.6_webcom/src/main/http_config.c	2002-07-25 01:39:47.000000000 -0700
@@ -1325,6 +1325,8 @@
     s->limit_req_fieldsize = main_server->limit_req_fieldsize;
     s->limit_req_fields = main_server->limit_req_fields;
 
+/* pedward - virtualhost hack */
+    s->namevirtual_symlink_dir = NULL;
     *ps = s;
 
     return ap_parse_vhost_addrs(p, hostname, s);
diff -r -u apache_1.3.6/src/main/http_core.c apache_1.3.6_webcom/src/main/http_core.c
--- apache_1.3.6/src/main/http_core.c	2002-07-25 01:38:06.000000000 -0700
+++ apache_1.3.6_webcom/src/main/http_core.c	2002-07-25 01:42:04.000000000 -0700
@@ -494,7 +494,26 @@
 API_EXPORT(const char *) ap_document_root(request_rec *r) /* Don't use this! */
 {
     core_server_config *conf;
+	char	*p;
 
+/* pedward - virtualhost hack */
+    if (r->server->namevirtual_symlink_dir && r->connection->server == r->connection->base_server && r->hostname) {
+	char	link[1024];
+	int	i;
+
+        p = ap_pstrcat(r->pool, r->server->namevirtual_symlink_dir, "/", r->hostname, NULL);
+
+	if ((i=readlink(p, link, sizeof(link))) != -1) {
+		link[i]='\0';
+	} else {
+		return p;
+	}
+
+	p = ap_pstrdup(r->pool, link);
+
+        return p;
+    }
+	
     conf = (core_server_config *)ap_get_module_config(r->server->module_config,
 						      &core_module); 
     return conf->ap_document_root;
@@ -672,6 +691,11 @@
 {
     core_dir_config *d;
 
+/* pedward - virtualhost hack */
+    if (r->server->namevirtual_symlink_dir && r->connection->server == r->connection->base_server) {
+	return r->hostname;
+    }
+
     d = (core_dir_config *)ap_get_module_config(r->per_dir_config,
 						&core_module);
     if (d->use_canonical_name & 1) {
@@ -2650,6 +2675,17 @@
 }
 #endif
 
+/* pedward - virtualhost hack */
+static const char *set_virtual_symlink_directory(cmd_parms *cmd, void *dummy, char *arg) 
+{
+    if (arg[strlen(arg)] == '/') {
+		arg[strlen(arg)]='\0';
+    }
+
+    cmd->server->namevirtual_symlink_dir = ap_pstrdup(cmd->pool, arg);
+    return NULL;
+}
+
 /* Note --- ErrorDocument will now work from .htaccess files.  
  * The AllowOverride of Fileinfo allows webmasters to turn it off
  */
@@ -2875,6 +2911,9 @@
   (void*)XtOffsetOf(core_dir_config, limit_req_body),
   OR_ALL, TAKE1,
   "Limit (in bytes) on maximum size of request message body" },
+/* pedward - virtualhost hack */
+{ "NameVirtualHostSymlinkDirectory", set_virtual_symlink_directory, NULL, RSRC_CONF, TAKE1,
+  "Set the namevirtual host symlink directory"},
 { NULL }
 };
 
@@ -2902,9 +2941,28 @@
 	&& (r->server->path[r->server->pathlen - 1] == '/'
 	    || r->uri[r->server->pathlen] == '/'
 	    || r->uri[r->server->pathlen] == '\0')) {
+
+/* pedward - virtualhost hack */
+    if (r->server->namevirtual_symlink_dir && r->connection->server == r->connection->base_server && r->hostname) {
+	char	link[1024];
+	int	i;
+	char	*p;
+
+        p = ap_pstrcat(r->pool, r->server->namevirtual_symlink_dir, "/", r->hostname, NULL);
+
+	if ((i=readlink(p, link, sizeof(link))) != -1) {
+		link[i]='\0';
+		p = link;
+	}
+
+        r->filename = ap_pstrcat(r->pool, p,
+				 (r->uri + r->server->pathlen), NULL);
+    } else {
         r->filename = ap_pstrcat(r->pool, conf->ap_document_root,
 				 (r->uri + r->server->pathlen), NULL);
     }
+
+    }
     else {
 	/*
          * Make sure that we do not mess up the translation by adding two
@@ -2917,8 +2975,25 @@
 				     NULL);
 	}
 	else {
-	    r->filename = ap_pstrcat(r->pool, conf->ap_document_root, r->uri,
-				     NULL);
+/* pedward - virtualhost hack */
+	    if (r->server->namevirtual_symlink_dir && r->connection->server == r->connection->base_server && r->hostname) {
+		char	link[1024];
+		int	i;
+		char	*p;
+
+		p = ap_pstrcat(r->pool, r->server->namevirtual_symlink_dir, "/", r->hostname, NULL);
+
+		if ((i=readlink(p, link, sizeof(link))) != -1) {
+			link[i]='\0';
+			p = link;
+		}
+
+                r->filename = ap_pstrcat(r->pool, p, r->uri,
+					     NULL);
+	    } else {
+                r->filename = ap_pstrcat(r->pool, conf->ap_document_root, r->uri,
+					     NULL);
+	    }
 	}
     }
 
diff -r -u apache_1.3.6/src/main/http_vhost.c apache_1.3.6_webcom/src/main/http_vhost.c
--- apache_1.3.6/src/main/http_vhost.c	2002-07-25 01:38:07.000000000 -0700
+++ apache_1.3.6_webcom/src/main/http_vhost.c	2002-07-25 01:39:48.000000000 -0700
@@ -665,6 +665,8 @@
     const char *hostname = r->hostname;
     char *host = ap_getword(r->pool, &hostname, ':');	/* get rid of port */
     size_t l;
+/* pedward */
+	char *p;
 
     /* trim a trailing . */
     l = strlen(host);
@@ -672,6 +674,12 @@
         host[l-1] = '\0';
     }
 
+	p = host;
+	while (*p) {
+		*p = tolower(*p);
+		p++;
+	}
+
     r->hostname = host;
 }
 

Leave a Reply

Your email address will not be published. Required fields are marked *