Web Communications [WebCom] was a web hosting provider that started in Santa Cruz California in late 1994, opening to the public in 1995. WebCom was the brainchild of Chris Schefler, a Cal State graduate who believed in freedom, communication, and ecology.
Chris started WebCom with co-founder Thomas Leavitt in a small windowless office at 903 Pacific Ave, Suite 306 A. This building was informally named Geek Hall because it was the nexus of every internet connected Santa Cruz startup.
July 27th, 1995 I met Chris and Thomas when I was canvasing tech businesses in Santa Cruz and handing out my resume. I remember entering Suite 306 B and immediately identifying Thomas as the person in charge. Thomas fit the popular image of the long-beard sysadmin, naturally leading me to him. It was not until Chris interjected that it became clear he was the Alpha. I was carrying a book entitled Developing Your Own 32-bit Operating System, Chris took note and pulled me aside into the other room to chat for a good while. He asked me to return the following day for a proper interview, I was offered the position of “Customer Service, Accounting, and Tech Support Representative” and began work on July 31st, 1995.
Prelude to an Industry
WebCom had about 750 customers when I started, by the end of the year it was over 900 and we relocated to 125 Water St, Suite A. During the small period we were in the 903 Pacific office I remember Chris having discussions with the HTTP steering committee about the HTTP/1.1 protocol. Chris proposed adding a new header to the protocol called Host
, the purpose of the host header would be to tell the web server what host the client was requesting. Prior to the addition of the host header there was no way for a server to differentiate what content the client was requesting, this is why you would see URLs like http://www.webcom.com/~gumbo/index.html
instead of http://www.gumbopages.com
, [Chuck Taggart was an early customer of WebCom and had the username gumbo
, as was Ernst & Young, predictably username ey
]. Early on we had created a plugin for Netscape Enterprise Server that dispensed with the tilde, so customers could have a URL like http://www.gumbopages.com/gumbo/index.html
, but that was a half-measure that didn’t meet the needs of the burgeoning Internet.
Prior to the HTTP Host header it was commonplace to allocate 1 IP address per domain name. The web server would then perform a reverse DNS lookup of the IP address that received the GET request to determine what virtual host the client requested.
Below you can see the email which Chris submitted to the working group. This request would lay the foundation for EVERY hosting company that exists today. You wouldn’t have your Wix.com, or Square Space, or GoDaddy, or … Web.com.
From: Chris Schefler <css@webcom.com> Date: Wed, 20 Sep 95 16:50:06 PDT Message-Id: <199509202341.AA031740509@hplb.hpl.hp.com> To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com Subject: domain-name? Currently, the client does not seem pass the entire URL to the server. Although a strict reading of the spec seems to indicate that the full URL (http://domain.name/path/to/file.html) is legal, in practice only everything after the domain name is passed in the GET request. Questions: Is there anything in the 1.1 spec for passing the full URL, or, alternatively, passing the domain name as a header? Or am I correct in my reading of the spec that passing the full URL, including domain name, is legal in 1.0? (e.g., GET http://www.domain.com/dir/welcome.html HTTP/1.0)? It is important to be able to determine which domain name was used in the URL in case a server answers to many domain names. Since the client resolves the domain name to an IP address, and only requests the part of the URL after the domain name, the server can not know which domain name is being requested. This is necessary to support 'virtual hosting', in which a server appears to be dedicated to many individual domain names, when in fact it is shared among all the domain names, e.g. http://www.foo.com/ and http://www.bar.com/ Both resolve to the same server (same IP address), yet return different home pages (one returns the foo home page, the other returns the bar home page). Typically, this is accomplished by assigning a different IP address to each domain name, allowing the Web server to consult a table or do a reverse DNS lookup to determine the domain name and map to the appropriate home page. However, this needlessly consumes IP addresses, requires an OS which supports multiple IP addresses on the same network interface, and has severe scalability problems (for instance, Solaris only allows a maximum of 255 IP addresses per network interface). If there is nothing currently in the works for the 1.1 spec, (or now way to accomplish passing of the domain name within the current revision of the protocol - 1.0), I would like to know so we can submit a formal proposal. Thank you. Chris Schefler -- Web Communications (sm) Chris Schefler Voice: (408) 457-9671 x100 css@webcom.com Web Communications Home Page <URL:http://www.webcom.com/>
Earning your stripes
Over the next 2 years I shed the Accounting, Customer Service, and Tech Support roles and became a System Software Engineer. My first programming project was a graphical hit counter. Those odometer counters on web pages were all the rage in 1996, but the 2 conventional solutions were CGI programs that required quite a bit of resources. At the time, forking processes for each web hit was extremely expensive and performance limiting, so Chris wanted me to write an NSAPI module for the Netscape Enterprise Server. The final deliverable was a new counter implementation plugin that used a Sybase database for the count storage and had about 20 GIF character sets statically included so it could produce a single GIF image very quickly. The counter duties were handed off to a Linux box running Apache and Sybase 11.5, answering to abacus.webcom.com.
It was during the development of the counter plugin that I noticed the beta versions of Netscape Communicator were sending the Host: header in HTTP requests. I quickly informed Chris of this discovery and he hacked together a virtual hosting plugin for the Netscape Enterprise Server. Thus was born name-based virtual hosting of websites. We used this to great effect, hosting 70,000 domains on a single IP address.
220 localhost WebCom SMTP(2.0 #249) PID 1337 ready
In 1996, we began an ambitious project to replace our troublesome email infrastructure with a home-brewed solution. We initially started out with Sendmail on an HP model 865, then we transitioned to Netscape Mail Server on a Sun Ultra. While NMS brought additional features and more convenient management of accounts, it was dreadfully slow. We opted for a multipronged approach that was built over several phases. The first phase of WebCom SMTP was an incoming listener proxy which could validate addresses and selectively proxy messages between incoming (NMS) and outgoing relays (Sendmail). This approach afforded us some breathing room since NMS only needed to handle email being received by our customers.
The development of WSMTP continued off and on over the next 2 years, finally going live in June 1998. WSMTP acted as a distributed orchestrator, it would natively receive emails, handle list processing, auto-replies, and address expansion, then it would do one of several things: hand batches off to Sendmail, deliver directly to POP email boxes, or deliver directly to Unix mail spools (internal email). The system was incredibly resilient and could scale to handle very large email messages, volumes, and surges of connections.
The WebCom mail server consisted of 6 major components: wsmtp — the listener, sybase — a load balanced group of Sybase 11.9 db servers that acted as transient stores for the message headers and bodies, wroute — the message routing and processing engine, wdeliver — the message compositor and delivery agent, pop — the pop email platform, and relay — a group of load balanced Sendmail relay servers that acted as smart delivery caches.
The wsmtp layer had 2 Sun Ultra2 servers with 125 persistent listeners each, for a total of 250 simultaneous client connections. We had 2 additional Ultra2 machines which both ran wroute and wdeliver processes, there was a scaling algorithm that maintained 10-20 processes of each. There were 4 Linux boxes running Sendmail for delivery caching, 2 load balanced Linux pop servers, and 4 Linux servers running Sybase. The Sybase servers were upgraded over time to faster and better machines, mainly due to a day 1 bug:
if(queue_write_body(handle,mail_buffer,msg)==FALSE) { write_handle(handle,ERROR_552); msg_error=1; goto done; } ... done: queue_close(handle,msg);
Quite literally, the bug was simply that queue_close(...)
was after the done:
label and not before. The error handling jumped to done:
to indicate a failure, but queue_close(...)
should have only been called upon successful receipt of a message. The consequence of this bug was that hundreds of thousands of aborted messages were handed off to the routing layer with no possible outcome.
When a message was in the creation phase it was state 0 in the database, once it was successfully received the state was incremented to 1. The wroute program would fetch all messages in state 1, or the oldest message in state 2 that was over 1hr old. wroute would set the state to 2, when it was complete it would increment to state 3. wdeliver would pick up messages in state 3 or 4, if over 1hr old. When wdeliver was done the state was incremented 5, ready for deletion. If a failure occurred in one of the phases you could reset all message states to the previous state and wroute/wdeliver would reprocess them. During periods of abuse the failed messages would clog up the works and prevent legitimate messages from being processed.
I think this is a good point to wrap up the first installment of WebCom history, I will get back to that allusion about Web.com later…
EDIT: I just realized that this post was written on the eve of the anniversary of the sale of WebCom to Verio, another story for another post.
I remember the day you came in the door. As I recall, the last company you’d talked to was Radio Shack. When you started talking about finding a bug in the Borland’s C compiler’s malloc() call, Chris and I looked at each other and said, “We’ve got to hire this guy.”
You’ll notice a couple of servers lined up on the floor next to the filing cabinet in the picture of our office at 306B. I believe one of those was our original HP PA-RISC E35 (which we bought off a failed Colorado ISP for $20,000), and another was our then Microsoft SQL server; at a later point, we transitioned to running Sybase (initially a pirated copy acquired from his former employer) on the E35 , after transitioning our core webserver from that machine onto an HP E55 (also acquired, in a straight up trade for web hosting services, from his former employer). Not sure about the timing of each transition; I do know that you were the first to introduce a Linux box into our environment. Later on, we transitioned the webserver to a Sparc 1000e, and then subsequently to an E4000, at which point the database server was migrated to the Sparc 1000e, which I recall running at essentially 95% of capacity 100% of the time.
Yeah, I built a dialin server with a Linux box. Initially we got some of the latest USR “soft” modems and they were a complete non-starter under Linux. I had nothing but trouble with those. Later we bought a couple 8 head Cyclades cards and just hung a couple Sportsters off those, just like EVERY ISP did until v.90/56k came along.