Web Communications

Web Communications [WebCom] was a web hosting provider that started in Santa Cruz California in late 1994, opening to the public in 1995. WebCom was the brainchild of Chris Schefler, a Cal State graduate who believed in freedom, communication, and ecology.

Chris started WebCom with co-founder Thomas Leavitt in a small windowless office at 903 Pacific Ave, Suite 306 A. This building was informally named Geek Hall because it was the nexus of every internet connected Santa Cruz startup.

July 27th, 1995 I met Chris and Thomas when I was canvasing tech businesses in Santa Cruz and handing out my resume. I remember entering Suite 306 B and immediately identifying Thomas as the person in charge. Thomas fit the popular image of the long-beard sysadmin, naturally leading me to him. It was not until Chris interjected that it became clear he was the Alpha. I was carrying a book entitled Developing Your Own 32-bit Operating System, Chris took note and pulled me aside into the other room to chat for a good while. He asked me to return the following day for a proper interview, I was offered the position of “Customer Service, Accounting, and Tech Support Representative” and began work on July 31st, 1995.

Prelude to an Industry

WebCom had about 750 customers when I started, by the end of the year it was over 900 and we relocated to 125 Water St, Suite A. During the small period we were in the 903 Pacific office I remember Chris having discussions with the HTTP steering committee about the HTTP/1.1 protocol. Chris proposed adding a new header to the protocol called Host, the purpose of the host header would be to tell the web server what host the client was requesting. Prior to the addition of the host header there was no way for a server to differentiate what content the client was requesting, this is why you would see URLs like http://www.webcom.com/~gumbo/index.html instead of http://www.gumbopages.com, [Chuck Taggart was an early customer of WebCom and had the username gumbo, as was Ernst & Young, predictably username ey]. Early on we had created a plugin for Netscape Enterprise Server that dispensed with the tilde, so customers could have a URL like http://www.gumbopages.com/gumbo/index.html, but that was a half-measure that didn’t meet the needs of the burgeoning Internet.

Prior to the HTTP Host header it was commonplace to allocate 1 IP address per domain name. The web server would then perform a reverse DNS lookup of the IP address that received the GET request to determine what virtual host the client requested.

Below you can see the email which Chris submitted to the working group. This request would lay the foundation for EVERY hosting company that exists today. You wouldn’t have your Wix.com, or Square Space, or GoDaddy, or … Web.com.

From: Chris Schefler <css@webcom.com>
Date: Wed, 20 Sep 95 16:50:06 PDT
Message-Id: <199509202341.AA031740509@hplb.hpl.hp.com>
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Subject: domain-name?

Currently, the client does not seem pass the entire URL to the
server.  Although a strict reading of the spec seems to indicate
that the full URL (http://domain.name/path/to/file.html) is legal,
in practice only everything after the domain name is passed in
the GET request.

Questions:  Is there anything in the 1.1 spec for passing the full
URL, or, alternatively, passing the domain name as a header?
Or am I correct in my reading of the spec that passing the full
URL, including domain name, is legal in 1.0? (e.g., GET
http://www.domain.com/dir/welcome.html HTTP/1.0)?

It is important to be able to determine which domain name was
used in the URL in case a server answers to many domain names.

Since the client resolves the domain name to an IP address, and
only requests the part of the URL after the domain name, the
server can not know which domain name is being requested.  This
is necessary to support 'virtual hosting', in which a server appears
to be dedicated to many individual domain names, when in fact it
is shared among all the domain names, e.g.

http://www.foo.com/
and
http://www.bar.com/

Both resolve to the same server (same IP address), yet return
different home pages (one returns the foo home page, the other
returns the bar home page).

Typically, this is accomplished by assigning a different IP
address to each domain name, allowing the Web server to consult
a table or do a reverse DNS lookup to determine the domain name
and map to the appropriate home page.

However, this needlessly consumes IP addresses, requires an OS
which supports multiple IP addresses on the same network interface,
and has severe scalability problems (for instance, Solaris only
allows a maximum of 255 IP addresses per network interface).

If there is nothing currently in the works for the 1.1 spec,
(or now way to accomplish passing of the domain name within
the current revision of the protocol - 1.0), I would like to
know so we can submit a formal proposal.

Thank you.

Chris Schefler
--
Web Communications (sm)                 Chris Schefler
Voice: (408) 457-9671 x100              css@webcom.com

Web Communications Home Page <URL:http://www.webcom.com/>

Earning your stripes

Counter test image with X-Files font

Over the next 2 years I shed the Accounting, Customer Service, and Tech Support roles and became a System Software Engineer. My first programming project was a graphical hit counter. Those odometer counters on web pages were all the rage in 1996, but the 2 conventional solutions were CGI programs that required quite a bit of resources. At the time, forking processes for each web hit was extremely expensive and performance limiting, so Chris wanted me to write an NSAPI module for the Netscape Enterprise Server. The final deliverable was a new counter implementation plugin that used a Sybase database for the count storage and had about 20 GIF character sets statically included so it could produce a single GIF image very quickly. The counter duties were handed off to a Linux box running Apache and Sybase 11.5, answering to abacus.webcom.com.

It was during the development of the counter plugin that I noticed the beta versions of Netscape Communicator were sending the Host: header in HTTP requests. I quickly informed Chris of this discovery and he hacked together a virtual hosting plugin for the Netscape Enterprise Server. Thus was born name-based virtual hosting of websites. We used this to great effect, hosting 70,000 domains on a single IP address.

220 localhost WebCom SMTP(2.0 #249) PID 1337 ready

In 1996, we began an ambitious project to replace our troublesome email infrastructure with a home-brewed solution. We initially started out with Sendmail on an HP model 865, then we transitioned to Netscape Mail Server on a Sun Ultra. While NMS brought additional features and more convenient management of accounts, it was dreadfully slow. We opted for a multipronged approach that was built over several phases. The first phase of WebCom SMTP was an incoming listener proxy which could validate addresses and selectively proxy messages between incoming (NMS) and outgoing relays (Sendmail). This approach afforded us some breathing room since NMS only needed to handle email being received by our customers.

The development of WSMTP continued off and on over the next 2 years, finally going live in June 1998. WSMTP acted as a distributed orchestrator, it would natively receive emails, handle list processing, auto-replies, and address expansion, then it would do one of several things: hand batches off to Sendmail, deliver directly to POP email boxes, or deliver directly to Unix mail spools (internal email). The system was incredibly resilient and could scale to handle very large email messages, volumes, and surges of connections.

The WebCom mail server consisted of 6 major components: wsmtp — the listener, sybase — a load balanced group of Sybase 11.9 db servers that acted as transient stores for the message headers and bodies, wroute — the message routing and processing engine, wdeliver — the message compositor and delivery agent, pop — the pop email platform, and relay — a group of load balanced Sendmail relay servers that acted as smart delivery caches.

The wsmtp layer had 2 Sun Ultra2 servers with 125 persistent listeners each, for a total of 250 simultaneous client connections. We had 2 additional Ultra2 machines which both ran wroute and wdeliver processes, there was a scaling algorithm that maintained 10-20 processes of each. There were 4 Linux boxes running Sendmail for delivery caching, 2 load balanced Linux pop servers, and 4 Linux servers running Sybase. The Sybase servers were upgraded over time to faster and better machines, mainly due to a day 1 bug:

if(queue_write_body(handle,mail_buffer,msg)==FALSE) {
  write_handle(handle,ERROR_552);
  msg_error=1;
  goto done;
}

...

done:

queue_close(handle,msg);

Quite literally, the bug was simply that queue_close(...) was after the done: label and not before. The error handling jumped to done: to indicate a failure, but queue_close(...) should have only been called upon successful receipt of a message. The consequence of this bug was that hundreds of thousands of aborted messages were handed off to the routing layer with no possible outcome.

When a message was in the creation phase it was state 0 in the database, once it was successfully received the state was incremented to 1. The wroute program would fetch all messages in state 1, or the oldest message in state 2 that was over 1hr old. wroute would set the state to 2, when it was complete it would increment to state 3. wdeliver would pick up messages in state 3 or 4, if over 1hr old. When wdeliver was done the state was incremented 5, ready for deletion. If a failure occurred in one of the phases you could reset all message states to the previous state and wroute/wdeliver would reprocess them. During periods of abuse the failed messages would clog up the works and prevent legitimate messages from being processed.

I think this is a good point to wrap up the first installment of WebCom history, I will get back to that allusion about Web.com later…

EDIT: I just realized that this post was written on the eve of the anniversary of the sale of WebCom to Verio, another story for another post.

2 thoughts on “Web Communications”

Thomas Leavitt says:
April 17, 2020 at 6:09 pm
I remember the day you came in the door. As I recall, the last company you’d talked to was Radio Shack. When you started talking about finding a bug in the Borland’s C compiler’s malloc() call, Chris and I looked at each other and said, “We’ve got to hire this guy.”
You’ll notice a couple of servers lined up on the floor next to the filing cabinet in the picture of our office at 306B. I believe one of those was our original HP PA-RISC E35 (which we bought off a failed Colorado ISP for $20,000), and another was our then Microsoft SQL server; at a later point, we transitioned to running Sybase (initially a pirated copy acquired from his former employer) on the E35 , after transitioning our core webserver from that machine onto an HP E55 (also acquired, in a straight up trade for web hosting services, from his former employer). Not sure about the timing of each transition; I do know that you were the first to introduce a Linux box into our environment. Later on, we transitioned the webserver to a Sparc 1000e, and then subsequently to an E4000, at which point the database server was migrated to the Sparc 1000e, which I recall running at essentially 95% of capacity 100% of the time.
1. pedward says:
  April 17, 2020 at 6:13 pm
  Yeah, I built a dialin server with a Linux box. Initially we got some of the latest USR “soft” modems and they were a complete non-starter under Linux. I had nothing but trouble with those. Later we bought a couple 8 head Cyclades cards and just hung a couple Sportsters off those, just like EVERY ISP did until v.90/56k came along.

Web Communications

Prelude to an Industry

Earning your stripes

220 localhost WebCom SMTP(2.0 #249) PID 1337 ready

Related

2 thoughts on “Web Communications”

Leave a Reply to Thomas Leavitt Cancel reply