Rabbit Holes and Time Sinks

This article is as much a piece of documentation as it is commentary. I recently decided to rejigger my home network after being quite comfortable in the current configuration for almost 7 years. The impetus was actually quite simple: one day I suddenly got paranoid when I realized what damage could be done if someone compromised my personal account. I am reasonably careful and competent about how I run things, but in spite of how careful I am, the services I’ve added in the last year increase the attack surface of my home network considerably. I would be foolish to ignore the increased risk these services pose.

Rabbit holes can be interesting or frustrating distractions to a relatively direct plan or process. Sometimes those rabbit holes turn from distractions into time sinks. Getting my home network upgrade completed was filled with both rabbit holes and time sinks. This isn’t the first major upgrade I’ve been involved in, I’ve moved datacenters multiple times, deployed new services, migrated services, but I’ve never had to completely duplicate all running services while also juggling new firewalls and network renumbers.

Normally, when you plan to perform upgrades for your office or enterprise, you stage the upgrades so that no one step is dramatic or has a large probability for error. You might renumber your networks, but keep all the services and hardware the same, or you might change mail servers, but keep your networks and numbering the same. In my case I was renumbering the network, introducing a new firewall, building a new web/email/DNS server, and I was figuring out all of the SELinux permissions that needed to be adjusted for the new server to play nice and be secure.

I decided on an open source firewall that was designed to also work as a VM under KVM, my chosen hypervisor. Where the user interface lacks polish or intuitiveness, the wiki more or less makes up, and I can use my Linux-fu skills to manually adjust settings behind the scenes.

My chosen server environment is Fedora Server, I’ve been using Fedora since version 10 and I like the balance of stability, progress, and package availability. I also like that it has better hardware support than RHEL — some of the hardware support decisions seem politicized, along with packaging and a lot of other things in RHEL. I started using RedHat with version 4.2 on the DEC Alpha Multia, it was the first Linux distro I personally purchased. Then we deployed RH 5.0 on a bunch of machines in our company and I was an RPM aficionado from there on. I can function in an APT world, but I find APT and dpkg to be a bit clunky in ways yum and RPM are not.

Moving all of my services to a new server meant not only copying over and merging configuration files from my existing system, but relabeling lots of SELinux contexts on directories, files, and creating custom policies where labeling coudn’t solve a problem.

Here is a list of problems I encountered and how I solved them:

Half way through the process I got VERY distracted by a rabbit hole in the form of Connection refused when attempting to connect to port 80 on any system outside the network. The firewall was not logging any denials and I could not get any thing useful out of the dmesg/messages log. Ultimately I ended up enabling iptables TRACE as the first rule in the POSTROUTING chain, which then started logging curious matches to the log. When I attempted to make an outgoing connection, a packet was being logged that was addressed to the firewall’s inside address on port 3128. The port number rang a bell because the firewall has a squid proxy capability. This next part is to help anyone searching for Connection Refused on port 80 when using the ipfire firewall: The squid proxy crashed and transparent proxy was enabled — this causes the firewall to rewrite all outgoing port 80 connections to be redirected to port 3128 on the firewall. I admit I overlooked the significance of the “Transparent proxy” feature and there was ZERO context as to what it actually did.

The fact that the squid proxy crashing on the firewall will totally hose your outgoing http connections is rather dumb in my opinion.

When I created the new server I had to decide between local storage or network storage for the files. Local storage has the advantage that it’s not dependent on the NAS and can operate when the NAS is down for maintenance. The disadvantage is that you are effectively duplicating data from elsewhere and you aren’t taking advantage of your NAS. Since I decided on local storage, I installed Fedora Server to the root volume group and added another 100GB for all of my data storage — being certain to use targeted and/or custom SELinux contexts. Per usual, when you have a big FS that you put your data on, you use symlinks to point the standard system locations like /var/www and /var/spool/mail to that filesystem. The frustrating and weird thing is that once I sorted out the contexts and SELinux custom policies, I wasn’t getting any denials, but procmail couldn’t deliver due to permissions issues. Setting SELinux to permissive proved the problem was within SELinux and not a filesystem permission. I stumbled across a debug mode for SELinux which will cause it to log all denials, including “filtered” denials: semanage -DB

Once I disabled filtering, the problem was staring me right in the face: procmail could not read the link because it pointed to a filesystem which was unlabeled. The directories WITHIN the filesystem have targeted labeling, but the mount point was unlabeled. The typical solution of adding a label won’t work because you are effectively creating an exception for that single context. The solution was to create a custom policy (using the command in the setroubleshoot log) which allowed procmail_t to readlink the filesystem mount point. After confirming that all the services worked, I ran semanage -B to reset the logging to filtered mode.
The last problem shouldn’t happen, but does: clients on the network behave in dumb and nonsensical ways when you change IP addresses around. The previous router (my desktop computer) gave up its IP and assumed a new one, this broke all sorts of things in very weird ways. My Android phone could reach my website in Chrome, but Firefox just produced an error, and Gmail could not fetch my mail. Windows 10 could see the new address and Netbios name for my desktop, but it produced an error when you tried to browse the shares. In the case of the Android phone, the problem magically fixed itself by morning and Gmail and Firefox worked. For Windows 10, it needed the good old 3 finger salute (a reboot) and everything was fine afterwards.

Somewhere in the process DNS didn’t work consistently, even though I just copied the entire config and zones from a working instance, it was very unpredictable and would work sometimes and not others. One curious observation was BIND was complaining about the DNSSEC key had expired in the root.zone file I had. This file wasn’t very old, but it seemed like I was getting inconsistent DNS lookups for .com domains. I replaced my root cache priming rule with the plain named.ca from the BIND package and it seemed like my problem was resolved. It works now, so I see no reason to mess with it — my DNS setup is somewhat complicated with internal and external views, DHCP dynamic DNS update, and DNSSEC signing, so “don’t fix it if it ain’t broke” applies here.

As of right now, everything seems to be working. The things that were surprisingly easy and without challenges were: moving over the www tree and configurations and recreating the docker mysql database. The www tree has 3 different PHP applications installed in it, so there is plenty of things to go wrong. To port the mysql database over I simply recreated the same docker network and container, then used mysqldump backups from the night before to populate the schemas. Using docker networks made the move simpler because I didn’t need to change any application settings or user grants.

The total prep time for setting up the new server, VM instances, and firewall was probably 12 hours. From the time I decided to cut-over it was a 6 or 7 hour marathon sprint, I was head down and oblivious to the world for that entire period. When I looked up and reached the “it works well enough to go to bed” point, my head hurt in ways that felt unnatural. I spent another 2 or 3 hours tying up loose ends today, and now I’m documenting these things here so I can remember what I had to do.

Rabbit Holes and Time Sinks

Related

Leave a Reply Cancel reply