Last weekend marked the 17th edition of the winter lan party of Zanzilan which I help organize and had an attendance of around 70 participants.

Previous editions I've used Centos and Puppet for managing the gateway infrastructure and services running in LXC containers, of which you can find more information in my presentation.

The desire for evaluating pfSense came from the increased burden of having to maintain the Puppet code (3.7), availability of knowledgeable people to act in case something goes wrong should I be K.O. for some reason and difficulty to find time to prepare and test everything.

The main focus is delivering the service, not configuration management for the sake of configuration management.


What is pfSense?

pfSense is an open source BSD based Firewall/Router solution with a lot of advanced features adequate from a small lab environment to managing a huge enterprise network.

In the context of the Lan party, these are the key features I was looking for:

  • Robust, easy to manage firewall rules
  • Multi-WAN support
  • VLANS
  • DNS Resolver
  • Traffic Shaping
  • Adequate monitoring
  • DHCP (with hostname registration in DNS)
  • Domain based blacklisting (Squid+SquidGuard)

pfSense is pretty much well established, although I feel the need to also mention its fork, OPNsense, as an alternative. From an superficial point of view both have pretty much the same capability (it is in fact, a fork), the biggest differences are mostly sugar coating (UI), the community and some philosophical differences which have led to the fork.


The Setup

                                           pfSense Gateway
             +-------------------------------------------+
             |                                           |
             |                   Lagg0                   |
             |         +-----------------------+         |
             |         |                       |         |
             |   +---+ | +---+   +---+   +---+ |         |
             +---+   +---+   +---+   +---+   +-----------+
WAN              +-+-+ | +---+   +---+   +---+ |
Down 500Mbps       |   |                       |
Up    40Mbps <-----+   +-----------+-----------+
                                   |
                 LAN_VLAN 20 +-----------+ 10 MGMT_VLAN
                                   |
                WIFI_VLAN 30 +-----+-----+  6 CATERING_VLAN

The current server in use has 4 network interfaces. One is used for the WAN, the other part is a LAGG so traffic can be aggregated to a switch (stack).

Inside the LAGG, 4 VLANS are configured for separating traffic, allowing the creation of separate Firewall, priority rules and domain level blocking (SquidGuard).

DNS and DHCP is defined on each VLAN to close the loop.

Squid is installed and functioning in Transparent Mode, so no client side configuration is required. Since we are not interested in the actual data (for privacy reasons) we Splice All the connections. Squid will not tamper with the certificates and just tunnel the communication between the client and the server it's trying to connect to.


Findings

Here's a list of things I've learned running this setup over the course of 3 days.

DNS not working in a particular VLAN

The CATERING_VLAN wasn't able to resolve DNS requests. This was due to the VLAN being created last and we forgot to add it to the list of interfaces the DNS server had to bind on.

Specific game unable to get online

One particular game, Rocket League, was not able to get online. I had to install a tool from Microsoft called Network Monitor which allowed me to see which specific connections a particular process was making on the client. I managed to pinpoint the game trying to connect to a specific IPaddress that apparently was blocked by SquidGuard. After some further digging, it appeared that that IP was blocked by the Warez blocklist  category we were using from Shalla's Blacklist. We decided to disabled the Warez category, and everything worked fine.

Packet loss on the WAN causes unavailability on the internal network

After some time, we realized that we were dealing with regular packet losses between 5 and 15%. When the packet loss reached somewhere beyond 15%, for some reason the web-interface and traffic between VLANs was impacted. This initially didn't made sense as you'd expect traffic between the VLANs which don't need the internet, to just work. The packet loss was traced back to a faulty coax cable installation done by the ISP's technician. After replacing the connector I did some more digging and found traces in the logs that pfSense reloads its filters/state every time the WAN reached beyond the 15% threshold. In a Multi-WAN scenario, state needs to be refreshed in order for connections to switch. The workaround used here was to configure pfSense to consider the interface as always up. Alternatively, you can change the configuration to ignore gateway events.

Limiter didn't do anything

We configured a limiter, but we didn't realize that it had to be applied to actual firewall rules in order to be effective. In the end we removed the limiter and went with Traffic Shaping using the Wizard which proved to be easy enough to use.


All in all I'm quite pleased with the results and pfSense certainly exceeded my expectations.

The next Zanzilan Lan party will be during the summer where we expect around 170 participants for which we'll be using 2-3 WAN connections.

That's gonna take while, but I'll surely write another blogpost about it with some new findings, and probably complement that with a talk and presentation at one of the conferences in Belgium around Systems Administrations and whatnot :)