Overview of the content filtering solution for OpenBSD - Richweb SGW config

Components used:

1. OpenBSD+ pf firewall

2. Squid Web Proxy/Cache

3. DansGuardian

4. ClamAV anti-virus (phish, malware, scam detection as well)

5. OpenDNS account (*)

6. Caching DNS server (bind9 or unbound)

7. Apache web server (bundled with OS)

8. ISC DHCP server (bundled with OS)

9. Richweb Network Management and monitoring NAGIOS plugins

10. SquidGuard extra rulesets

 

Items marked with a (*) require a commercial subscription as per usage terms.

 

System Requirements:

1. 1.6GHz or better CPU

2. 768MB RAM for small office (less than 10 users), 1GB of RAM for 10-50 users, 2GB of RAM suggested for 50-150 users, 4GB of RAM for 150+ users

3. Firewall with 1 NIC for app gateway mode, 2 NICs for transparent proxy (redirect port 80)

 

Overview:

 

The proxy can be deployed as a stand alone server, or as an in-line firewall where all outbound internet access will pass through the firewall. The main difference is that in firewall mode the system can be configured to capture outbound port 80 (HTTP) such that users that are attempting to evade the proxy will have a much harder time doing so.

 

The proxy is able to scan all content downloaded via HTTP (web search results, redirects, web pages, etc) for malware by ClamAV and all content is also matched against both admin-controlled white and blacklists as well as keywords (intelligent filtering) and phrases. Based on the level that DansGuardian is allowed to pass as safe, virtually all adult and non-business-appropriate sites can be blocked.

OpenDNS fits into the solution as it is needed to control urls visited by HTTPS. Since HTTPS (HTTP over SSL) negotiates a secure end to end channel between the web server and web browser, it is not possible for the proxy to scan  or block HTTPS-fetched content.  However, all websites that are browsed over HTTPS still require a DNS lookup. By blocking at the DNS level (what OpenDNS is very good at) https:// versions of inappropriate sites can be blocked as well.

 

Sample Session:

1. Browser requests url, using the Dans Guardian proxy

2. Dansguardian checks local blacklists to make sure domain and url are safe
and not blocked. If you wanted to block myspace.com for example, it can be added to a local file of banned sites, which will be checked first.

3. Dansguardian connects to squid proxy and passes the url that was requested

4. Squid resolves the dns name in a url to an ip address. If Opendns is in
use, then opendns blocked-domain redirects would kick in here. squid would
fetch the bad domain or blocked domain page from the opendns web server.

Squid will be configured to use the local DNS cache on the machine. The local cache means that OpenDNS servers must be queried only when the answer is not already cached.

If you have a local MS AD domain (saee corp.local) that hosts an intranet you may find some of the solutons here to be useful in making the local AD domain resolve along with the OpenDNS forwarding and local caching. 

http://richweb.com/openbsd_remote_vpn_site_dns_active_directory_configs

 

The local DNS cache makes the system more responsive and lessens the load on OpenDNS.

5. Squid fetches the entire block of web content requested by the url
including any sub-assets (images, css, remote ads, etc).

6. All content is passed back to dansguardian where dansguardian is going to
scan all textual content for bad phrases, etc, and block the main page (url)
requested if the naughtiness limit (score) is exceeded by what is found on the page in question.

7. All content that passes the keywords filtering is passed off to clamav for virus scanning. Any content that
fails to pass the malware checks is dropped.

8. Content is returned to the browser by Dansguardian where it is rendered by the browser. An include ad that was blocked would not block the entire webpage, just  that pocket would be empty.

Because ad servers send inline javascript to the browsers which in turn fetches the ad content, all ads that have malware should still be blocked, because the browser is using the same protected mechanisms to download inline ads as it is the main web page.

 

Using speed tests to measure the proxy performance

The popular (and of limited use) web based speed tests are all of course not going to be able to measure any kind of network speed properly due to the multiple stages of buffering and the fact that content cant be streamed directly to the browser; each piece of content or http object has to be fetched in full by the Dansguardian engine to be scanned before it can be allowed to touch the browser.  And of course if the web-based speed tester is a flash or java app or even a web app that is asking the browser or code loaded by the browser to connect on a port that is not proxied, then the content wont be even going thru the proxy.

The best way to monitor the performance of the filtering system is with systat, for system usage and either bwm (bwm-ng package) or iftop for network and port/protocol usage.

 

Setting Users to use the Proxy

There are several options here, you will probably use all of them:

 

1. Manual - this is for testing only of course or for networks with 1 or 2 stations. You wont want to manage proxy settings manually for more than 1 or 2 browsers. Set your proxy to the ip address of the firewall (inside ip) or server ip (server mode) on port 8080.

 

2. Active Directory Group Policy - this can be done in AD on a group by group basis, though you will need a wpad file. Of course this is an option for sites with a domain controller / local file and print login server.

 

3. DHCP proxy setting - this works by telling the browser / OS via DHCP the url of the wpad (Windows Proxy AutoDetect) file.

 

4. DNS+WPAD - proxy auto-detect when enable will work by having the browser make a request for a wpad file at http://wpad.my_ad_domain.local/wpad.dat

For more information on the DHCP and WPAD options:
http://en.wikipedia.org/wiki/Web_Proxy_Autodiscovery_Protocol

 

5. Transparent Proxy - this is a last resort only. Note that Transparent redirection can and will break websites that assume that there is no proxy between the client and the web server. It is suggested that one of the first 4 options above are implemented, and that transparent proxy is used ONLY to capture users that attempt to evade the policy settings of the filters.

Setting up transparent redirect is done in the pf.conf file and it again, will only break end systems (for some websites) that are NOT aware they should be using a proxy. So setting transproxy is a good idea once you have the DHCP, DNS, and WPAD options in place as it will catch end users that are either trying to evade the content filter or guests that plug-in with laptops that dont have your domain policy for example.

It is easy in pf as well to build a set of ips that are excluded from transparent proxy; servers that need to download updates that already have good local Antivirus installed and maintained are good candidates to be excluded from transparent proxy.

 

Logging

Dansguardian keeps all of its logs in /var/log/dansguardian/access.log

Dowmloading the log file and parsing it using a product like CyFin report is possible, and you can also use grep to search through the logs for a particular site or end user as well.

http://www.wavecrest.net/support/cyfin/reporter/