Varnish is a popular HTTP accelerator that can speed up web sites. Here is an example of how to set it up on SLC6 box in view of testing CERN Open Data portal responsiveness.
Installation
Official Varnish packages for Scientific Linux 6 (a distribution that is binary API compatible with CentOS6 and RHEL6) are outdated. To install latest Varnish version 4, one can use Varnish's own package repository:
sudo rpm --nosignature -i https://repo.varnish-cache.org/redhat/varnish-4.0.el6.rpm sudo yum install -y varnish
Configuration
Let us configure Varnish so that it would listen on port 80 and forward traffic to our web application that runs on port 8080 on the same machine.
First, let's configure Varnish listening port:
sudo perl -pi -e 's,VARNISH_LISTEN_PORT=6081,VARNISH_LISTEN_PORT=80,g' /etc/sysconfig/varnish
and make it use more memory while we are at it:
sudo perl -pi -e 's,VARNISH_STORAGE_SIZE=256M,VARNISH_STORAGE_SIZE=512M,g' /etc/sysconfig/varnish
The web application we are trying to accelerate, in this case CERN Open Data test instance, runs on top of Apache and listens on port 8080 only and on incoming IP address 127.0.0.1 only:
sudo perl -pi -e 's,Listen 80,Listen 8080,g' /etc/httpd/conf/httpd.conf sudo -u apache perl -pi -e 's,128.142.151.32:80,127.0.0.1:8080,g' /opt/open-data/.virtualenvs/opendata/var/invenio.base-instance/apache/invenio-apache-vhost.conf
After restart of Apache and Varnish:
sudo /etc/init.d/httpd restart sudo /etc/init.d/varnish restart
we can check that the processes are well listening where they should:
$ sudo netstat -lp | grep varnish tcp 0 0 *:http *:* LISTEN 50690/varnishd tcp 0 0 localhost:6082 *:* LISTEN 50685/varnishd tcp 0 0 *:http *:* LISTEN 50690/varnishd $ sudo netstat -lp | grep httpd tcp 0 0 *:webcache *:* LISTEN 50592/httpd unix 2 [ ACC ] STREAM LISTENING 5289212 50592/httpd /opt/open-data/.virtualenvs/opendata/var/run.50592.0.1.sock
and that the web client connecting from laptop sees things as it should:
$ curl -I http://opendata.cern.ch/ HTTP/1.1 200 OK Date: Thu, 18 Sep 2014 19:35:08 GMT Server: Apache Content-Type: text/html; charset=utf-8 X-Varnish: 98611 3 Age: 99 Via: 1.1 varnish-v4 Content-Length: 8934 Connection: keep-alive
However, due to Varnish proxy, Apache log sees all incoming requests as coming from 127.0.0.1:
$ tail /opt/open-data/.virtualenvs/opendata/var/log/apache.log 127.0.0.1 - - [18/Sep/2014:21:24:15 +0200] "GET /gen/almond.js?5127e506 HTTP/1.1" 304 - "http://opendata.cern.ch/" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 660 127.0.0.1 - - [18/Sep/2014:21:24:15 +0200] "GET /gen/invenio.js?8e21d7fc HTTP/1.1" 304 - "http://opendata.cern.ch/" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 606 127.0.0.1 - - [18/Sep/2014:21:24:15 +0200] "GET /gen/jquery.js?a6392293 HTTP/1.1" 304 - "http://opendata.cern.ch/" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 1019
Let's fix it.
mod_rpaf
A reverse proxy add forward mod_rpaf module can help us here. However, it is not available for RHEL6 out of the box.
One could take it from CentOS 6, the binary API compatible distribution:
sudo rpm -ivh ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/Apache:/Modules/CentOS_CentOS-6/x86_64/mod_rpaf-0.6-1.2.x86_64.rpm for configoption in "LoadModule rpaf_module modules/mod_rpaf-2.0.so" \ "RPAFenable On" \ "RPAFsethostname On" \ "RPAFproxy_ips 127.0.0.1 ::1" \ "RPAFheader X-Forwarded-For"; do if ! grep -q "${configoption}" /etc/httpd/conf.d/mod_rpaf.conf; then echo "${configoption}" | sudo tee -a /etc/httpd/conf.d/mod_rpaf.conf fi done
We can also easily compile it ourselves:
cd /tmp wget http://www.stderr.net/apache/rpaf/download/mod_rpaf-0.6.tar.gz tar xvfz mod_rpaf-0.6.tar.gz cd mod_rpaf-0.6 sudo yum install -y httpd-devel sudo apxs -i -c -n mod_rpaf-2.0.so mod_rpaf-2.0.c # gives /usr/lib64/httpd/modules/mod_rpaf-2.0.so
Once available, let's configure mod_rpaf as follows:
$ sudo vim /etc/httpd/conf.d/mod_rpaf.conf $ cat /etc/httpd/conf.d/mod_rpaf.conf LoadModule rpaf_module modules/mod_rpaf-2.0.so RPAFenable On RPAFsethostname On RPAFproxy_ips 127.0.0.1 ::1 RPAFheader X-Forwarded-For
After restarting Apache, we see real IP addresses in the apache log:
86.209.237.81 - - [18/Sep/2014:21:59:27 +0200] "POST /results/83ce2e1d87cb0b8a190d34e69cba4786 HTTP/1.1" 200 43145 "http://opendata.cern.ch/search" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 490552 86.209.237.81 - - [18/Sep/2014:21:59:27 +0200] "GET /facet/collection/83ce2e1d87cb0b8a190d34e69cba4786?parent=CMS-Derived-Datasets HTTP/1.1" 200 13 "http://opendata.cern.ch/search" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 12762 86.209.237.81 - - [18/Sep/2014:21:59:27 +0200] "POST /results/83ce2e1d87cb0b8a190d34e69cba4786 HTTP/1.1" 200 43145 "http://opendata.cern.ch/search" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 968808 86.209.237.81 - - [18/Sep/2014:21:59:27 +0200] "POST /results/83ce2e1d87cb0b8a190d34e69cba4786 HTTP/1.1" 200 24569 "http://opendata.cern.ch/search" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 957070 86.209.237.81 - - [18/Sep/2014:21:59:27 +0200] "POST /results/83ce2e1d87cb0b8a190d34e69cba4786 HTTP/1.1" 200 24569 "http://opendata.cern.ch/search" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 4294376
All is well; the basic configuration is finished.
Massaging cookies
To profit from the web acceleration, we can speed things by configuring Varnish to cache everything coming from the backend application except for the search pages. Let's do this regardless of what our backend application says. This is because our application does not offer any user-specific functionality that would differentiate one guest user from another. Every use is treated equally, there is no login, no restricted data, etc.
The Invenio back-end application currently does not handle Set-Cookie very nicely, see issue #2291. Let us assume therefore that we need to remove this header from all application responses, except for search pages.
(Actually, we can cache also the search pages, but let's assume that we would like to amend cookies coming from the backend application only for certain URLs. This "advanced" configuration may be needed in later production.)
Let's start by cloning default configuration:
sudo cp /etc/varnish/default.vcl /etc/varnish/opendata.vcl sudo vim /etc/varnish/opendata.vcl
Let's introduce the following differences, where we basically unset req.http.Cookie and beresp.http.set-cookie for all pages except our wanted /search URL:
sudo diff -u /etc/varnish/default.vcl /etc/varnish/opendata.vcl --- /etc/varnish/default.vcl 2014-06-24 11:40:31.000000000 +0200 +++ /etc/varnish/opendata.vcl 2014-09-18 21:30:20.940161421 +0200 @@ -23,6 +23,10 @@ # # Typically you clean up the request here, removing cookies you don't need, # rewriting the request, etc. + + if (!(req.url ~ "^/search")) { + unset req.http.Cookie; + } } sub vcl_backend_response { @@ -30,6 +34,11 @@ # # Here you clean the response headers, removing silly Set-Cookie headers # and other mistakes your backend does. + + if (!(bereq.url ~ "^/search")) { + unset beresp.http.set-cookie; + set beresp.ttl = 1h; + } } sub vcl_deliver {
We can activate new configuration like this:
sudo perl -pi -e 's,VARNISH_VCL_CONF=/etc/varnish/default.vcl,VARNISH_VCL_CONF=/etc/varnish/opendata.vcl,g' /etc/sysconfig/varnish
Performance measurements
Let's measure the response time speed up via Apache ab tool. The old configuration gives:
laptop> ab -n 100 -c 5 http://opendata.cern.ch/ Requests per second: 32.38 [#/sec] (mean)
Restarting varnish with the new configuration gives:
$ sudo service varnish restart $ ab -n 100 -c 5 http://opendata.cern.ch/ Requests per second: 52.36 [#/sec] (mean)
We can serve 52 reqs/sec vs 32 reqs/sec. This does not seem much in terms of increase, but this measurement was done over a slow ADSL line which limits the throughput somewhat.
Here is throughput comparison on the server itself:
$ ab -n 100 -c 5 http://127.0.0.1:80/ Total transferred: 946000 bytes HTML transferred: 923100 bytes Requests per second: 2156.52 [#/sec] (mean) Time per request: 2.319 [ms] (mean) Time per request: 0.464 [ms] (mean, across all concurrent requests) Transfer rate: 19922.54 [Kbytes/sec] received $ ab -n 100 -c 5 http://127.0.0.1:8080/ Total transferred: 931641 bytes HTML transferred: 909141 bytes Requests per second: 72.28 [#/sec] (mean) Time per request: 69.175 [ms] (mean) Time per request: 13.835 [ms] (mean, across all concurrent requests) Transfer rate: 657.61 [Kbytes/sec] received #+END_EXAMPLE
We are much, much faster; 21k reqs/sec vs 72 reqs/sec.
Slashdot effect
Let's try to increase the number of client connections and observe response times when simulating 5 and 100 concurrent users:
laptop> ab -n 100 -c 5 http://opendata.cern.ch/ Requests per second: 52.36 [#/sec] (mean) laptop> ab -n 1000 -c 100 http://opendata.cern.ch/ Requests per second: 57.78 [#/sec] (mean)
The cache can easily serve such increased traffic, because the pages are served from memory via efficient event-driver model.
Note that proper user scalability test would require distributed testing with some backend heat processes, e.g. via siege. However we are interested here in a rule of thumb only.
Reboot-persistent configuration
How to make Varnish run after reboot:
$ sudo chkconfig | grep http httpd 0:off 1:off 2:on 3:on 4:on 5:on 6:off $ sudo chkconfig | grep varnish varnish 0:off 1:off 2:off 3:off 4:off 5:off 6:off varnishlog 0:off 1:off 2:off 3:off 4:off 5:off 6:off varnishncsa 0:off 1:off 2:off 3:off 4:off 5:off 6:off $ sudo chkconfig varnish on $ sudo chkconfig | grep varnish varnish 0:off 1:off 2:on 3:on 4:on 5:on 6:off varnishlog 0:off 1:off 2:off 3:off 4:off 5:off 6:off varnishncsa 0:off 1:off 2:off 3:off 4:off 5:off 6:off
Nicer error page
By default the Varnish error page is not "user-friendly-nice". E.g. stop Apache and observe "Error 503 Backend fetch failed".
To make the error page simpler and to hide Varnish server signature, we can edit vcl_backend_error:
$ sudo vim /etc/varnish/opendata.vcl $ sudo diff -u /etc/varnish/default.vcl /etc/varnish/opendata.vcl [...] + +sub vcl_backend_error { + set beresp.http.Content-Type = "text/html; charset=utf-8"; + set beresp.http.Retry-After = "5"; + synthetic( {"<!DOCTYPE html> +<html> + <head> + <title>"} + beresp.status + " " + beresp.reason + {"</title> + </head> + <body> + <h1>Error "} + beresp.status + " " + beresp.reason + {"</h1> + </body> +</html> +"} ); + return(deliver); +}
Logging
To enable logging of incoming queries on the Varnish level, do:
$ sudo /etc/init.d/varnishncsa start $ cat /var/log/varnish/varnishncsa.log 128.141.95.173 - - [05/Nov/2014:13:09:15 +0100] "GET http://opendata.cern.ch/ HTTP/1.1" 200 11612 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 128.141.95.173 - - [05/Nov/2014:13:09:15 +0100] "GET http://opendata.cern.ch/gen/opendata.css?eb2f0489 HTTP/1.1" 200 0 "http://opendata.cern.ch/" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 128.141.95.173 - - [05/Nov/2014:13:09:15 +0100] "GET http://opendata.cern.ch/gen/invenio.css?56a680c2 HTTP/1.1" 200 0 "http://opendata.cern.ch/" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.1.0" 128.141.164.203 - - [05/Nov/2014:13:09:39 +0100] "GET http://opendata.cern.ch/ HTTP/1.1" 200 11597 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0"
Conclusions
Varnish is a widely used HTTP accelerator for web applications. The use for the CERN Open Data portal seems perfectly plausible. One can relatively easily configure it to amend Set-Cookie for certain pages in case of (buggy) web application. The setup on the SLC6 platform seems stable under heavy load.