Archive for June, 2012

Nginx, Passenger and WSGI

by Oliver on Monday, June 25th, 2012.

It’s a little-known fact that Phusion Passenger, the awesome Rack webserver module, can also competently talk WSGI as well as fit into the Rack ecosystem. This means that not only can you run your Rack and Rails applications through it using Nginx or Apache (or in fact independently, using a cut-down version of Nginx) but also your WSGI-compliant Python applications.

This has been touched on in various levels of detail on Hongli’s own blog, the official Phusion blog, and the Dreamhost wiki, among other sites. You can piece together a working configuration between them but especially relying on the Dreamhost instructions you don’t really get a good idea of a generic configuration that would work outside of their tuned environment.

I’ve got an over-utilised HTPC at home that ends up not just serving as an XBMC frontend but a bunch of other things – IPv6 tunnel endpoint, VPN endpoint, IP traffic accounting system, monitoring station, wiki server, Mingle server and more. On 2GB of RAM this ends up being a significant amount, and most of the webapps are either being run as CGIs or proxied to via Apache. Since the wiki I’m running is MoinMoin, and I recently also found a way to run my CGIs through Passenger it seemed natural to do away with the separate webservers and run as much through Passenger as possible. Since all the cool kids are running Nginx these days on the basis of it having a much smaller memory footprint (and obviously other factors) I thought I would move to that from Apache at the same time and claim back some memory.

What I’ll describe below is a very quick run-through of setting up MoinMoin for operation through Passenger and Nginx. It should hopefully be somewhat reusable (the main point of writing this post, since I couldn’t find a decent generic guide through Google so far) but there are some details of the operation which I haven’t investigated entirely yet so don’t hold it against me!

Install Nginx (via Passenger)

If you are not familiar with Nginx, or if you are coming from the Apache world you may be dismayed, overjoyed, surprised or annoyed that Nginx doesn’t yet support dynamically-loaded modules like Apache does, no doubt for performance reasons. This means any additional modules like Passenger require a full recompile of the Nginx binary. For that reason I didn’t see a big point in installing the Ubuntu package for Nginx at all and just used the Passenger installer which has an option to download and install Nginx with Passenger already enabled.

Firstly, just install the Passenger gem:

root@oneiric:~# gem install passenger
Fetching: fastthread-1.0.7.gem (100%)
Building native extensions.  This could take a while...
Fetching: daemon_controller-1.0.0.gem (100%)
Fetching: rack-1.4.1.gem (100%)
Fetching: passenger-3.0.13.gem (100%)
Successfully installed fastthread-1.0.7
Successfully installed daemon_controller-1.0.0
Successfully installed rack-1.4.1
Successfully installed passenger-3.0.13
4 gems installed
Installing ri documentation for fastthread-1.0.7...
Installing ri documentation for daemon_controller-1.0.0...
Installing ri documentation for rack-1.4.1...
Installing ri documentation for passenger-3.0.13...
Installing RDoc documentation for fastthread-1.0.7...
Installing RDoc documentation for daemon_controller-1.0.0...
Installing RDoc documentation for rack-1.4.1...
Installing RDoc documentation for passenger-3.0.13...

Now use Passenger itself to download and install Nginx with its own recommended settings (option 1, when you have to choose), rather than rebuilding any available version for your distribution. Handily, the installer will give you pretty accurate commands to use if you don’t have some pre-requisites installed like a C++ compiler, so it is pretty idiot-proof.

root@oneiric:~# passenger-install-nginx-module 
Welcome to the Phusion Passenger Nginx module installer, v3.0.13.

...LOTS of output...

I chose to install it in the suggested location, /opt/nginx. Assuming you got through the pre-requisites and Nginx compiled and installed successfully, it will also be somewhat configured for you. However, we are interested in running a WSGI application, so the standard Rack/Rails suggestions the installer provides are not entirely helpful. What is left is to set up the app, provide the bridge to WSGI and set up a location stanza in Nginx for it.

Install MoinMoin

At a former employer, we used MoinMoin wiki for internal (and later external) documentation and I’ve continued that by using it for personal documentation. It’s a very handy tool, especially being available from all our computers at home and remotely via VPN. It’s written in Python and works very well.

Installing is dead simple with setuptools:

root@oneiric:~# apt-get install -y python-setuptools
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  python-pkg-resources
Suggested packages:
  python-distribute python-distribute-doc
The following NEW packages will be installed:
  python-pkg-resources python-setuptools
0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded.
Need to get 274 kB of archives.
After this operation, 1,274 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu/ oneiric-updates/main python-pkg-resources all 0.6.16-1ubuntu0.1 [62.7 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu/ oneiric-updates/main python-setuptools all 0.6.16-1ubuntu0.1 [212 kB]
Fetched 274 kB in 1s (180 kB/s)            
Selecting previously deselected package python-pkg-resources.
(Reading database ... 54159 files and directories currently installed.)
Unpacking python-pkg-resources (from .../python-pkg-resources_0.6.16-1ubuntu0.1_all.deb) ...
Selecting previously deselected package python-setuptools.
Unpacking python-setuptools (from .../python-setuptools_0.6.16-1ubuntu0.1_all.deb) ...
Setting up python-pkg-resources (0.6.16-1ubuntu0.1) ...
Setting up python-setuptools (0.6.16-1ubuntu0.1) ...


root@oneiric:~# easy_install moin
Searching for moin
Reading http://pypi.python.org/simple/moin/
Reading http://moinmo.in/
Best match: moin 1.9.4
Downloading http://static.moinmo.in/files/moin-1.9.4.tar.gz
Processing moin-1.9.4.tar.gz
Running moin-1.9.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Wp_XA9/moin-1.9.4/egg-dist-tmp-fEQzCQ
warning: no files found matching '*' under directory 'tests'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '*.pyo' found anywhere in distribution
warning: no previously-included files matching '*/CVS/*' found anywhere in distribution
warning: no previously-included files matching '*/.cvsignore' found anywhere in distribution
warning: no previously-included files matching 'underlay.tar' found anywhere in distribution
warning: no previously-included files matching 'README.underlay' found anywhere in distribution
zip_safe flag not set; analyzing archive contents...
MoinMoin.wikiutil: module references __file__
MoinMoin.packages: module references __file__
MoinMoin.events.__init__: module references __file__
MoinMoin.support.werkzeug.utils: module references __file__
MoinMoin.support.werkzeug.utils: module references __path__
MoinMoin.support.werkzeug.serving: module references __file__
MoinMoin.support.werkzeug.__init__: module references __file__
MoinMoin.support.werkzeug.__init__: module references __path__
MoinMoin.support.werkzeug.contrib.jsrouting: module MAY be using inspect.trace
MoinMoin.support.werkzeug.debug.tbtools: module MAY be using inspect.getsourcefile
MoinMoin.support.werkzeug.debug.__init__: module references __file__
MoinMoin.support.pygments.unistring: module references __file__
MoinMoin.support.pygments.formatters._mapping: module references __file__
MoinMoin.support.pygments.lexers._luabuiltins: module references __file__
MoinMoin.support.pygments.lexers._clbuiltins: module MAY be using inspect.trace
MoinMoin.support.pygments.lexers._phpbuiltins: module references __file__
MoinMoin.support.pygments.lexers._mapping: module references __file__
MoinMoin.converter.__init__: module references __file__
MoinMoin.script.account.__init__: module references __file__
MoinMoin.script.maint.__init__: module references __file__
MoinMoin.script.cli.__init__: module references __file__
MoinMoin.script.server.__init__: module references __file__
MoinMoin.script.import.__init__: module references __file__
MoinMoin.script.export.__init__: module references __file__
MoinMoin.script.index.__init__: module references __file__
MoinMoin.script.xmlrpc.__init__: module references __file__
MoinMoin.script.migration.__init__: module references __file__
MoinMoin.web.__init__: module references __file__
MoinMoin.web.static.__init__: module references __file__
MoinMoin.filter.__init__: module references __file__
MoinMoin.parser.__init__: module references __file__
MoinMoin.config.multiconfig: module references __file__
MoinMoin.macro.__init__: module references __file__
MoinMoin.userprefs.__init__: module references __file__
MoinMoin.formatter.__init__: module references __file__
MoinMoin.action.SpellCheck: module references __file__
MoinMoin.action.__init__: module references __file__
MoinMoin.theme.__init__: module references __file__
MoinMoin.xmlrpc.__init__: module references __file__
Adding moin 1.9.4 to easy-install.pth file
Installing moin script to /usr/local/bin

Installed /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg
Processing dependencies for moin
Finished processing dependencies for moin

Now we have both MoinMoin and Nginx/Passenger installed as root, but don’t worry – they definitely won’t run as root.

Configure MoinMoin instance

Setting up MoinMoin to run as a WSGI app is actually very straightforward, but we’ll set it up as a standard server app explicitly here. By this I mean, not running the standalone python-based webserver included with it, using a separate user account for it and as a result keeping the data entirely separate from the installation base of the wiki, making it easier to upgrade later on.

root@oneiric:~# useradd -m moin -s /bin/bash
root@oneiric:~# su - moin
moin@oneiric:~$ mkdir wiki; cd wiki
moin@oneiric:~/wiki$ mkdir config public
moin@oneiric:~/wiki$ cp -a /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg/share/moin/data/ .
moin@oneiric:~/wiki$ cp -a /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg/share/moin/underlay/ .
moin@oneiric:~/wiki$ ln -s /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg/MoinMoin/web/static/htdocs/

Here we are setting up the basic structure of the app:

  • Config will contain our customised wiki config for this instance.
  • Public is a requirement of Passenger. It can stay empty.
  • Data will contain our wiki pages, from a starting point of what the installation gives you by default.
  • Underlay contains built-in wiki page resources of MoinMoin. It shouldn’t need to change, but when I attempted to run it from a symlink to the installed files it failed on missing write permissions. I’m not sure why this happened but I’m more comfortable making a copy and referencing it than altering the original installation file permissions. Something to investigate later.
  • Htdocs are static files used by MoinMoin (stylesheets, images etc). We just symlink to them.

Let’s create the configuration:

moin@oneiric:~/wiki$ cd config/
moin@oneiric:~/wiki/config$ cp /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg/share/moin/config/wikiconfig.py .

We just customise the standard included configuration file. These are the entries I changed:

data_dir = os.path.join(instance_dir, '..', 'data', '') # path with trailing /
data_underlay_dir = os.path.join(instance_dir, '..', 'underlay', '') # path with trailing /
url_prefix_static = '/wiki/htdocs'
sitename = u'My Wiki'
page_front_page = u"FrontPage"

Passenger and WSGI

To run a WSGI app, Passenger needs a script in place which initialises the app and the WSGI interface. It is called passenger_wsgi.py and usually goes into the root of the app site. In my case, I wanted to serve the wiki from the /wiki/ URI path, so things were a little different but not much. Here’s what I dropped into the file:

moin@oneiric:~$ pwd
/home/moin
moin@oneiric:~$ cat passenger_wsgi.py 
import sys, os

#The two following line ensure that you are using python2.7
#You can change it to another python version if you want
INTERP = "/usr/bin/python2.7"
if sys.executable != INTERP: os.execl(INTERP, INTERP, *sys.argv)

# Add to the search path so that we can do "from MoinMoin import ..."
sys.path.insert(0, '/usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg')

#Path to your $WIKI_CONFIG
sys.path.insert(0, '/home/moin/wiki/config')

from MoinMoin.web.serving import make_application

#Import to set shared to False, to serve the media files directly
application = make_application(shared=False)

This config file is largely based on what Dreamhost have provided in their wiki. The file does not need to be executable.

Now let’s set up Nginx and Passenger to start the app using this file. As mentioned, I want to serve the site from the path /wiki/ so I use a location stanza in /opt/nginx/conf/nginx.conf. I also set up a separate location stanza to serve the static site assets from htdocs directly.

        # Wiki
        location /wiki/ {
                alias /home/moin/;
                passenger_enabled on;
                passenger_user moin;
                passenger_base_uri /wiki;
        }
        location /wiki/htdocs/ {
                alias /home/moin/wiki/htdocs/;
        }

Here, the passenger_base_uri is the key to having the site served under a different path than from the root. This is one area that Nginx seems to make more complicated than Apache. Both have the ability to proxy to alternate webservers but only Apache can reverse proxypass and fix up links in the returned responses (although I believe the same can be done with Nginx scripting which I haven’t tried yet). Admittedly it’s a different tool than passing requests and responses through a loaded/compiled module but I haven’t yet found a good way to do it in Nginx (so undoubtedly a mechanism does exist).

Finally, we start up Nginx:

root@oneiric:~# /opt/nginx/sbin/nginx

Navigating to http://localhost/wiki/ now displays the front page of the so-far empty wiki. In my case, I just moved my data out of the previous path and into the location now served by Passenger (although with a symlink it is trivial to migrate from the standalone Python webserver to the Passenger-based WSGI system). Please do note that I haven’t dealt at all with the security system for MoinMoin which allows individual users to be identified and authentication/authorisation to be required.

Hopefully this has been useful for you, and I’ve run through the setup with a virtual machine from scratch to check that all the steps are correct. Please let me know though if I’ve made any mistakes, and of course any tips for a better Nginx configuration are always welcome!

Tags: , , , , ,

Monday, June 25th, 2012 Tech 1 Comment

More on Bundler and RPMs

by Oliver on Friday, June 22nd, 2012.

So I was going to post a comment on the original blog post which I linked to from here but Facebook connect was broken and I don’t feel like setting up yet another account </1stworldproblems>… but there was a slight development.

I attempted to use the same methodology we had already followed with the first app, to another app so that this one was also packaged using Bundler and RPM. Needing to confirm that all was well before I committed the changes I did some testing in a CentOS virtual machine in Vagrant. To be expected, with a deployment bundle of a decent few gems the package size comes out at around 15MB. I committed the changes and the produced RPM from the Jenkins build job was 50MB. Why?

Initially, I suspected subtle differences in Bundler gem versions, library path differences etc but these ended up being dead-ends. What was happening, however, was that the gems were being installed into apprepo/vendor/ruby/1.8, including the excluded groups. I am assuming this is a necessity for the tests and other build-time checks to run, but I certainly didn’t want them to be packaged with the RPM which can rely on just the gem cache.

As it turns out, Bundler has some “smart” code around user permissions – specifically around what commands you can run through sudo. A standard Vagrant box will have unrestricted sudo access for the vagrant user, so it can install gems anywhere. Bundler uses this fact to its advantage and will install them into the standard /usr/lib64/ruby/gems/1.8/gems/ path. Hence, when it comes time to package up the gems as an RPM, these files are not in the app build path and the RPM stays a slim 15MB.

In our build pipeline which uses a standard user account on a fairly normal CentOS install, the jenkins user has no such permissions and thus has no option but to install them into the vendor directory along with the other Bundler artifacts. The solution was simply to exclude this directory from being packaged, although I’m still not entirely sure why we didn’t hit this problem the first time around. Nevertheless, bearing in mind these few gotchas, we now have a system in place that makes it a snap to add more gems and maintain a well-packaged and stable application from development to production.

Tags: , , , ,

Friday, June 22nd, 2012 Tech 2 Comments

More on Puppet module unit-testing

by Oliver on Tuesday, June 19th, 2012.

I’ve previously made presentations and blog posts about Puppet and module testing – my position is that you should treat Puppet code as just that: code. Just like mainstream programming languages, it is possible (and good practice) to test your Puppet manifests so that you have higher confidence in them working when it comes time to actually run them.

There are some other factors which play a part:

  • Make your modules generic. Any environment or host specifics you have baked into classes or definitions makes them that much harder to test in a dissimilar (read: clean) environment like your CI pipeline.
  • As a corollary to the first point, classes should be parameterised so that they can be used in a variety of different ways – in your production environment, in staging, in QA, development (etc etc) and of course your test pipeline.
  • Loosely couple your modules. Tight dependencies enforced with strict ordering constraints means that you can’t test each class by itself without pulling in all the dependencies as well. Speaking directly from experience, this can mean errors are that much harder to track down when you have to look in a bunch of places for one failing resource.

It is issues like this last point that seem to cause the most grief when testing Puppet modules in our environment. We have a collection of common modules called dist, which provide both re-usable functionality when required by application modules (e.g. the ability to easily set up a MySQL server, a standard way to provision Apache/Nginx etc) and configuration we expect to be standardized across all machines – in other words, the platform. In fact the wrapper class that pulls in the standardized configuration is called just that – “platform”.

Platform pulls in a lot of helper functionality which application modules can take for granted. An example is the yum class. Here we set up a standard /etc/yum.conf with some tunable values, the /etc/yum.repos.d fragments directory and a bunch of standard repository fragments such as OS, Updates and so on. The module also includes a defined type, yum::repo which acts much like the built-in version but works with the yum class to have a fully managed fragments directory – if a yum fragment is on disk but not managed by Puppet, it is removed.

Naturally, in developing an application module you will want to set up a repository fragment to point to wherever you have your app packages stored, so all application modules utilise yum::repo at least once. Now, in testing your application class foo as a unit test, you might have the following code:

class foo {
  yum::repo { 'myapp':
    descr => 'Repository for My App',
    ...
  }
...
}


To test it, you’d have the following in the tests directory:

class { 'foo': }


This is, of course, a trivial example with no parameters. Here, we already have a problem when attempting to unit test the class – it will immediately fail due to the yum class not having been instantiated in the catalog, thus not satisfying the dependency the yum::repo defined type has on it. Typically we have worked around this by just adding it to the test:

class { 'yum': }
class { 'foo': }

This is fine if you know which class you need to pull in, but if there are dependencies between resources in different classes it may not be so clear. The dependent resource will know what it needs, but not where it should retrieve it from. This pattern actually breaks encapsulation, so while it is acceptable in Puppet standard practice it is not very good practice from a developer standpoint.

Another idea we toyed with was automatically including a “stubbed out” version of the platform in every test, thus satisfying all dependencies that application classes may have without needing the user to specify them. I don’t like this idea for a couple of main reasons:

  1. It will blow out compile (and thus test) time a lot. We can usually get away with each test running for a few seconds, and a complete app module (for the entire job) in maybe 30-60 seconds. Pull in all of platform for every test and one application module will take a few minutes. Multiply that by hundreds of application modules and you are looking at a big increase in test time.
  2. This is no longer really unit testing. We’re doing full-blown integration testing at this point. Don’t get me wrong – this is also valuable, but there is a time and place for it, and I don’t want to destroy the unit testing that we already have, with the implicit limited scope (and thus easier fault-finding) that it provides.

In traditional unit-testing with external dependencies that we don’t want to test, we would mock those dependencies. Good mocking libraries will also allow us to be explicit about call ordering, inputs and outputs expected in order to verify behaviour of our own code as well as the relationship it establishes with the external dependencies. Is this possible with Puppet? What would it look like?

define yum::repo (
  $descr,
  $baseurl,
  $enabled,
  ...
) { }
class { 'foo': }

Now we have a somewhat mocked version of yum::repo that our class can use without having to worry about other chained dependencies outside of its view of the world. This starts introducing some other problems though:

  • It’s quite clear that Puppet language just doesn’t have the capabilities for advanced mocking (which is no surprise – that’s not its primary goal). It would be interesting if a third-party library provided Puppet mocks though…
  • We now have inconsistency in our testing methods. The only time you need to mock out a class/define is when it has unresolved dependencies you don’t want to have to worry about. In all other cases, we can still use the real version (which will be on the modulepath already since we install the dist modules into the testing VM with the app module). Now there are two slightly confusing mechanisms for testing.
  • Will the mocked version be found before the real version that is elsewhere on the modulepath? I haven’t looked into the code to know whether it will be found immediately by virtue of it having just been parsed, but it’s not an unknown factor that I like.
  • One of the tenets of reusable, encapsulated functionality that we provide in our dist modules is that you don’t need to know the details. In fact, thanks to parameterised classes, defined types, custom types and providers it is often not possible to tell what is built-in to Puppet and what is one of the previously mentioned ways of extending it – and this is just how it should be. Would you really want to mock out any one of these resource types when it provides so much more than just a container with parameters? Input validation, consistency between input parameters, platform checking, built-in dependency handling between resources of the same type are all valid reasons to stick with what these types give you for free. It feels wrong to remove them (which effectively is the reason you do the compile testing in the first place).

I haven’t spent us much time on this as on other Puppet problems previously, because at the moment (fortunately) it is mostly no more than an annoyance. We can add in the missing test dependencies by hand, and most of our users are becoming savvy enough to do it themselves. I’m interesting in what the community thinks about this topic though, and if you have solved this problem yourself? Please leave comments; I would love to know what you think!

Tags: , ,

Tuesday, June 19th, 2012 Tech No Comments

Bundler, gems and RPMs

by Oliver on Wednesday, June 6th, 2012.

Recently I was working with one of our valued Thoughtworkers on an application we were trying to not only develop in a sane way, but package and deploy to production with just as much sanity. The status quo seems to favour bundler on the development side, but RPMs on the production side (if you judge these decisions based on what developers and ops folk prefer, generally).

After a reasonable amount of WTFing, we actually managed to get it working reasonably well. If you have Ruby apps with gem dependencies and want to develop and push to production with equal ease I suggest you read the blog post on the subject by Philip Potter here: http://rhebus.posterous.com/rpm-ruby-and-bundler

Tags: , , , ,

Wednesday, June 6th, 2012 Tech 1 Comment