python

Nginx, Passenger and WSGI

by Oliver on Monday, June 25th, 2012.

It’s a little-known fact that Phusion Passenger, the awesome Rack webserver module, can also competently talk WSGI as well as fit into the Rack ecosystem. This means that not only can you run your Rack and Rails applications through it using Nginx or Apache (or in fact independently, using a cut-down version of Nginx) but also your WSGI-compliant Python applications.

This has been touched on in various levels of detail on Hongli’s own blog, the official Phusion blog, and the Dreamhost wiki, among other sites. You can piece together a working configuration between them but especially relying on the Dreamhost instructions you don’t really get a good idea of a generic configuration that would work outside of their tuned environment.

I’ve got an over-utilised HTPC at home that ends up not just serving as an XBMC frontend but a bunch of other things – IPv6 tunnel endpoint, VPN endpoint, IP traffic accounting system, monitoring station, wiki server, Mingle server and more. On 2GB of RAM this ends up being a significant amount, and most of the webapps are either being run as CGIs or proxied to via Apache. Since the wiki I’m running is MoinMoin, and I recently also found a way to run my CGIs through Passenger it seemed natural to do away with the separate webservers and run as much through Passenger as possible. Since all the cool kids are running Nginx these days on the basis of it having a much smaller memory footprint (and obviously other factors) I thought I would move to that from Apache at the same time and claim back some memory.

What I’ll describe below is a very quick run-through of setting up MoinMoin for operation through Passenger and Nginx. It should hopefully be somewhat reusable (the main point of writing this post, since I couldn’t find a decent generic guide through Google so far) but there are some details of the operation which I haven’t investigated entirely yet so don’t hold it against me!

Install Nginx (via Passenger)

If you are not familiar with Nginx, or if you are coming from the Apache world you may be dismayed, overjoyed, surprised or annoyed that Nginx doesn’t yet support dynamically-loaded modules like Apache does, no doubt for performance reasons. This means any additional modules like Passenger require a full recompile of the Nginx binary. For that reason I didn’t see a big point in installing the Ubuntu package for Nginx at all and just used the Passenger installer which has an option to download and install Nginx with Passenger already enabled.

Firstly, just install the Passenger gem:

root@oneiric:~# gem install passenger
Fetching: fastthread-1.0.7.gem (100%)
Building native extensions.  This could take a while...
Fetching: daemon_controller-1.0.0.gem (100%)
Fetching: rack-1.4.1.gem (100%)
Fetching: passenger-3.0.13.gem (100%)
Successfully installed fastthread-1.0.7
Successfully installed daemon_controller-1.0.0
Successfully installed rack-1.4.1
Successfully installed passenger-3.0.13
4 gems installed
Installing ri documentation for fastthread-1.0.7...
Installing ri documentation for daemon_controller-1.0.0...
Installing ri documentation for rack-1.4.1...
Installing ri documentation for passenger-3.0.13...
Installing RDoc documentation for fastthread-1.0.7...
Installing RDoc documentation for daemon_controller-1.0.0...
Installing RDoc documentation for rack-1.4.1...
Installing RDoc documentation for passenger-3.0.13...

Now use Passenger itself to download and install Nginx with its own recommended settings (option 1, when you have to choose), rather than rebuilding any available version for your distribution. Handily, the installer will give you pretty accurate commands to use if you don’t have some pre-requisites installed like a C++ compiler, so it is pretty idiot-proof.

root@oneiric:~# passenger-install-nginx-module 
Welcome to the Phusion Passenger Nginx module installer, v3.0.13.

...LOTS of output...

I chose to install it in the suggested location, /opt/nginx. Assuming you got through the pre-requisites and Nginx compiled and installed successfully, it will also be somewhat configured for you. However, we are interested in running a WSGI application, so the standard Rack/Rails suggestions the installer provides are not entirely helpful. What is left is to set up the app, provide the bridge to WSGI and set up a location stanza in Nginx for it.

Install MoinMoin

At a former employer, we used MoinMoin wiki for internal (and later external) documentation and I’ve continued that by using it for personal documentation. It’s a very handy tool, especially being available from all our computers at home and remotely via VPN. It’s written in Python and works very well.

Installing is dead simple with setuptools:

root@oneiric:~# apt-get install -y python-setuptools
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  python-pkg-resources
Suggested packages:
  python-distribute python-distribute-doc
The following NEW packages will be installed:
  python-pkg-resources python-setuptools
0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded.
Need to get 274 kB of archives.
After this operation, 1,274 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu/ oneiric-updates/main python-pkg-resources all 0.6.16-1ubuntu0.1 [62.7 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu/ oneiric-updates/main python-setuptools all 0.6.16-1ubuntu0.1 [212 kB]
Fetched 274 kB in 1s (180 kB/s)            
Selecting previously deselected package python-pkg-resources.
(Reading database ... 54159 files and directories currently installed.)
Unpacking python-pkg-resources (from .../python-pkg-resources_0.6.16-1ubuntu0.1_all.deb) ...
Selecting previously deselected package python-setuptools.
Unpacking python-setuptools (from .../python-setuptools_0.6.16-1ubuntu0.1_all.deb) ...
Setting up python-pkg-resources (0.6.16-1ubuntu0.1) ...
Setting up python-setuptools (0.6.16-1ubuntu0.1) ...


root@oneiric:~# easy_install moin
Searching for moin
Reading http://pypi.python.org/simple/moin/
Reading http://moinmo.in/
Best match: moin 1.9.4
Downloading http://static.moinmo.in/files/moin-1.9.4.tar.gz
Processing moin-1.9.4.tar.gz
Running moin-1.9.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Wp_XA9/moin-1.9.4/egg-dist-tmp-fEQzCQ
warning: no files found matching '*' under directory 'tests'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '*.pyo' found anywhere in distribution
warning: no previously-included files matching '*/CVS/*' found anywhere in distribution
warning: no previously-included files matching '*/.cvsignore' found anywhere in distribution
warning: no previously-included files matching 'underlay.tar' found anywhere in distribution
warning: no previously-included files matching 'README.underlay' found anywhere in distribution
zip_safe flag not set; analyzing archive contents...
MoinMoin.wikiutil: module references __file__
MoinMoin.packages: module references __file__
MoinMoin.events.__init__: module references __file__
MoinMoin.support.werkzeug.utils: module references __file__
MoinMoin.support.werkzeug.utils: module references __path__
MoinMoin.support.werkzeug.serving: module references __file__
MoinMoin.support.werkzeug.__init__: module references __file__
MoinMoin.support.werkzeug.__init__: module references __path__
MoinMoin.support.werkzeug.contrib.jsrouting: module MAY be using inspect.trace
MoinMoin.support.werkzeug.debug.tbtools: module MAY be using inspect.getsourcefile
MoinMoin.support.werkzeug.debug.__init__: module references __file__
MoinMoin.support.pygments.unistring: module references __file__
MoinMoin.support.pygments.formatters._mapping: module references __file__
MoinMoin.support.pygments.lexers._luabuiltins: module references __file__
MoinMoin.support.pygments.lexers._clbuiltins: module MAY be using inspect.trace
MoinMoin.support.pygments.lexers._phpbuiltins: module references __file__
MoinMoin.support.pygments.lexers._mapping: module references __file__
MoinMoin.converter.__init__: module references __file__
MoinMoin.script.account.__init__: module references __file__
MoinMoin.script.maint.__init__: module references __file__
MoinMoin.script.cli.__init__: module references __file__
MoinMoin.script.server.__init__: module references __file__
MoinMoin.script.import.__init__: module references __file__
MoinMoin.script.export.__init__: module references __file__
MoinMoin.script.index.__init__: module references __file__
MoinMoin.script.xmlrpc.__init__: module references __file__
MoinMoin.script.migration.__init__: module references __file__
MoinMoin.web.__init__: module references __file__
MoinMoin.web.static.__init__: module references __file__
MoinMoin.filter.__init__: module references __file__
MoinMoin.parser.__init__: module references __file__
MoinMoin.config.multiconfig: module references __file__
MoinMoin.macro.__init__: module references __file__
MoinMoin.userprefs.__init__: module references __file__
MoinMoin.formatter.__init__: module references __file__
MoinMoin.action.SpellCheck: module references __file__
MoinMoin.action.__init__: module references __file__
MoinMoin.theme.__init__: module references __file__
MoinMoin.xmlrpc.__init__: module references __file__
Adding moin 1.9.4 to easy-install.pth file
Installing moin script to /usr/local/bin

Installed /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg
Processing dependencies for moin
Finished processing dependencies for moin

Now we have both MoinMoin and Nginx/Passenger installed as root, but don’t worry – they definitely won’t run as root.

Configure MoinMoin instance

Setting up MoinMoin to run as a WSGI app is actually very straightforward, but we’ll set it up as a standard server app explicitly here. By this I mean, not running the standalone python-based webserver included with it, using a separate user account for it and as a result keeping the data entirely separate from the installation base of the wiki, making it easier to upgrade later on.

root@oneiric:~# useradd -m moin -s /bin/bash
root@oneiric:~# su - moin
moin@oneiric:~$ mkdir wiki; cd wiki
moin@oneiric:~/wiki$ mkdir config public
moin@oneiric:~/wiki$ cp -a /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg/share/moin/data/ .
moin@oneiric:~/wiki$ cp -a /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg/share/moin/underlay/ .
moin@oneiric:~/wiki$ ln -s /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg/MoinMoin/web/static/htdocs/

Here we are setting up the basic structure of the app:

  • Config will contain our customised wiki config for this instance.
  • Public is a requirement of Passenger. It can stay empty.
  • Data will contain our wiki pages, from a starting point of what the installation gives you by default.
  • Underlay contains built-in wiki page resources of MoinMoin. It shouldn’t need to change, but when I attempted to run it from a symlink to the installed files it failed on missing write permissions. I’m not sure why this happened but I’m more comfortable making a copy and referencing it than altering the original installation file permissions. Something to investigate later.
  • Htdocs are static files used by MoinMoin (stylesheets, images etc). We just symlink to them.

Let’s create the configuration:

moin@oneiric:~/wiki$ cd config/
moin@oneiric:~/wiki/config$ cp /usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg/share/moin/config/wikiconfig.py .

We just customise the standard included configuration file. These are the entries I changed:

data_dir = os.path.join(instance_dir, '..', 'data', '') # path with trailing /
data_underlay_dir = os.path.join(instance_dir, '..', 'underlay', '') # path with trailing /
url_prefix_static = '/wiki/htdocs'
sitename = u'My Wiki'
page_front_page = u"FrontPage"

Passenger and WSGI

To run a WSGI app, Passenger needs a script in place which initialises the app and the WSGI interface. It is called passenger_wsgi.py and usually goes into the root of the app site. In my case, I wanted to serve the wiki from the /wiki/ URI path, so things were a little different but not much. Here’s what I dropped into the file:

moin@oneiric:~$ pwd
/home/moin
moin@oneiric:~$ cat passenger_wsgi.py 
import sys, os

#The two following line ensure that you are using python2.7
#You can change it to another python version if you want
INTERP = "/usr/bin/python2.7"
if sys.executable != INTERP: os.execl(INTERP, INTERP, *sys.argv)

# Add to the search path so that we can do "from MoinMoin import ..."
sys.path.insert(0, '/usr/local/lib/python2.7/dist-packages/moin-1.9.4-py2.7.egg')

#Path to your $WIKI_CONFIG
sys.path.insert(0, '/home/moin/wiki/config')

from MoinMoin.web.serving import make_application

#Import to set shared to False, to serve the media files directly
application = make_application(shared=False)

This config file is largely based on what Dreamhost have provided in their wiki. The file does not need to be executable.

Now let’s set up Nginx and Passenger to start the app using this file. As mentioned, I want to serve the site from the path /wiki/ so I use a location stanza in /opt/nginx/conf/nginx.conf. I also set up a separate location stanza to serve the static site assets from htdocs directly.

        # Wiki
        location /wiki/ {
                alias /home/moin/;
                passenger_enabled on;
                passenger_user moin;
                passenger_base_uri /wiki;
        }
        location /wiki/htdocs/ {
                alias /home/moin/wiki/htdocs/;
        }

Here, the passenger_base_uri is the key to having the site served under a different path than from the root. This is one area that Nginx seems to make more complicated than Apache. Both have the ability to proxy to alternate webservers but only Apache can reverse proxypass and fix up links in the returned responses (although I believe the same can be done with Nginx scripting which I haven’t tried yet). Admittedly it’s a different tool than passing requests and responses through a loaded/compiled module but I haven’t yet found a good way to do it in Nginx (so undoubtedly a mechanism does exist).

Finally, we start up Nginx:

root@oneiric:~# /opt/nginx/sbin/nginx

Navigating to http://localhost/wiki/ now displays the front page of the so-far empty wiki. In my case, I just moved my data out of the previous path and into the location now served by Passenger (although with a symlink it is trivial to migrate from the standalone Python webserver to the Passenger-based WSGI system). Please do note that I haven’t dealt at all with the security system for MoinMoin which allows individual users to be identified and authentication/authorisation to be required.

Hopefully this has been useful for you, and I’ve run through the setup with a virtual machine from scratch to check that all the steps are correct. Please let me know though if I’ve made any mistakes, and of course any tips for a better Nginx configuration are always welcome!

Tags: , , , , ,

Monday, June 25th, 2012 Tech 1 Comment

Mocking fun with Python, mox and socket

by Oliver on Friday, September 9th, 2011.

I’ve been doing most of my work with Ruby this last year, as a lot of it was Puppet-related. I went through the pain and pleasure of learning Ruby and the intricacies of unit testing with test::unit and RSpec, and mocking objects with the excellent Mocha library. Now I have slightly returned to my roots by doing a bit of Python programming on another part of our codebase.

We use Python for some of the code we use to test our infrastructure – initially because it most suited the engineers who were implementing it and Python was already on our list of supported languages. We don’t want our language set to grow unwieldy but at this point in time we have a fairly well-defined set of domains in which we restrict our language choice. Finally in this sprint my chance came up to do some work on this code base so it was time to freshen my memory on Python unit testing.

The particular task at hand was verifying connectivity to the management interface of a piece of hardware. Without giving too much away, you might for example do an extremely simple check of a device that has an SSH interface as follows:


    # Create a TCP connection to the SSH port and grab the output
    try:
      try:
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(5)
        sock.connect((ip, socket.getservbyname('ssh')))
        sock_output = sock.recv(100)
      except socket.timeout:
        return False, "SSH Connection Timeout"
      except socket.error:
        return False, "SSH Connection Error"
    finally:
      sock.close()

    result = re.search('SSH-2.0-OpenSSH_5.1', sock_output)
    if not result:
      return False, "Valid SSH interface not found"
    else:
      return True, "Valid SSH interface was found"

No, this is not great code (and you can blame Python 2.4 for that crazy nested exception handling block), but it works and reasonably catches the obvious errors. How do we go about testing this? I have another test class which verifies connectivity to an HTTP interface of a piece of hardware, but the decision was just about made up for me with that. Since the interface actually requires SSL, it would be quite difficult, tiring and time consuming to make a real HTTPS connection to a fake server and pass some dummy text across the connection. Therefore in that case I simply stubbed the calls to urllib2.urlopen and passed the dummy text without any network being involved.

In the case of a much simpler interface like a socket, we have a quandary on our hands:

  • Create a listening socket, have it call a real object which passes data over the connection which goes back to the connecting object.
  • Create a listening socket which mocks out the send method in order to send some pre-cooked output back to the connecting object.
  • Mock the sending socket and have pre-cooked output going back to the calling object.

Ideally we’d keep things as real as possible, spinning up a connected pair of sockets on the loopback adaptor and performing reasonably real communication, but after being away from Python for a while it was far too taxing on the brain. To cut a long story short I decided to take the last option and mock the socket object itself, although as it turned out, it was a bit tricky:


  def setUp(self):
    """ Create a mock socket for later use """
    super(TestSshTest, self).setUp()
    self.mock_socket = self.mox.CreateMockAnything(socket.socket)
    self.mox.StubOutWithMock(socket, 'getservbyname')
    self.mox.StubOutWithMock(socket, 'socket')
    socket.socket(socket.AF_INET, socket.SOCK_STREAM).AndReturn(self.mock_socket)

  def test_ssh_success(self):
    """ Standard successful test """
    socket.getservbyname('ssh').AndReturn(33333)
    self.mock_socket.connect(('192.0.2.1', 33333))
    self.mock_socket.recv(100).AndReturn('SSH-2.0-OpenSSH_5.1')
    self.mock_socket.close()
    self.mox.ReplayAll()

    # real method runs here

Don’t ask me exactly why all the above was necessary, as I’m still coming to grips with it. Part of it I believe is due to the intricacies of exactly how different entities work in Python, and part of it is how sockets are set up. Pymox appears to be more complicated to use than Mocha but I’m reasonably certain that is just added flexibility and power that Mocha hides from you – I’m still waiting for the lightbulb moment.

If anybody out there has successfully mocked out sockets in Python using Pymox and in a way that is simpler than this, I’d love to hear how it was done!

Tags: , , , ,

Friday, September 9th, 2011 Tech No Comments

OpenCV on the N900

by Oliver on Wednesday, July 27th, 2011.

I’ve been playing around with OpenCV over the last few weeks while reading/coding along with the Learning OpenCV book. Overall I’m pretty impressed with how easy it is to do quite complex things, thanks to the rich library of visual computing functions. The library is natively in C, but there are also python bindings using SWIG. It is actually an interesting exercise taking the examples in the Learning OpenCV book (which were written for C) and rewriting them in Python. Incidentally, my worked examples are here if you want to reuse them or take a look at my terrible code.

Actually there are not too many changes necessary and you don’t have to worry about pointers or memory management which is handy. The great thing about having a Python library to use is that it is generally cross-platform. Sure enough, someone has packaged OpenCV for Maemo 5. Unfortunately the default Python version in the Scratchbox build environment for the platform is 2.3, which apparently doesn’t play nicely when trying to build python-opencv. So last night I rebuilt against Python 2.5 which is also available in the regular “Fremantle” repositories for the phone and today have a working python-opencv environment.

I ran one of the example programs which simply opens the default system camera and displays it on the screen, both on my laptop and on my phone and this is the result:

I’m not sure if OpenCV has any camera focusing calls, but in any case none are used in this example obviously so the quality is not great. There is also some colour-correction that perhaps the standard imaging utilities in the phone uses but OpenCV lacks so the colours are a bit off. Rest assured though, that both the computer and the phone are indeed running the same (very simple) script (chapter2/example9.py).

A bit more code and you could have the computer and phone tracking each other, and after a few more iterations lead to some kind of T-800/T-1000 cyber-stand-off 😉 I’m talking to the python-opencv package maintainer to get the fixes incorporated into the public repositories (extras-devel right now) but if you want them earlier please let me know.

Tags: , , ,

Wednesday, July 27th, 2011 Tech 2 Comments

I’m done with shell scripting

by Oliver on Saturday, February 12th, 2011.

I think I will call this week the last I use shell script as my primary go-to language. Yes, by trade I am a systems administrator but I do have a Bachelor of Computer Science degree and there is a dormant programmer inside me desperately trying to get out. I feel like shell script has become the familiar crutch that I go back to whenever faced with a problem, and that is becoming frustrating to me.

Don’t get me wrong – there is a wealth of things that shell script (and I’m primarily referring to BASH (or Bourne Again SHell) here rather than C SHell, Korn SHell or the legendary Z SHell) can do, even more so with the old-school UNIX tools like grep, sed, awk, cut, tr and pals. In fact, if you have the displeasure of being interviewed by me, a good deal of familiarity with these tools will be expected of you. They have their place, that is what I am trying to say, but the reflex of reaching for these tools needs to be quietened somewhat.

The straw that broke my camel’s back in this instance was this sorry piece of scripting by yours truly. It’s not an exemplary piece of “code” and I think that demonstrates how little I cared about it at this point. I was briefly entertained by the idea of implementing a simple uploader for Flickr in shell script, and I did actually manage to write it up in a fairly short amount of time, and it did then successfully upload around 4GB of images. The problem was that while the initial idea was simple enough, the script took on a life of its own (especially once the intricacies of Flickr’s authentication API were fully realised) and became much more complex than initially envisaged.

Despite this, I had started out with the goal of making a reasonably “pure” shell uploader, and stuck to my guns. What I should have done, was call it quits when I started parsing the REST interface’s XML output with grep – that was a major warning sign. Now I have a reasonably inflexible program that barely handles errors at all and only just gets the job done. I had a feature request from some poor soul who decided to use it and I was actually depressed at the prospect of having to implement it – that’s not how a programmer should react to being able to extend the use of his/her work!

From a technical standpoint, shell is a terrible excuse for a “language”. The poor typing system, excruciating handling of anything that should be an “object” when all you generally have to work with are string manipulation tools, and a “library” that is basically limited by what commands you have available on the system. I know that I have probably barely plumbed the depths of what BASH is capable of, but when the basics are just so hard to use for what are frequently used programming patterns, I don’t really see the point.

So from next week, I’ve decided to reach for Python or Ruby when I have to code something up that is more than a few lines’ worth, or of reasonable complexity. Not that I don’t already use Python and Ruby when the occasion calls for it, but I think that those occasions are too few and far between. Shell scripting is an old-school sysadmin crutch and it is time to fully embrace the DevOps mentality and get into serious programming mode.

Tags: , , , , ,

Saturday, February 12th, 2011 Tech, Thoughts 2 Comments

Reasons to love PostgreSQL

by Oliver on Sunday, November 21st, 2010.

I’ve been hacking on a small Ruby on Rails project to summarise and then prettily display pmacct traffic data. It’s not an original idea by any stretch and in fact is directly inspired by a system we had at my last job. For obvious reasons I have to at least reimplement the idea myself, and while the original was written in Python using PostgreSQL, mine will be using Ruby and MySQL.

One small pain point I’ve already encountered is the lack of IP address functions in MySQL. PostgreSQL at least as far back as 7.4 has supported IP address functions and in the case when you need to be comparing addresses and summarising based on local subnets they are sorely missed. It really makes me sad, and wonder why even the latest MySQL does not have any support for these functions.

Granted, having built-in support for certain functions does not always help you. It is well known that you can often gain performance for your application by sorting records in the code rather than in the database. However, without even having the option there to allow you to make a comparison, it is impossible to say which way would be better. I know that in terms of convenience and LOC savings, I’d prefer the functions to be in the database engine.

Tags: , , , ,

Sunday, November 21st, 2010 Tech No Comments

My love/hate relationship with Ruby

by Oliver on Wednesday, November 3rd, 2010.

I consider myself a failed programmer, having never really excelled at it during University and only really having come to terms with some of the concepts several years later. I’ve always liked programming but at some point years ago I decided I didn’t want to be a programmer/developer so that was that. Since cementing myself in the realm of Systems Administration I’ve come to miss the programming that I was once so terrible at (and probably still am), but I never have quite enough time to catch up what I’ve missed. The programming landscape seems to have changed so much in the years subsequent to my joining the workforce so it seems like an ever increasing amount of new things to learn.

While working at Anchor I came to grips with Python which was at the time the “standard” language for the company (although I see now that their website is probably running on Ruby on Rails). I like Python, and find it logical and convenient (if not the best supported language out there at the moment). Ruby is actually not so much the new kid on the block any more but still has all of the Fanboyism that it gained a few years ago (if not more). Like the die-hard Mac users, Ruby programmers will defy all logic to defend their beloved language.

Critics of Ruby have made their opinions known far and wide around the Internets so I won’t repeat them here. I actually quite like Ruby because it is easy to use, has a huge collection of Gems to add functionality (and all-important code-reuse) and it is the language of Puppet which is my favourite configuration management tool, so I have to use Ruby to interface with it. I can get by with Ruby, but I also hate so many things about it.

One of the favourite lines of Ruby fans is how efficient Ruby is with simple string handling, thanks to the feature known as symbols. These are basically just a string of characters (with certain limitations) prefixed by a colon character, like :symbol. The efficiencies come from only storing the one copy of a symbol in memory at any time, even if it is used in many different places. I was intrigued by this claim when I first read it and set out to test the theory.

#!/usr/bin/ruby
  100000000.times do
    foo = :abcdefghijklmnopqrstuvwxyz
  end

That’s my basic testing framework. It is probably very naive, but I was looking for simplicity. To get an idea of how miniscule the “efficiencies” we gain, we have to run this loop 100 million times just to see numbers that differ significantly. The first time I ran this test over a year ago, I got slower results using symbols than using strings (“abcdefghijklmnopqrstuvwxyz” or ‘abcdefghijklmnopqrstuvwxyz’ rather than the symbol above) and laughed long and hard. I’ve now just retested and got the following results:

Symbol: 44.661 seconds
Single-quoted string: 53.224 seconds
Double-quoted string: 53.276 seconds

Wow, there actually is a benefit in using symbols. But bear in mind, we only saved about 9 seconds over 100 million invocations. You would have to be doing some pretty serious symbol use to gain performance from this. Ruby fans will take exception to this saying that the point of symbols is not for performance but for memory consumption, to which I would respond that Ruby has far more serious memory issues than in handling a few duplicate strings. Seriously.

The reason I tested single- and double-quoted strings is due to Ruby needing to check for interpolated variables within the double-quoted string. I had expected there to be more of a difference in performance but clearly there is not.

Out of interest I tried the same loop test in Python:

#!/usr/bin/python

i = 1
while i <= 100000000:
    foo = 'abcdefghijklmnopqrstuvwxyz'
    i += 1

How long did it take? 20.634 seconds.

Tags: , ,

Wednesday, November 3rd, 2010 Tech No Comments