adrift on a cosmic ocean

Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We have lived in Berlin since 2009, have two kids, and have far too little time to really justify having a blog.

Reliably finding processes with ps by name

Posted by Oliver on the 17th of July, 2011 in category Tech
Tagged with: jenkinslinuxps

I imagine that there are two groups of people who might read this post:

  1. When you need to find a process by name, you run ps -ef or similar and pipe into grep processname.
  2. When you need to find a process by name, you run ps -C processname

If you fall into the first category, you fail my interview tests. Perhaps you smugly fall into the second category, but surely you have seen this occur:

ohookins  4410  0.0  0.2 212804  9384 ?        S    20:10   0:00 /usr/lib/bamf/bamfdaemon$ ps u -C /usr/lib/bamf/bamfdaemonUSER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND$ ps u -C bamfdaemonUSER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMANDohookins  4410  0.0  0.2 212804  9384 ?        S    20:10   0:00 /usr/lib/bamf/bamfdaemon

I don't particularly care about bamfdaemon, but given that the process listing shows the full path to the binary, why can't we search for it by this process name? Why does the unqualified filename work? OK, perhaps it is just basing the match on the unqualified filename...

ohookins  3710  0.0  0.9 468988 38216 ?        Sl   19:49   0:05 /usr/bin/python /usr/bin/terminator$ ps u -C terminatorUSER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND$ ps u -C pythonUSER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND$ ps u -C /usr/bin/pythonUSER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND$ ps u -C /usr/bin/terminator USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMANDohookins  3710  0.0  0.9 468988 38216 ?        Sl   19:49   0:05 /usr/bin/python /usr/bin/terminator

OK, what the heck is going on here exactly? I'm not terribly familiar with the POSIX specification, so let's take a look at the source code of procps:

109     break; case SEL_COMM: i=sn->n; while(i--)110     if(!strncmp( buf->cmd, (*(sn->u+i)).cmd, 15 )) return 1;

In select.c of ps, we see these two lines in the case statement which selects between different process identification mechanisms. -C actually allows you to select multiple processes by different name since it iterates through the list of selectors (which I didn't know before looking at the code - very cool).

A limited string comparison is done between the argument given to -C and the process being examined. You can see that this limit is 15 characters, and in the union inside the selection node only 16 characters are stored anyway. Let's have a look at what this proc_t buf struct looks like so we can figure out what the comparison is being done on. This sits in proc/readproc.h:

char cmd[16];        // stat,status     basename of executable file in call to exec(2)

Now we are getting somewhere. We can easily verify that the limit in the character comparison is being done:

$ ps u -C upstart-socket-USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMANDroot      1123  0.0  0.0  15004   400 ?        S    18:46   0:00 upstart-socket-bridge --daemon

So if you are lazy, you only have to type 15 characters at most of your process name. Let's look at the more complicated case of when processes are just really hard to find by any name we can see in the process listing - my candidate case for this is Jenkins, which is notoriously hard to track down especially if you are running several Java-based services on the one machine (for example Jenkins itself, Nexus and perhaps Sonar which all logically fit together as part of a typical Java build server):

$ ps uwww -U jenkinsUSER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMANDjenkins  19517  9.3 30.8 1823496 1245768 ?     Ssl  Jul08 1255:29 /usr/bin/java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -Xmx1024m -Xms768m -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --daemon --httpPort=8080 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20

Nothing amazing here, let's find this process by the command name:

$ ps uwww -C javaUSER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMANDnexus     3853  0.0  3.5 1442536 145520 ?      Sl   Jun06   3:17 java -Dsun.net.inetaddr.ttl=3600 -Dbasedir=. -Djava.io.tmpdir=./runtime/tmp -Djava.library.path=bin/jsw/linux-x86-64/lib -classpath bin/jsw/lib/wrapper-3.2.3.jar:./runtime/apps/nexus/lib/plexus-classworlds-1.4.jar:./conf/ -Dwrapper.key=ybUhRQr9hU88aJwC -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.pid=3837 -Dwrapper.version=3.2.3 -Dwrapper.native_library=wrapper -Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1 org.codehaus.plexus.classworlds.launcher.Launcher

Wait, where is Jenkins? Didn't we confirm that the process running was in fact /usr/bin/java, and we know only java is used as the executable basename inside of ps? How is it possible that ps is now not showing us the Jenkins process? Let's have a slightly different look at it:

$ ps -U jenkins  PID TTY          TIME CMD19517 ?        20:55:47 exe$ ps uwww -C exeUSER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMANDjenkins  19517  9.3 30.7 1823496 1245124 ?     Ssl  Jul08 1255:50 /usr/bin/java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -Xmx1024m -Xms768m -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --daemon --httpPort=8080 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20

Aha! We have found the errant Jenkins process. But why is the basename exe? As it turns out, it is a peculiarity of Jenkins itself, documented in https://issues.jenkins-ci.org/browse/JENKINS-9206 which also causes problems with the init script (when it tries to find the process with the incorrect method, as we found above, due to certain assumptions).

In any case, now we've seen how ps operates and even how to find a process using the correct method, even when that process is playing hard to get.

© 2010-2018 Oliver Hookins and Angela Collins