Writings on various topics (mostly technical) from Oliver Hookins and Angela Collins. We have lived in Berlin since 2009, have two kids, and have far too little time to really justify having a blog.
I imagine that there are two groups of people who might read this post:
ps -ef
or similar
and pipe into grep processname
.ps -C processname
If you fall into the first category, you fail my interview tests. Perhaps you smugly fall into the second category, but surely you have seen this occur:
ohookins 4410 0.0 0.2 212804 9384 ? S 20:10 0:00 /usr/lib/bamf/bamfdaemon$ ps u -C /usr/lib/bamf/bamfdaemonUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND$ ps u -C bamfdaemonUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDohookins 4410 0.0 0.2 212804 9384 ? S 20:10 0:00 /usr/lib/bamf/bamfdaemon
I don't particularly care about bamfdaemon, but given that the process listing shows the full path to the binary, why can't we search for it by this process name? Why does the unqualified filename work? OK, perhaps it is just basing the match on the unqualified filename...
ohookins 3710 0.0 0.9 468988 38216 ? Sl 19:49 0:05 /usr/bin/python /usr/bin/terminator$ ps u -C terminatorUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND$ ps u -C pythonUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND$ ps u -C /usr/bin/pythonUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND$ ps u -C /usr/bin/terminator USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDohookins 3710 0.0 0.9 468988 38216 ? Sl 19:49 0:05 /usr/bin/python /usr/bin/terminator
OK, what the heck is going on here exactly? I'm not terribly familiar with the POSIX specification, so let's take a look at the source code of procps:
109 break; case SEL_COMM: i=sn->n; while(i--)110 if(!strncmp( buf->cmd, (*(sn->u+i)).cmd, 15 )) return 1;
In select.c of ps, we see these two lines in the case statement which selects between different process identification mechanisms. -C actually allows you to select multiple processes by different name since it iterates through the list of selectors (which I didn't know before looking at the code - very cool).
A limited string comparison is done between the argument given to -C and the process being examined. You can see that this limit is 15 characters, and in the union inside the selection node only 16 characters are stored anyway. Let's have a look at what this proc_t buf struct looks like so we can figure out what the comparison is being done on. This sits in proc/readproc.h:
char cmd[16]; // stat,status basename of executable file in call to exec(2)
Now we are getting somewhere. We can easily verify that the limit in the character comparison is being done:
$ ps u -C upstart-socket-USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDroot 1123 0.0 0.0 15004 400 ? S 18:46 0:00 upstart-socket-bridge --daemon
So if you are lazy, you only have to type 15 characters at most of your process name. Let's look at the more complicated case of when processes are just really hard to find by any name we can see in the process listing - my candidate case for this is Jenkins, which is notoriously hard to track down especially if you are running several Java-based services on the one machine (for example Jenkins itself, Nexus and perhaps Sonar which all logically fit together as part of a typical Java build server):
$ ps uwww -U jenkinsUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDjenkins 19517 9.3 30.8 1823496 1245768 ? Ssl Jul08 1255:29 /usr/bin/java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -Xmx1024m -Xms768m -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --daemon --httpPort=8080 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20
Nothing amazing here, let's find this process by the command name:
$ ps uwww -C javaUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDnexus 3853 0.0 3.5 1442536 145520 ? Sl Jun06 3:17 java -Dsun.net.inetaddr.ttl=3600 -Dbasedir=. -Djava.io.tmpdir=./runtime/tmp -Djava.library.path=bin/jsw/linux-x86-64/lib -classpath bin/jsw/lib/wrapper-3.2.3.jar:./runtime/apps/nexus/lib/plexus-classworlds-1.4.jar:./conf/ -Dwrapper.key=ybUhRQr9hU88aJwC -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.pid=3837 -Dwrapper.version=3.2.3 -Dwrapper.native_library=wrapper -Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1 org.codehaus.plexus.classworlds.launcher.Launcher
Wait, where is Jenkins? Didn't we confirm that the process running was
in fact /usr/bin/java
, and we know only java
is used as the
executable basename inside of ps? How is it possible that ps is now not
showing us the Jenkins process? Let's have a slightly different look at
it:
$ ps -U jenkins PID TTY TIME CMD19517 ? 20:55:47 exe$ ps uwww -C exeUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDjenkins 19517 9.3 30.7 1823496 1245124 ? Ssl Jul08 1255:50 /usr/bin/java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -Xmx1024m -Xms768m -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --daemon --httpPort=8080 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20
Aha! We have found the errant Jenkins process. But why is the basename
exe
? As it turns out, it is a peculiarity of Jenkins itself,
documented in https://issues.jenkins-ci.org/browse/JENKINS-9206 which
also causes problems with the init script (when it tries to find the
process with the incorrect method, as we found above, due to certain
assumptions).
In any case, now we've seen how ps operates and even how to find a process using the correct method, even when that process is playing hard to get.