Culture Codes From Around The Tech World

by Oliver on Wednesday, February 8th, 2017.

Last week I found myself reading through all (ok not entirely all, I skipped the NASA one) of the slide decks on culturecodes.co – there are a variety of slide decks from tech companies, which attempt to describe what the company culture looks like, their values, or company processes that serve as an employee handbook. They are called Culture Codes and not Company Culture because culture can only be talked about or described, but cannot itself be seen. Hence – the content of the slides are codes for the culture.

At any company we express our culture naturally – it has been described as “what happens when nobody is looking” (and there are many more quotes about it you can Google in your own time). It would be interesting to think about what might be inside a slide deck about my own company’s culture, were we to put one together (we haven’t yet, hence my thoughts on the topic). To that end, I decided to read as many of the slide decks on the above site, and summarise my thoughts.

General Observations

These are stream-of-consciousness notes in no particular order about what I observed in the slide decks.

  • Usually has a value/mission statement for the whole company.
  • Things they aspire to do, some descriptions of what the culture looks like.
  • What makes the company (and culture) different from others.
  • Longer-form description and detail of company values (i.e. describe each value).
  • How the company works (as in workflow) and works (as in success).
  • How to make decisions.
  • Definitions of what success looks like.
  • High-level description of behaviours/traits of people who work well in the company.
  • Lots of justifications for “why” these values are here.
  • Culture deck is often shared publicly and used as recruiting/marketing tool (both for employees and customers).
  • Some have a small “history” section describing the founding of the company or bootstrapping of company values. Good if they are interlinked. Describes major milestones of the company so far.
  • Quotes from customers, employees, source material (e.g. entrepreneurs) for values or guiding statements, supporting the values.
  • If the deck is shared publicly, it sets expectations for employees before they walk through the door. Some are explicitly aimed at being marketing material.
  • Company involvement with community – greater impact than just through core mission.
  • It is tempting to copy values or content, but there are some slide decks which are very incompatible with our values, feel jarring (or just feel plain wrong). Some are very prescriptive, strict rules.
  • The language you use matters! Imagery too.
  • Many pictures of smiling employees / teams. Photos/descriptions of company events (e.g. parties/picnics etc).
  • Crediting culture with driving the success of the company.
  • Video where employees can give concrete examples of living the values.
    • BUT, culture is not just quirky work environments, posters on the wall, etc.
  • Many culture decks double as employee handbooks (e.g. Zappos, Valve). Some are extremely long (Zappos ~300 pages -> many personal employee stories and photos -> tell the values story through many concrete examples rather than analogy/metaphor and brief descriptions).
    • Length varies a lot. Lower limit is around 12 slides, which is not enough space for meaningful descriptions. 50-100 is the sweet spot, more than that just drags on.
  • Some (Genius, Trello) dogfood their own product to create their culture deck (or employee handbook), but it comes off as a bit messy and hard to understand, in some cases full of comments that obscures what is current or actual reality. NextBigSound, Sprintly, Clef store markdown on Github (which is optimised for authoring but not necessarily consumption).
  • Inputs sometimes also from partners and customers.
  • All culture decks walk a tightrope between cliché and meaningless catchphrases on either side. Extremely difficult (impossible?) to be original, quite difficult to be meaningful and capture reality that people can buy into.
  • Often both clarifying negative and positive statements: “X means that we do Y”, or “X doesn’t mean that we do Z”.
  • Google’s values are predictably different (and perhaps not a great fit for many other smaller companies with fewer resources). Same goes for NASA which has a completely different set of needs to most tech companies (safety!).
  • Although I don’t like the Big Spaceship culture deck too much (painful on the eyes) I like that they call bullshit on buzzwords and manager speak.
  • Lots of quotes from Daniel Pink, Steve Jobs, Malcolm Gladwell, US and UK leaders.

Values

As you might expect, the values that companies strive to adhere to and live every day are very similar. I summarised them into a smaller number of similar themes (which may not all be entirely cohesive, but not much time was spent on this):

  • Autonomy / freedom / desire to reduce process / resourceful.
  • “Do the right thing” (also related to the above) / trust / responsibility / ownership.
  • Driven by a vision / vision provided by leaders / leaders living and demonstrating values.
  • Transparency / sharing knowledge / communication / feedback / questioning / openness.
  • Selflessness / not personally identifying with the work / using data rather than opinions.
  • Striving for results / outcome-driven / calculated risk taking / action over talk / results vs busy / building for the customer / doing more with less / growth / Get [Sh]it Done.
  • Learning from mistakes / great is the enemy of good / ship now / fail fast.
  • Helping others / treating colleagues with respect / coaching / mentoring / servant leadership.
  • Constantly learning / inventive / optimistic.
  • Mindfulness / empathy / care / reflection / time for being strategic.
  • Inclusivity / diversity goals.
  • Work/life balance, living healthily. Having fun / celebrating success / “family” feeling / friends at work.
  • Meaningful work / Company invests in community / serving a higher purpose.
  • Intertwining of individual/team/company values and missions.
  • At least one “quirky”, unique company-specific hashtag or catchphrase (several companies list “quirky” as a value).
  • Working with other talented people / hiring & nurturing.
  • Almost all claim to be unique! (really?)

“Anti-” Values

Some companies have very different values to others, which may be undesirable or completely destructive outside of that particular company. But perhaps within their own environment they do work. Presented here without further comment:

  • We start work at 8:30 every day.
  • Many sporting team analogies and metaphors (bro culture?).
  • Success is “hard” or a “grind”.
  • Work hard play hard / fast-paced / urgency / hustle.
  • Timebox everything. Strict scheduling.
  • Focus on competition mentality vs satisfying customers.
  • Only hire “A” players.
  • Do the thing you least want to do right now.
  • Blindly attempting to define culture around The Art of War.
  • Never compromise (never never? really?).
  • Do things that don’t scale.
  • Open Office! (I thought that wasn’t cool anymore…)
  • Many other things that work when the company is very small, but eventually fail and it turns into the same org structure and ways of working as most other tech companies (e.g. Github).

What are some of your own culture codes or values that you like, and see lived out every day? Leave some comments!

Tags: , , ,

Wednesday, February 8th, 2017 Tech No Comments

iframe-based dashboards don’t work in 2017

by Oliver on Thursday, January 5th, 2017.

At $current_employer (unlike $previous_employer where all these problems were sorted out), we have great huge TVs in every room but not consistently useful usage of them. I love seeing big, beautiful dashboards and KPIs visualised everywhere but right now, we just don’t have that in place. No matter, this is part of my mission to improve engineering practices here and I’m happy to tackle it.

The last time I felt I had to do this was back in about 2013. My team was fairly small at 2-3 people including myself, and there was no company-wide dashboarding solution in place. The list of commercial and open source solutions was much smaller than it is today. We ended up using a Mac Mini (initially with Safari, later Chrome) and some tab rotation extension to do the job of rotating between various hard-coded HTML pages I had crafted by hand, which aggregated numerous Graphite graphs into a fixed table structure. Hm.

While there are many solutions to displaying dashboards, collecting and storing the data, actually hosting the infrastructure and what drives the TV still seems a bit fiddly. You could try using the TV’s built-in web browser if it is a smart TV (low-powered, usually no saved settings if you turn the TV off, not enough memory, questionable HTML5 support), Chromecast (not independent from another computer), Raspberry Pi (low-powered, not enough memory), or some other small form-factor PC. The ultimate solution will probably require some common infrastructure to be deployed along the lines of Concerto, which I’ve used before but don’t want to wait for that to be set up yet.

The simplest possible solution is to host a small static HTML file on the machine, load it in the browser and have that page rotate through a hard-coded set of URLs by loading them in an iframe. I came up with this code in a few minutes and hoped it would work:

<html>
<head>
  <title>
    Dashboards
  </title>
</head>
<body style="margin:0px">
  <iframe id="frame"></iframe>
  <script type="text/javascript">
    function rotateDashboard(urls, refreshPeriod) {
      var frame = document.getElementById("frame");
      frame.src = urls[0];

      // Put the current URL on the back of the queue and set the next refresh
      urls.push(urls.shift());
      setTimeout(rotateDashboard, refreshPeriod * 1000, urls, refreshPeriod);
    };

    // Set up iframe
    var frame = document.getElementById("frame");
    frame.height = screen.height;
    frame.width = screen.width;
    frame.style.border = "none";
    frame.seamless = true;

    // Set up metadata
    var xhr = new XMLHttpRequest();
    xhr.onload = function(e) {
      var json = JSON.parse(xhr.responseText);
      var refresh = json.refresh;
      var urls = json.urls;

      rotateDashboard(urls, refresh);
    };

    xhr.open("GET", "https://gist.githubusercontent.com/ohookins/somegisthash/dashboards.json");
    xhr.send();
  </script>
</body>
</html>


For the first couple of locally hosted dashboards and another static website for testing, it worked, but for the first Librato-based dashboard it immediately failed due to the X-Frame-Options header in the response being set to DENY. Not being a frontend-savvy person, I’d only vaguely known of this option but here it actually blocked the entire concept from working.

So, TL;DR – you can’t host rotating dashboards in an iframe, given the security settings most browsers and respectable websites obey in 2017. This is probably well-known to anyone who has done any reasonable amount of web coding in their lives, but to a primarily backend/infrastructure person it was a surprise. So, the Earliest Testable Product in this case needs to be with a tab rotation extension in the browser. You might argue this is simpler, but I was looking forward to maintaining configuration in a flexible manner as you can see in the script above. In any case, by the end of today I’ll have such a system running and the team will start to enjoy the benefits of immediate visibility of critical operational data and KPIs!

Tags: , ,

Thursday, January 5th, 2017 Tech No Comments

“We’re doing our own flavour of Agile/Scrum”

by Oliver on Tuesday, December 13th, 2016.

I won’t descend into hyperbole and say you should run, shrieking and naked into the dark night, when you hear these words. But, it’s worth pondering what exactly it means. I think I’ve (over)used this phrase myself plenty over the years and right now find myself examining why so many people find themselves needing to invent their own version of well accepted software workflow methodologies.

You might say “we just pick the parts that work for us” or “we continually iterate on our workflow and so it is constantly evolving rather than sticking to static definitions of Agile”, or “we haven’t found estimations useful”. Many teams that have a significant infrastructure component to their work find themselves split between Scrum and Kanban. I always imagine that traditional or strict Scrum works best when you are working on a single application and codebase, with pretty well restricted scope and limited technologies in play. I actually crave working in such an environment, since working in teams with broad missions and wide varieties of technologies can make organising the work extremely difficult. At the same time you don’t want to split a reasonably-sized team of 6-8 people into teams of 1-2 people just to make their mission/vision clear.

Some reasons I think “custom Agile/Scrum” happens:

  • Most or all of the team has never actually worked in a real Waterfall development model, and can’t appreciate the reason for all the Agile/Scrum rituals, processes and ideals. This will continue to happen more and more frequently, and is almost guaranteed if you are dealing with those dreaded Millenials.
  • Estimations are hard, and we’d rather not do them.
  • Backlog grooming is hard, and we don’t want to waste time on it. Meeting fatigue in general kills a lot of the rituals.
  • Unclear accountability on the team. Who does it fall on when we don’t meet our goals? What is the outcome?
  • Too many disparate streams of work to have one clear deliverable at the end of the sprint.
  • Various factors as mentioned in the previous paragraph leading to a hybrid Scrum-Kanban methodology being adopted.
  • The need to use electronic tools e.g. Jira/Trello/Mingle/TargetProcess (have you heard of that one?) etc, rather than old fashioned cards or sticky notes on the wall. Conforming to the constraints of your tool of choice (or lack of choice) inevitably make a lot of the rituals much harder. Aligning processes with other teams (sometimes on other continents) also adds to the friction.

So anyway, why is any of this a problem? Well, let’s consider for a moment what the purpose of these workflow tools and processes is. Or at least, in my opinion (and if you disagree, please let me know and let’s discuss it!):

  • Feedback
  • Learning
  • Improvement

I think these three elements are so important to a team, whether you implement Scrum or Kanban or something else. If you pick and choose from different agile methodologies, you’d better be sure you have all of these elements present. Let me give some examples of where the process fails.

You have a diverse team with a broad mission, and various roles like backend, frontend, QA, design etc. Not everybody is able to work on the same thing so your sprint goals look like five or 10 different topics. At the end of the sprint, maybe 70-80% of them are completed, but that’s ok right? You got the majority done – time to celebrate, demo what you did finish and move what’s left over to the next sprint.

Unfortunately what this does is create a habit of acceptable failure. You become accustomed to never completing all the sprint goals, moving tickets over to the following sprint and not reacting to it. Quarterly goals slip but that’s also acceptable. You take on additional “emergency” work into the sprint without much question as slipping from 70% to 65% isn’t a big difference. You’ve just lost one of your most important feedback mechanisms.

If you had a single concrete goal for the sprint, and held yourself to delivering that thing each sprint, you would instead build up the habit of success being normal. The first sprint where that single goal is not delivered gives you a huge red flag that something went wrong and there is a learning opportunity. What did you fail to consider in this sprint that caused the goal to be missed? Did you take on some emergency work that took longer than expected? It’s also a great opportunity for engineers to improve how they estimate their work and how they prioritise. It also facilitates better discussions around priorities – if you come to me and ask me to complete some “small” task, I will ask you to take on responsibility for the entire sprint goal being missed, and explaining that to the stakeholders. 100% to 0% is a much harder pill to swallow than 85% to 80% – and in the latter case I believe these important conversations just aren’t happening.

But let’s say Scrum really doesn’t work for you. I think that’s totally fine, as long as you own up to this and replace the feedback mechanisms of Scrum with that of something else – but not stay in some undefined grey area in the middle. Two-week (or some alternative time period) sprints may not work well, or you might deliver to production every week, or every three weeks. Something that doesn’t align with the sprint. Now you are in a situation where you are working in sprints/iterations that are just arbitrary time containers for work but aren’t set up to deliver you any valuable feedback. Don’t stay in the grey zone – own up to it and at least move to something else like Kanban.

But if you are using Kanban, do think about what feedback mechanisms you now need. Simply limiting work in progress and considering your workflow a pipeline doesn’t provide much intelligence to you about how well it is functioning. Measuring cycle time of tasks is the feedback loop here that tells you when things are going off the rails. If you get to the point where your cycle time is pretty consistent but you find your backlog is growing more and more out of control, you have scope creep or too much additional work is making its way into your team. Either way there is a potential conversation around priorities and what work is critical to the team’s success to be had. Alternatively if cycle time is all over the place then the team can learn from these poor estimates and improve their thought process around the work. Having neither cycle time nor sprint goal success adequately measured leaves you unable to judge healthy workflow or react to it when it could be improved.

I guess you could also disagree with all of this. I’d still argue that if you are in a business or venture that cares about being successful, you want to know that how you are going about your work actually matters. If it isn’t being done very efficiently, you want to know with reasonable certainty what part of your methodology is letting you down and respond to it. If you can’t put your finger on the problem and concretely say “this is wrong, let’s improve it” then you are not only avoiding potential success, but also missing out on amazing opportunities for learning and the challenge of solving interesting problems!

Tags: , , ,

Tuesday, December 13th, 2016 Tech No Comments

What’s Missing From Online Javascript Courses

by Oliver on Tuesday, December 6th, 2016.

Perhaps the title is somewhat excessive, but it expresses how I feel about this particular topic. I’m not a “front-end person” (whatever that is) and feel much more comfortable behind an API where you don’t have to worry about design, markup, logic, styling as well as how to put them all together. That being said, I feel it’s an area where I should face my fears head-on, and so I’m doing some small side-projects on the web.

One thing that I realised I had no idea about, is how you actually get a web app running. I don’t mean starting a server, or retrieving a web page, or even which line of javascript is executed first. I’m talking about how you put all the scripts and bits of code in the right place that the browser knows when to run them, and you don’t make an awful mess in the process.

This can only be shown by example, so here’s what I inevitably start with:

<html>
  <head>
    <script src="//code.jquery.com/jquery-3.1.1.min.js" />
  </head>
  <body>
    <div id="data" />
    <script type="text/javascript">
      $.ajax("/get_data",
        success: function(data) {
          $("data").text = data
        }
      );
    </script>
  </body>
</html>

Yes, I’m using jQuery and no that code example is probably not entirely correct. I still find there is a reasonable period of experimentation involved before even the simple things like an AJAX call to get some data from an API are working. In any case, here we are with some in-line Javascript and things are generally working as expected. But of course we know that in-lining Javascript is not the way to a working, maintainable application, so as soon as we have something working, we should pull it into its own external script.

<html>
  <head>
    <script src="//code.jquery.com/jquery-3.1.1.min.js" />
    <script src="/javascripts/main.js" />
  </head>
  <body>
    <div id="data" />
  </body>
</html>

Uh-oh, it stopped working. The code in main.js is the exact same as what we had in the document before but it is no longer functioning. Already, we are beyond what I’ve seen in most beginner Javascript online courses, yet this seems like a pretty fundamental issue. Of course, the reason is that the script has been loaded and executed in the same order as the script tags and before the HTML elements (including the div we are adding the data to) were present in the DOM.

So naturally we exercise jQuery and fix the problem, by only executing the code once the document is ready and the relevant event handler is fired:

$(
  $.ajax("/get_data",
    success: function(data) {
      $("data").text = data
    }
  );
);

But now we have another problem. We’ve heard from more experienced developers that using jQuery is frowned upon, and although figuring out when the document is loaded seems simple enough to do without using a library, we’re not sure that there is a single cross-browser way of doing it reliably. So jQuery it is.

Actually there is another way, well explained here and seems to be well supported without relying on Javascript functions. You simply drop the “defer” keyword into the script tag you want to execute after parsing of the page, and it will now only run at the right time for our purposes:

<html>
  <head>
    <script src="/javascripts/main.js" defer/>
  </head>
...

I had never seen that before, but it was so simple. Many thanks to my coworkers Maria and Thomas for shedding a bit of light on this corner of the browser world for me. Of course, they also mentioned correctly that using jQuery is not an unforgivable sin, nor are some cases of in-line Javascript snippets (look at some of your favourite websites, even those from respected tech giants, and you will undoubtedly see some). But for a novice to web development it is sometimes hard to look beyond Hackernews and figure out what you are meant to be doing.

On to the next web challenge – mastering D3!

Tags: ,

Tuesday, December 6th, 2016 Tech No Comments

Now using a Let’s Encrypt certificate

by Oliver on Saturday, October 29th, 2016.

Last week I got a notification from StartSSL that my site certificate was going to expire in a couple of weeks. Since recently there has been some news (I guess you can check the Wikipedia entry for exact details) that suggests StartSSL is in some danger of no longer being trusted by major browsers, I decided to finally get around to moving to Let’s Encrypt for my certificates.

When the project was first in beta I had some intentions to do the same thing then, but the tooling was far less mature than it is now, and the trust situation was not as good. Right now, probably most people will be able to access the site without any problems. Programmatic access may not be as fortunate – so the main point of this blog post is to mention the change and ask you to let me know if you have problems accessing the site (if indeed you see this at all, possibly with a security warning). Just drop me a comment.

Otherwise, the process was relatively simple, but I am left wondering what kind of identity verification is involved. I didn’t have to confirm anything during the process that I actually owned the domain name, so what would stop someone else getting a certificate for my domain name? I should look into that in more detail.

Update 01.11.16:
Looks like Google has made the move to not trust StartCom any longer, and this echoes similar movements by Apple and Mozilla. So it seems like the right thing to do. Auf Wiedersehen, StartCom.

Tags: , , , , ,

Saturday, October 29th, 2016 Tech No Comments

Adding Meaning to Code

by Oliver on Wednesday, August 24th, 2016.

This is the product of only about 5 minutes worth of thought, so take it with a grain of salt. When it comes to how to write maintainable, understandable code, there are as many opinions out there as there are developers. Personally I favour simple, understandable, even “boring” method bodies that don’t try to be flashy or use fancy language features. Method and class names should clearly signal intent and what the thing is or does. And, code should (IMHO) include good comments.

This last part is probably an area I’ve seen the most dissent. For some reason people hate writing comments, and think that the code should be “self-documenting”. I’ve rarely, perhaps never seen this in practice. While perhaps the intent was for it to be self-documenting, that never arose in practice.

Recently (and this is related, I promise), I watched a lot of talks (one, in person) and read a lot about the Zalando engineering principles. They base their engineering organisation around three pillars of How, What and Why. I think the same thing can be said for how you should write code and document it:

class Widget
  def initialize
    @expires_at = Time.now + 86400
  end

  # Customer X was asking for the ability to expire     #  <--- Why
  # widgets, but some may not have an expiry date or
  # do not expire at all. This method handles these
  # edge cases safely.
  def is_expired?()                                     #  <--- What
    return !!@expires_at && Time.now() > @expires_at    #  <--- How
  end
end

This very simple example shows what I mean (in Ruby, since it's flexible and lends itself well to artificial examples like this). The method body itself should convey the How of the equation. The method name itself should convey the intent of the method - What does this do? Ultimately, the How and What can probably never fully explain the history and reasoning for their own existence. Therefore I find it helpful to accompany these with the Why in a method comment to this effect (and a comment above the method could also be within the method, or distributed across the method - it's not really important).

You could argue that history and reasoning for having the method can be determined from version control history. This turns coding from what should be a straightforward exercise into some bizarre trip through the Wheel of Time novels, cross-referencing back to earlier volumes in order to try to find some obscure fact that may or may not actually exist, so that you can figure out the reference you are currently reading. Why make the future maintainer of your code go through that? Once again, it relies entirely on the original committer having left a comprehensive and thoughtful message that is also easy to find.

The other counter argument is that no comments are better than out of date or incorrect comments. Again, personally I haven't run into this (or at least, not nearly as frequently as comments missing completely). Usually it will be pretty obvious where the comment does not match up with the code, and in this (hopefully outlier) case you can then go version control diving to find out when they diverged. Assessing contents of the code itself is usually far easier than searching for an original comment on the first commit of that method, so it seems like this should be an easier exercise.

Writing understandable code (and let's face it, most of the code written in the world is probably doing menial things like checking if statements, manipulating strings or adding/removing items from arrays) and comments is less fun than hacking out stuff that just works when you are feeling inspired, so no wonder we've invented an assortment of excuses to avoid doing it. So if you are one of the few actually doing this, thank you.

Tags: , ,

Wednesday, August 24th, 2016 Tech No Comments

Thoughts on creating an engineering Tech Radar

by Oliver on Friday, August 12th, 2016.

Perhaps you are familiar with the ThoughtWorks Tech Radar – I really like it as a useful summary of global technology trends and what I should be looking at familiarising myself with. Even the stuff on the “hold” list (such as Scaled Agile Framework – sometimes anti-patterns are equally useful to understand and appreciate). There’s a degree of satisfaction in seeing your favourite technology rise through the ranks to become something recommended to everyone, but also in my current (new) role it has a different purpose.

Since I started a new job just over a month ago, I’ve come into an organisation with a far simpler tech stack and in some regards, less well-defined technology strategy. I like to put in place measures to help engineers be as autonomous in their decision-making process as possible, so a Tech Radar can help frame which technologies they can or should consider when going about their jobs. This ranges from techniques they should strongly consider adopting (which can be much more of a tactical decision) to databases they could select from when building a new service that doesn’t fit the existing databases already in use. The Tech Radar forms something like a “garden fence” – you don’t necessarily need to implement everything within it, but it shows you where the limits are in case you need something new.

So basically, I wanted to use the Tech Radar as a way to avoid needing to continually make top-down decisions when stepping into unknown territory, and help the organisation and decision-making scale as we add more engineers. The process I followed to generate it was very open and democratic – each development team was gathered together for an hour, and I drew the radar format on the whiteboard. Then engineers contributed post-it notes with names of technologies and placed them on the board. After about 10 minutes of this, I read through all of the notes and got everyone to describe for the room the “what” and the “why” of their note. Duplicates were removed and misplaced notes moved to their correct place.

Afterwards, I transcribed everything into a Google Doc and asked everyone to again add the “what” and “why” of each contributed note to the document. What resulted was an 11-page gargantuan collection of technologies and techniques that seemed to cover everything that everyone could think of in the moment, and didn’t quite match up with my expectations. I’ll describe my observations about the process and outcomes.

Strategy vs Tactics, and Quadrants

The purpose of the overall radar is to be somewhat strategic. ThoughtWorks prepares their radar twice a year, so it is expected to cover at least the next 6 months. Smaller companies might only prepare it once a year. However, amongst the different quadrants there is a reasonable amount of room for tactics as well. In particular I would say that the Techniques and Tools quadrants are much more tactical, whereas the Platforms and Languages & Frameworks quadrants are much more strategic.

For example, let’s say you have Pair Programming in the Techniques quadrant. Of course, you might strategically adopt this across the whole company, but a single team (in fact, just two developers) can try instituting it this very day, at no impact to anyone in other teams and probably not even others in the same team. It comes with virtually no cost to just try out, and start gaining benefit from immediately, even if nobody else is using it. Similarly, on the Tools side, you might decide to add a code test coverage reporting tool to your build pipeline. It’s purely informational, you benefit from it immediately and it doesn’t require anyone else’s help or participation, nor does it impact anyone else. For that reason it’s arguable whether these things are so urgent to place on the radar – developers can largely make the decisions themselves to adopt such techniques or tools.

On the other hand, the adoption of a new Language or Framework, or building on top of a new Platform (let’s say you want to start deploying your containers to Kubernetes) will come with a large time investment both immediately and ongoing, as well as needing wide-scale adoption across teams to benefit from that investment. Of course there is room for disagreement here – e.g. is a service like New Relic a tool or a platform? Adoption of a new monitoring tool definitely comes with a large cost (you don’t want every team using a different SaaS monitoring suite). But the Tech Radar is just a tool itself and shouldn’t be considered the final definition of anything – just a guide for making better decisions.

Strategic Impact

As touched on above, adopting a Platform or new Language/Framework has significant costs. While putting together a radar like this with input from all people, who may have different levels of experience, you might find that not all of the strategic impacts have been considered when adding an item to the list. An incomplete list of things I believe need to be examined when selecting a Language or Framework could be:

  • What are the hiring opportunities around this technology? Is it easier or harder to hire people with this skillset?
  • Is it a growing community, and are we likely to find engineers at all maturity levels (junior/intermediate/senior) with experience in the technology?
  • For people already in the company, is it easy and desirable to learn? How long does it take to become proficient?
  • Similarly, how many people already at the company already know the technology well enough to be considered proficient for daily work?
  • Does the technology actually solve a problem we have? Are there any things our current technologies do very well that would suffer from the new technology’s introduction?
  • What other parts of our tech stack would need to change as a result of adopting it? Testing? Build tooling? Deployments? Libraries and Dependencies?
  • Do we understand not only the language but also the runtime?
  • Would it help us deliver more value to the customer, or deliver value faster?
  • By taking on the adoption costs, would we be sacrificing time spent on maximising some other current opportunity?
  • Is there a strong ecosystem of libraries and code around the technology? Is there a reliable, well-supported, stable equivalent to all of the libraries we use with our current technologies? If not, is it easy and fast to write our own replacements?
  • How well does adoption of the technology align with our current product and technology roadmaps?

By no means is this list exhaustive, but I think all points need some thought, rather than just “is it nicer to program in than my current language”.

Filtering the list and assembling the radar

As mentioned, I ended up with a fairly huge list of items which now needs to be filtered. This is a task for a CTO or VP of Engineering depending on your organisation size. Ultimately people accountable for the technology strategy need to set the bounds of the radar. For my list, I will attempt to pre-filter the items that have little strategic importance – like tools or techniques (unless we determine it’s something that could/should have widespread adoption and benefit).

Ultimately we’ll have to see what the output looks like and whether engineers feel it answers questions for them – that will determine whether we try to build a follow-up radar in the next quarter or year. If I end up running the process again, I suspect I’ll make use of a smaller group of people to add inputs – who have already collected and moderated inputs from their respective teams. The other benefit of the moderation/filtering process is that the document that is later produced is a way of expressing to engineers (perhaps with less experience) the inherent strategic importance of the original suggestions. There are no wrong suggestions, but we should aim to help people learn and think more about the application of strategy and business importance in their day to day work.

Tags: , , , ,

Friday, August 12th, 2016 Tech No Comments

Easing back into fitness

by Oliver on Wednesday, June 1st, 2016.

It has been about 7 months since the birth of my daughter and I think that’s about long enough to let myself sit idle due to child rearing. Certainly as the father, I don’t have that many excuses as to why I can’t become physically active again, and I actually miss running and taking part in the various crazy obstacle races. So, I’ve resolved to get myself back into shape (without being too obsessive about it at least).

In previous years I certainly at times pushed myself too hard and ended up with some minor injuries – I guess that’s what you get in your mid-thirties. I pushed myself to get my running distances up too quickly, and ended up straining some leg muscles and needed physiotherapy for a couple of months. So far, I’ve only run about 5-6km once a week for the last few weeks and that is about all I can manage. I can feel the fitness level slowly returning (although it is also hard to tell due to the heat I’ve been running in) but am resisting running for any longer distances yet.

Since my biggest focus is getting into a state of fitness where I can again tackle an obstacle race that is not “insanely difficult” (to be defined further down), I know that one of my biggest weaknesses (literally) is upper body and core strength. To that end, and to assist with my running recovery, I’ve started a regime of stretches and small upper/core exercises which I repeat twice daily – once in the morning after the kids have woken me up, and again at night before bed. Here’s the general routine:

  • Lie on my back and stretch the entire body out
  • Pull each knee up to my chest individually and stretch the leg
  • Stretch the “glutes”
  • Hamstring stretches
  • Abductor / groin stretch while sitting
  • Front plank for as long as I can hold it
  • Side plank for as long as I can hold it, on each side
  • Push ups while kneeling (so I can do more repetitions with smaller load)
  • Prisoner squats until my legs start burning.
  • Lie on my front and stretch the front thigh area of each leg.

Sorry for the lack of accurate terminology! Some of these stretches I learned while I had a personal trainer leading up to my Tough Mudder race in 2014, some I got from my yoga teacher wife and some I just make up myself. Generally each stretch I hold for 30 seconds. I can say that my legs feel better after stretching them twice a day, and the other light exercises are having a very small but noticeable effect. It’s enough to keep those muscles a little bit active but not so much that I dread it and skip exercising them at all.

The intention now is to keep this up, continue raising the limits slowly until I feel like I can take on some of the smaller and less difficult obstacle races (and perhaps shorter regular running races). I figured out last year that a marathon is just not my cup of tea, after attempting to run 30km in one training session and finding it incredibly boring. I can manage a half marathon but I think that’s about the limit.

What do I define as “insanely difficult”? Tough Mudder definitely had at least two aspects which for me are pretty undesirable. I don’t particularly like being electrocuted, and the 12ft walls were almost impossible for me without a lot of assistance – this again comes back to the lack of upper and core strength which I hope to work on. Getting Tough – The Race was probably the hardest event I’ve undertaken so far due to the distance (24km) and extreme cold (being completely submerged for a long period of time in icy water) and sheer number of obstacles. I don’t relish the thought of that icy water again any time soon. No Guts No Glory, despite also being in very icy conditions (well, actual snow for most of it) was very enjoyable although I unfortunately did some injury to my finger which still hasn’t recovered. Bremen Lake Run would again have been more fun if it weren’t for the big walls, and it also had some cold water thrown in for fun.

So I guess my main complaint would be with the walls, which I know I need to work on a lot. I don’t know if the electric shock therapy obstacles will always be in Tough Mudder but if the walls were less of a challenge for me I guess I can work on my psychological toughening to get through being electrocuted. Meanwhile, there are actually a lot of very enjoyable (like, actually enjoyable for normal people) obstacle races coming up in Germany over the next few months which don’t have this level of insane difficulty that I’d like to attempt. Perhaps this year or next I’ll even try one or two in the UK as they tend to have more variety.

Tags: , , , ,

Wednesday, June 1st, 2016 Health No Comments

Catching Up

by Oliver on Saturday, May 21st, 2016.

I haven’t posted anything for quite some time (which I feel a little bad about), so this is something of a randomly-themed catch-up post. According to my LinkedIn profile I’ve been doing this engineering management thing for about two years, which at least provides some explanation for a relative lack of technical-oriented blog posts. Of course in that time I have certainly not revoked my Github access, deleted all compilers/runtimes/IDEs/etc and entirely halted technical work, but the nature of the work of course has changed. In short, I don’t find myself doing so much “interesting” technical work that leads to blog-worthy posts and discoveries.

So what does the work look like at the moment? I’ll spare you the deep philosophical analysis – there are many, many (MANY) posts and indeed books on making the transition from a technical contributor to a team manager or lead of some sort. Right back at the beginning I did struggle with the temptation to continue coding and contribute also on my management tasks – it is difficult to do both adequately at the same time. More recently (and perhaps in my moments of less self-control) I do allow myself to do some technical contributions. These usually look like the following:

  • Cleaning up some long-standing technical debt that is getting in the way of the rest of the team being productive, but is not necessarily vital to their learning/growth or knowledge of our technology landscape.
  • Data analysis – usually ElasticSearch, Pig/Hive/Redshift/MapReduce jobs to find the answer to a non-critical but still important question.
  • Occasionally something far down the backlog that is a personal irritation for me, but is not in the critical path.
  • Something that enables the rest of the team in some way, or removes a piece of our technology stack that was previously only known by myself (i.e. removes the need for knowledge transfer)
  • Troubleshooting infrastructure (usually also coupled with data analysis).

I’d like to say I’ve been faithful to that list but I haven’t always. The most recent case was probably around a year ago I guess, when I decided I’d implement a minimum-speed data transfer monitor to our HLS server. This ended up taking several weeks and was a far more gnarly problem than I realised. The resulting code was also not of the highest standard – when you are not coding day-in and day-out, I find that my overall code quality and ability to perceive abstractions and the right model for a solution is impaired.

Otherwise, the tasks that I perhaps should be filling my day with (and this is not an exhaustive list, nor ordered, just whatever comes to mind right now) looks more like this:

  • Assessing the capacity and skills make up of the team on a regular basis, against our backlog and potential features we’d like to deliver. Do we have the right skills and are we managing the “bus factor”? If the answer is “no” (and it almost always is), I should be hiring.
  • Is the team able to deliver? Are there any blockers?
  • Is the team happy? Why or why not? What can I do to improve things for them?
  • How are the team-members going on their career paths? How can I facilitate their personal growth and helping them become better engineers?
  • What is the overall health of our services and client applications? Did we have any downtime last night? What do I need to jump on immediately to get these problems resolved? I would usually consider this the first item to check in my daily routine – if something has been down we need to get it back up and fix the problems as a matter of urgency.
  • What is the current state of our technical debt; are there any tactical or strategic processes we need to start in order to address it?
  • How are we matching up in terms of fitting in with technology standards in the rest of the organisation? Are we falling behind or leading the way in some areas? Are there any new approaches that have worked well for us that could be socialised amongst the rest of the organisation?
  • Are there any organisational pain-points that I can identify and attempt to gather consensus from my peer engineering managers? What could we change on a wider scale that would help the overall organisation deliver user value faster, or with higher quality?
  • Could we improve our testing processes?
  • How are we measuring up against our KPIs? Have we delivered something new recently that needs to be assessed for impact, and if so has it been a success or not matched up to expectations? Do we need to rethink our approach or iterate on that feature?
  • Somewhat related: have there been any OS or platform updates on any of our client platforms that might have introduced bugs that we need to address? Ideally we would be ahead of the curve and anticipate problems before they happen, but if you have a product that targets web browsers or Android phones, there are simply too many to adequately test ahead of general public releases before potential problems are discovered by the public.
  • Is there any free-range experimentation the team could be doing? Let’s have a one-day offsite to explore something new! (I usually schedule at least one offsite a month for this kind of thing, with a very loose agenda.)
  • How am I progressing on my career path? What aspects of engineering management am I perhaps not focussing enough on? What is the next thing I need to be learning?

I could probably go on and on about this for a whole day. After almost two years (and at several points before that) it is natural to question whether the engineering management track is the one I should be on. Much earlier (perhaps 6 months in) I was still quite unsure – if you are still contributing a lot of code as part of your day to day work, the answer to the question is that much harder to arrive at since you have blurred the lines of what your job description should look like. It is much easier to escape the reality of settling permanently on one side or the other.

Recently I had some conversations with people which involved talking in depth about either software development or engineering management. On the one hand, exploring the software development topics with someone, I definitely got the feeling that there was a lot I am getting progressively more and more rusty on. To get up to speed again I feel would take some reasonable effort on my part. In fact, one of the small technical debt “itches” I scratched at the end of last year was implementing a small application to consume from AWS Kinesis, do some minor massaging of the events and then inject them into ElasticSearch. I initially thought I’d write it in Scala, but the cognitive burden of learning the language at that point was too daunting. I ended up writing it in Java 8 (which I have to say is actually quite nice to use, compared to much older versions of Java) but this is not a struggle a competent engineer coding on a daily basis would typically have.

On the other hand, the conversations around engineering management felt like they could stretch on for ever. I could literally spend an entire day talking about some particular aspect of growing an organisation, or a team, or on technical decision-making (and frequently do). Some of this has been learned through trial and error, some by blind luck and I would say a decent amount through reading good books and the wonderful leadership/management training course at SoundCloud (otherwise known as LUMAS). I and many other first-time managers took this course (in several phases) starting not long after I started managing the team, and I think I gained a lot from it. Unfortunately it’s not something anyone can simply take, but at least I’d like to recommend some of the books we were given during the course – I felt I got a lot out of them as well.

  • Conscious Business by Fred Kofman. It might start out a bit hand-wavy, and feel like it is the zen master approach to leadership but if you persist you’ll find a very honest, ethical approach to business and leadership. I found it very compelling.
  • Five Dysfunctions of a Team by Patrick Lencioni. A great book, and very easy read with many compelling stories as examples – for building healthy teams. Applying the lessons is a whole different story, and I would not say it is easy by any measure. But avoiding it is also a path to failure.
  • Leadership Presence by Kathy Lubar and Belle Linda Halpern. Being honest and genuine, knowing yourself, establishing a genuine connection and empathy to others around you and many other gems within this book are essential to being a competent leader. I think this is a book I’ll keep coming back to for quite some time.

In addition I read a couple of books on Toyota and their lean approach to business (they are continually referenced in software development best practices). I have to admit that making a solid connection between all of TPS can be a challenge, and I hope to further learn about them in future and figure out which parts are actually relevant and which parts are not. There were a few other books around negotiation and other aspects of leadership which coloured my thinking but were not significant enough to list. That said, I still have something like 63 books on my wish list still to be ordered and read!

In order to remain “relevant” and in touch with technical topics I don’t want to stop programming, of course, but this will have to remain in the personal domain. To that end I’m currently taking a game programming course in Unity (so, C#) and another around 3D modelling using Blender. Eventually I’ll get back to the machine learning courses I was taking a long time ago but still need to re-take some beginner linear algebra in order to understand the ML concepts properly. Then there are a tonne of other personal projects in various languages and to various ends. I’ll just keep fooling myself that I’ll have free time for all of these things 🙂

Tags: , , ,

Saturday, May 21st, 2016 Tech, Thoughts No Comments

Pre-warming Memcache for fun and profit

by Oliver on Wednesday, August 12th, 2015.

One of the services my team runs in AWS makes good use of Memcached (via the ElastiCache product). I say “good” use as we manage to achieve a hit rate of something like 98% most of the time, although now I realise that it comes at a significant cost – when this cache is removed, it takes a significant toll on the application. Unlike other applications that traditionally cache the results of MySQL queries, this particular application stores GOB-encoded binary metadata, but what the application does is outside the scope of this post. When the cached entries aren’t there, the application has to do a reasonable amount of work to regenerate it and store it back.

Recently I observed that when one of our ElastiCache nodes are restarted (which can happen for maintenance, or due to system failure) we already saw a less desirable hit to the application. We could minimise this impact by having more instances in the cluster with less capacity each – for the same overall cluster capacity. Thus, going from say 3 nodes where we lose 33% of our cache capacity to 8 nodes where we would lose 12.5% of our cache capacity is a far better situation. I also realised we could upgrade to the latest generation of cache nodes, which sweetens the deal.

The problem that arises is: how can I cycle out the ElastiCache cluster with minimal impact to the application and user experience? To save a long story here, I’ll tell you that there’s no way to change individual nodes in a cluster to a different type, and if you maintain your configuration in CloudFormation and change the instance type there, you’ll destroy the entire cluster and recreate it again – losing your cache in the process (in fact you’ll be without any cache for a short period of time). I decided to create a new CloudFormation stack altogether, pre-warm the cache and bring it into operation gently.

How can you pre-warm the cache? Ideally, you could dump the entire contents and simply insert it into the new cluster (much like MySQL dumps or backups), but with Memcached this is impossible. There is the stats cachedump command to Memcached, which is capable of dumping out the first 2MB of keys of a given slab. If you’re not aware of how Memcached stores its data, it breaks the memory allocation into various “slabs” of increasing sizes and stores values in the closest-sized slab that will fit it (although always rounding up). Thus, internally the data is segmented. You can list stats for all of the current slabs with stats slabs, then perform a dump of the keys with stats cachedump {slab} {limit}.

There are a couple of problems with this. One is the aforementioned 2MB limit on the returned data, which in my case did in fact limit how useful this approach was. Some slabs had several hundred thousand objects and I was not able to retrieve nearly the whole keyspace. Secondly, the developer community around Memcached is opposed to the continued lifetime of this command, and it may be removed in future (perhaps it already is, I’m not sure, but at least it still exists in 1.4.14 which I’m using) – I’m sure they have good reasons for it. I was also concerned that using the command would lock internal data structures and cause operational issues for the application accessing the server.

You can see the not-so-reassuring function comment here describing the locking characteristics of this operation. Sure enough, the critical section is properly locked with pthread_mutex_lock on the LRU lock for the slab, which I assumed meant that only cache evictions would be affected by taking this lock. Based on some tests (and common sense) I suspect that it is an LRU lock in name only, and more generally locks the data structure in the case of writes (although it does record cache access stats somewhere as well, perhaps in another structure). In any case as mentioned before, I was able to retrieve only a small amount of the total keyspace from my cluster, so as well as being a dangerous exercise, using the stats cachedump command was not useful for my original purpose.

Later in the day I decided to instead retrieve the Elastic LoadBalancer logs from the last few days, run awk over them to extract the request path (for some requests that would trigger a cache fill) and simply make the same requests to the new cluster. This is more effort up-front since the ELB logs can be quite large, and unfortunately are not compressed, but fortunately awk is very fast. The second part to this approach (or any for that matter) is using Vegeta to “attack” your new cluster of machines, replaying the previous requests that you’ve pulled from the ELB logs.

A more adventurous approach might be to use Elastic MapReduce to parse the logs, pull out the request paths and using the streaming API to call an external script that will make the HTTP request to the ELB. That way you could quite nicely farm out work of making a large number of parallel requests from a much larger time period in order to more thoroughly pre-warm that cache with historical requests. Or poll your log store frequently and replay ELB requests to the new cluster with just a short delay after they happen on your primary cluster. If you attempt either of these and enjoy some success, let me know!

Tags: , , , ,

Wednesday, August 12th, 2015 Tech No Comments