This post is all about what I’ve learned in my first two weeks as Director of LiveOps at Demonware. The role of a manager should always be to enable the organization to increase the level of production while maintaining sanity and without having to horizontally scale the team. (‘buzzword bingo’, anyone?) In a year, this blog will be filled with examples of how we as a management team accomplished that: all of the challenges, wins, missteps, etc. we’ve made on our way to fulfilling our destiny as the premier Operations team in the gaming industry.
When I joined DW on June 1, I had no idea what to expect. Yes, I’ve been in Ops for longer than I care to admit. But gaming is a fairly foreign world to me – I can watch someone play a game all day, and I fare fairly well with games targeted at 4-year-olds. That’s where my experience in the game industry stops. That being said, here are some of my initial impressions after spending three days in Vancouver with the team & working from home in Seattle (silly work permit process….) for a few more days.
- Operations is Operations. Yes, the technologies might differ drastically between companies, but the same challenges, issues and solutions exist when trying to enable a high-performing team to ‘level-up’: process, standardization, automation and tooling
- I’m extremely humbled that Demonware selected me to guide their highly-capable LiveOps team. Seriously.
- I wonder at the amount of work the company had been able churn out with such a small but able staff
- I’m incredibly excited by the positive attitude and collaborative inter- and intra-team spirit. Even the surliest of engineers kick ass and take names
- I instantly fell in love with my highly-technical, over-taxed, mostly junior management team. I expect that I will learn just as much from them as I will teach them.
Most importantly, I realize that while the amount of work produced by our engineers reflects a very high-performing organization, we’re at a breaking point. The deliverables in the [currently-being-drafted] short- and long-term Operations road maps far outstrips the processes and resources available. More so than any team I’ve managed previously, and I’ve had to deal with some pretty gnarly resource constraints.
State of the DW LiveOps Union
We build and maintain backend services for Activision/Blizzard games such as Call of Duty – services such as leaderboards and matchmaking. (pretty sweet, right?) Our work load is mostly dictated by the road maps of third-party game studios, and while the work is cyclical, not every game requires the same features or infrastructure. Currently, LiveOps is the tail being wagged, with late-binding requests generating a make-or-break race to hit the hard holiday shopping deadlines.
Engineering and Operations were both re-structured just a few months ago to better reflect the work load. This seems to have gone well for the SDE world, where structures based on services makes a lot of sense. We’re still working through the transition in Operations- these exercises typically take much longer to shake out in our more interrupt-driven, diverse realm.
We’re very, very lucky to have fantastic support from DW senior management. (and I’m not just saying that because my boss will most likely be reading this post at some point) It’s only been two weeks, but I feel ‘mind meld’ coming on, and that’s only happened one other time in my career. Our management understands the value that a world-class Operations team provides to the company. It’s a rare occurrence, in my experience, and I plan to take full advantage of it. 🙂
LiveOps is a technically high-performing team, and…. entertaining. It’s filled with some of the most driven, intelligent and open engineers I’ve worked with. The company has done a fantastic job of hiring for culture as well as technical skill, and that really does make all the difference. Prima donnas can suck the life out of an Ops team.
We’re just beginning to think about Scale-with-a-capital-S. It’s a rare and exciting time in the life of an adolescent company. I thank my lucky stars that I’ve been fortunate enough to experience scaling challenges and seen some amazing solutions to them at Amazon and Facebook. I feel like my time at both of those companies was the best prep I could have ever had for the challenges we’re now facing.
My Dirty Little Assessment
First off, I can’t give enough credit to The First 90 Days for providing me a solid framework for approaching the assessment of my new organization. I’m learning to take my time to focus on observing and building relationships, rather than jumping in and making lightly-considered/rash decisions just to try to make my mark. The book’s common sense is forcing me to focus on defining a few quick-strike wins to build momentum and credibility. If you’re ever faced with transitioning into a new role, read this. It’s bible-worthy IMO, even though none of the concepts are particularly foreign. Now on to what I think I might be blogging about over the next year…
Have I Mentioned We’re Hiring?
Hiring is one of our top priorities. First of all, we have a great recruiting team, and the people who Demonware has hired are fabulous. Just like Amazon and FB, we’ve placed just as much emphasis on culture fit as technical acumen. Like it or not, the work doesn’t stop coming in just because we’re being selective in our hiring process though. To help fill our roles more quickly, we’ll be re-factoring job descriptions, and working with recruiting on updating our processes to include base technical pre-screen questions (to save our phone screeners time and headaches), more timely and descriptive feedback, and using our engineers’ penchant for social networking to get the word out.
“Traditional” Ops Processes
Demonware is just coming out of their startup phase, and it seems that a common denominator in companies at this stage in their progression is lack of mature processes (makes sense). We actually have a great start- it’s all about streamlining and improving upon what we already have. Process should be an enabler, not a hindrance. People who balk at this idea or think that ‘process’ is a four-letter word obviously haven’t seen it implemented the right way. Just sayin’. Here are a couple of deliverables that we’ve talked about as a management team that are on my personal road map:
- Event Management: We already have a decent (not perfect) Event Management process documented, and we follow it most of the time. We also have a fantastic start on an incredible tool set that covers the basics of notification and engagement. The information we need exists, but we still need to tie it all together. We also need to remove more of the human element in the process (notice I said we follow it most of the time, just like most other shops). In the middle of an event, engineers just want to fix the issue, rather than concentrating on following the process. And, of course, we could always tighten our post-event actions to ensure that we’re lengthening MTBF.
These are important things to address, but the most important deliverable for this point is the ability to measure the effectiveness of the process (MTTD, MTTR, MTBF). We honestly won’t know how to take this a step farther until we know how we’re currently doing.
- Change Management: We’re in the same boat with CM as we are with Event Management. Good process that’s well-documented, but no way to measure the effectiveness of it, the time spent per change, number of planned vs. emergent changes, or a solid way to track customer impact/fallout programmatically. This isn’t to say that we don’t pay attention to this- we definitely do. We just need to make it much easier to get at the data we need quickly, and we need to build on that data to improve upon our susceptibility to fallout.
- Monitoring/Alerting: We monitor A LOT of stuff, and we have the basics covered pretty well. The next step is to refine our monitoring configurations to pare down the noise. We must be able to definitively say that yes, we’re monitoring the right stuff at the proper thresholds, that the correct personnel are notified for the right alarms, and that we’re able to measure our effectiveness at reducing the number of alarms through everything from code re-factoring to architecture standardization.
- Operational Acceptance (OAC): Ops teams routinely complain about stuff being ‘thrown over the fence’ for them to support. OAC is a great way to ensure that before the team signs off on a new support request, it’s actually supportable. Providing a well-designed OAC checklist to customers will not only address that, but it will oftentimes spawn different design decisions that will make a service/stack more extensible and reliable. Theo Schlossnagel says it’s about “putting more ops into dev”, rather than the inverse. Can’t argue with Theo, right? 🙂
We have to make our own lives simpler. That’s just a given for any Ops team, regardless of how long the team or company has existed or how successful they are. Now that we’re starting to hunker down, we need to begin approaching Operations as a business unit, just like every other organization. It sounds like an awful concept to engineers, but once the framework is in place, those same engineers are grateful that they can depend on the way work flows into and out of the team, there are clear escalation paths, etc.
- Planning and Prioritization: It’s the same with most Ops teams, but the resounding feedback from our team is that “we never have time to get to the stuff we really need to do”. We need to answer the questions, “what is it that is taking up your time currently?” and “what exactly should we be doing instead, and why?”. Prioritizing work in the Ops world is typically tougher than in the engineering world due to the interrupt-driven, break/fix nature of the role. There’s no reason you can’t just make an “Operational Interrupts” line item in your road map, assign it the proper resource level, and devote the remainder of the team’s time toward the projects which pop the stack in terms of business value.
- Communication/Partnering: The more of a partnership you can cultivate with engineering and senior management, the easier it gets. We already work well with both sets of customers, but this will always be a focus for us. Reviewing road maps and priorities to make sure we’re all on the same page, participating in design reviews (so that Ops has a seat at the table before a service launches), and consistently setting and resetting expectations will all make our lives easier as Ops personnel.