I have to admit that when I started creating my first Ops Acceptance Criteria (OAC), I had very little knowledge of what it entailed- I just knew it needed to happen. So I scoured the e-interweb for examples and, perhaps not-so-surprisingly, found a plethora of OAC docs which probably should have been confidential information. 🙂 I adopted general concepts from some of those docs to cobble together one of my own. As with everything else I’ll ever write about here, there isn’t a one-size-fits-all solution; every OAC must match the current environment and be reviewed/refined consistently to ensure it doesn’t become a fossil two days after publishing. But here’s a generic outline of what we created for my first IT Ops team, with the confidential/specialized parts omitted, of course. I think it’s a decent outline of what ought to be covered for a typical first-line support team, although the categorization could probably stand a refresh.
Before a task/service/etc can be considered ‘handed off’, the operations team will shadow the engineering oncall (or designate) for X period of time, and roles will then be reversed, with the engineering teams shadowing the operational oncall, for X period of time. During the warranty period, the following checklist will be utilized and all relevant points signed off on. When all relevant issues are addressed, the operational handoff will be considered complete. One engineer and one member of the operational support staff will be paired to maintain consistency throughout the audit. These two people are responsible for ensuring smooth a handoff.
- Must reside on currently supported O/S versions and hardware platforms, as defined by whomever is charged with defining them.
- Must be stable prior to handoff to operational support team. We make sure the service/architecture won’t spawn operational interrupts due to poor monitoring configuration, improper implementation or poor design for at least the last week of the warranty period.
- Configuration/data/tuneables separate from code (where applicable)
- Performance characteristics understood and documented
Adherence to standards
- Default and unique configurations identified and documented
- Deviations from standard redundancy model identified and documented
- Any IT Security requirements met and documented
- Operational support documentation has been furnished utilizing the standard Ops Doc template(s).
- Relevant tagging/categorization has been applied to documentation (where applicable, for ease of oncall duties)
- Bottlenecks/known choke points documented
- List of clients (where applicable) documented
- List of dependencies documented
- RMA (return merch auth) process, including vendor SLAs, documented
- Hardware ‘spares’ are identified and available as required
- Vendor contact information documented
- Routine Maintenance procedures documented
- Satisfactory high-level code review occurs (major components and software package dependencies, where applicable)
- Upgrade processes defined and reviewed with operational team (includes test suite as well as expected behaviour)
- Disaster Recovery procedures tested and documented
- Log retention policy/location/rotation documented
- Integration with current appropriate tool set. Must be compatible and tested/validated on the platform
- Permissions are managed via the accepted mechanism for servers, services and/or network devices.
- Naming convention documented and adheres to standards
- Exceptions to the accepted CM policies and procedures are approved by the engineering team and operations management and are listed in the proper section of the relevant CM policy document(s).
- Operations team must be an approver/reviewer for major revisions of software or major upgrades/changes in architecture
- Operational documentation must be provided to and reviewed by the operations team prior to requesting resources for CM completion
- Alarming with standard notification processes
- Thresholds have been tested
- Performance and health monitoring is in place
- Ticket impact levels are defined for identified failure modes
- Operational ticket assignments match the task/request in support of customer experience and metrics.
- Auto-submitted tickets from the system include link to operational documentation
- Auto-submitted tickets have de-duping mechanism, where appropriate
- What engineering escalation alias is paired with the service?
- Escalation path to engineering queue and criteria defined
- SLA for escalation defined/documented
Known Opportunities for Improvement
List any automation or standardization opportunities for this task/product, including links to the relevant action/tracking items.
Whiteboard sessions/”chalk talks” have been given to all operational support personnel prior to handoff. This can be held by the operational or engineering team, whichever makes the most sense. Support staff in all locations should have first-hand (and in-person if possible) knowledge of these sessions.