Platform Engineering

Platform engineering teams are concerned with deployments, service accounts, and infrastructure. The responsibilities of a platform engineering team should not be confused with those of a DevOps team. They’re similar in some respects, though they vary in others.

Platform engineers build systems and services that allow other teams to consume or build on, and acts an engineering efficiency function.

These team(s) operate an internal platform which enables delivery teams to self-service deploy and operate systems with reduced lead time and stack complexity. The emphasis here is on API-driven self-service and supporting tools, with delivery teams still responsible for supporting what they deploy onto the platform. Organizations that consider establishing such a platform team should be cautious not to accidentally create a separate DevOps team, nor repurpose an existing hosting and operations structure as a platform.

Platform Definition

The Platform is this abstraction layer that hides the underlying complexity of operating the software and infrastructure layers, takes care of all the details of handling Infrastructure operations, services orchestration, CI/CD, and monitoring all these components.

The Platform can be seen as an internal product whos stakeholders are the technology teams that build on top of it the software and applications that power the Business and helps it thrive in this ever-changing technology landscape.

Platform Components

The Platform can be a multi-layered entity where each layer has its responsibilities and clearly defined boundaries, and it could go as follows:

  • The Infrastructure layer
  • The Software Architecture layer
  • The Continuous Improvement layer (DevOps)
  • Continuous Integration and Delivery (or Deployment)
  • Continuous Testing
  • Continuous Security
  • Continuous Feedback (Monitoring)
  • Knowledge Transfer and Documentation Layer

The Infrastructure Layer

Infrastructure is the deepest layer of The Platform, and one of the most important ones. Infrastructure should be easy to reproduce and auditable, using IaC (Infrastructure as Code) to simplify the management of resources and support Security and Compliance requirements (Business Continuity, and Disaster Recovery Plans).

The Platform team should carefully decide which technologies and frameworks to use to achieve operational efficiency of the Infrastructure. Automation is critical for the infrastructure layer reducing its operating complexity and maintenance in the long term.

Services orchestration is another vital component of the Platform. Picking the right technology is crucial for the infrastructure layer, not only for things like maintenance, scalability, and costs but also for how easily these technologies interact with the other layers.

The Software Architecture layer

The Platform team could work as an internal consultant and authority of the Technology teams in terms of aiding them in making the right decisions on picking up the technologies they use to develop their software. These go from programming languages, libraries, and frameworks, but also aiding with developing reference implementations and boilerplates that help the developers focus on the business logic and not on how to implement non-functional requirements over and over again. The platform team could help build core libraries that aim to reuse code and abstract functionality to deliver new products more efficiently.

The Platform team is responsible for aiding the Technology teams on evaluating the right systems used to implement their solutions and also help them take into consideration all the necessary aspects to make their solutions scalable, maintainable, and observable.

The Continuous Improvement layer

DevOps is a culture, and the Platform team should transmit it to the technology teams, aiming to achieve operational efficiency in the whole Software Development Lifecycle and reducing time to market. The Platform team needs to care for how code gets written, how its deployed, and the immediate feedback developers can have on how their applications run in production environments. The Platform team will be responsible for maintaining the Continuous Improvement layer that will provide the Technology teams the necessary tools to implement every step of a mature SDLC, abstracting the complexity of interacting with this layer by using APIs

Continuous Integration and Delivery

Continous Integration and Delivery is one of the pillars of the DevOps culture, simplifying the release of new products and features, reducing the time to market and allowing a business to be competitive. The Platform team should select the right tools to simplify the deployment of applications to the infrastructure, automating and integrating these tools to provide an abstraction layer that developers can easily use to deploy their applications with minimal intervention (or no intervention at all) of the Platform team.

Continuous Deployment can be supported by the platform, but requires a mature and refined DevOps culture, because of its nature and flexibility, and also requires strong support from automated testing.

Continuous Testing

The Platform team should be able to provide the mechanisms to test the software the technology teams are developing during the whole SDLC.

  • For example, Unit and Integration testing during the Continuous Integration step, Smoke and Regression testing during the Continuous Delivery step, and periodically run tests that the AQA (Automated Quality Assurance) teams create to test that applications are behaving as expected in production environments.

Continuous Security

The incredible pace the technology advances these days has also come with the challenges of how we secure and protect our critical systems.

The term DevSecOps has become popular and focuses on bringing the DevOps culture to the security landscape, to efficiently implement security best practices into the SDLC. So integrating things like static code analysis, vulnerability scanning, and penetration testing into the CI/CD pipelines has become necessary to protect our most valued assets. DevSecOps should collaborate with the Platform team to make security a critical part of the Platform.

Continuous Feedback

Obvservability in every layer of the Platform is vital to understand how it is performing, the Platform team should implement the mechanisms to provide feedback to the technology teams on how their applications and systems are behaving in Production. The platform team will pick the right tools to monitor infrastructure, manage logs, monitor application performance metrics (APM), and uptime monitoring. These tools will help the technology teams have a deep understanding of how their applications are behaving and quickly detect problems and reduce the time it takes to remediate them and also use the appropriate channels to keep everyone in the loop.

Knowledge Transfer and Documentation Layer

The Platform engineering function should be transparent and clear on how the whole Platform operates. So knowledge transfer is another responsibility of the Platform Team, documenting all layers of the Platform, performing knowledge transfer sessions with the technology teams to explain and discuss how the different components of the platform work. Code reviews and software architecture review assessments could be performed by the Platform team, to make sure technology teams meet software quality requirements to make efficient use of the Platform.

Who is the Platform Team?

The platform team at the end of the day is a product team whose clients, users, and stakeholders are the technology teams that use the Platform itself, to release new products and features for the end-users. The team should be composed of a set of individuals whose seniority and expertise will let them be able to lead the design, implementation, and maintenance of a platform that will work as a foundation for all the products developed by the technology teams.

Platform Team Operation

Like any system, its value relies on its capability to evolve and adapt as the business grows and changes. Continuous improvement should be the core value of the Platform and practiced by the Platform Team.

How to improve your Platform Engineering velicity (Agile)?

Maintain focus

When your Platform team is faced with new tasks and distractions once the sprint has already begun, your velocity will suffer. It is very important to clearly define the tasks and prioritize these tasks before you begin your sprint. Be consistent when it comes to accepting or rejecting changes from the rest of the stakeholders, everyone should be aware of the deadline for submitting ideas and features to the Platform engineering backlog.

Technical debt

Developers are constantly under pressure to deliver features faster. These pressures result in shortcuts and assumptions, which may probably necessary at that time. However, it is easy to forget these concessions of these actions and if they’re not dealt with, they turn into technical debt that could weigh your Platform team down.

To optimize for velocity, you need to ensure that your Configuration management [Chef, Puppet, and Ansible], Infrastructure as code (IaC) [Vagrant, Terraform, and AWS CloudFormation] and CI/CD pipeline code bases are well maintained.

Sprint retrospectives

Retrospectives (or retros) are a key part of the agile methodology. It allows a team to identify what went right or wrong in the sprint and how to improve for next time.

Ensure that these retros happen in a trusted environment, without blame, so you get maximum transparency. The meetings will require time and participation, it’s a tried and tested practice that will lead to extremely valuable insights and tweaks that will ultimately speed up your velocity.

Some teams perfer to have a retro every two or three sprints, this in our opinion is not fruite full. When a sprint span multiple weeks, we've found in some cercumstances details have been forgotten and thoughts on improvements may diverged and have moved on.

Team morale

A happy team is a productive team. Ensure the team is rallied around a common goal and that they have clear ways of expressing feedback if they’re not happy.

The work that is done should also feel satisfying, try to bridge the gap between the business side of the company and the engineering team, by, for example, showing how many users started to use the feature that was built in the last sprint.

Don't forget to cheer for the maintenance and tech debt heroes: the people who refactored this nasty function, fixed that bug, reviewed these pull requests, and ultimately saved the team hours of work by keeping the codebase healthy.

In Summary, evaluate velocity over time

It’s important to remember that your velocity cannot be constantly increased (you don’t want to end up with scewed estimations). Instead, try to evaluate it periodically and make the changes that you feel will have the greatest impact on the team.