AI SRE vs AI Platform Engineer

"Offense sells tickets, but defense wins championships," Bear Bryant

"The future is already here — it's just not very evenly distributed," William Gibson

We get asked if we’re building an AI SRE, which seems to be a tool that:

Ingests a firehose of observability data
Identifies root cause of any current/developing incidents
Recommend solutions and accelerates time to recovery

This seems like a useful tool but we’re not building that. Here’s why:

Platform Engineering 101

In the 2010s, running code in the cloud was really hard. A lot of tooling and concepts descended on high from FAANG and most engineering teams weren’t ready for it:

Cloud → redefined scale
Kubernetes → managing scale was incredibly complex
Microservices + Squads → knowledge, ownership and responsibility were distributed outside of a central DevOps team

Smart teams took a look at this complexity and realized that the traditional DevOps model couldn’t scale. Instead they needed to build automations, golden paths and self-serve tooling so that everyone could work with the cloud.

Spinnaker was Dylan going Electric. In 2022, Charity Majors christened this movement as Platform Engineering; and now in 2025 Google says Platform Engineering is mandatory.

No more Incidents

Proposed:

Platform Engineering has significantly reduced the rate of incidents, and changed the role of SREs.

Hypotheses:

Most incidents are triggered by rollouts
Rollouts are much safer and routine due to modern Platform Engineering
- High quality, representative development environments
- Linting/Testing/Smoking/Policy for every PR
- Templates for Cloud Resources
- B/G deployments and Automated Rollbacks
- Smarter, standardized observability
- IDPs and SLOs have solved most of the ownership and knowledge problems
Incidents are less frequent. When they happen they are more easily classified and understood because of standard open source projects and comparable cloud products
SREs are becoming coaches or architects. They make sure an organization knows how to manage SLOs, Observability and Incidents.
Staffing Ratios are changing to reflect this. We’re seeing 1 SRE for every 10 Platform Engineers, for every 100 Feature Engineers. Budgets follow similar ratios

tldr - we’ve learned how to build platforms that make the cloud (or your data center) safe and accessible. SREs are still important but they are now one of many critical roles supporting development and customer experience.

After Incidents, comes Toil

So why do engineers spend only 16% of their time coding applications?.

Well your platform team has a lot of work to do

Maintaining those Developer Environments
Writing & Updating Linting/Testing/Smoking/Policy systems
Managing Cloud Resources and Building Abstractions
B/G deployments and Automated Rollbacks
Updating instrumentation and log management
Updating the metadata model and SLOs underneath your IDP

Add in migrations, refactors, and GPUs and your platform team is incredibly busy. But they also need to give their stakeholder engineers the self-serve tools to do things like

Access cloud resources and using your abstractions
Debug, mutate and fork abstractions as needed
Release new code, and keep it running
Documentation, Labels, Hygiene, etc
FinOps
Compliance

In short, platform engineering has made the cloud productive, safe and accessible but these capabilities don't come for free. Copilot, cursor, etc can’t help you with these tasks because these tools don’t have visibility into observability and orchestration. That’s why you keep hiring Platform Engineers.

Don’t build an AI SRE, build an AI Platform Engineer

So that’s what we’re building … an AI agent that helps you build, maintain and protect the things you do in the cloud.

Flightcrew has many similarities with an AI SRE … we ingest observability data, traverse graphs, classify issues and recommend fixes.

The difference is that we’re not focused on playing whack-a-mole with incidents … we’re playing tower defense by generating code/IAC for reliable, efficient and compliant infrastructure.

Today Flightcrew is performing tasks like:

Refactoring hundreds of lines of IAC for a legal tech company
Optimizing Kubernetes resources for India’s largest delivery startup
Tuning autoscaling, networking and database config at a major digital education company

When we earn it, we’ll call Flightcrew an AI Platform Engineer. Until then, we'll simply say we're building Flightcrew.

If this resonates we’d love to chat.

And if you are building an AI SRE or Codegen tool we’d love to compare notes and integrate. We share a common enemy and the future is bright.

AI SRE vs AI Platform Eng

Platform Engineering 101

No more Incidents

After Incidents, comes Toil

Don’t build an AI SRE, build an AI Platform Engineer

Keep reading

Announcing our Integration with GitHub Copilot

Flight Plan

Don’t miss out!