David Collom

Staff Solutions Engineer

[email protected] // Co Durham, UK


// About Me

Experienced Platform Engineer with over a decade of experience in infrastructure automation and platform design, including 8+ years of deep expertise in architecting and operating Kubernetes across cloud and on-premises environments. Focused on infrastructure as code at scale, using Terraform to build robust environments that accelerate delivery and reduce drift. Demonstrated thought leadership through presentations at industry conferences (e.g., KubeCon, ArgoCon), sharing insights on Kubernetes cost optimisation and platform reliability, and actively contributed to open-source community tools. Eager to leverage unique technical skills and leadership abilities to drive success in future platforms, ensuring high-quality services and rapid delivery in a fast-paced environment.

// Skills

  • Languages / DBs

  • Go, Python, Ruby, Redis, MySQL, PostgreSQL
  • Frameworks

  • Controller Runtime, OpenAPI / Swagger, Rails, Grape, KOPF, Sinatra
  • Applications / Toolkits

  • Kubernetes, Terraform, Docker, Chef, Git, Terragrunt, Nginx, Apache, HAProxy, Argo Project, Argo CD, Flux, Ansible, HashiCorp Vault, Consul
  • Observability

  • Prometheus, OpenTelemetry, Grafana, InfluxDB, Thanos, VictoriaMetrics
  • Cloud Services / Providers

  • Google Cloud Platform (GCP), Digital Ocean, Amazon AWS
  • Other

  • Team Leadership, Mentoring and Coaching, Open-source Contributions, Issue Triage and Planning, System Design and Architecture, Pragmatic Problem Solving

// Experiences

Staff Solutions Engineer

 
Jetstack / Venafi / CyberArk  (Remote)
  November 2020 - Present
  • Engaged in high-stakes client interactions, cultivating strong relationships with engineers, project management teams, and C-suite executives.
  • Led diverse teams across various time zones, promoting collaboration and success.
  • Spearheaded the development of extended Kubernetes deployments and platforms, thereby enhancing developer speed and reducing time to market.
  • Developed several custom controllers/operators to enhance and integrate with various tools for customers' specific needs.
  • Presented at ArgoCon 2023 to an audience of over 200 individuals.
  • Presented at the Cloud Native Computing and Kubernetes Conference in 2023, showcasing expertise in Kubernetes and cloud-native technologies.
  • Orchestrated demos and discussions at the Cloud Native Computing and Kubernetes Conference in 2022.
  • Contributed pull requests to significant projects of various software languages, from community tooling, documentation and code contributions.
  • Conceptualised and architected internal and external training platforms to optimise course development and delivery, significantly reducing toil and increasing the attendees' overall experience.

Senior Infrastructure Engineer

 
Credit Karma  (London, UK)
  July 2019 - November 2020
  • Oversaw the management, maintenance, and enhancement of Kubernetes clusters and core services.
  • Automated infrastructure deployment with Terraform and Terragrunt.
  • Formulated capacity planning strategies and predictive alerting mechanisms.
  • Directed service mesh infrastructure management, leading migration efforts.
  • Contributed to the launch of a new independent platform on Google Cloud Platform.

Infrastructure Platform Engineer

 
Sky Betting and Gaming  (Leeds, UK)
  May 2018 - July 2019
  • Administered several On-Premises and Cloud Kubernetes clusters.
  • Instituted automated patching and reporting mechanisms, significantly reducing engineering time and service disruption.
  • Established availability zones for On-Premises clusters to enhance resilience.
  • Developed Custom Cloud Controller for Kubernetes for on-premises node provisioning.
  • Utilised Terraform for streamlined automated deployment management.

Senior DevOps Engineer

 
Sky Betting and Gaming  (Leeds, UK)
  June 2017 - May 2018
  • Collaborated with Site Reliability Engineers and DevOps teams to elevate service quality.
  • Enhanced automated patching processes through a variety of tools, reducing toil and
  • Reduced operational overhead by automating previously manual tasks.

Senior Software Developer

 
William Hill  (Leeds, UK)
  February 2016 - June 2017
  • Managed, maintained and planned workloads across development and engineering teams.
  • Oversaw performance management and took part in candidate interviewing.
  • Provided Mentorship and support for software and systems engineers across various levels.
  • Championed migrating a monolithic application to a Service-Oriented Architecture, reducing delivery from 6 6-month to weeks.
  • Provided software solutions to visualise better and support the automation of firewalls and the approval process between engineering, Networks and InfoSec teams.
  • Oversaw several Software Development Life Cycle (SDLC) processes, ensuring adherence to best practices.

Software Developer

 
William Hill  (Leeds, UK)
  May 2014 - February 2016
  • Delivered a customised Platform as a Service (PaaS) solution, enhancing development efficiency and utilising VMware vSphere and existing on-premise solutions.
  • Delivered an automated containerised infrastructure product using Mesos/Marathon.
  • Conducted highly-rated training courses with exceptional feedback from participants.

Software Developer

 
City Electrical Factors  (Durham, UK)
  June 2012 - April 2014
  • Developed and maintained a bespoke E-commerce platform, enhancing the online shopping experience for customers.
  • Implemented a custom Content Management System (CMS) to streamline content updates and management.
  • Created a custom CRM system to improve customer relationship management and sales processes.
  • Developed a custom ERP system to integrate various business processes, improving operational efficiency.
  • Designed and implemented a custom reporting system to provide insights into business performance and customer behaviour.
  • Designed, supported and implemented various automation tooling to assist in hosting new and existing applications.

Developer & System Administrator

 
Visualsoft eCommerce  (Stockton-on-Tees, UK)
  August 2008 - June 2012
  • Oversaw servers hosting over 400 high-traffic eCommerce websites.
  • Maintained, secured and patched security vulnerabilities
  • Executed PCI compliance standards to bolster security.

// Achievements & certifications

Speaker: Yorkshire DevOps - GKE Overview and History

  2023  
Slides

I presented at Yorkshire DevOps to an audience of over 60 attendees, focusing on the evolution and features of Google Kubernetes Engine (GKE). The session covered GKE's history and highlighted its powerful capabilities, including AutoPilot, Release Channels, and essential add-ons such as Backups, Istio, Config-Connector, Ingress Controller, Secrets, and IAM Integration.

The presentation aimed to provide a comprehensive overview of GKE's strengths and its role in modern cloud-native architectures. Engaging with attendees at Yorkshire DevOps reinforced my commitment to sharing knowledge and promoting best practices in Kubernetes and cloud infrastructure management.


Speaker: KubeCon + CloudNativeCon North America 2023

  2023  
View
 
Watch

I presented alongside a colleague at KubeCon NA on “Kubernetes Confessions: Tales of Overspending and Redemption”, addressing critical challenges in cloud cost management. Our session attracted over 360 RSVPS, including attendees from notable organisations such as Microsoft, Google, Amazon, and HashiCorp. This experience highlighted our ability to provide actionable insights and engage a diverse audience within the Kubernetes community.

Our talk provided a vendor-agnostic approach to managing Kubernetes cloud spend efficiently, emphasising best practices and open-source tools. Key topics included gaining visibility into cloud and Kubernetes expenditures, right-sizing resources, leveraging discounts, and preemptively avoiding common pitfalls.

The positive reception from attendees reinforced our reputation as knowledgeable speakers in the Kubernetes ecosystem, We are known for delivering practical solutions and fostering continuous improvement within cloud infrastructure management.


Speaker: CNCF-Hosted Co-Located Events Europe 2023: Argo Con

  2023  
View
 
Watch

In 2023, I had the privilege of delivering my inaugural large-scale conference talk, “How to Avoid a Kubernetes Doom Loop,” at a prominent tech event. This lightning talk recounted a critical incident involving a Kubernetes cluster managing 16K Argo Workflows across 165 nodes, highlighting the risks of automation misconfiguration and emphasising the criticality of adhering to Cloud Reliability Engineering (CRE) best practices.

The talk attracted over 280 RSVPS, indicating substantial interest in the topic within the Kubernetes community and marking a successful debut on the conference speaking circuit. This experience underscored the importance of proactive maintenance and adherence to best practices in ensuring the stability and reliability of cloud-native infrastructures.


CS-169.1x: Software as a Service

  2013  
View