David Collom

Staff Solutions Engineer


// About Me

As a dedicated technologist, I strive for excellence, fueled by a passion for technology and collaboration.
My expertise lies in automation and DevOps, areas that are instinctive to my work.

I blend these skills to enhance system efficiency and productivity while relishing opportunities to learn from diverse individuals.
This combination of continuous learning, technical acumen, and collaboration makes me an asset to any tech-focused team.

// Contact

Co Durham, UK
May, 1985

// Experiences

Staff Solutions Engineer

 
Jetstack / Venafi

November 2020 - Present

During my enriching tenure at Jetstack, I was deeply ingrained in numerous high-stakes client engagements, forming strong working relationships with engineers, project management teams, and critical decision-makers ranging from Directory to C-suite executives. More than just being a team player, I relished the opportunity to lead dynamic teams spanning diverse time zones, fostering an environment of collaboration and progress.

A significant part of my professional journey at Jetstack was spearheading the development of extended Kubernetes deployments and platforms. My primary objective was to turbocharge developer speed, slash time to market, and deliver rapid, flexibly scalable solutions tailored to the changing workload and traffic demands.

I'm particularly proud of the opportunity to present at ArgoCon, a distinguished event jointly hosted with Cloud Native Computing and Kubernetes Conference EU 2023 to over 200+ individuals. I would be delighted to share the video link of my presentation upon request.

My engagement with the industry broadened further with the Cloud Native Computing and Kubernetes Conference in Detroit, USA, in 2022, where I took on a pivotal role. At our sponsor booth, I spearheaded demos and stimulated discussions about our consulting and security offerings, amplifying our brand's resonance and influence in the field. While my presentation at ArgoCon was an important milestone, my dedication to the industry is far from limited to such events.

I have eagerly given back to the community by contributing a multitude of pull requests to essential projects. This includes rectifying bugs, addressing lingering issues, and enhancing documentation - all in a bid to consistently refine the tools and resources our community relies on.

Apart from exceeding client expectations, I built and architected our internal and external training platform This revolutionary platform equips engineers with sandbox environments and projects, enabling them to interact with workshop exercises using our training packages - many of which I've personally delivered. This forward-thinking approach has significantly streamlined course development and reduced entry barriers for participants, often requiring nothing more than an iPad and a keyboard.


Senior Infrastructure Engineer

 
Credit Karma

July 2019 - November 2020

At Credit Karma, I have played a pivotal role in managing, maintaining, and upgrading our Kubernetes clusters and core services. My responsibilities encompassed:

  • Performing upgrades and implementing enhancements for the clusters, including robust monitoring of the cluster and critical services, introducing pod disruption budgets, maintaining network policies, enabling cluster autoscaling for different workloads, and establishing affinity and anti-affinity rules for critical systems.
  • Automating infrastructure deployment using Terraform and Terragrunt, ensuring streamlined and efficient processes.
  • Providing valuable feedback to service owners and product owners on resource management best practices.
  • Implementing capacity planning strategies and predicting alerting mechanisms to proactively manage resource usage.

In addition to overseeing the Kubernetes clusters, I took charge of managing and maintaining our service mesh infrastructure, starting with Linkerd 1.x and later spearheading the migration to an in-house envoy-proxy sidecar-powered mesh. This involved managing both internal and external deployments within and outside of Kubernetes.

Beyond the realm of Kubernetes, I took on the responsibility of various supporting infrastructure components. These included HashiCorp Vault, Consul, Artifactory mirroring, supporting UK region's static assets and Docker repository, and overseeing a large-scale shared MySQL deployment.

During my tenure, I was temporarily assigned to the data engineering team to assist with multiple ETL/data warehousing services within the UK region, as well as our primary recommendation engine. This cross-functional collaboration enabled me to broaden my skill set and contribute to critical aspects of the business.

The culmination of my efforts greatly contributed to the successful launch of our independent platform in the UK. Through the deployment of over 200+ standalone services, we were able to transition away from an expensive third-party hosted solution to our own Google Cloud Platform infrastructure.


Infrastructure Platform Engineer

 
Sky Betting and Gaming

May 2018 - July 2019

I was responsible for the management, maintenance, and enhancement of both our On-Premises and Cloud (AWS) Kubernetes clusters on a day-to-day basis. Working closely with our internal customers, I ensure adherence to Kubernetes best practices whenever feasible.

I have played a crucial role in several key projects aimed at improving our On-Premises offerings. These projects include:

  • Implementing automated patching and reporting mechanisms to streamline the maintenance process.
  • Establishing On-Premises availability zones to enhance the resilience and fault tolerance of our infrastructure.
  • Upgrading and modernizing the supporting infrastructure to leverage the latest technologies and features.
  • Introducing automated service health checks to minimize Time to Incident (TTI) during incident response.
  • Utilizing Terraform for automated deployment and configuration management across both On-Premises and Cloud (AWS) environments.

In addition to my core responsibilities, I actively invest my time in learning and development. I have acquired extensive experience in Go Lang, allowing me to investigate and implement internal tooling for our Kubernetes customers.

Furthermore, I have set up and managed my own 12-node Kubernetes cluster, incorporating various architectural patterns. This cluster supports essential components such as Prometheus, Grafana, Alertmanager, Home Assistant, Traefik, and Metal LB, providing me with hands-on experience and in-depth knowledge of these technologies.


Senior DevOps Engineer

 
Sky Betting and Gaming

June 2017 - May 2018

During my tenure in this role, I collaborated closely with SRE (Site Reliability Engineers), Performance Testing teams, and various other DevOps teams to uphold a high level of service quality.

One of my primary responsibilities was to enhance our automated patching process and auditing capability. Through the utilization of Jenkins, Chef, and custom in-house scripts (Bash/Python/Ruby), we successfully achieved this objective. These improvements made significantly reduced the team's overhead by automating previously manual tasks. Some notable features we implemented were:

  • Automatic rebooting of patched hosts to ensure the application of updates.
  • Automatic silencing of alerts across multiple platforms to prevent unnecessary noise during the patching process.
  • Automatic draining or disabling of patched hosts in multiple tiers to minimize disruptions to services.
  • Ensuring that new services were designed and configured to seamlessly integrate with the patching process.

By streamlining and automating these processes, we were able to enhance operational efficiency, reduce manual effort, and minimize the risk of human error.


Senior Software Developer

 
William Hill

February 2016 - June 2017

Building upon my previous experience as a Software Developer, my responsibilities expanded to encompass managing and maintaining workloads across the development and engineering teams.

In this capacity, I engaged in regular one-on-one meetings and performance management discussions with direct reports. I also oversaw timesheet management for contractors and actively participated in interviewing potential candidates. Additionally, I provided extended mentoring and training to both developers and engineers, ensuring their proficiency in undertaking tasks within a DevOps-oriented team.

To foster continuous improvement, I conducted investigations into new technology solutions and spearheaded the implementation of a peer review process. Moreover, I successfully drove the adoption of unit testing and benchmarking practices, significantly increasing their utilization across the team.

A noteworthy accomplishment during this role was leading the migration of our existing monolithic application/API to embrace a Service-Oriented Architecture (Microservices). This initiative allowed us to decouple a substantial portion of our platform and facilitated the swift execution of Proof of Concepts (PoCs) with various third-party providers such as AWS, Vcloud Air, Azure, and more.


Software Developer

 
William Hill

May 2014 - February 2016

During my tenure at William Hill, I have been a crucial member of the in-house Automation Platform team, known as WH Cloud. Our team's significant accomplishment was the remarkable reduction in scaling time for key business applications from 3-6 weeks down to approximately 1-2 hours. This achievement has had a profound impact on the efficiency and agility of our operations.

One of my notable personal achievements during this role was spearheading the delivery of an automated containerized infrastructure product utilizing Mesos/Marathon. I led the development efforts and ensured seamless integration with our existing deployed infrastructure. This solution brought about enhanced scalability, flexibility, and management capabilities for our applications.

In addition to my development responsibilities, I actively contributed to the organization by delivering various training courses. The feedback received for these courses has been consistently excellent, highlighting the value and effectiveness of the knowledge-sharing initiatives across the organization.


Software Developer

 
City Electrical Factors

June 2012 - April 2014

During my initial employment as a Software Developer with a PHP background, I proactively expanded my skill set by learning Ruby and the Ruby on Rails framework within the first six months. This demonstrated my adaptability and commitment to continuously learning new technologies.

In addition to my development responsibilities, I took on a significant project involving pricing, planning, and implementing a large-scale virtualized server architecture. This transition allowed the company to move away from relying on a single physical appliance. The new architecture was designed to leverage multiple service providers, enabling the distribution of services across different platforms. This strategic decision positioned the company for scalability, cost-effectiveness, and increased flexibility for future development endeavours.

The development of this custom-made virtualized solution involved various technical aspects, including load balancing, replication and clustering services, automated configuration and provisioning, and the implementation of a centralized firewall shared across all servers. These efforts contributed to the successful implementation of a robust and efficient infrastructure that met the company's needs.


Developer & System Administrator

 
Visualsoft eCommerce

August 2008 - June 2012

As a System Administrator, I successfully managed servers and infrastructure hosting over 400 high-traffic Ecommerce websites. My primary focus was ensuring exceptional uptime, striving for a 99 percentile uptime rate across all essential technology solutions.

Given the prestigious nature of many of our clients, security vulnerability was a constant concern. To address this, I elevated the level of security to meet PCI compliance standards, thereby safeguarding sensitive data and protecting the integrity of our systems.

In addition to my System Administration responsibilities, I actively participated in implementing and supporting the rollout of various features. Some notable projects included:

  • eBay Synchronization: I played a key role in integrating our E-commerce websites with the eBay platform, enabling seamless synchronization of products, inventory, and orders.
  • Amazon Synchronization: I contributed to the implementation of a system that synchronized our Ecommerce websites with the Amazon marketplace, ensuring consistent product data and order management.
  • Amazon Price Matching System: I was involved in developing and implementing a dynamic price matching system that allowed our clients to stay competitive on the Amazon platform by adjusting prices in real-time based on market conditions.
  • ePOS / Till System Integrations: I successfully integrated our Ecommerce websites with electronic point of sale (ePOS) and till systems, streamlining inventory management, order processing, and sales data synchronization.

These implementations enhanced the functionality and competitiveness of our Ecommerce websites, providing our clients with powerful tools to expand their online presence and optimize their operations.


// Achievements & certifications

Speaker: Yorkshire DevOps - GKE Overview and History

 
View

2023

I presented at Yorkshire DevOps to an audience of over 60 attendees, focusing on the evolution and features of Google Kubernetes Engine (GKE). The session covered the history of GKE and highlighted its powerful capabilities, including AutoPilot, Release Channels, and essential addons such as Backups, Istio, Config-Connector, Ingress Controller, Secrets, and IAM Integration.

The presentation aimed to provide a comprehensive overview of GKE’s strengths and its role in modern cloud-native architectures. Engaging with attendees at Yorkshire DevOps reinforced my commitment to sharing knowledge and promoting best practices in Kubernetes and cloud infrastructure management.

Slides are available here: https://docs.google.com/presentation/d/10yctw8AqZJgkbaKLakXiMEBC2uYTgrU-O5h2BKScwG0/edit#slide=id.g2e5c96ec5a5_0_2672


Speaker: KubeCon + CloudNativeCon North America 2023

 
View

2023

I presented alongside a colleague at KubeCon NA on “Kubernetes Confessions: Tales of Overspending and Redemption”, addressing critical challenges in cloud cost management. Our session attracted over 360 RSVPs, including attendees from notable organisations such as Microsoft, Google, Amazon, and HashiCorp. This experience highlighted our ability to provide actionable insights and engage a diverse audience within the Kubernetes community.

Our talk provided a vendor-agnostic approach to managing Kubernetes cloud spend efficiently, emphasizing best practices and open-source tools. Key topics included gaining visibility into cloud and Kubernetes expenditures, right-sizing resources, leveraging discounts, and preemptively avoiding common pitfalls.

The positive reception from attendees reinforced our reputation as knowledgeable speakers in the Kubernetes ecosystem, known for delivering practical solutions and fostering continuous improvement within cloud infrastructure management.

This talk is available on YouTube: https://youtu.be/6gtj7pwuWZs?si=jv9ry-GfaIKS9_op


Speaker: CNCF-Hosted Co-Located Events Europe 2023: Argo Con

 
View

2023

In 2023, I had the privilege of delivering my inaugural large-scale conference talk titled “How to Avoid a Kubernetes Doom Loop” at a prominent tech event. This lightning talk recounted a critical incident involving a Kubernetes cluster managing 16K Argo Workflows across 165 nodes, highlighting the risks of automation misconfiguration and emphasizing the criticality of adhering to Cloud Reliability Engineering (CRE) best practices.

The talk attracted over 280 RSVPs, indicating substantial interest in the topic within the Kubernetes community and marking a successful debut on the conference speaking circuit. This experience underscored the importance of proactive maintenance and the adherence to best practices in ensuring the stability and reliability of cloud-native infrastructures.

This talk is available on YouTube: https://youtu.be/skOTc_evJnw?si=u5g-xyetfMt_Bb_d


CS-169.1x: Software as a Service

 
View

2013


// Education

University of Teesside

HNC, Web Development

2005 - 2007

Modules:
  • Principles of Visual Programming
  • Computer Law
  • Web Authoring
  • Introduction to Markup Languages
  • Internet Marketing
  • Multimedia Applications 1
  • Database and SQL
  • Communications Case Study
  • Group Project


Bishop Auckland College

Various, Information Technology

2001 - 2005

Modules:
  • C&G Level 3 - ICT Practitioner (system Support)
  • C&G Level 3 - Webpage design - Credit
  • C&G Level 2 - ICT Practitioner (System Support)
  • TROCN - Intro to Webpage Design
  • NCFE - Intro to the Internet
  • NCFE - Build a PC
  • KeySkills Level 2 - ICT
  • GNVQ - Intermediate ICT


Barnard Castle School

GCSE

1999 - 2001

Societies:
  • Combined Cadet Force (RAF)


// Skills

  • Languages / DBs

  • Ruby, Python, Go, MySQL, Redis, PostgreSQL, MongoDB
  • Frameworks

  • Rails, Grape, OpenAPI, KOPF, Sinatra
  • Applications / Tools

  • Kubernetes, Terraform, Docker, Chef, Git, Terragrunt, Nginx, Apache, HAProxy, Argo Project, Argo CD, Flux
  • Cloud Services / Providers

  • Google Cloud Platform, Digital Ocean, Amazon AWS
  • Operating Systems

  • Ubuntu, RedHat, Debian, MacOS

// Hobbies & Interests

  Formula 1 & MotorSports


  Cycling


  Traveling


  Music


  Digital Art


  Photography


  Gadgets


  Automation