Site Reliability Engineer & Systems Engineer
Over 10 years of experience scaling Linux infrastructure, implementing automation, and building reliable systems across hybrid and cloud environments
I'm a Site Reliability Engineer with over a decade of experience supporting and scaling Linux and Windows-based infrastructure across on-premises, hybrid, and virtualized environments. I specialize in implementing infrastructure automation, observability solutions, and incident response procedures.
My expertise includes performance tuning, system monitoring, and reducing operational toil through scripting and configuration management. I'm passionate about SRE principles, automation, and building reliable, scalable systems that serve both internal teams and customers.
Improving and maintaining 99.5%+ service availability for critical customer-facing systems and infrastructure. Leading implementation of infrastructure projects, including migrations and performance optimizations.
Architected infrastructure upgrades and system migrations to reduce downtime and improve resilience. Developed monitoring dashboards for performance visibility and alerting.
Delivered advanced troubleshooting for enterprise and SMB environments across virtualization, NAS, RAID, and system performance. Created and maintained internal observability tools for diagnostics.
Resolved technical issues related to OS, networking, disk arrays, and server access.
A comprehensive homelab setup with DNS through Technitium, services managed through Ansible playbooks, and everything hosted on Gitea.
A complete monitoring and alerting solution using Prometheus, Grafana, and custom dashboards for infrastructure metrics.
Infrastructure deployed through Terraform and Proxmox with configuration handled by Ansible.