Site Reliability Engineer Resume Samples

HIGH QUALITY

The best examples from thousands of real-world resumes

EXPERT APPROVED

Handpicked by resume experts based on rigorous standards

DIVERSE EXAMPLES

Tailored for various backgrounds and experience levels

Site Reliability Engineers are software development experts who handle the following responsibilities in a company: improving application lifecycle, evolving software systems to increase their reliability, monitoring application performance, and ensuring overall system health. Those interested in a Site Reliability Engineer job should be able to showcase the following skills in their resumes: computer proficiency, analytical thinking, problem-solving, communication abilities, troubleshooting, coding skills, and teamwork. Most resume samples in the field make display of a Bachelor’s Degree in information technology.

Looking for job listings? Check out our Site Reliability Engineer Jobs page.

Want personalized resume advice? Get a FREE Resume Review

1

Sr. Site Reliability Engineer

  • Implemented, tested and monitored microservices in the datacenter cloud environments for Cisco-Jasper IOT platform. Performing continuous integration and delivery of new microservices, on-demand trouble shooting of large-scale deployment issues on Linux systems. Started and maintained How-To series of knowledge items, sharing acquired information about installation, integration and deployment for Middleware services on privately hosted and public clouds, including AWS, Google and IBM clouds.
  • Provided configuration, maintenance and testing of Jibe pipeline framework for Apple Corp., allowing migration of data between heterogeneous systems and services. Worked on creating Maven based build environment, testing import and export components of the Jibe framework, integration with Kafka services, monitoring data synchronization between Oracle and Mongo databases. Enabled continuous build and deployment automation for hybrid cloud environment, expanding integration coverage for software defined enterprise infrastructure.
  • Developed and maintained a toolchain framework for configuring, integrating and testing a set of Middleware tools, used to build corporate VMware products and services. Project resulted in continuous management of the private cloud based distributed repositories, allowing automated test driven synchronization of the toolchain content. Created and managed a virtual test lab environment for testing enterprise services inside multitenant cloud infrastructure.
  • Designed and implemented adaptive remote testing framework for installation and customization of multitenant cloud environments, their integration with distributed data sources.
Candidate Info
17
years in
workforce
3
years
at this job
MS
Computer Science
2

Site Reliability Engineer

Deploy and monitor Amazon Web Service resources (EC2, VPC, ELB, S3, RDS) using Boto, Terraform and Chef

  • Deploy code updates into test and production environments and work to roll environments forward
  • Maintain Git repositories for developers and promote topic branch workflow
  • Help with Support tickets by reproducing bugs
  • Troubleshoot and escalate bugs for our Live server product
Candidate Info
18
years in
workforce
3
years
at this job
BA
Mathematics
3

Site Reliability Engineer

Provide systems support by participating in rotational on-call support as well as performing recovery, maintenance and upgrades during weekend and evening hours.

  • Serve as an escalation point for other Systems Administrators, Engineers, and other technology teams in the resolution of server and system problems.
  • Contribute to the development and maintenance of automation tools used in the management of our infrastructure.
  • Plan, schedule, test and perform software installation and upgrades.
  • Create and maintain documentation of systems and processes for existing and new systems.
  • Build, administer, and troubleshoot all mission critical environments (Production, Stage, Dev, Test, QA)
  • Coordinate changes with application owners to ensure minimal user impact.
  • Maintain PCI and SOX compliance with required applications and environments.
Candidate Info
10
years in
workforce
3
years
at this job
BA
Bachelor of Arts
4

Service Engineer II / Site Reliability Engineer II

  • Deploy and maintain international server environment for 24/7 critical uptime business product offering in a mixed Windows/Linux environment.
  • Leverage automation tools, especially Powershell and Puppet, in order to decrease end-to-end deployment times, reduce downtime, and increase reliability.
  • Implement and maintain monitoring solutions at the server and application level in order to increase visibility into day-to-day operations and issues, utilizing SCOM, Nagios, Solarwinds and AppDynamics.
  • Lead initiatives to transition critical software services into the Cloud, and provide training for other employees on the Cloud transition process for other portions of the product/organization.
  • Act as top-tier on-call support for critical uptime business applications to maintain availability and minimize downtime during outage scenarios.
  • Provide training for System Administrators and other Engineers, including brown-bag style trainings, documentation, and one-on-one mentorship.
Candidate Info
13
years in
workforce
2
years
at this job
BS
Computer Science
5

Site Reliability Engineer

  • Write automation/self-healing scripts in Ruby / BASH / Go to maintain the Bluemix cloud environment
  • Manage the stability, operation, and automation of more than 50 Bluemix environments (Cloud Foundry-based cloud platforms)
  • Perform primary/secondary on call duties to manage alerts on pager duty and solve issues
  • Perform Cloud Foundry deployments to Bluemix using BOSH and Urban Code Deploy
  • Create/maintain Slack integration bot which supports the Bluemix SRE team (Ruby/Sinatra)
  • Contribute to development pipeline for Urban Code Deploy using Golang
Candidate Info
5
years in
workforce
2
years
at this job
BA
Mathematical Economics
MA
Finance
(java), Econometrics, Game Theory, Managerial Accounting, Advanced Microeconomic Analysis
6

Site Reliability Engineer

  • Front line technical service reliability operators accountable for handling critical customer issues coming in via support phone line and HUB.
  • Responsible for first touch incident resolution (via TSG or SOP) or escalation to the appropriate resource within SLA.
  • Responsible for monitoring the live service via HUB alerts, Heads up Displays, Manual service checks or customer escalations.
  • Accountable for High Priority Bridge Moderation (Spin up bridge, start whiteboard, document sequence of events).
  • Document and refine Phone Script, TSGs and SOPs.
  • Service Request Management (User Provisioning, Client Invites, Environment requests, Deployments, etc.)
  • Responsible for refining Service Center tools and process
Candidate Info
28
years in
workforce
9
months
at this job
BS
Bachelor of Science
MA
Computer Applications
7

Site Reliability Engineer Intern

  • Preparing the Business Process Flow using Bizagi Modeler
  • Responsible for setting up ELK (ElasticSearch, Logstash, Kibana) platform, parsing unstructured logs using regular expressions to structured JSON format
  • Passing the structured data to ElasticSearch and performing operations on this data
  • Analyzing the data on Kibana, Graphana and Graphite and deriving the performance of the products. Stabilizing the servers
  • Setting up alerts, handling overloads on server, performing release engineering
  • Analyzing, investigating and resolving problems to help smooth product performance. Programming in Visual Studio Code, tracking the progress through JIRA and Git Repositories
Candidate Info
3
years in
workforce
4
months
at this job
BE
Computer Science
MS
Information Systems
8

Devops/site Reliability Engineer

  • Collect and maintain a complete inventory of all systems. Identify and retire unused systems to recycle resources and reduce maintenance costs.
  • Configure and maintain thousands of systems via a set of Chef cookbooks within an Atlassian continuous build and deploy environment (Jira, Confluence, Stash, Bamboo, Git).
  • Identify and correct the root cause of various system alarms. Recommend changes to avoid their recurrence.
  • Configure and maintain Amazon Web Services (AWS) Cloud Computing environments.
Candidate Info
19
years in
workforce
1
year
at this job
BA
Computer Science
MS
Computer Science

Free Professional Resume Critique

We have partnered with TopResume to bring you a FREE resume critique service. Upload your resume and within 48 hours TopResume will email you a detailed analysis of what hiring managers and automated systems think of your resume – and how to improve it.

Browse
Upload Resume File