Position Title: Site Reliability Engineer
Company: ST Electronics (Training & Simulation Systems) Pte Ltd
Business Area: Electronics

SREs at TMS take charge of maintaining the platform TMS uses to run production systems. You'll experience a mix of activities from optimising OpenShift to maintaining system uptime, to conducting operating system upgrades & deployments of new platform services. This role is focused on the maintenance & reliability of the platform & processes as well as automation of tedious, complex &/or error-prone tasks. There is plenty of room to learn about the application deployment side of things as well.

 

This role is brand-new at TMS & you can look forward to working with bleeding-edge technologies.

 

Technologies We Use:

·       AWS, OpenShift

·       BitBucket, Jenkins

·       Infinispan, Kafka, Mongo, PostgreSQL, Redis

·       Golang, Java, NodeJS, Python

·       ElasticSearch

·       Prometheus, Alertmanager

 

Essential Qualities:

·       You are able to balance security concerns with business practicalities.

·       You are flexible & pragmatic in making things work now while systematically improving system health & security in the long run.

·       You are a coder who knows systems.

·       You are an analytical thinker who gathers data to identify problems.

·       You are passionate about delivering reliable systems that drive business applications.

·       You are comfortable working cross-functionally with application developers & product managers to ensure success of the system & business.

·       You love finding ways to collect & organise data as well as slicing & dicing these data points to identify areas for improvement.

·       You will be called upon to support the platform in the event of a failure.

·       You are comfortable with on-call responsibility & are able to manage a crisis collaborating with other functional groups.

 

Requirements:

·       3-5 years of experience in a SRE/sysadmin-related role, preferably with a background in application delivery

·       Understanding of &/or experience in cloud computing &/or platforms, preferably with a broad swath of Amazon Web Services

·       Understanding of &/or experience in Linux system administration

·       Understanding of &/or experience in scripting, preferably with bash &/or Golang

·       Understanding of &/or experience in building tools to automate system, platform &/or database maintenance tasks

·       Understanding of &/or experience in applying log processing & alerting stacks, preferably with the Elastic Stack

·       Understanding of &/or experience in applying monitoring tools to detect &/or understand anomalies, preferably with time-series-based tools

·       Understanding of &/or experience in server automation tooling such as Ansible, Helm &/or Terraform

·       Understanding of &/or experience in OpenShift, Kubernetes &/or Docker, preferably along with service meshes such as Istio

·       Understanding of &/or experience in cloud-centric networking such as Elastic Load Balancer, HAProxy, Open vSwitch &/or Virtual Private Cloud

 

Nice to Have:

·       Familiarity with database administration &/or management is a strong plus

·       Familiarity with cloud security norms &/or implementation details is a strong plus

·       Familiarity with content delivery networks is a plus

·       Familiarity with application development is a plus

Back to Job Listing