Senior Data Reliability Engineer
Elliptic
Location
London, United Kingdom
Employment Type
Full time
Location Type
Hybrid
Department
Engineering
The impact you will have:
As Senior DRE, you will drive engagement with Site Reliability across the full breadth of engineering. You will hold every engineer and every team accountable in building highly-resilient, robust, reliable software. You will be part of a cross-functional, cross-discipline team of SMEs and on-callers, whose mission it is to keep our platform highly performant 24/7/365.
Responsible for a diverse suite of products, you will oversee SR of enterprise grade applications that sit on the critical path running 1000s of QPS. Elliptic is known for its extensive and reliable datasets and you will play a critical role in defining and building out a market-leading foundation for data quality and control. This means building the processes, culture, and frameworks that will power observability, quality, data lineage, and remediation to form an essential pillar of our data & intelligence platform.
What you will do:
This is a cross team role, and you will have the full support of leadership and engineering in carrying out your responsibilities - it’s not all down to you, but you will show the rest of us what good looks like.
Evangelise SRE & DRE across engineering
Lead the charge on building out a framework for data quality that will provide our customers with strong guarantees about the fidelity of our data as well support our marketing and revenue functions
-
SRE as a function define and own the on-call process:
Quickly establishing a strong working knowledge of our systems
Commanding incidents
Running mop-ups
Ensuring follow-up actions are completed to your schedule
Evaluating and improving our existing E2E on-call process
Take part in the on-call rotation, one week every 4–5 weeks (24x7x365 coverage)
Evaluate, manage and maintain our existing solutions for monitoring, alerting, paging, response, documentation
Report on uptime, availability, performance, etc across our product suite
Write post-mortems for both internal and external consumption
Represent our SRE & DRE function on sales calls with tier one enterprise financial institutions
Work with product, sales and customer service to define SLAs for different products and use cases
Work with internal product teams to define SLOs for internal consumption and measurement
Work with our engineering teams directly to embed DRE practices
You will be a great fit here if you:
Thrive under high pressure situations, and are able to make tough decisions quickly
Fail fast, own the failure; encourage a blame free engineering culture
Are an inspiring thought leader, and are able to take others with you on a journey
Aren’t afraid to get your hands dirty and dig into code across myriad technologies
Understand the importance of reliability in enterprise finance systems
Have strong opinions based on your experience that you evolve over time as you learn from others
Our ideal candidate has:
Proven experience at leveling up the quality and reliability of large datasets not just services and APIs
Experience leading site reliability for a high volume SaaS product
Supported distributed systems in AWS
The presence and empathy required to hold teams to account
Defined SLAs / SLOs both internal and client facing
Offered post mortems to enterprise clients (verbal and written)
Bonus Points for:
Having a genuine interest in the crypto ecosystem and being behind the mission of the company
Working knowledge of Kubernetes and the challenges presented
Job Benefits
> How we work:
Hybrid working and the option to work from almost anywhere for up to 90 days per year
£500 Remote working budget to set up your home office space
> Learning & Development:
$1,000 Learning & Development budget to use on anything (agreed with your manager) that contributes to your growth and development
> Vacation/ Leave:
Holidays: 25 days of annual leave + bank holidays
An extra day for your birthday
Enhanced parental leave: we provide eligible employees, regardless of gender or whether they become a parent by birth or adoption, 16 weeks fully-paid leave and leave.
> Benefits:
Private Health Insurance - we use Vitality!
Full access to Spill Mental Health Support
Life Assurance: we hope you will never need this - but our cover is for 4 times your salary to your beneficiaries
£100 Crypto for you!
Cycle to Work Scheme