Formed in 2014 by a team of proven FinTech entrepreneurs, we are an FCA-regulated business providing global claim funds management and payment solutions. Operating one of the largest banking and payment settlement networks in the world, we give our customers direct access to 200 countries and currencies. Through a single integration, insurers can use this network to pay claims in as fast as 45 seconds and deliver a superior claimant experience. Our market-leading treasury proposition provides insurers with transparency and control over their claim funds, even when delegated to third-parties, allowing them to have their money in the right place, at the right time, to make that all-important payment when customers need it most.

With over 175 employees across our London headquarters, Europe, and the US, $93m Series C funding secured, and exceeding £10bn in processed transactions, we are only just getting started.

We are collaborative, customer centric and work with integrity, whilst partnering with some of the biggest insurance leaders including Lloyd’s of London and Many Pets. We take huge pride in our company culture, ensuring that everyone has a part to play, an opportunity to be heard, be involved, and the ability to make a real difference. As we continue to scale up, we want like-minded humans to join us on this exciting journey. Are you ready?

As a Site Reliability Engineer (SRE), you will play an important role in designing, building, and maintaining the infrastructure and tools necessary to support our software applications and services. You will collaborate closely with the product engineering squads, technical operations, and security teams to ensure the reliability, scalability, and security of our platform. Your responsibilities will include automating infrastructure provisioning, configuration management, and deployment pipelines, utilizing best practices and modern technologies to streamline processes and improve efficiency. You will also be responsible for monitoring system performance, identifying bottlenecks, and implementing solutions to enhance system reliability and performance.

Key responsibilities

• Cloud Platform Management: Using Azure/AWS to manage and optimize infrastructure components, ensuring scalability, reliability, and cost management.

• Infrastructure Design and Implementation: Designing, building and maintaining the cloud-based infrastructure that supports our software applications and services

• System Reliability: Ensuring the reliability, availability, and performance of systems and services by designing, implementing, and maintaining robust infrastructure.

• Infrastructure as Code (IaC): Implementing and maintaining tools for automation, monitoring, and deployment to improve efficiency and reduce manual intervention.

• Collaboration and Support: Working closely with product engineering to ensure efficient workflows and support continuous integration and delivery pipelines (CI/CD).

• Capacity Planning and Scalability: Assessing system capacity requirements and planning for future growth to ensure the system can scale and is cost efficient.

• Incident Response and Management: Monitoring system health, promptly responding to incidents, and assisting with the resolution process.

• Risk Management: Identifying potential risks and vulnerabilities in systems and implementing measures to mitigate these risks effectively.
• Monitoring and Observability: Implement and oversee monitoring tools to proactively detect
and mitigate issues, ensuring high application and system availability.
• Documentation and Knowledge Sharing: Maintaining documentation and sharing knowledge
with the team to ensure transparency and facilitate cross-functional collaboration.

• 3+ years of experience in a Site Reliability Engineer, DevOps, Platform Engineer, or similar role.

• Strong knowledge and experience in cloud platforms, substantial experience in Microsoft Azure is essential

• Proven track record in designing, implementing, and maintaining highly available and scalable systems.

• Expertise in containerization tools like Docker and orchestration tools such as Kubernetes.

• Experience with infrastructure as code (IaC) tools such as Terraform, Ansible, or Chef for automation and configuration management.

• Strong understanding of monitoring and observability tools like Prometheus, Grafana, Azure App Insights for proactive system monitoring and troubleshooting.

• Knowledge of networking, security principles, and best practices in a cloud environment.

• Demonstrated experience of CI/CD tools like GitHub Actions, GitLab CI/CD, or Azure DevOps for continuous integration and delivery.

• Problem-solving mindset and meticulous attention to detail.

• Strong collaboration and communication skills to work effectively with cross-functional teams.

• Comfortable working in a fast-paced environment, handling incidents, and participating in on-call rotations.

• Adaptability to evolving technologies and eagerness to learn new tools and methodologies.

25 days Holiday per year (increasing by 1 day per years' service, up to 30 days) + Bank Holidays
Hybrid working arrangements – This role demands the ability to thrive in a fast-paced setting, frequently multitasking across various tasks and support requests. The role can offer either a hybrid work schedule or fully remote options, but will require the occasional office visit, for team get-togethers or larger product workshops.
Contributory pension scheme
Enhanced Parental leave
Cycle to Work Scheme
Private Medical Insurance with AXA
Unlimited access to therapy sessions through our partner, Oliva

Discounted Gym membership through Gympass
Financial Coaching with Octopus Wealth
2 days of volunteering leave per year
Sabbatical after 5 years’ service
Life Assurance - MetLife (UK employees only)
Ongoing Learning and Development to support you reach your career goals

Vitesse at our best – our values

The Vitesse values are a true reflection of what it takes to thrive in our business, so it’s important to us that any employee who joins our business is aligned with these 3 attributes

Confident Humility

We don’t do ego and we know that unless we all win, none of us win. We admit when we’re wrong, ask for help and always think about the wider business before ourselves.

Driven to Succeed

We see the opportunity ahead of us and we won’t stop until we fulfil the potential we know we have. We hold ourselves to high standards and deliver high quality outcomes for Vitesse and our customers.

Tenacious Responsibility

We take ownership for our actions and decisions, and face into the challenges that come our way. We are committed to seeing things through to completion, even in the face of adversity.

We are an Equal Opportunity Employer We are committed to creating an inclusive environment that enables everyone to perform at their best, where we recognise the rights of all individuals to mutual respect and where there is an unbiased acceptance of others. Our policies and practices aim to promote an environment that is free from all forms of Unfair discrimination and values the diversity of all people. At the heart of our policy, we seek to treat people fairly and with dignity and respect. Please confirm if selected for an interview, what interview adjustments you would need? You can contact Clara Moretti-Greene on clara.moretti-greene@vitesse.io or in her absence contact our People Team PeopleTeam@vitessepsp.com.

This job is no longer accepting applications

See open jobs at Vitesse PSP.See open jobs similar to "Site Reliability Engineer" Octopus Ventures.

See more open positions at Vitesse PSP

Powered by Getro.com

Privacy policy Cookie policy