Senior DevOps Engineer
Lightricks
Lightricks, an AI-first company, is revolutionizing how visual content is created. With a mission to bridge the gap between imagination and creation, Lightricks is dedicated to bringing cutting-edge technology to the creative and business spaces.
Our AI photo and video generation models, which power our apps and platforms including Facetune, Photoleap, Videoleap, and LTX Studio, allow creators and brands to leverage the latest research breakthroughs, offering endless control over their creative potential. Our influencer marketing platform, Popular Pays, provides creators the ability to monetize their work and offers brands opportunities to scale their content through tailored creator partnerships.
Our ML Platform team is responsible for building Lightricks’ ML serving platform, which hosts all of the company’s ML models. Our goal is to create streamlined processes to bring functional research code and AI models to production grade systems, where millions of our apps users enjoy this magic.
What you will be doing
Take a key part in designing, building and maintaining our ML serving platform and making sure it is scalable and reliable. Our goal is to provide researchers and other developers in the company an easy way to deploy, monitor and maintain their services. Design, build and maintain cloud infrastructure to provide a reliable and scalable platform for other teams to build on. Ensure high availability and performance of our production environment. Implement and manage CI/CD pipelines to ensure seamless testing, deployment, and monitoring of services running on our platform. Develop and maintain monitoring and alerting systems to ensure the early detection of issues, enabling proactive problem resolution. Practice sustainable incident response and blameless postmortems. Learn and apply industry best practices and share this knowledge with other teams through guidance, lectures and workshops. Develop tools and self-serve solutions to simplify the development and deployment processes for other teams at the company. Continuously evolve and learn new technologies that can improve our team’s workflow, accelerate the development process and make it more reliable.
Your skills and experience
- 5+ years of proven experience building and maintaining scalable and highly available SAAS systems over GCP/AWS/Azure
- 3+ years of practical experience with Kubernetes (K8S), Helm/Kustomize, and Infrastructure as Code tools such as Terraform, Pulumi or CrossPlane
- Experience in building and managing microservice systems in a containerised environment.
- Experience with GitOps methodology and argoCD.
- Experience with CI/CD solutions (Github Actions, Circle CI, Jenkins etc.)
- Hands-on experience with Prometheus, Grafana, or other comparable monitoring tools
Nice to haves
- Experience with Developer Experience (DevX) Tools and Practices: Proven track record in enhancing developer productivity through the implementation and management of DevX tools and practices, such as CI/CD pipelines, development environments, and automation frameworks, is highly desirable.
- Experience working with GPUs: Hands-on experience in configuring, managing, and optimising GPU resources for computational tasks, including familiarity with CUDA, TensorFlow, or similar frameworks, is a strong advantage.
- Deep understanding of networking, protocols and network-security concepts.
- Good familiarity with UNIX-like operating-systems and experience writing shell scripts.
- B.Sc. in computer science or similar quantitative field.
#LICK1
#LI-HYBRID