Role Overview:
We are seeking an experienced Engineering Team Lead responsible for automating and optimizing our cloud infrastructure, CI/CD pipelines, and monitoring processes. You’ll architect, scale, and secure the infrastructure behind our AI-powered SaaS platform. You’ll be the go-to expert on everything that keeps the system performant, resilient, and deployable at speed.
Key Responsibilities:
- Architect and maintain secure, scalable, and cost-optimized cloud infrastructure across AWS or GCP environments.
- Design, implement, and manage robust CI/CD pipelines to streamline testing, deployment, and production releases.
- Collaborate cross-functionally with Backend, ML, and Product teams to integrate Infrastructure as Code (IaC) principles into the development lifecycle.
- Optimize system performance for reliability, scalability, and cost-efficiency across distributed cloud systems.
- Deploy and manage containerization and orchestration tools (e.g., Docker, Kubernetes) to support microservice architecture.
- Automate infrastructure provisioning, configuration management, and real-time system monitoring using industry-standard tools.
- Monitor and troubleshoot system reliability, performance issues, and security incidents; perform deep root cause analysis when needed.
- Lead disaster recovery planning, backup automation, incident response processes, and infrastructure-level cost optimization strategies.
- Ensure infrastructure compliance with internal security protocols and external data privacy regulations (e.g., GDPR, SOC 2).
- Drive DevOps excellence, including observability, logging, and continuous monitoring, to support a high-availability product.
- Mentor junior engineers and actively shape a high-performing, collaborative, and growth-oriented engineering culture.
Required Skills and Experience:
Education: Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
Experience: 5–6 years in engineering roles with a strong DevOps & Cloud backbone
Problem-solving: Ability to troubleshoot complex systems and networks.
Collaboration: Strong communication skills and a collaborative mindset to work with cross-functional teams.
Technical Skills:
- Expertise in cloud platforms such as AWS, Google Cloud, or Azure.
- Strong proficiency with CI/CD tools (Jenkins, GitLab CI, CircleCI, etc.).
- Hands-on experience with containerization (Docker) and orchestration (Kubernetes).
- In-depth knowledge of infrastructure as code (IaC) tools like Terraform, Ansible, or CloudFormation.
- Proficiency in scripting languages (Python, Bash, etc.) for automation tasks.
- Experience with monitoring and logging tools (Prometheus, Grafana, ELK Stack, etc.).
- Hands-on experience with data processing and message streaming tools like Apache Spark and Kafka.
- Experience deploying ML models with Kubernetes or cloud-native tools like Sagemaker or Azure ML is a plus.
- Experience with Ray or Kuberay is a plus.
- Understanding of security best practices and regulatory requirements.
Preferred Qualifications
- Certifications in cloud platforms (AWS Certified DevOps Engineer, Google Cloud DevOps Engineer, etc.).
- Experience with large-scale, highly available, and distributed systems.
- Exposure to AI/ML infrastructure or MLOps
- Hands-on with observability tools (Datadog, Prometheus, Grafana)
- Familiarity with zero-downtime deployments and security best practices
What We Offer:
- Salary range: Competitive salary
- Yearly 2 bonuses and 1 performance bonus.
- A dynamic and collaborative work environment.
- Opportunities for professional growth and development.
- The chance to work on cutting-edge technology and make a real impact.