Job Description:
We are seeking a highly skilled and motivated ML Ops Engineer to join our team. As an MLOps Engineer, you will design, build, and maintain scalable and efficient machine learning (ML) and data pipelines. Your expertise in large language models (LLMs), distributed computing, and data engineering will ensure seamless deployment, monitoring, and scaling of our ML models and data infrastructure.
Key Responsibilities:
- Design and Implementation of ML Pipelines: Develop and manage end-to-end ML pipelines to train, validate, and deploy large language models (LLMs) and other ML models.
- Distributed Computing: Utilize Ray or similar frameworks for distributed computing to scale ML workloads efficiently.
- Data Pipeline Development: Architect, build, and maintain data pipelines using Apache Kafka, dbt, and other tools to ensure real-time and batch data processing capabilities.
- Infrastructure Management: Deploy and manage scalable infrastructure on Azure, including Kubernetes clusters, to support ML and data workflows.
- Workflow Orchestration: Implement and manage workflow orchestration tools such as Apache Airflow or Dagster to automate and monitor complex data and ML pipelines.
- Collaboration and Integration: Work closely with Data Scientists, Data Engineers, and DevOps teams to integrate ML models into production systems.
- Monitoring and Optimization: Set up monitoring, logging, and alerting systems to ensure the reliability, performance, and cost-effectiveness of ML and data pipelines.
- CI/CD for ML: Develop and maintain continuous integration and deployment (CI/CD) pipelines for ML models and data pipelines, ensuring rapid and reliable deployment.
- Security and Compliance: Ensure all ML and data workflows adhere to security best practices and relevant data protection regulations.
Required Skills and Experience:
Education: Bachelor’s or Master’s in Computer Science, Data Science, Machine Learning, or a related field.
Technical Expertise:
- Strong experience with large language models (LLMs) and related ML frameworks.
- Proficiency in Ray or similar distributed computing frameworks.
- Hands-on experience with Azure cloud services, including AKS (Azure Kubernetes Service) and other related services.
- In-depth knowledge of Kubernetes for container orchestration.
- Experience with Apache Kafka for real-time data streaming.
- Proficiency in workflow orchestration tools like Apache Airflow or Dagster.
- Strong understanding of dbt for data transformation and analytics engineering.
- Familiarity with CI/CD practices in ML environments.
Programming Skills:
- Proficient in Python and familiarity with ML frameworks such as TensorFlow, PyTorch, or similar.
- Experience with containerization technologies (Docker).
- Familiarity with infrastructure-as-code (IaC) tools like Terraform.
Problem Solving and Collaboration:
- Strong analytical and problem-solving skills.
- Ability to work collaboratively in a cross-functional team environment.
- Excellent communication skills with the ability to convey complex technical concepts to non-technical stakeholders.
Preferred Qualifications:
- Experience with AI/ML Operations: Previous experience deploying and managing ML models in production environments.
- Certifications: Azure, Kubernetes, or other relevant certifications.
- Knowledge of Data Engineering: Experience with data lake architectures and ETL processes.
- Experience with Monitoring Tools: Familiarity with Prometheus, Grafana, or similar monitoring tools.
What We Offer:
- Salary range: Competitive salary
- Yearly 2 bonuses and 1 performance bonus.
- A dynamic and collaborative work environment.
- Opportunities for professional growth and development.
- The chance to work on cutting-edge technology and make a real impact.
How to Apply:
Interested candidates are invited to submit their resume and a cover letter detailing their experience and qualifications to hr@markopolo.ai. Please include "ML Ops Engineer Application" in the subject line.
Join us and be a part of the AI revolution in digital marketing!