AI DevOps / SRE Engineer
EPAM Systems
29.08.2024
Türkiye
Tam Zamanlı
Hibrit
Deneyim: 2-4 Yıl
Job Description
As an AI DevOps/SRE Engineer, you will be pivotal in deploying, maintaining, and scaling our AI solutions, including LLMs and RAG systems. You will work closely with data scientists and software developers to ensure seamless integration and operational efficiency of our AI deployments. Your role will involve both classic DevOps tasks and innovative approaches to MLOps, ensuring high availability and optimal performance of our systems.
Responsibilities
- Implement and maintain CI/CD pipelines for AI and machine learning projects, ensuring robust deployment strategies and continuous integration
- Monitor and ensure the reliability, availability, and performance of AI applications, particularly those involving LLMs and RAG
- Collaborate with AI research teams to operationalize machine learning models and systems efficiently
- Develop and enforce best practices for version control, configuration management, and testing of AI-driven software solutions
- Utilize MLOps tools such as Kubeflow, MLflow, or TensorFlow Extended (TFX) to streamline the machine learning lifecycle from experimentation to production
- Implement monitoring solutions that track both system metrics and model performance to facilitate proactive issue resolution
- Participate in on-call rotations to support the operational health of critical systems, employing SRE principles to meet service-level objectives (SLOs) and reduce downtime
Requirements
- Bachelor’s degree in Computer Science, Engineering, or a related field
- Proven experience as a DevOps Engineer or SRE, with a strong background in software development and automation
- Experience with deployment and management of LLMs, including technologies like RAG
- Proficient in CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI) and infrastructure as code (e.g., Terraform, Ansible)
- Knowledge of container orchestration technologies (e.g., Kubernetes, Docker)
- Familiarity with MLOps tools and practices to support machine learning lifecycle management
- Strong problem-solving skills and ability to work in a dynamic, fast-paced environment
Nice to have
- Experience with cloud services (AWS, GCP, Azure) particularly in AI/ML deployments
- Background in monitoring tools like Prometheus, Grafana, and ELK stack
- Knowledge of Python, particularly in data science and machine learning contexts
- Certification in Kubernetes, AWS/GCP/Azure, or similar technologies
We offer
CONTINUOUS UPSKILLING, LEARNING & DEVELOPMENT:
- Diversity of tasks and projects
- Assessment center for objective review of competency level
- Personal development plan
- Mentoring programs and leadership development
- Certification and professional development support
- Access to learning platforms including more than 2,500 internal courses and the LinkedIn Learning library with 20,000+ courses
- English courses taught by certified teachers
CORPORATE BENEFITS:
- Extra leave days
- Referral bonuses
COMPENSATION PACKAGE:
- Competitive compensation paid in USD
- Regular salary and performance reviews
MEDICAL & HEALTHCARE:
- Private health insurance
- Well-being events
WORKING ENVIRONMENT:
- Recreation areas and kitchens
- Tea, coffee, and snacks
- Well-being events
- Sports equipment and game consoles
- IT Equipment
- Microsoft's Software Assurance Home Use Program (HUP)
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
About Company
About EPAM Systems
About US
We can help you reimagine your business through a digital lens. Our software engineering heritage combined with our strategic business and innovation consulting, design thinking, and physical-digital capabilities provide real business value to our customers through human-centric innovation.