META Smart Factory公司位于土耳其,招聘DevOps & Security Engineer | LinkedIn
Meta Smart Factory是一家工业数字化公司,提供涵盖MES、IIoT、AI和实时生产分析的端到端工业4.0解决方案。我们在全球范围内运营,并与多个行业的领先制造公司合作。
加载中...
This is a principal-level individual contributor role at the heart of our cloud platform’s reliability, scalability, and operational maturity. You will work hands-on across AWS and Azure environments, solving complex production problems while systematically eliminating the manual toil that creates them. The role offers significant autonomy, deep technical impact, and the opportunity to shape how reliability engineering is practiced across the organization.
Our company operates a growing SaaS platform supporting enterprise customers with mission-critical workloads. We run complex, multi-cloud environments and value engineers who take ownership, think in systems, and build solutions that scale. Our culture emphasizes operational excellence, blameless learning, and collaboration across Engineering, Support, Professional Services, and Product teams.
KEY PERFORMANCE OBJECTIVES (First 12 Months)
OBJECTIVE 1: Platform Familiarity Through Escalations & Early Automation (First 90 Days)
Outcome: Within 90 days, resolve escalated infrastructure cases across major AWS and Azure services and deliver 2–3 targeted automations that measurably reduce manual resolution time for recurring issues.
Impact: Accelerates ramp-up, demonstrates immediate value, and establishes the expectation that operational issues are systematically automated rather than repeatedly handled manually.
How: Work directly on escalated cases from Support and Professional Services, document manual resolution steps, identify repeatable patterns, and implement focused Python or PowerShell automations tied to high-frequency workflows.
OBJECTIVE 2: Eliminate Top Sources of Operational Toil (3–6 Months)
Outcome: Within 3–6 months, eliminate or significantly reduce manual intervention for the top 5–7 highest-frequency operational issues through automation, self-service tooling, or infrastructure improvements.
Impact: Reduces support load, improves service stability, and frees Cloud Engineering capacity for higher-value reliability and platform initiatives.
How: Analyze case and incident data, prioritize automation candidates by frequency and impact, build production-grade automations and runbooks, and partner with Support and PS teams to validate adoption and effectiveness.
OBJECTIVE 3: Mature Incident Response & Post-incident Learning (6–9 Months)
Outcome: By month 9, establish a consistent, high-quality incident response and post-incident review process resulting in faster containment, clearer ownership, and tracked corrective actions for all critical production incidents.
Impact: Reduces repeat incidents, improves on-call effectiveness, and increases organizational confidence during high-severity events.
How: Lead critical incidents, standardize incident runbooks, facilitate blameless postmortems, track follow-up actions to completion, and coach teams on effective incident communication and decision-making.
OBJECTIVE 4: Deliver a Mature, SLO-Aligned Observability Platform (9–12 Months)
Outcome: By month 12, deliver a mature observability layer across AWS and Azure with service-level dashboards, tuned alerts, and clear SLI/SLO reporting actively used by on-call and engineering teams.
Impact: Improves detection, diagnosis, and prevention of production issues while reducing alert fatigue and enabling data-driven reliability decisions.
How: Design Grafana dashboards aligned to service health and user journeys, integrate metrics, logs, and traces from core platforms, tune alert thresholds, and embed observability into CI/CD and incident workflows.
WHAT YOU BRING
Deep hands-on experience operating production systems in AWS and Azure environments
Strong automation skills using Python and PowerShell in operational contexts
Proven ability to identify repetitive operational work and eliminate it through automation
Experience leading incident response and blameless post-incident reviews
Strong observability expertise, particularly with Grafana and SLI/SLO-driven monitoring
Ability to influence engineering practices without formal authority
Clear written and verbal communication skills across technical and non-technical audiences
Deep hands-on experience operating production systems in AWS and Azure environments
Strong automation skills using Python and PowerShell in operational contexts
Proven ability to identify repetitive operational work and eliminate it through automation
Experience leading incident response and blameless post-incident reviews
Strong observability expertise, particularly with Grafana and SLI/SLO-driven monitoring
Ability to influence engineering practices without formal authority
Clear written and verbal communication skills across technical and non-technical audiences
Our company operates a growing SaaS platform supporting enterprise customers with mission-critical workloads. We run complex, multi-cloud environments and value engineers who take ownership, think in systems, and build solutions that scale. Our culture emphasizes operational excellence, blameless learning, and collaboration across Engineering, Support, Professional Services, and Product teams.
注册并登录后即可查看
Meta Smart Factory是一家工业数字化公司,提供涵盖MES、IIoT、AI和实时生产分析的端到端工业4.0解决方案。我们在全球范围内运营,并与多个行业的领先制造公司合作。
加入Fountain团队,您将成为领先的企业级前线劳动力管理解决方案的一部分。Fountain的自动化、可定制的平台为工人提供无缝的申请人体验,同时确保组织能够扩展并管理其前线劳动力。
该职位的薪资范围为每月$5,000 - $9,500(美元净额)。Sezzle致力于通过金融科技革新购物体验,结合尖端技术与无缝的无息分期计划,使购物更智能更便捷。
为什么选择Kimchi?Kimchi是CAST AI内部的AI平台。我们最初帮助公司在其自己的Kubernetes集群上运行LLMs,现在我们正在构建agents执行真实工作的执行层。