Senior Systems Engineer
Spreedly
Software Engineering
Montreal, QC, Canada
About Us:
Spreedly is the world's leading Open Payments Platform. Founded in 2007 and headquartered in Durham, NC, the company gives mid-market and enterprise businesses the infrastructure to connect to any payment gateway, processor, or fraud tool through a single API. The platform is vendor-agnostic by design, meaning customers are never locked into a single provider and never need to rebuild their payments stack to access new capabilities.Product Offering:
Spreedly provides an open payments platform. The platform’s connectivity provides payments performance. Key products and services include:
Connect — A unified API that integrates with hundreds of payment gateways, processors, and alternative payment methods worldwide, including digital wallets. Merchants access the global payments ecosystem through one connection.
Vault — A PCI-compliant secure repository for payment methods. Merchants store card data once and reuse it across any payment service, reducing PCI scope and protecting cardholder data at scale.
Optimize — Workflow-driven routing and retry logic that directs each transaction to the best-performing gateway in real time. On average, 7.9% of failed transactions succeed immediately when retried on a secondary gateway. This is where merchants recover lost revenue and increase authorization success rates.
Protect — A flexible fraud and authentication layer, incorporating advanced fraud tools and 3DS. Following Spreedly's acquisition of Dodgeball in September 2025, fraud orchestration and payment optimization now operate within the same platform.
Resolve — Centralized management and reporting that reduces operational silos, strengthens security, and improves billing control across a merchant's entire payment operation.
What It's Like to Work Here:
We describe our team as "Spreedlings": diverse, forward-thinking, and driven by a shared belief that a more open payments ecosystem benefits everyone. The company operates with a culture built on transparency, courageous collaboration, and self-driven leadership. The team values simplicity in both product and process, and approaches problem-solving with genuine curiosity.Responsibilities:
- Infrastructure Operations & Reliability:
- Operate, scale, and modernize AWS-based infrastructure supporting highly available, uptime-driven production systems.
- Design for fault tolerance, graceful degradation, and automated recovery across EC2- and ECS-based workloads.
- Support the organization’s roadmap toward multi-region, globally distributed infrastructure.
- Infrastructure as Code & Automation:
- Build, maintain, and improve infrastructure using Terraform, Ansible, and related tooling to ensure repeatability, auditability, and resilience.
- Support and evolve CI/CD pipelines (GitHub Actions, AWS tooling) with a focus on reliability, speed, and developer autonomy.
- Reduce operational brittleness by creating reusable, well-documented infrastructure patterns.
- Observability & Incident Response:
- Implement and maintain observability using Datadog, CloudWatch, OpenTelemetry, and related tools.
- Define and monitor SLOs, improve alert quality, and reduce MTTD/MTTR through actionable dashboards and runbooks.
- Participate in and help mature a 24/7 on-call rotation; confidently troubleshoot and resolve incidents under pressure.
- Security & Compliance:
- Serve as an infrastructure security subject-matter expert, helping bridge the Infrastructure Engineering and Security teams.
- Implement and operate security controls such as IAM policies, WAFs, DDoS protections, secrets management, and deployment safeguards.
- Support regulated environments and compliance efforts (PCI, SOC 2, or similar).
- Collaboration, Mentorship & Delivery:
- Proactively communicate status, risks, and tradeoffs in a distributed, async-first environment.
- Mentor engineers and contribute to shared learning across experience levels.
- Own small-to-medium scoped projects end-to-end: breaking down work, driving execution, and following through to completion.
Requirements:
- 5+ years of experience working with cloud infrastructure or systems engineering in a production environment.
- Deep hands-on experience operating and scaling production systems in AWS (ECS, EC2, ALB/ELB, ASG, IAM, VPC, Secrets Manager).
- Strong infrastructure-as-code experience with Terraform and configuration management tools such as Ansible.
- Experience supporting highly available, uptime-sensitive systems with on-call responsibility.
- Observability expertise using tools such as Datadog, CloudWatch, and OpenTelemetry.
- Linux systems experience (Debian- or RHEL-based distributions).
- Exposure to or experience with multi-region cloud environments to support global availability
- Experience with DevOps and/or GitOps and an understanding of CI/CD methods
- Experience with containers and container orchestration (Nomad, Docker, etc.)
- Infrastructure security experience (e.g., WAFs, DDoS mitigation, access controls).
- Experience in regulated environments (PCI, SOC 2, HIPAA, or similar).
- Proven ability to run projects end-to-end and deliver repeatable, maintainable solutions.
Additional Skills We Value:
- Familiarity with Edge CDN-type services
We Offer Our Canada-Based Employees:
- Competitive salary + Equity
- Group Life Insurance and Disability Coverage
- Medical, Vision, and Dental coverage
- Pension contribution
- Open Paid Time Off policy
- Monthly home working/digital lifestyle stipend, new MacBook, and one-time accessory reimbursement
- $1,ooo professional development stipend
- Access to company-paid professional coaching service
- Visits to HQ in Durham, North Carolina for remote employees