All Jobs
Data Engineer - Remote
IT ⋅ Full-time ⋅ Permanent
About Us
Halo Labs is a future focused, end-to-end data solutions firm - transforming tomorrow, today. Our intelligent and secure technology systems and data driven solutions drive meaningful outcomes and unlock lasting value.
Why work at Halo Labs?
Leading Innovation: We don’t just solve problems; we illuminate a never-ending stream of innovation.
Exceptional Perks: Enjoy outstanding perks like a dedicated learning budget, performance bonuses, and comprehensive wellbeing support.
Remote-First Organisation: Experience the perks of remote work while having the flexibility to travel to client locations throughout Australia.
Inclusive and Engaging: Celebrate diversity in a welcoming space that thrives on new ideas and open conversations, all in a respectful environment.
Inspiring Origins: With a compelling founder story, we are a customer-focused, culture-first organisation.
About the role
• Design, develop, and maintain reusable in-house PySpark frameworks to enforce standardised data engineering patterns across the SaaS platform
• Architect and implement scalable, production-grade ETL/ELT pipelines across Azure and AWS environments
• Build distributed data processing solutions using Python and PySpark on Databricks
• Develop batch and near real-time ingestion pipelines integrating third-party clinical systems, healthcare APIs, and external enterprise platforms
• Design secure data integration patterns (REST APIs, SFTP, event-driven ingestion, webhooks) ensuring compliance and data integrity
• Work closely with Software Engineers to embed data services directly into the SaaS product architecture
• Contribute to backend system design discussions to ensure data layer scalability, observability, and performance
• Implement CI/CD pipelines using Azure DevOps, Git, and Azure Pipelines for automated deployment and testing of data workloads
• Apply infrastructure-as-code and environment management best practices across Azure and AWS
• Optimise Spark jobs, cluster configurations, and storage strategies for performance and cost efficiency
• Design and maintain robust data models, including dimensional models and SaaS-oriented data schemas
• Implement data validation, monitoring, and alerting to ensure pipeline reliability and production stability
• Provide technical mentorship and enforce engineering standards across the analytics and data engineering team
About you
• Strong hands-on experience with Azure services including Databricks, Azure Data Factory, Azure SQL, Azure Storage, and Azure DevOps
• Practical experience with AWS services relevant to modern data platforms (S3, Lambda, RDS, Glue, IAM, etc.)
• Advanced proficiency in Python, SQL, and PySpark for large-scale distributed data processing
• Deep experience configuring and managing Databricks clusters for scalable big data workloads
• Experience building production-ready data pipelines in a SaaS or product-led engineering environment
• Strong understanding of cloud-native data architecture, including data lakes, lakehouse architecture, and modular pipeline design
• Experience integrating with third-party systems via APIs and secure data exchange mechanisms
• Exposure to healthcare or regulated data environments, including handling sensitive data securely
• Strong knowledge of data modelling, metadata management, and data governance principles
• Experience implementing automated testing frameworks for data pipelines
• Solid understanding of DevOps practices including Git workflows, branching strategies, and CI/CD automation
• Degree in Computer Science, Engineering, Data Science, or related technical field
Apply for this job