We are looking for a Senior Data Engineer to work on Scanbuy’s growing data business. The successful candidate will leverage his/her experience and skills to onboard new data sources and service new destinations that enhance our products and service and increase our sales revenue.
This position requires a creative, logical, and self-driven person with high standards for quality and attention to detail, capable of taking responsibility and accountability for actions and time management.
- Builds large-scale batch data pipelines
- Leverages best practices in continuous integration and delivery
- Drives optimization, testing and tooling to improve data quality
- Interprets business requirements into technical requirements and executes
- Enhances the ETL codebase for added efficiency and capacity.
- Develops, recommends and implements process and procedure changes to systematically improve data integrity.
- Analyzes data sets, builds visualizations and develops dashboards to inform internal and external clients.
- Designs, creates and manages processes to prepare both static and streaming data for use in machine learning algorithms
We work as a distributed engineering team using agile methodologies such as Kanban/Scrum.
This position is ideally based in New York, NY or other remote locations.
Skills & Requirements
Must Have Skills:
- Has 3+ years of professional experience in ETL and/or other “Big Data” processes including modelling and data architectures
- Understands Big Data stack including:
- Is familiarity with relational and hierarchical database development
- Has strong written and verbal presentation skills and is capable of communicating technology benefits to business problems
- Possesses knowledge of data warehousing, data preparation, analytics, reporting and dashboarding for data at terabyte scale using platforms such as Kylin
- Has extensive experience in AWS services ecosystem:
- Is proficient with Linux OS, preferably RHEL/Centos/AWS Linux or Ubuntu.
- Demonstrates ability to write advanced SQL queries for MySQL and/or PostgreSQL.
- Is capable of scripting in various languages: Bash, Python or R, Perl or Ruby
- Is familiar with source control platforms (github).
Nice to Have Skills:
- Demonstrates strong understanding and appreciation for the features, benefits and limitations of machine learning algorithms:
- K-Nearest Neighbor
- Hierarchical Multi-Label Classifiers
- Dimensionality Reduction
- High Correlation Filter
- Principal Component Analysis
- Possesses Domain expertise in mobile, consumer packaged goods or demographic data sets
- NoSQL experience: HBase, MongoDB.
- Apache Storm, Kafka, Cassandra.
Thank you for visiting NYCJOBS.CO