Senior Data Engineer at Scanbuy (New York, NY)

We are looking for a Senior Data Engineer to work on Scanbuy’s growing data business. The successful candidate will leverage his/her experience and skills to onboard new data sources and service new destinations that enhance our products and service and increase our sales revenue.

This position requires a creative, logical, and self-driven person with high standards for quality and attention to detail, capable of taking responsibility and accountability for actions and time management.

  • Builds large-scale batch data pipelines
  • Leverages best practices in continuous integration and delivery
  • Drives optimization, testing and tooling to improve data quality
  • Interprets business requirements into technical requirements and executes
  • Enhances the ETL codebase for added efficiency and capacity.
  • Develops, recommends and implements process and procedure changes to systematically improve data integrity.
  • Analyzes data sets, builds visualizations and develops dashboards to inform internal and external clients.
  • Designs, creates and manages processes to prepare both static and streaming data for use in machine learning algorithms

We work as a distributed engineering team using agile methodologies such as Kanban/Scrum.

This position is ideally based in New York, NY or other remote locations.

Skills & Requirements

Must Have Skills:

  • Has 3+ years of professional experience in ETL and/or other “Big Data” processes including modelling and data architectures
  • Understands Big Data stack including:

    • MapReduce
    • Hadoop
    • Spark
    • Hive

  • Is familiarity with relational and hierarchical database development
  • Has strong written and verbal presentation skills and is capable of communicating technology benefits to business problems
  • Possesses knowledge of data warehousing, data preparation, analytics, reporting and dashboarding for data at terabyte scale using platforms such as Kylin
  • Has extensive experience in AWS services ecosystem:

    • S3
    • EC2
    • RDS
    • EMR
    • Redshift
    • Quicksight

  • Is proficient with Linux OS, preferably RHEL/Centos/AWS Linux or Ubuntu.
  • Demonstrates ability to write advanced SQL queries for MySQL and/or PostgreSQL.
  • Is capable of scripting in various languages: Bash, Python or R, Perl or Ruby
  • Is familiar with source control platforms (github).

Nice to Have Skills:

  • Demonstrates strong understanding and appreciation for the features, benefits and limitations of machine learning algorithms:

    • K-Nearest Neighbor
    • Hierarchical Multi-Label Classifiers
    • Dimensionality Reduction

      • High Correlation Filter
      • Principal Component Analysis

    • Possesses Domain expertise in mobile, consumer packaged goods or demographic data sets
    • NoSQL experience: HBase, MongoDB.
    • Apache Storm, Kafka, Cassandra.

Thank you for visiting NYCJOBS.CO

Company: Scanbuy