Job Description

Internship Description:

Sayari is looking for an intern to join its Data Engineering team! Sayari’s flagship product, Sayari Graph, provides instant access to structured business information from billions of corporate, legal, and trade records. As a member of Sayari's data team you will work with our Product and Software Engineering teams to collect data from around the globe, maintain existing ETL pipelines, and develop new pipelines that power Sayari Graph.

Our application tier is built primarily in TypeScript, running in Kubernetes, and backed by Postgres, Cassandra, Elasticsearch, and Memgraph. Our data ingest tier runs on Spark, processing terabytes of data collected from hundreds of data sources. The platform allows users to explore a large knowledge graph sourced from hundreds of millions of structured and unstructured records from over 200 countries and 30 languages. As part of this team, you'll have the chance to contribute to our growing library of open-source work, including our WebGL-powered network visualization library Trellis.

This is a remote paid internship with work expectations being between 20-30 hours a week.

Job Responsibilities:

Write and deploy crawling scripts to collect source data from the web

Write and run data transformers in Scala Spark to standardize bulk data sets

Write and run modules in Python to parse entity references and relationships from source data

Diagnose and fix bugs reported by internal and external users

Analyze and report on internal datasets to answer questions and inform feature workWork collaboratively on and across a team of engineers using basic agile principles

Give and receive feedback through code reviews

Required Skills & Experience:

Experience with Python and/or a JVM language (e.g., Scala)

Experience working collaboratively with git

Desired Skills & Experience:

Experience with Apache Spark and Apache Airflow

Experience working on a cloud platform like GCP, AWS, or Azure

Understanding of or interest in knowledge graphs

Software Engineer, Data Hub

Engineering at StackAdapt: As an Engineer at StackAdapt, you will be directly involved in the development of our advertising platform, producing production level cod;

engineer
dev

Lead Data Engineer - R01547739

Lead Data EngineerPrimary SkillsMapReduce, HDFS, Spark - Pyspark, ETL Fundamentals, SQL, SQL (Basic + Advanced), Spark - Scala, Python, Data Warehousing, Hive, Modern Dat;

engineer
exec

Senior Analytics Engineer

About Stellar Health: Historically, US Healthcare has relied on a fee-for-service reimbursement system where providers are paid based on the quantity of patient visits an;

senior
engineer

Data Engineer

Job Highlights· Location: Remote, must be based in the United States· Salary Range: $103,500-$143,500 per year, plus benefits. Individual salary offers will be based on e;

engineer

Data Engineering Intern

Job Description

USA Only

Data Engineer

24 days ago

Software Engineer, Data Hub

Lead Data Engineer - R01547739

Senior Analytics Engineer

Data Engineer

Find Remote Jobs

About us