Job Description

Internship Description:

Sayari is looking for a Data Engineer Intern specializing in web crawling to join its Data Engineering team! Sayari has developed a robust web crawling project that collects hundreds of millions of documents every year from a diverse set of sources around the world. These documents serve as source records for Sayari’s flagship graph product, which is a global network of corporate and trade entities and relationships. As a member of Sayari's data team your primary objective will be to work on maintaining and improving Sayari’s web crawling framework, with an emphasis on scalability and reliability. You will work with our Product and Software Engineering teams to ensure our crawling deployment meets product requirements and integrates efficiently with our ETL pipelines.

This is a remote paid internship with work expectations being between 20-30 hours a week.

Job Responsibilities:

Investigate and implement web crawlers for new sources

Maintain and improve existing crawling infrastructure

Improve metrics and reporting for web crawling

Help improve and maintain ETL processes

Contribute to development and design of Sayari’s data product

Required Skills & Experience:

Experience with Python

Experience managing web crawling at scale, any framework, Scrapy is a plus

Experience working with Kubernetes

Experience working collaboratively with git

Experience working with selectors such as: XPath, CSS, JMESPath

Experience with WebDev tools (Chrome/Firefox)

Desired Skills & Experience:

Experience with Apache projects such as Spark, Avro, Nifi, and Airflow

Experience with datastores Postgres and/or RocksDB

Experience working on a cloud platform like GCP, AWS, or Azure

Working knowledge of API frameworks, primarily REST

Understanding of or interest in knowledge graphs

Experience with *nix environments

Experience with reverse engineering

Proficient in bypassing anti-crawling techniques

Experience with Javascript

Staff Engineer - Data Platforms

IV.AI, a US company headquartered in Los Angeles, is looking for a Lead Engineer to lead engineering on one of its data platforms. The role will be reporting directly to ;

engineer

Sr Data Engineer

About Us Our leading SaaS-based Global Growth Platform™ enables clients to expand into over 180 countries quickly and efficiently, without the complexities of establishin;

engineer
python

Director, Data Engineering

Hi, we’re Underdog! We’re the fastest-growing sports gaming company ever. We build innovative games and products for American sports fans. Founded in 2020, our team built;

engineer
exec
data science

Senior Software Data Engineer - (Remote - Canada)

Jobgether has ALL remote jobs globally. We match you to roles where you're most likely to succeed and provide feedback on every application to help you learn. No more gue;

engineer
python
senior
sql
dev

Data Engineer Intern - Web Crawling

Job Description

USA Only

Data Engineer

18 days ago

Staff Engineer - Data Platforms

Sr Data Engineer

Director, Data Engineering

Senior Software Data Engineer - (Remote - Canada)

Find Remote Jobs

About us