Data Engineering AI Startup Jobs
Explore startups tagged with Data Engineering and compare hiring activity, company profiles, and direct job links. This page is indexable only when a tag reaches at least 5 companies to avoid thin content.

Databricks
238 jobsUnified analytics platform for data and AI, helping companies process and analyze big data in the cloud.

Scale AI
80 jobsData infrastructure company providing high-quality training data for AI applications, recently partnered with Meta.

Together AI
37 jobsCloud platform for running and fine-tuning open-source AI models at scale.

Hightouch
34 jobsData activation platform (Reverse ETL, Customer Studio) to sync warehouse data to business tools.

Cribl
29 jobsData pipeline platform that gives you control over your observability data.

Rain
21 jobsNeuromorphic AI chip company developing brain-inspired processors for edge AI applications.

Astronomer
14 jobsCompany behind Astro, a managed Apache Airflow DataOps platform for data & AI pipelines.

Snorkel AI
14 jobsData-centric AI platform for programmatically labeling and managing training data.

Eon.io
13 jobsNext-gen cloud backup and data protection with rapid restore and ransomware resilience.

PhaseV
5 jobsML-driven adaptive trials and clinical development optimization.

Datology AI
4 jobsAI training data curation platform helping enterprises optimize ML training data at petabyte scale.

Polars
4 jobsPolars is a blazingly fast DataFrames library written in Rust, offering Python, R, Node.js, and SQL bindings for efficient, multi-threaded data manipulation at scale.

OneSchema
3 jobsAI-driven CSV and PDF data import automation platform for seamless customer onboarding.

PostHog
2 jobsOpen-source product analytics platform with session replay, feature flags, and A/B testing.

Apheris
Apheris provides governed, privacy-preserving data access and collaboration for AI and analytics across sensitive datasets.

Biostate AI
A scalable biological data collection service providing multi-omics data for research.

Collate
Collate is a data intelligence platform built on the open-source OpenMetadata project, automating data discovery and governance for teams:contentReference[oaicite:5]{index=5}.

Credal.ai
Credal provides a secure AI agent platform for enterprises, enabling teams to build AI agents and MCP-connected workflows across internal data sources with governance controls.

Deepnote
Collaborative cloud data notebook platform for data science and analytics teams.

Distyl AI
Data intelligence platform that unifies messy operational data, applies AI agents, and routes insights back into business workflows.

DualBird
DualBird provides a cloud-native hardware-software data and AI infrastructure engine that delivers 10-100x faster performance and 50-90% lower costs through FPGA-based acceleration.

Encharge AI
Developing analog in-memory compute chips and software for energy-efficient AI at the edge.

Firecrawl
Firecrawl is a web data infrastructure platform that converts websites into clean, structured data optimized for AI applications through a simple API, turning entire websites into LLM-ready markdown or structured data.

Fundamental
Fundamental builds large tabular models and enterprise AI infrastructure for prediction and analysis on complex business data, focused on tabular reasoning and decision support.

Gruve
Gruve delivers AI-native infrastructure, inference systems, and enterprise AI agents for inference-heavy workloads with an emphasis on speed, security, and measurable outcomes.

Hex
Modern collaborative analytics workspace combining notebooks, SQL, and apps for data teams.

Junction
Junction (formerly Vital) modernizes healthcare infrastructure with seamless lab testing and device data integration, connecting over 500 wearables and medical devices with 10+ lab networks including Labcorp and Quest across all 50 states.

PhysicsX
PhysicsX is deploying AI to transform how physical systems are engineered, embedding intelligence across the entire product lifecycle:contentReference[oaicite:23]{index=23}.

Profluent Bio
Uses generative AI to design novel proteins and gene editors for therapeutics.

Protege
Protege operates a governed marketplace platform for ethical sourcing of multimodal, real-world AI training data with compliant data exchange capabilities.

Pulse
API-first Document AI that converts PDFs, images, slides, and spreadsheets into structured JSON for RAG, analytics, and automation.

Pytho AI
Provides a unified interface to design AI workflows by connecting data, models, and automations.

Relace
Relace is a provider of auxiliary coding models for faster, more reliable AI code generation that makes it easy to deploy production-ready coding agents with models co-optimized with infrastructure to achieve state-of-the-art performance across million-line repositories.

Roboflow
Provides end-to-end computer vision platform for managing data, labeling, training, and deploying models.

Shovels
Shovels builds construction intelligence software that turns fragmented building permit data into actionable market and go-to-market signals through APIs and analytics tools.

Spiral
Spiral is a data infrastructure company that provides a multimodal data platform for AI, unifying governance and exposing a single API for every data modality including video, audio, geospatial, and text, engineered for machine-scale throughput to keep GPUs fully saturated.

Structify
AI-powered data platform that transforms unstructured web data and documents (websites, PDFs, pitch decks, reports) into structured, enterprise-ready datasets using their proprietary DoRa model that navigates and extracts data like a human, enabling real-time web extraction for business intelligence and data workflows.

TinyFish
TinyFish provides enterprise web agents that automate complex web-based workflows and extract structured data from websites at scale. The platform enables Fortune 500 companies like Google and DoorDash to automate web interactions, streamline data collection, and integrate web automation into their business processes.

Tonic AI
Generates realistic synthetic data to power software testing and analytics without exposing sensitive production data.
FAQ
What is the Data Engineering tag page on Fast AI Startup Jobs?
It is a curated landing page that groups AI startup companies tagged with Data Engineering, plus links to their company profiles and available jobs.
How many Data Engineering companies are included?
This page currently lists 39 companies tagged with Data Engineering.
How many jobs are associated with Data Engineering companies?
The companies on this page currently account for 498 listed jobs in our public dataset (subject to regular updates).
What roles are most common at Data Engineering companies?
Based on currently listed jobs for Data Engineering companies, the most common role groups are Engineering (1254), Other (359), Sales (358).
What funding stages are most common among Data Engineering companies?
Common funding stages on this Data Engineering page include Series A (17), Series B (7), Seed (5), Series D (3).
Where do the job links go?
Job links point to official company career pages or public job listings, not re-hosted application forms.
How often is this tag page refreshed?
Data is refreshed on a near-daily cadence as public company and job listings change.