Tag Landing Page

Data Engineering AI Startup Jobs

Explore startups tagged with Data Engineering and compare hiring activity, company profiles, and direct job links. This page is indexable only when a tag reaches at least 5 companies to avoid thin content.

Companies
39
Jobs (listed)
498
Last updated
Mar 3, 2026, 12:00 AM
Databricks logo

Databricks

238 jobs

Unified analytics platform for data and AI, helping companies process and analyze big data in the cloud.

AI InfrastructureSeries G+AIAnalyticsCloud Computing
Scale AI logo

Scale AI

80 jobs

Data infrastructure company providing high-quality training data for AI applications, recently partnered with Meta.

AI InfrastructureGrowthAIComputer VisionInfrastructure
Together AI logo

Together AI

37 jobs

Cloud platform for running and fine-tuning open-source AI models at scale.

AI InfrastructureSeries BAICloud ComputingGenerative AI
Hightouch logo

Hightouch

34 jobs

Data activation platform (Reverse ETL, Customer Studio) to sync warehouse data to business tools.

Data / Marketing TechSeries CAnalyticsMarketing TechB2B SaaS
Cribl logo

Cribl

29 jobs

Data pipeline platform that gives you control over your observability data.

AI InfrastructureSeries EAIInfrastructureCloud Computing
Rain logo

Rain

21 jobs

Neuromorphic AI chip company developing brain-inspired processors for edge AI applications.

AI InfrastructureSeries AAICloud ComputingInfrastructure
Astronomer logo

Astronomer

14 jobs

Company behind Astro, a managed Apache Airflow DataOps platform for data & AI pipelines.

Data Engineering / OrchestrationSeries DAIDeveloper ToolsCloud Computing
Snorkel AI logo

Snorkel AI

14 jobs

Data-centric AI platform for programmatically labeling and managing training data.

AI InfrastructureSeries DAIData PlatformInfrastructure
Eon.io logo

Eon.io

13 jobs

Next-gen cloud backup and data protection with rapid restore and ransomware resilience.

Cloud Infrastructure / Data ProtectionSeries DCloud ComputingCybersecurityInfrastructure
PhaseV logo

PhaseV

5 jobs

ML-driven adaptive trials and clinical development optimization.

Health AISeries AAIAnalyticsHealthcare
Datology AI logo

Datology AI

4 jobs

AI training data curation platform helping enterprises optimize ML training data at petabyte scale.

AI InfrastructureSeries AAIMachine LearningAutomation
Polars logo

Polars

4 jobs

Polars is a blazingly fast DataFrames library written in Rust, offering Python, R, Node.js, and SQL bindings for efficient, multi-threaded data manipulation at scale.

Data PlatformSeries AAIAnalyticsData Platform
OneSchema logo

OneSchema

3 jobs

AI-driven CSV and PDF data import automation platform for seamless customer onboarding.

Developer ToolsSeedAIAutomationDeveloper Tools
PostHog logo

PostHog

2 jobs

Open-source product analytics platform with session replay, feature flags, and A/B testing.

Developer ToolsSeries EAIAnalyticsOpen Source
Apheris logo

Apheris

Apheris provides governed, privacy-preserving data access and collaboration for AI and analytics across sensitive datasets.

CybersecuritySeries AAIData PlatformCompliance
Biostate AI logo

Biostate AI

A scalable biological data collection service providing multi-omics data for research.

BiotechSeries AAIBiotechMedical
Collate logo

Collate

Collate is a data intelligence platform built on the open-source OpenMetadata project, automating data discovery and governance for teams:contentReference[oaicite:5]{index=5}.

Developer ToolsSeries AAIOpen SourceSaaS
Credal.ai logo

Credal.ai

Credal provides a secure AI agent platform for enterprises, enabling teams to build AI agents and MCP-connected workflows across internal data sources with governance controls.

Enterprise AI InfrastructureSeedAIAgentsEnterprise Software
Deepnote logo

Deepnote

Collaborative cloud data notebook platform for data science and analytics teams.

Developer ToolsSeries AAIAnalyticsData Platform
Distyl AI logo

Distyl AI

Data intelligence platform that unifies messy operational data, applies AI agents, and routes insights back into business workflows.

Developer ToolsSeries BAIAgentsEnterprise Software
DualBird logo

DualBird

DualBird provides a cloud-native hardware-software data and AI infrastructure engine that delivers 10-100x faster performance and 50-90% lower costs through FPGA-based acceleration.

AI InfrastructureSeries AAICloud ComputingData Platform
Encharge AI logo

Encharge AI

Developing analog in-memory compute chips and software for energy-efficient AI at the edge.

AI InfrastructureSeries BAIInfrastructureIoT
Firecrawl logo

Firecrawl

Firecrawl is a web data infrastructure platform that converts websites into clean, structured data optimized for AI applications through a simple API, turning entire websites into LLM-ready markdown or structured data.

Developer ToolsSeries AAIAPIDeveloper Tools
Fundamental logo

Fundamental

Fundamental builds large tabular models and enterprise AI infrastructure for prediction and analysis on complex business data, focused on tabular reasoning and decision support.

Enterprise AISeries AAIMachine LearningAnalytics
Gruve logo

Gruve

Gruve delivers AI-native infrastructure, inference systems, and enterprise AI agents for inference-heavy workloads with an emphasis on speed, security, and measurable outcomes.

InfrastructureSeries AAIInfrastructureCloud Computing
Hex logo

Hex

Modern collaborative analytics workspace combining notebooks, SQL, and apps for data teams.

AnalyticsSeries CAIAnalyticsDeveloper Tools
Junction logo

Junction

Junction (formerly Vital) modernizes healthcare infrastructure with seamless lab testing and device data integration, connecting over 500 wearables and medical devices with 10+ lab networks including Labcorp and Quest across all 50 states.

HealthTechSeries AAPIHealthcareIoT
PhysicsX logo

PhysicsX

PhysicsX is deploying AI to transform how physical systems are engineered, embedding intelligence across the entire product lifecycle:contentReference[oaicite:23]{index=23}.

Developer ToolsSeries BAIManufacturingAutomation
Profluent Bio logo

Profluent Bio

Uses generative AI to design novel proteins and gene editors for therapeutics.

HealthcareSeries BAIBiotechHealthcare
Protege logo

Protege

Protege operates a governed marketplace platform for ethical sourcing of multimodal, real-world AI training data with compliant data exchange capabilities.

AISeries AAIMachine LearningData Platform
Pulse logo

Pulse

API-first Document AI that converts PDFs, images, slides, and spreadsheets into structured JSON for RAG, analytics, and automation.

AI InfrastructureSeedAPIAIGenerative AI
Pytho AI logo

Pytho AI

Provides a unified interface to design AI workflows by connecting data, models, and automations.

Developer ToolsUnknownAIAgentsDeveloper Tools
Relace logo

Relace

Relace is a provider of auxiliary coding models for faster, more reliable AI code generation that makes it easy to deploy production-ready coding agents with models co-optimized with infrastructure to achieve state-of-the-art performance across million-line repositories.

Developer ToolsSeries AAIDeveloper ToolsInfrastructure
Roboflow logo

Roboflow

Provides end-to-end computer vision platform for managing data, labeling, training, and deploying models.

Developer ToolsSeries BAIComputer VisionMachine Learning
Shovels logo

Shovels

Shovels builds construction intelligence software that turns fragmented building permit data into actionable market and go-to-market signals through APIs and analytics tools.

PropTechSeedAIAnalyticsData Platform
Spiral logo

Spiral

Spiral is a data infrastructure company that provides a multimodal data platform for AI, unifying governance and exposing a single API for every data modality including video, audio, geospatial, and text, engineered for machine-scale throughput to keep GPUs fully saturated.

Data PlatformSeries AData PlatformAIMachine Learning
Structify logo

Structify

AI-powered data platform that transforms unstructured web data and documents (websites, PDFs, pitch decks, reports) into structured, enterprise-ready datasets using their proprietary DoRa model that navigates and extracts data like a human, enabling real-time web extraction for business intelligence and data workflows.

Data PlatformSeedAIData PlatformB2B SaaS
TinyFish logo

TinyFish

TinyFish provides enterprise web agents that automate complex web-based workflows and extract structured data from websites at scale. The platform enables Fortune 500 companies like Google and DoorDash to automate web interactions, streamline data collection, and integrate web automation into their business processes.

Enterprise SoftwareSeries AAIAutomationDeveloper Tools
Tonic AI logo

Tonic AI

Generates realistic synthetic data to power software testing and analytics without exposing sensitive production data.

Developer ToolsSeries BAIInfrastructureMachine Learning

FAQ

What is the Data Engineering tag page on Fast AI Startup Jobs?

It is a curated landing page that groups AI startup companies tagged with Data Engineering, plus links to their company profiles and available jobs.

How many Data Engineering companies are included?

This page currently lists 39 companies tagged with Data Engineering.

How many jobs are associated with Data Engineering companies?

The companies on this page currently account for 498 listed jobs in our public dataset (subject to regular updates).

What roles are most common at Data Engineering companies?

Based on currently listed jobs for Data Engineering companies, the most common role groups are Engineering (1254), Other (359), Sales (358).

What funding stages are most common among Data Engineering companies?

Common funding stages on this Data Engineering page include Series A (17), Series B (7), Seed (5), Series D (3).

Where do the job links go?

Job links point to official company career pages or public job listings, not re-hosted application forms.

How often is this tag page refreshed?

Data is refreshed on a near-daily cadence as public company and job listings change.