Tag Landing Page

Data Engineering AI Startup Jobs

Explore startups tagged with Data Engineering and compare hiring activity, company profiles, and direct job links. This page is indexable only when a tag reaches at least 5 companies to avoid thin content.

Companies
72
Jobs (listed)
462
Last updated
Apr 23, 2026, 12:00 AM
Databricks logo

Databricks

238 jobs

Unified analytics platform for data and AI, helping companies process and analyze big data in the cloud.

AI InfrastructureSeries G+AIAnalyticsCloud Computing
Scale AI logo

Scale AI

80 jobs

Data infrastructure company providing high-quality training data for AI applications, recently partnered with Meta.

AI InfrastructureSeries G+AIComputer VisionInfrastructure
Together AI logo

Together AI

37 jobs

Cloud platform for running and fine-tuning open-source AI models at scale.

AI InfrastructureSeries BAICloud ComputingGenerative AI
Hightouch logo

Hightouch

34 jobs

Data activation platform (Reverse ETL, Customer Studio) to sync warehouse data to business tools.

Data / Marketing TechSeries CAnalyticsMarketing TechB2B SaaS
Cribl logo

Cribl

29 jobs

Data pipeline platform that gives you control over your observability data.

AI InfrastructureSeries EAIInfrastructureCloud Computing
Astronomer logo

Astronomer

14 jobs

Company behind Astro, a managed Apache Airflow DataOps platform for data & AI pipelines.

Data Engineering / OrchestrationSeries DAIDeveloper ToolsCloud Computing
Snorkel AI logo

Snorkel AI

14 jobs

Data-centric AI platform for programmatically labeling and managing training data.

AI InfrastructureSeries DAIData PlatformInfrastructure
PhaseV logo

PhaseV

5 jobs

ML-driven adaptive trials and clinical development optimization.

Health AISeries AAIAnalyticsHealthcare
Datology AI logo

Datology AI

4 jobs

AI training data curation platform helping enterprises optimize ML training data at petabyte scale.

AI InfrastructureSeries AAIMachine LearningAutomation
Polars logo

Polars

4 jobs

Polars is a blazingly fast DataFrames library written in Rust, offering Python, R, Node.js, and SQL bindings for efficient, multi-threaded data manipulation at scale.

Data PlatformSeries AAIAnalyticsData Platform
OneSchema logo

OneSchema

3 jobs

AI-driven CSV and PDF data import automation platform for seamless customer onboarding.

Developer ToolsSeedAIAutomationDeveloper Tools
Alloy logo

Alloy

Alloy is a data platform for robotics that helps companies process, organize, and search through the massive volumes of sensor, camera, and telemetry data their robots generate. The Sydney-based startup enables natural language search across robot data and automated issue detection, reducing data processing time by up to 90%.

Data & AnalyticsPre-SeedAIAnalyticsAutomation
Anaconda logo

Anaconda

Anaconda provides the world's most popular open-source Python and R distribution for data science and AI development. Serving over 45 million users, its platform enables enterprises to manage packages, environments, and AI workflows at scale with security and governance controls.

Developer ToolsSeries CAIData PlatformDeveloper Tools
Anomalo logo

Anomalo

Anomalo is an AI-powered enterprise data quality monitoring platform that automatically detects data issues across warehouses and lakes without manual rule configuration. The platform uses machine learning to monitor structured and unstructured datasets for enterprises like Block and Discover Financial.

Data & AnalyticsSeries BData PlatformMachine LearningAI
Apheris logo

Apheris

Apheris provides governed, privacy-preserving data access and collaboration for AI and analytics across sensitive datasets.

CybersecuritySeries AAIData PlatformCompliance
Artie logo

Artie

Fully managed change data capture (CDC) streaming platform that replicates production databases into data warehouses and lakes in real time. Trusted by Substack, ClickUp, and Alloy, processing over 700 billion rows annually.

Data & AnalyticsSeries AData PlatformInfrastructureOpen Source
Astral logo

Astral

Astral builds high-performance Python developer tooling, including Ruff, uv, and ty, with a focus on fast local workflows and production-grade packaging.

Developer ToolsCloud ComputingDeveloper ToolsDevOps
Ayar Labs logo

Ayar Labs

Ayar Labs builds optical I/O and in-package photonics technology to reduce data-movement bottlenecks in large-scale AI and high-performance computing systems.

InfrastructureSeries EAIInfrastructureManufacturing
Bindwell logo

Bindwell

Bindwell is an AI-powered pesticide discovery company that uses machine learning models 4x faster than DeepMind's AlphaFold to screen billions of molecules and design safer, more effective crop protection products. Unlike traditional agtech software companies, Bindwell develops and licenses complete proprietary pesticide molecules to major agrochemical companies. Founded by teen entrepreneurs Tyler Rose and Navvye Anand through Y Combinator's W25 batch, the company is backed by General Catalyst and Paul Graham.

BiotechSeedAIBiotechMachine Learning
Biostate AI logo

Biostate AI

A scalable biological data collection service providing multi-omics data for research.

BiotechSeries AAIBiotechMedical
Bronto logo

Bronto

Modern logging and observability platform for AI applications and engineering teams, offering fast log ingestion, search, and alerting with a columnar storage architecture.

Data & AnalyticsSeedDeveloper ToolsAnalyticsDatabase
Colossal Biosciences logo

Colossal Biosciences

Colossal Biosciences is a genetic engineering and de-extinction company using CRISPR technology to restore extinct species like the woolly mammoth and protect critically endangered ecosystems.

BiotechSeries CBiotechMachine LearningHealthcare
Credal.ai logo

Credal.ai

Credal provides a secure AI agent platform for enterprises, enabling teams to build AI agents and MCP-connected workflows across internal data sources with governance controls.

Enterprise AI InfrastructureSeedAIAgentsEnterprise Software
Dagster logo

Dagster

Dagster builds open-source and commercial orchestration tooling that helps data teams ship, observe, and scale pipelines with a modern developer experience.

Developer ToolsSeries BAnalyticsCloud ComputingData Platform
David AI logo

David AI

David AI is the world's first dedicated audio data research lab, building the data layer for next-generation audio AI. Founded by former Scale AI engineers, serving most FAANG companies and major AI labs.

Data & AnalyticsSeries BAIData PlatformDeveloper Tools
Deepnote logo

Deepnote

Collaborative cloud data notebook platform for data science and analytics teams.

Developer ToolsSeries AAIAnalyticsData Platform
Definite logo

Definite

Definite combines a cloud data warehouse, metrics layer, notebooks, dashboards, and AI assistant workflows into an all-in-one analytics platform for faster self-serve analysis.

Data AnalyticsSeedAIAnalyticsB2B SaaS
Distyl AI logo

Distyl AI

Data intelligence platform that unifies messy operational data, applies AI agents, and routes insights back into business workflows.

Developer ToolsSeries BAIAgentsEnterprise Software
DualBird logo

DualBird

DualBird provides a cloud-native hardware-software data and AI infrastructure engine that delivers 10-100x faster performance and 50-90% lower costs through FPGA-based acceleration.

AI InfrastructureSeries AAICloud ComputingData Platform
Encharge AI logo

Encharge AI

Developing analog in-memory compute chips and software for energy-efficient AI at the edge.

AI InfrastructureSeries BAIInfrastructureIoT
Eon logo

Eon

Eon is the first cloud backup posture management (CBPM) platform, automating and unifying complex cloud backups into a queryable data lake for fast recovery, compliance, and AI analytics. Founded by the team behind AWS Disaster Recovery, Eon converts idle backup data into an accessible secondary storage layer for enterprise AI workloads.

CybersecuritySeries DCloud ComputingCybersecurityInfrastructure
Espresso AI logo

Espresso AI

Espresso AI uses generative AI and machine learning to automatically optimize SQL queries and reduce cloud compute costs by up to 70-80% for Snowflake data warehouse users. The platform integrates with existing data warehouse setups to analyze and optimize queries in real time using NLP, program synthesis, and reinforcement learning.

Data & AnalyticsSeedAIAnalyticsAutomation
Firecrawl logo

Firecrawl

Firecrawl is a web data infrastructure platform that converts websites into clean, structured data optimized for AI applications through a simple API, turning entire websites into LLM-ready markdown or structured data.

Developer ToolsSeries AAIAPIDeveloper Tools
Flatfile logo

Flatfile

AI-assisted data exchange platform that helps teams collect, map, validate, and transform messy customer data before it enters core systems.

Data PlatformSeries BData PlatformAPIDatabase
Flow Computing logo

Flow Computing

Flow Computing develops Parallel Processing Unit technology to accelerate next-generation CPUs for AI, edge, cloud, and parallel computing workloads.

AI InfrastructureAICloud ComputingDeveloper Tools
Fundamental logo

Fundamental

Fundamental builds large tabular models and enterprise AI infrastructure for prediction and analysis on complex business data, focused on tabular reasoning and decision support.

Enterprise AISeries AAIMachine LearningAnalytics
Grafana Labs logo

Grafana Labs

Company behind the open-source Grafana observability stack providing monitoring, logging, and tracing solutions, reaching $400M ARR as a fully remote company across 40+ countries.

Data & AnalyticsSeries EOpen SourceDevOpsCloud Computing
Gruve logo

Gruve

Gruve delivers AI-native infrastructure, inference systems, and enterprise AI agents for inference-heavy workloads with an emphasis on speed, security, and measurable outcomes.

InfrastructureSeries AAIInfrastructureCloud Computing
Hex logo

Hex

Hex is a collaborative analytics workspace that combines notebooks, SQL, data apps, and AI-assisted workflows for data teams.

AnalyticsSeries CAIAnalyticsDeveloper Tools
Junction logo

Junction

Junction (formerly Vital) modernizes healthcare infrastructure with seamless lab testing and device data integration, connecting over 500 wearables and medical devices with 10+ lab networks including Labcorp and Quest across all 50 states.

HealthTechSeries AAPIHealthcareIoT
LlamaIndex logo

LlamaIndex

LlamaIndex is a data framework for LLM applications that enables developers to connect, index, and query custom data sources with large language models through their open-source library and LlamaCloud platform.

AI InfrastructureSeries AAIAPIData Platform
Mage logo

Mage

Mage is an open-source, AI-native data pipeline platform that enables teams to build, run, and manage data pipelines for integrating and transforming data using Python, SQL, and R. Available as both open-source and enterprise versions, it provides real-time and batch pipeline orchestration.

Data & AnalyticsSeedAIData PlatformOpen Source
MotherDuck logo

MotherDuck

MotherDuck is a serverless cloud data warehouse built on the open-source DuckDB engine, enabling fast SQL analytics with no infrastructure to manage. The platform supports hybrid local-cloud execution, allowing analysts to query data seamlessly across laptop and cloud.

Data & AnalyticsSeries BData PlatformDatabaseCloud Computing
Nexthop AI logo

Nexthop AI

Nexthop AI builds networking systems for AI-scale data centers, focusing on high-performance switching infrastructure for hyperscale and cloud environments.

InfrastructureSeries BAICloud ComputingInfrastructure
Omni logo

Omni

Omni is a modern business intelligence and analytics platform that combines a unified semantic data model with SQL flexibility, enabling AI-powered trustworthy answers in seconds. The platform supports embedded analytics, custom dashboards, and governed data exploration.

Data & AnalyticsSeries BAnalyticsData PlatformAI
Perle logo

Perle

Perle is an AI training data platform that combines human expertise with adaptive workflows to help companies collect, annotate, and evaluate specialized training data for generative AI, LLMs, and RLHF. Their vetted global network of domain experts provides modular solutions for data annotation, enrichment, and adversarial robustness assessment.

Data & AnalyticsSeedAIData PlatformMachine Learning
Prefect logo

Prefect

Prefect builds workflow orchestration and AI infrastructure software that helps teams automate, observe, and manage data and application workflows.

Workflow OrchestrationAIAutomationB2B SaaS
Prior Labs logo

Prior Labs

Prior Labs builds tabular foundation models that understand spreadsheets and databases, enabling instant pattern inference across any dataset without task-specific training. Their flagship model TabPFN, trained on 130 million synthetic datasets, ranks #1 on the TabArena benchmark and scales to 10 million rows, serving Fortune 500 companies like Hitachi.

AI InfrastructurePre-SeedAIAnalyticsB2B SaaS
Profluent Bio logo

Profluent Bio

Uses generative AI to design novel proteins and gene editors for therapeutics.

HealthcareSeries BAIBiotechHealthcare
Protege logo

Protege

Protege operates a governed marketplace platform for ethical sourcing of multimodal, real-world AI training data with compliant data exchange capabilities.

AISeries AAIMachine LearningData Platform
Pulse logo

Pulse

API-first Document AI that converts PDFs, images, slides, and spreadsheets into structured JSON for RAG, analytics, and automation.

AI InfrastructureSeedAPIAIGenerative AI
Pytho AI logo

Pytho AI

Provides a unified interface to design AI workflows by connecting data, models, and automations.

Developer ToolsUnknownAIAgentsDeveloper Tools
Reducto logo

Reducto

Reducto provides a high-quality AI document ingestion and parsing API for large language models. The Y Combinator-backed company processes nearly a billion pages monthly for leading AI teams like Harvey and Scale AI.

AI InfrastructureSeries BAIAPIDeveloper Tools
Relace logo

Relace

Relace is a provider of auxiliary coding models for faster, more reliable AI code generation that makes it easy to deploy production-ready coding agents with models co-optimized with infrastructure to achieve state-of-the-art performance across million-line repositories.

Developer ToolsSeries AAIDeveloper ToolsInfrastructure
Rune logo

Rune

Developer of the world's first DC data centers built exclusively for solar and wind power. Using proprietary chip design and smart controllers, Rune converts stranded and curtailed renewable energy into compute power at generation sites.

Climate TechSeedClimate TechCloud ComputingInfrastructure
San Francisco Compute logo

San Francisco Compute

SF Compute provides rentable, large, low-cost GPU clusters for AI pre-training workloads. The platform operates as a marketplace connecting AI teams with on-demand high-performance computing capacity, offering flexible access to supercomputing-scale infrastructure with InfiniBand interconnects.

AI InfrastructureSeries AAICloud ComputingInfrastructure
Sapien logo

Sapien

Sapien builds AI-native analysts for finance and operations teams, connecting ERP, warehouse, spreadsheet, and operational data. Its agents help CFO and analytics teams find profit drivers, explain variance, and act on messy transaction-level data faster.

Data & AnalyticsSeedAIAgentsAnalytics
Shovels logo

Shovels

Shovels builds construction intelligence software that turns fragmented building permit data into actionable market and go-to-market signals through APIs and analytics tools.

PropTechSeedAIAnalyticsData Platform
Spiral logo

Spiral

Spiral is a data infrastructure company that provides a multimodal data platform for AI, unifying governance and exposing a single API for every data modality including video, audio, geospatial, and text, engineered for machine-scale throughput to keep GPUs fully saturated.

Data PlatformSeries AData PlatformAIMachine Learning
Structify logo

Structify

AI-powered data platform that transforms unstructured web data and documents (websites, PDFs, pitch decks, reports) into structured, enterprise-ready datasets using their proprietary DoRa model that navigates and extracts data like a human, enabling real-time web extraction for business intelligence and data workflows.

Data PlatformSeedAIData PlatformB2B SaaS
Supper logo

Supper

AI-native agentic data platform that integrates with SaaS tools and data warehouses, cleanses and normalizes data, and enables self-serve insights through natural language.

Data & AnalyticsSeedAIAgentsAnalytics
Syenta logo

Syenta

Syenta develops Localized Electrochemical Manufacturing (LEM) technology for advanced semiconductor chip packaging, enabling scalable, high-density interconnects without traditional lithography. Spun out from the Australian National University, their approach addresses memory bandwidth bottlenecks in AI computing.

Hardware & RoboticsSeedAIManufacturingInfrastructure
Tensormesh logo

Tensormesh

Semantic KV caching layer built for LLM inference, enabling AI applications to reduce inference costs and latency by reusing cached computation across similar prompts.

AI InfrastructureSeedAIInfrastructureLLM
Tinybird logo

Tinybird

Tinybird is a real-time data platform that enables data and engineering teams to build real-time data products and APIs at scale. The platform ingests, transforms, and serves large volumes of data with sub-second latency for analytics and operational intelligence.

Data & AnalyticsSeries BAPIAnalyticsData Platform
TinyFish logo

TinyFish

TinyFish provides enterprise web agents that automate complex web-based workflows and extract structured data from websites at scale. The platform enables Fortune 500 companies like Google and DoorDash to automate web interactions, streamline data collection, and integrate web automation into their business processes.

Enterprise SoftwareSeries AAIAutomationDeveloper Tools
Tonic AI logo

Tonic AI

Generates realistic synthetic data to power software testing and analytics without exposing sensitive production data.

Developer ToolsSeries BAIInfrastructureMachine Learning
Tracer logo

Tracer

Tracer is the first pipeline monitoring system purpose-built for high-performance computing in life sciences, providing real-time performance metrics, cost breakdowns, and optimization insights for complex computational pipelines.

Developer ToolsSeedAIAnalyticsBiotech
Transcend logo

Transcend

Transcend is an enterprise-grade data privacy infrastructure platform that serves as the compliance layer for customer data. It enables organizations to automate data subject requests, map data across systems, manage consent, and activate data for AI responsibly at scale.

CybersecuritySeries BComplianceCybersecurityEnterprise Software
Unlimited Industries logo

Unlimited Industries

Unlimited Industries is an AI-native construction company that vertically integrates design and build for large-scale infrastructure projects including data centers, energy facilities, and advanced manufacturing. The company's proprietary AI platform can explore tens of thousands of design configurations to optimize costs and timelines, reducing pre-construction engineering from months to weeks. Founded by serial entrepreneurs and backed by Andreessen Horowitz, Unlimited is rethinking how America's critical infrastructure gets built.

PropTech & ConstructionSeedAIAutomationManufacturing
Unstructured logo

Unstructured

Open-source data preprocessing platform that extracts, cleans, and transforms unstructured documents (PDFs, images, HTML, emails) into structured formats optimized for AI and LLM pipelines.

AI InfrastructureSeries CAIOpen SourceLLM
Weka logo

Weka

WEKA builds a cloud and AI data platform that accelerates model training and inference workloads with high-performance, software-defined storage.

InfrastructureSeries EInfrastructureData PlatformCloud Computing
ZeroEntropy logo

ZeroEntropy

ZeroEntropy provides a high-accuracy search API over unstructured data for AI agents and RAG applications. The YC-backed company builds smarter retrieval models enabling AI agents across healthcare, law, and sales.

AI InfrastructureSeedAIAPIAgents

FAQ

What is the Data Engineering tag page on Fast AI Startup Jobs?

It is a curated landing page that groups AI startup companies tagged with Data Engineering, plus links to their company profiles and available jobs.

How many Data Engineering companies are included?

This page currently lists 72 companies tagged with Data Engineering.

How many jobs are associated with Data Engineering companies?

The companies on this page currently account for 462 listed jobs in our public dataset (subject to regular updates).

What roles are most common at Data Engineering companies?

Based on currently listed jobs for Data Engineering companies, the most common role groups are Engineering (1777), Sales (494), Other (408).

What funding stages are most common among Data Engineering companies?

Common funding stages on this Data Engineering page include Seed (19), Series A (18), Series B (15), Series C (5).

Where do the job links go?

Job links point to official company career pages or public job listings, not re-hosted application forms.

How often is this tag page refreshed?

Data is refreshed on a near-daily cadence as public company and job listings change.