Data Engineer
Consensus
About Consensus:
Consensus is building the OS for research. Today, our academic search engine helps 8 million students, researchers, and doctors analyze scientific papers and complete literature reviews 10x faster.
Our Series A was led by USV, with major participation from top AI investors, including Nat Friedman and Daniel Gross. Consensus has been featured in The Wall Street Journal, The Atlantic, The New York Times, Nature, and a16z as one of today’s most exciting and important AI startups.
Our mission is to empower the world to understand, create, and apply good science. Help us build the future of research.
Role: Data Engineer to design, build, and maintain the data pipelines and infrastructure that form the foundation of our product.
Responsibilities
- Build and maintain production pipelines for third-party data (APIs, files, partner feeds).
- Clean, normalize, and unify raw into clear, consistent tables/datasets.
- Handle de-duplication / entity resolution so records are canonical and trustworthy.
- Own data quality: validation, monitoring/alerting, and documentation.
- Partner closely with ML/product/backend to design schemas and ship data features that unblock use cases.
- Optimize pipelines for performance and cost, and improve CI/CD + testing for data jobs.
Qualifications
- 3+ years building production data pipelines.
- Strong SQL + Python.
- Experience with distributed data processing (Databricks/Spark or similar).
- Experience with an orchestrator (Dagster/Airflow/Prefect); Dagster is a plus.
- Proven ability to ingest from multiple sources and produce clean, well-modeled datasets.
- Familiar with GitHub + CI/CD and data observability practices (validation, logging, monitoring, error handling).
- High ownership, fast execution, and good product instincts around how data impacts UX.
- Interest in the product and mission: science, research, education
Compensation:
- $160-$230k cash
- Competitive Series A equity
Final offers are determined by multiple factors and may vary from the amounts listed above.