Sr. AI Data Engineer

AI-first data acquisition, serving & reporting
About the role
This is an AI-first role owning the data acquisition, serving, and reporting infrastructure that powers our product. We aggregate public data at scale, and these pipelines are core to what we deliver. We don’t want someone to hand-operate scrapers and hand-debug every breakage—we want an engineer who builds systems that largely run, diagnose, and repair themselves, using LLMs and agentic workflows to keep everything reliable while continuously raising the level of automation.You own critical systems end to end with minimal hand-holding and treat AI as a core part of the toolkit. Remote, reporting to our Principal Engineer and team lead.
What you’ll do
- Own and self-heal our fleet of web scrapers—build LLM-assisted resilience so structural, markup, and anti-bot changes are detected, diagnosed, and self-repaired with minimal manual effort. When something does break, agents do the first pass on root-cause and propose fixes; you review and approve.
- Keep daily scraping runs stable—monitoring, alerting, retries, and graceful handling of upstream failures so data lands reliably each morning
- Use LLMs for resilient parsing and entity extraction from messy or changing HTML, reducing reliance on brittle selectors
- Own and optimize the serving layer and the ETL/ELT pipelines feeding our BigQuery warehouse—ensuring data is fresh, performant, and reliable for live use
- Build our reporting infrastructure—data models, transformations, and dashboards—plus AI-native layers like natural-language query and LLM-generated narrative insight
- Drive data quality through both rule-based checks and ML/LLM-based anomaly detection, and manage anti-bot challenges (proxies, rate limiting, request patterns) within legal and ethical guidelines
- Build and maintain production-grade MCP servers and agentic workflows that expose our data and tooling to internal and AI consumers
- Partner with the Principal Engineer, analysts, product, and leadership; document systems and best practices for maintainability and human-in-the-loop AI operations
What we’re looking for
- 6+ years in data engineering, including ownership of mission-critical production systems
- Strong Python with deep experience building, maintaining, and debugging scrapers (e.g., Scrapy, Playwright, Selenium, BeautifulSoup)
- AI-first: Hands-on experience building LLM-powered and agentic workflows in production—not just calling an API, but designing systems where agents do meaningful work under human supervision—including production-grade MCP servers
- Prompt engineering and LLM evaluation/observability—reasoning about output quality, cost, latency, and failure modes the way you’d reason about uptime—plus fluency with AI-assisted dev tools (e.g., Claude Code, Cursor)
- Proven experience designing reporting/analytics layers—data modeling, transformations (e.g., dbt), and BI tools
- Hands-on with the GCP data stack—BigQuery, Cloud Composer (managed Airflow), Cloud Storage, Cloud Run or GKE—plus advanced SQL and Docker
- A reliability mindset—proven track record owning systems, triaging failures, and being accountable for uptime; sound judgment on when to use deterministic code versus an LLM
- Understanding of the legal and ethical considerations around web scraping
Nice to have
- Experience training, deploying, and maintaining ML models
- Experience with MotherDuck / DuckDB, ideally serving data to production applications
- Experience scaling or refactoring distributed scraping systems
- Knowledge of Pub/Sub, Dataflow, or other large-scale data processing tools
- Infrastructure-as-code (Terraform)
- Experience setting data strategy or mentoring other engineers
Logistics
- Location: Remote (US based)
- On-call: This role supports daily scraping and nightly processing runs and a production serving layer; some availability for off-hours incident response may be expected
- Compensation (based on experience): $190-210K Base Salary + Bonus
Grace Hill offers a robust suite of benefits, including health, dental and vision insurance, 401K, PTO, life insurance, disability insurance, and more.
Unfortunately we are not able to offer visa sponsorship or assistance. Applicants must be based in the US and authorized to work in the US at the time of hire.
About us
Grace Hill provides industry-leading SaaS technology solutions designed to make a positive impact in real estate and improve the lives of people where they work and live. Harnessing years of real estate experience and the understanding that people are better together, Grace Hill helps owners and operators increase property performance, reduce operating risk and grow top talent. More than 500,000 professionals from over 1,700 companies rely on Grace Hill’s talent performance solutions covering policy, training, assessment, survey, and data-driven insights. Visit us at gracehill.com or on Linked
Our HelloData product solves complex data problems for the multifamily industry, utilizing automated pipelines and AI to provide real-time market insights for the nation's top managers, developers, and investors. Our platform is trusted by the industry’s largest operators to help optimize rents, underwrite operating expenses, and grow NOI with its highly accurate data and user-friendly interface. Since being acquired by Grace Hill in April 2025, HelloData has continued to accelerate at an unbelievable rate, growing ARR by over 300% in 2025 alone and on track for a record-breaking 2026. We combine the agility and innovation of a high-growth startup with the stability and resources of an established enterprise, making us the gold standard in multifamily data analytics.