Data & Model Infrastructure Engineer (EO-AI)
You'll build the data and model infrastructure that everything else depends on — turning public Earth Observation archives, alternative data signals, and real-world models into a clean, queryable, production-grade foundation for now-casting and decision support. A hands-on building role for someone who codes fast, thinks in data, and uses modern AI coding assistants as a force multiplier.
What you'll do
- Build and operate ingestion pipelines for public EO data (Sentinel-1/2, Landsat, MODIS, ISRO ResourceSat/Cartosat, Bhuvan) and alternative data signals (weather, IoT, market, mobility, and other non-traditional sources), harmonised to analysis-ready data.
- Implement and extend the canonical data schema stack — STAC 1.1 catalogues, a GeoParquet + pgvector feature store, and validated inference records.
- Develop, adapt, and fine-tune real-world models — from foundation-model embeddings and multi-sensor fusion to now-casting components — wired into reliable, testable pipelines.
- Stand up the supporting infrastructure: data lakes, compute/MLOps workflows, evaluation harnesses, and monitoring.
- Use AI coding assistants (e.g. Claude Code, Copilot, Cursor) fluently to prototype, refactor, and ship — and help set good practices for AI-assisted development.
What we're looking for
- Master's or PhD student in Computer Science, or a recent graduate of an engineering discipline, with a strong coding and data-modeling background.
- Strong programming (Python essential; SQL a plus) and sound software-engineering fundamentals — version control, testing, clean pipelines.
- Demonstrated fluency with AI coding assistants and a track record of building real things quickly.
- Comfort with data modeling and ML workflows: structuring messy data, fine-tuning models, reasoning about evaluation and uncertainty.
Nice to have
- Geospatial / remote-sensing data, cloud platforms, vector databases, or MLOps (Kafka/Flink, containers, orchestration).
- Familiarity with foundation models, data assimilation, or time-series / now-casting methods.
- A portfolio, GitHub, or shipped projects we can look at.
Who thrives here: builders who want ownership of real infrastructure, energised by going from raw satellite bytes to a working signal — work that feeds decisions, not slide decks.