mlytics-careers

Repo for mlytics careers and talent hunting

View the Project on GitHub mlytics/mlytics-careers

← Back to all positions

AI Data Engineer

Location: Singapore (PR+) / Taipei, Taiwan Work style: Singapore / Malaysia (Remote) / Taiwan (Hybrid)


Mlytics in 30 seconds

Mlytics is an AI Answer Engine. We help media publishers turn reader intent into commercial outcomes — replacing fading CPM revenue with high-quality CPL revenue. Our Intent Refinery is live with 15+ of Taiwan’s top media properties, serving 4M+ weekly active users.

We started as a multi-CDN company. That infrastructure — <50ms routing, multi-vendor failover — is now the substrate underneath.

More about Mlytics →


The role

You’ll join the Data & Innovation team, reporting to our Data & Innovation Lead, Tim. Tim has built the foundation: Databricks with Unity Catalog, medallion architecture for CDN usage data, MLflow for experiment tracking, vector search and embedding pipelines for the AIGC product. The architectural choices are made. What’s missing is the person who turns the intent data specs into production systems.

This is not a research role. We’ve already designed the behavioral tracking pipeline (Bronze → Silver → Gold), defined 6 composite intent signal tiers with specific CPC pricing, built the intent taxonomy framework, and spec’d a second-price auction matching engine. The specs exist. Your job is to make them real — and then make them better based on what the data actually tells you.

You’ll work closely with the full-stack product engineer (building the widget and API layer), with Tim (on ML experimentation and model selection), and directly with the Head of Product and CEO on what the data means commercially. When the BD team walks into a meeting with a financial services brand and says “we can show you which users are actively considering retirement planning vs. casually browsing” — the confidence behind that claim comes from your pipeline.


What you’ll do in your first 90 days

Month 1 — Ship the behavioral tracking pipeline.

Our clickstream SDK captures 8 event types (page_enter, scroll_depth, active_time, page_exit, widget_visible, qa_click, qa_read, cross_page) from 58 publisher sites. This data lands in S3 as JSON. Your first deliverable is the production pipeline on Databricks:

By the end of month 1, the pipeline is live, monitored, and processing events from all 58 publishers with <5 minute end-to-end latency.

Month 2 — Implement intent scoring and make it queryable.

The session_summary Gold table feeds a composite intent scoring model that classifies users into tiers with dramatically different commercial value:

Build a Genie Room on top of this so the commercial team can self-serve: “Show me high-intent users on cnyes.com in the investment vertical this week” should return an answer in seconds, not require a data team ticket.

Month 3 — Build the matching engine foundation.

Connect intent signals to advertiser campaigns. Implement the core matching logic: intent classification → campaign filter (vertical, publisher, intent L1/L2/L3) → score (bid × confidence × profile richness × historical CTR) → second-price auction. This is where your pipeline meets the full-stack engineer’s Sponsored Questions product — the first revenue-generating integration of the Intent Refinery.


What we’re looking for

The non-negotiables:

What would make you exceptional:


Tech stack

Layer What we use
Platform Databricks / Unity Catalog / Delta Lake
Compute Spark Structured Streaming / Databricks Workflows
ML MLflow / Python / embeddings / vector search
Storage S3 (clickstream) / PostgreSQL / Redis
Languages Python / SQL / some Go for service integration
Infra GCP / AWS / CloudFlare Workers (collection endpoint)
Observability Databricks SQL dashboards / Grafana / alerting

How this role connects to the bigger picture

The Intent Refinery has four monetization layers. Your pipeline powers all of them:

Product Pricing How your data makes it work
Sponsored Questions CPC $0.50–$2.00 Intent classification triggers real-time ad auction
Intent Display Network CPM $15–$40 User intent profiles enable premium targeting
Intent Micro-sites CPL $5–$20 Cross-site intent graphs identify high-value leads
Full Conversation Performance Multi-turn dialog scoring determines conversion readiness

The commercial target is $70–100K+ MRR by month 6. The features the full-stack engineer builds are the surface. Your pipeline is the engine underneath.


Why this matters right now

We’re raising our next round. The investment thesis is that real-time intent data captured at the point of content consumption is structurally more valuable than historical behavioral data or contextual targeting. That thesis is only as strong as the data system that proves it.

Right now, a brand prospect asks “how is this different from contextual advertising?” and we show them a spec. After you ship, we show them a live dashboard where a Deep Reader + Decision Click user is worth $1.50 CPC and a Casual Browser is worth $5 CPM — a 30x spread. That’s the difference between a hypothesis and a business.


Want to see how we actually build this?How we ship: Becoming Product Builders with Business Thinking


How to apply

Send us something that shows how you think about data systems — a pipeline you’ve built, an architecture decision you made and why, a scoring model you operationalized. We care about what you’ve shipped more than what tools you’ve used.

📧 [email protected]