Repo for mlytics careers and talent hunting
Location: Singapore (PR+) / Taipei, Taiwan Work style: Singapore / Malaysia (Remote) / Taiwan (Hybrid)
Mlytics is an AI Answer Engine. We help media publishers turn reader intent into commercial outcomes — replacing fading CPM revenue with high-quality CPL revenue. Our Intent Refinery is live with 15+ of Taiwan’s top media properties, serving 4M+ weekly active users.
We started as a multi-CDN company. That infrastructure — <50ms routing, multi-vendor failover — is now the substrate underneath.
You’ll join the Data & Innovation team, reporting to our Data & Innovation Lead, Tim. Tim has built the foundation: Databricks with Unity Catalog, medallion architecture for CDN usage data, MLflow for experiment tracking, vector search and embedding pipelines for the AIGC product. The architectural choices are made. What’s missing is the person who turns the intent data specs into production systems.
This is not a research role. We’ve already designed the behavioral tracking pipeline (Bronze → Silver → Gold), defined 6 composite intent signal tiers with specific CPC pricing, built the intent taxonomy framework, and spec’d a second-price auction matching engine. The specs exist. Your job is to make them real — and then make them better based on what the data actually tells you.
You’ll work closely with the full-stack product engineer (building the widget and API layer), with Tim (on ML experimentation and model selection), and directly with the Head of Product and CEO on what the data means commercially. When the BD team walks into a meeting with a financial services brand and says “we can show you which users are actively considering retirement planning vs. casually browsing” — the confidence behind that claim comes from your pipeline.
Month 1 — Ship the behavioral tracking pipeline.
Our clickstream SDK captures 8 event types (page_enter, scroll_depth, active_time, page_exit, widget_visible, qa_click, qa_read, cross_page) from 58 publisher sites. This data lands in S3 as JSON. Your first deliverable is the production pipeline on Databricks:
By the end of month 1, the pipeline is live, monitored, and processing events from all 58 publishers with <5 minute end-to-end latency.
Month 2 — Implement intent scoring and make it queryable.
The session_summary Gold table feeds a composite intent scoring model that classifies users into tiers with dramatically different commercial value:
Build a Genie Room on top of this so the commercial team can self-serve: “Show me high-intent users on cnyes.com in the investment vertical this week” should return an answer in seconds, not require a data team ticket.
Month 3 — Build the matching engine foundation.
Connect intent signals to advertiser campaigns. Implement the core matching logic: intent classification → campaign filter (vertical, publisher, intent L1/L2/L3) → score (bid × confidence × profile richness × historical CTR) → second-price auction. This is where your pipeline meets the full-stack engineer’s Sponsored Questions product — the first revenue-generating integration of the Intent Refinery.
The non-negotiables:
What would make you exceptional:
| Layer | What we use |
|---|---|
| Platform | Databricks / Unity Catalog / Delta Lake |
| Compute | Spark Structured Streaming / Databricks Workflows |
| ML | MLflow / Python / embeddings / vector search |
| Storage | S3 (clickstream) / PostgreSQL / Redis |
| Languages | Python / SQL / some Go for service integration |
| Infra | GCP / AWS / CloudFlare Workers (collection endpoint) |
| Observability | Databricks SQL dashboards / Grafana / alerting |
The Intent Refinery has four monetization layers. Your pipeline powers all of them:
| Product | Pricing | How your data makes it work |
|---|---|---|
| Sponsored Questions | CPC $0.50–$2.00 | Intent classification triggers real-time ad auction |
| Intent Display Network | CPM $15–$40 | User intent profiles enable premium targeting |
| Intent Micro-sites | CPL $5–$20 | Cross-site intent graphs identify high-value leads |
| Full Conversation | Performance | Multi-turn dialog scoring determines conversion readiness |
The commercial target is $70–100K+ MRR by month 6. The features the full-stack engineer builds are the surface. Your pipeline is the engine underneath.
We’re raising our next round. The investment thesis is that real-time intent data captured at the point of content consumption is structurally more valuable than historical behavioral data or contextual targeting. That thesis is only as strong as the data system that proves it.
Right now, a brand prospect asks “how is this different from contextual advertising?” and we show them a spec. After you ship, we show them a live dashboard where a Deep Reader + Decision Click user is worth $1.50 CPC and a Casual Browser is worth $5 CPM — a 30x spread. That’s the difference between a hypothesis and a business.
Want to see how we actually build this? → How we ship: Becoming Product Builders with Business Thinking
Send us something that shows how you think about data systems — a pipeline you’ve built, an architecture decision you made and why, a scoring model you operationalized. We care about what you’ve shipped more than what tools you’ve used.