Humera Minhas & Parinitha Hirehal

Shoot for the moon - machine learning for automated online ad detection

Why did a simple tree-based model outperform a complex graph neural network for detecting online ads? The answer is a lesson in practical machine learning.

Shoot for the moon - machine learning for automated online ad detection
#1about 4 minutes

The challenge of manual ad filtering and the moonshot project

Manual ad filter lists are slow and resource-intensive, prompting the "Project Moonshot" initiative to automate ad detection using AI and machine learning.

#2about 2 minutes

Choosing the right data source for ad detection

The team pivoted from inefficient computer vision models for perceptual ad detection to analyzing HTML structure, which provided richer data for machine learning.

#3about 3 minutes

Generating labeled training data at scale

A custom crawler combined with a modified Adblock Plus was used to automatically label HTML nodes on 250,000 web pages, creating a large-scale ground truth dataset.

#4about 4 minutes

Pre-processing HTML data and overcoming key challenges

The data pipeline converted raw HTML into adjacency and feature matrices while solving challenges like severely unbalanced data and slow processing speeds.

#5about 6 minutes

Experimenting with different machine learning model approaches

Several models were tested for ad classification, including graph neural networks, traditional classifiers with node embeddings, and tree-based models like XGBoost.

#6about 3 minutes

Comparing model performance and planning future improvements

Tree-based models significantly outperformed graph neural networks in F1 score, and future work will explore self-supervised learning and more diverse data.

#7about 3 minutes

Deploying machine learning models in a JavaScript environment

The team tackled deployment challenges by converting Python models to JavaScript, optimizing for latency by moving the model to a background script, and using TensorFlow.js.

#8about 5 minutes

Answering questions on model circumvention and design choices

The speakers address audience questions regarding how ad companies might circumvent the model and the rationale behind their model experimentation process.

Related jobs
Jobs that call for the skills explored in this talk.
SabIna compys

SabIna compys
Vienna, Austria

Remote
20-100K
Intermediate
JavaScript
.NET
+1

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
All the videos of Halfstack London 2024!
Last month was Halfstack London, a conference about the web, JavaScript and half a dozen other things. We were there to deliver a talk, but also to record all the sessions and we're happy to share them with you. It took a bit as we had to wait for th...
All the videos of Halfstack London 2024!
SD
Sabina Dapo
New Test
The basic purpose of narrative is to entertain, to gain and hold readers’ interest. However narratives can also be written to teach or inform, to change attitudes / social opinions e.g. soap operas and television dramas that are used to raise topical...
New Test

From learning to earning

Jobs that call for the skills explored in this talk.

Python Engineer

Usersnap
Vienna, Austria

Intermediate
Python
Amazon Web Services (AWS)