Semantic Modeling for Content-Targeted Online Advertising

Author: Ankit Patil
Date: 2017-05-06
Report no: IIIT/TH/2017/27
Advisor:Vasudeva Varma

Abstract

In recent times, online advertising have become the prominent source of revenue for a major part of the Internet economy. Various modes of user interaction like - search engines, social networking, QA systems etc. has evolved forming different channel of advertising like Sponsored Search, Contextual Advertising, Micro-blog Targeting etc. The ad-publishers and the ad-networks need to be watchful about user’s interest when targeting them for their brand or product promotion through these channels. Exploiting user’s interest is of paramount importance while placing the advertisements, as it involves the higher probability of a click on the ad, which offers benefit to all the entities involved. Browsing content, a user is going through, is one of the indication of the things which interests user, which is used by the ad-networks for targeting. The ad relevant to the user’s interest increases the probability of an ad-click, which is beneficial for the entities involved. The field of On line Advertising which deals in placing ads by targeting the content is prominently known as ‘Content-targeted advertising’. The short-text of ads and unavailability of word-distributions on search engines/social-media builds on the vocabulary difference between the user and the publisher/advertiser. This vocabulary gap is one of the major problem while retrieving the relevant ads for the content being targeted. In this research work, our main focus was to bridge the ‘vocabulary-gap’ between the ad-text and browsing-content. Due to vocabulary mis-match we can not retrieve all the ads relevant to the content. In this thesis we have proposed different Semantic Solutions to solve the problem of vocabulary-mismatch between the texts. The two texts (ad and the content to be targeted) in their original forms do not share same vocabulary, so, we can consider them as two different syntactic-spaces. Measuring the relatedness between the two texts with different vocabularies is difficult and not-precise. The solutions proposed involve transforming these two texts to a common semantic space and then measure the similarity between them with higher precision than their actual representation. We proposed solutions for two categories of content-targeted advertising: • Micro-blog Targeting: This is about targeting ads relevant to micro-blogs in social media platforms. • Contextual Advertising: This is about retrieving relevant ads for the web page based on its content. Both categories come with their own challenges, micro-blogs are sparse in nature , involving slangs and inaccurate grammatical sentences with spelling mistakes. Retrieving ads for micro-blogging content is a hard problem. As the micro-blogging content is short and noisy and the ads are short too, there is a high amount of lexical mismatch between the micro-post and the ads. To bridge this lexical mismatch, we have proposed a ‘conceptual approach’ that transforms the content into a conceptual space that represents the latent concepts of the content. The conceptual distribution of the content have been used to measure the relatedness between the micro-blog and the ad. Web-page has a lot of text which needs to be targeted against the ad (very short text), we have proposed a solution which deals with the inherent latent topics of the text. The approach ‘topic models’ transforms the text into the distribution of predefined topics, and then we compare the topical distributions of the content and the ad-text to measure the similarity between them. To build the topic-model, we have used ODP (Open directory project) which has web-pages for numerous categories like, Finance, Automobile, Shopping etc. Our work on the microblog targeting is one of the initial work in the field. In this thesis, we have investigated some of the factors which are helpful while targeting microblogs. We have also compared our proposed approaches with various state-of-the-art techniques like vector-space model and language modeling (syntactic-match), using a taxonomy classification approach for semantic-ad matching, a machine learning approach etc. We empirically show that the proposed semantic models and the ensemble of this models with syntactic models have performed better than all the baselines and also have the substantial and significant improvement in the precision at all the precision levels over the baseline models.

Full thesis: pdf

Centre for Search and Information Extraction Lab

IIIT Hyderabad Publications

Semantic Modeling for Content-Targeted Online Advertising

Abstract