Register | Log in | FAQ      [?] 
CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Recent | Recommended | Search | Authors | Tags | Export

Combating Good Word Attacks on Statistical Spam Filters with Multiple Instance Learning

by: Yan Zhou, Z Jorgensen, M Inge
Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on, Vol. 2 (2007), pp. 298-305.


View FullText article


X Reviews [Write a review of this article]

There are no reviews of this article

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Abstract

Statistical spam filters are known to be vulnerable to adversarial attacks. One such adversarial attack, known as the good word attack, thwarts spam filters by appending to spam messages sets of "good" words, which are common in legitimate e-mail but rare in spam. We present a counter attack strategy that first attempts to differentiate spam from legitimate e-mail in the input space, by transforming each e- mail into a bag of multiple segments, and subsequently applies multiple instance logistic regression on the bags. We treat each segment in the bag as an instance. An e-mail is classified as spam if at least one instance in the corresponding bag is spam, and as legitimate if all the instances in it are legitimate. We show that a spam filter using our multiple instance counter-attack strategy stands up better to good word attacks than its single instance counterpart and the commonly practiced Bayesian filters.


X BibTeX record

X RIS record



RIS BibTeX
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.