Filtrage Web/ Web Filter
> Theory

This section's articles

Scoring/Statistical approach (Theory)

Wednesday 26 April 2006 by ClarK
The best way to understand this part is to read the article: A Statistical Approach to the Spam Problem. Else that would kind of the translation of a translation... > continue

Learning (Theory)

Wednesday 26 April 2006 by ClarK
Learning will used the two steps seen before (tokenization and storage). The aim is to have a representative base of pages of both allowed and forbidden categories. We need a relatively big amount of this pages as for the learning to be useful, and so that the data stored can cover a correct (...) > continue

Storage (Theory)

Wednesday 26 April 2006 by ClarK
When retrieving tokens, on each wab page, we need to store them: temporarily during learning step as well as during scoring step, in order to deal with them, permanently when storing tokens and their number of occurences (when building the learning database). The better way is to store them (...) > continue

Tokenization (Theory)

Wednesday 26 April 2006 by ClarK
The principle of tokenization is to cut into significant parts (tokens) the HTML code of a wabpage. In order to determine what is significant in the content of a wabpage we need to analyse it. Example: http://www.cplair.com/ continue

Home page | Contact | Site Map | Private area | visits: 4468

RSS RSSen RSSTheory

Site created with SPIP 1.8.3 + ALTERNATIVES