Filtrage Web/ Web Filter

Learning

Wednesday 26 April 2006 by ClarK

Learning will used the two steps seen before (tokenization and storage). The aim is to have a representative base of pages of both allowed and forbidden categories. We need a relatively big amount of this pages as for the learning to be useful, and so that the data stored can cover a correct range of the different tokens that can be found on the Web. Then we will be able to give a score to an unknown webpage.

These representatives pages will then be tokenized and the tokens stored in a database depending on their types (tags, domain names, words) with their number of occurrences. A token found into many times in a page will be only counted once, but it can appear in many pages.


-->

Forum

Home page | Contact | Site Map | Private area | visits: 4468

RSS RSSen

Site created with SPIP 1.8.3 + ALTERNATIVES