|
|
L-Files - Netting the good and the bad |
The L-Files this edition looks at two contrasting uses of the internet – one building community by sharing knowledge, the other using ‘the dark side’ of the force to undermine functionality.
Spam Spam Spam Spam
Each day I now receive more than 600 unsolicited automated emails, or spam. A survey by MessageLabs published last month confirmed that spam is on the increase, now accounting for 75% of all email activity. No matter how secret you try to keep your email address, spammers will eventually find you. Spam filtering techniques are all that prevent email from losing any practical utility.
Email filtering is used to separate legitimate messages from junk email. False positives, in which a legitimate message is rejected, can be disastrous. Traditional filtering techniques depend on comparing the sender of a message with a known blacklist of spammers. Another method is to reject messages that contain certain keywords (such as ‘Nigeria’ or ‘enlarge’). Such techniques, which are often used by ISPs, trap only about 20% of spam, and increase the risk of false positives.
The increasing sophistication of spamming techniques has led to the development of tools that use artificial intelligence to learn to accurately differentiate between ‘good’ email and spam. Such ‘statistical filters’ are trained by each individual user, who initially manually identifies which messages are junk. To simplify a rather complicated process, all the words that are used in legitimate emails are stored in a list of good words. All the words used in spam emails are stored in a file of bad words. Each new message is analysed to determine the ratio of good words to bad words, and an arbitrary level is used below which a message is classified as junk. More sophisticated techniques analyse word pairs and the context in which words are used. As long as the user continues to correct mistakes made by the filter, it is capable of keeping up with new tricks that spammers may adopt (such as including a passage of legitimate text or random words among the hard-sell). Such a trained system reduces my load of 600 junk emails a day to only 10 or so that I have to deal with, with very very few false positives.
Paul Graham has an analysis on anti-spam techniques and a list of available statistical filters at www.paulgraham.com/
antispam.html. Joe Kissel has an introduction to the Apple mail filter at www.tidbits.com/ tb-issues/TidBITS-731.html#lnk2
Open knowledge: Wikipedia- the free encyclopedia
Wikipedia is a fantastic resource for learning about almost anything and has become the first port of call for much of our web research. Wikipedia is an encyclopedia (and text book) with a difference. It embraces open-source concepts and encourages the sharing of knowledge. It currently contains more than 285,000 articles, all of which are contributed by the readers. If you have expertise on a certain subject - publish it instantly in Wikipedia. If you are reading an article by someone else that contains an error, or could be expressed better, edit it on the spot. The changes you make are incorporated into the article.
There is an excellent system for linking related articles and concepts. This sounds like anarchy. In practice, Wikipedia increasingly is a well balanced, accurate journal. Readers can identify all changes made to articles and by whom. The instant peer review process quickly weeds out errors and biases. In many fields, the clarity of explanation and depth of information is unsurpassed. In many other fields, Wikipedia is strangely silent. (There was no article on Caroline Chisholm, so my daughter Alex added one) The whole endeavour is an exciting work in process.
Wikipedia is non-commercial and advertising free. The content of Wikipedia is ‘copyleft’ That is to say, Wikipedia content can be copied, modified, and redistributed so long as the new version grants the same freedoms to others and acknowledges the authors of the Wikipedia article used (usually by a link).
The community that builds Wikipedia is very aware of criticisms directed at the project and deals with these comprehensively and honestly on the site. The ‘In The News’ section of the home page provides a background to breaking international stories and personalities. Wikipedia reflects many of the values that make the internet an exciting community and it is well worth a visit.
|
|
|
|