Jump to content

User:AlekseyBot

From Wikipedia, the free encyclopedia
(Redirected from User talk:AlekseyBot)
Deactivated
This bot is no longer active on Wikipedia.

This bot reads Wikipedia to build collaborative filter models to aid link disambiguation. Right now, it should not be making any edits.

Algorithm description

[edit]

Collaborative filtering is a technique designed to predict unknown preferences of a user based on its previous preferences and the preferences of other users. Similar systems are used to predict whether a person will like a particular movie or product.

This concept can be applied to ambiguous links on Wikipedia. In this context, consider the links from a disambiguation page to be possible targets for an ambiguous link. To build a model, we look at all the articles that currently (and unambiguously) link to a target. We call these articles pages, which fill the same role that users would in the above examples. Next, we look at all the links present in each page. Each article linked to from a page we call an item, which fill the role of movies or products. A page linking to an item is considered to be a vote or preference for that item. We expect that pages that link to a specific target will have similar "preferences", meaning that they also link to a similar set of items. When presented with a new page that has an ambiguous link to one of our targets, we also expect that if the new page links to a substantially similar set items as other pages that link to a particular target, the new page would probably prefer that particular target as well.

Bot Description

[edit]

This bot implements a system like the one described above. Right now, initial testing has been conducted on the Mandarin disambiguation page and has given the results summarized here. Official bot status would be useful to speed up the time needed to build a model (which is transfer intense) and allow some formal trials to see if in the future the system could disambiguate some links automatically with an acceptably small error rate.