The main job Information retrieval system is to fetch a document from the data source that meets the requirement of user. But how do you ensure that the document systems fetches in response to a query is the most relevant one from the user perspective. So an IR systems needs to develop a yard stick to serve a user request and measure it success or failure.
Term frequency
Traditionally, the retrieved document content is compared with the keywords supplied by the user in the query. It is assumed that the frequency of word ( i.e keyword) in a document furnishes a useful measurement of word significance. For instance suppose a user provides the key words "Himalaya" and the most relevant document to be supplied to him is the one in which the term "Himalaya" occurs most times. This measure is called TERM FREQUENCY .
Inverse document frequency
The next question is the determining those keywords that occur in most of the documents. Since every word in a language can be used as a keyword one or the other user, it makes sense to classify keywords into different categories indicating their importance. A measure Inverse document frequency is to used to achieve this. It asserts that the value of keyword varies inversely with the log of the number of documents in which it occurs.
Monday, September 10, 2007
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment