Information Retrieval Today: September 2007

Tuesday, September 11, 2007

Pre-processing : The First step in IR

Pre-processing

A user provides one or more words as input for IR systems. Given the huge number of words in a given language, it is possible to imagine that not all words are equally useful in answering a user query. In order to reduce the work during the retrieval process, these user provided words are subjected to a screening and refining activity. This process is called Preprocessing. This process consists of the following activities.

Tokenisation

Tokenisation identifies the different elements in the user query and separates then into tokens. Tokens are simplest, identifiable language elements such as a word, delimiter and so on. This is done because token such generated are user for matching with index terms in the data source.

Stop word removal

Stop words are those words in a query but expected to add little value for retrieval process. For instance in a query such as " who is the president of India?", the most important words are President and India. The other words such as who, is, the, of do not significantly help in retrieving process. Hence they are removed from the list of words sent further for retrieving. These words are called STOP WORDS. The process of eliminating them from search process is called " stop word removal".

Stemming.

Another category of words take different forms but essentially mean the same. Using all the forms of such terms in a retrieval process does not significantly improve search results. However their presence in retrieval process adds up to the work load. In order to reduce the work, all such words are reduced to the root format. For instance swimmer, swimming, swim are reduced to one word: swim. This process is called stemming

Monday, September 10, 2007

Measures of Information Retrieval Success

The main job Information retrieval system is to fetch a document from the data source that meets the requirement of user. But how do you ensure that the document systems fetches in response to a query is the most relevant one from the user perspective. So an IR systems needs to develop a yard stick to serve a user request and measure it success or failure.

Term frequency
Traditionally, the retrieved document content is compared with the keywords supplied by the user in the query. It is assumed that the frequency of word ( i.e keyword) in a document furnishes a useful measurement of word significance. For instance suppose a user provides the key words "Himalaya" and the most relevant document to be supplied to him is the one in which the term "Himalaya" occurs most times. This measure is called TERM FREQUENCY .

Inverse document frequency
The next question is the determining those keywords that occur in most of the documents. Since every word in a language can be used as a keyword one or the other user, it makes sense to classify keywords into different categories indicating their importance. A measure Inverse document frequency is to used to achieve this. It asserts that the value of keyword varies inversely with the log of the number of documents in which it occurs.

Saturday, September 8, 2007

Elements of Information Retreival

Information retrieval is a process of accessing relevant information from a given data source. However the item of information or unit of information the user might be interested may vary from user to user and time to time.

Document
Historically of course, information retrieval generally meant 'retrieving' of a relevant document. For example document might be a research paper or a patent or government gazette. However with the unprecedented growth in the volume and variety of data available in digital format is changing the nature "retrieval" documents. And even the definition of document is undergoing a drastic change.
Query
Information retrieval process is initiated by the user by launching a 'query' to an IR system. The query could consists of a word or a phrase. Capturing query and converting that into a format for searching within the data source is the first job of IR system and this activity itself is a major chunk of work in an IR system.

Query processing IR system checks the entire source for any document(s) that matches the query and returns all such matched documents back to user. Such a list of documents is called "Searched result"
Relevance
User expects the search results to be useful for him. The extent to which the search results matches the expectation of the user is called " Relevance". The success of IR system hinges on its ability to get more and more "relevant" document for every user.

Friday, September 7, 2007

What is Information Retrieval?

In broad terms, information retrieval refers to the act of retrieving of unstructured data. In practical terms information retrieval means retrieval of text documents from a repository such as library collection. The term IR was coined during the time when text documents were the only or primary mode of information storage. These included documents, books, papers to name few. For this reason most the literature available today is concerned with retrieval of documents although things are changing.

The nature and scope of IR domain has changed in recent times because the availability of wide variety of media in which data can be stored. With the invention of sophisticated computing devices that can allow user to create and store different media formats, and growing numbers of users having access to such devices. In addition with the increasing digitization of every kind of media such as photos and music, it is imperative for IR domain to develop newer approaches for searching non-textual items such as pictures.

Consequently, new special domain such as video retrieval, music retrieval and image retrievals systems are being developed and deployed everywhere.

Most importantly, the size , scope and nature of ever expanding web has posed new challenge for IR community. The uncontrolled and open nature of web and innumerably possible formats of data available on the web is challenging traditional IR approaches. This has led a new domain called Web information retrieval (WIR). The famous search engines are but examples of WIR

In summary, IR community has new but great challenges ahead to tackle. mobile messages, emails are also becoming part of a larger definition of data. An such a corpus of multimedia data is the next frontier waiting for IR community

Wednesday, September 5, 2007

The Role of Information Retrieval

Information is the most potent tool of digital age. Information has become the dynamic , life-giving element in every business. Without access to good information, 'other sources of production" remain resources and never get deployed in useful, gainful and productive manner. It is said that ' the way an enterprise uses it information resources differentiates it from its competitors. Information has become the structural element of a new age enterprise.

Information Retrieval is the process that unlocks the hidden power of an information systems. Thankfully, we are blessed with a range of information retrieval systems that meet the needs of everyone in an enterprise.

To begin with an individual information worker can use desktop search systems to organize contents on his or her computers. It is a everyday experience that we keep spending a lot of our productive time during a working day in 'identifying' and 'locating' information item we need to complete on hand. We had seen that 'item of information' somewhere. With the advent of desktop information retrieval systems, commonly called "desktop search engines", our problems are addressed well.

Sharing of information known to its employees among all its employees is bound to enhance the over all productivity of an enterprise. Such a sharing mechanisms avoid much of 'repeated' work. Enterprise search system come handy for companies in organizing an enterprise information systems usually stored on their intranents in manners that makes accessing and sharing of company information across its entire work force.

However, the most potential information retrieval tools at our disposal are search engines. They unlock the wealth of information available across the world.

In summary the information retrieval systems and tools help us exploring the unlimited information resources that surround us everywhere.

Why Information Retrieval?

We have today the information sources, knowledge, experience, tools and technologies needed for the successful practice of information retrieval. Information retrieval skills are essential for individual and institutional survival and are in great demand. This blog covers the entire spectrum of Information retrieval: underlying principles, successful practices, advanced aspects and contemporary tools that support and enhance information retrieval practice.

Information retrieval is a process of locating and extracting useful information item from within an information source . The source of information could be a personal desktop, corporate intranet, a digital library to which you or your institution subscribe.

However, the most important, rapidly growing and widely accessible information source to large population is the world wide web or simply the web.

Successful practice of information retrieval on the web is an essential ingredient for overall success of an individual and as well as for institutions. For instance, for a student web offers an enormous source of learning resources; for an entrepreneur it is a source of new business trends locater or emerging business ideas or even mine for finding new partner or even play field to test his new ideas or ventures. For an enterprise, information retrieval skills of its staff help in monitoring new business possibilities, current customers opinions and new emerging customers trends.

Even a common man must understand the power of information retrieval process and develop skills to explore and explore abundant information resources that surround him.

Every new age demands a new set of skills to be developed for survival and success. And event to set and stand apart from the rest of the crowd. For contemporary, the skill is Information Retrieval skills set

Information Retrieval Today

Tuesday, September 11, 2007

Pre-processing : The First step in IR

Monday, September 10, 2007

Measures of Information Retrieval Success

Saturday, September 8, 2007

Elements of Information Retreival

Friday, September 7, 2007

What is Information Retrieval?

Wednesday, September 5, 2007

The Role of Information Retrieval

Why Information Retrieval?

India Education Search Engine

Blog Archive

About Me

Information Retrieval Today

Tuesday, September 11, 2007

Pre-processing : The First step in IR

Monday, September 10, 2007

Measures of Information Retrieval Success

Saturday, September 8, 2007

Elements of Information Retreival

Friday, September 7, 2007

What is Information Retrieval?

Wednesday, September 5, 2007

The Role of Information Retrieval

Why Information Retrieval?

Subscribe To

India Education Search Engine

Blog Archive

About Me