PIM - personal information manager

What is personal Information Organizer (pim)?

Soon after trying to use the Web, every user encounters the same problem -- finding a way of archiving the locations of useful resources for future reference. This was recognized early in the design of Web software, so that all browser contain some mechanism for storing lists of interesting resources. Such lists are known as hotlists, or bookmarks. Originally such lists were simple flat files. More recently, browsers have supported hierarchical lists, such that items on similar topics can be grouped into a folder, a subfolder under a folder, and so on. This allows for increased flexibility in the storage of bookmarked entries.

The names or titles chosen for each of the entries is, by default, taken from the content of the TITLE element in the HTML document being archived. Thus to a large extent the cataloging of the entries is determined by the author of the document, and not by the archiver of the bookmark. Users can modify the titles associated with their bookmarks, but my observations show that this is only rarely done.

Folder titles must be selected by the user: in general, users choose folder titles that associate well with the folder content.

These methods of archiving work well, provided the lists do not grow too large or to stale in the user's mind. When the lists get very large (greater than 50 or so items), traditional retrieval problems start to occur -- the user knows that a URL was recorded, but cannot find it. In addition, the user will often add a bookmark for a resource that already exists in the bookmark collection, having forgotten where the original entry lay. Finally, the user may enter two bookmarks for the same collection, but referencing slightly different locations (e.g., one referencing the Table of Contents, the other the Introduction). To summarize, the possible problems are: categories:

  1. The user cannot remember the TITLE of the desired resource.
  2. Some archived objects, such as images, FTP, or mail URLs, do not have a TITLE. The default is to use the URL, which is not terribly informative.
  3. The user cannot remember under which folder the item was stored.
  4. The user thinks the item was stored under one folder, but in fact it is in another.
  5. The user has entered duplicate bookmark entries for the same resource, as he has forgotten about (or can't find), the earlier entry.
  6. The user has entered similar bookmark entries for the same resource -- for example, entries pointing to the Table of Contents, or Introduction, of the same collection.
  7. The link is no longer functional, because the original document has been deleted or moved.
  8. The link is no longer relevant, as the target resource has changed, and is no longer related to the original archived resource.

Finally, there is a semantic problem associated with the very idea of an hierarchical bookmark list. Many entries do not belong in a single place in the hierarchy, but rather in multiple locations. Thus it would be nice to find another way of storing that provides a better organizational model, along with a better interface for browsing or searching the bookmarks collection.

 

The above information is straightforwardly added to a bookmark database -- the hard part is the semantic structuring of the information. This must be done in a way that reflects the meaning associated to the bookmarked entry by the user, while the the interface by which entries are encoded into this index must be simple, as otherwise it will not be used.

What information is there to work with? We really have two things:

  1. The document text content
  2. User selection of some parameters

There are several ways these can be processed.

  1. -The Document Content -- Intelligent Software
    • Determines the structural type of the document -- (resource list, text-based material, mostly graphics, FORM interface to tool). This could be based on text content, as well as information in the document head (LINK and META elements).
    • Determines and extracts document keywords -- the s/w could look through the document and locate important keywords, and use these to index the content.
    • Correlates the text with pre-defined categories or keywords -- The index may have predefined categories and/or keywords for indexing purposes, and the software could test the document against these, and choose appropriate categories.
  2. -User Selection of Parameters
    • User selects arbitrary keywords and categories -- not very good, as the user is unlikely to do it, and the results are not well organized.
    • User selects keywords and categories from predetermined list -- easier to do, but the user must also be able to add categories when necessary.

 

TextIndexer helps to storage, categorized and sort web pages. It helps you storage search words and results of searching.

  • Link to Internet addresses within articles, including e-mail addresses and local files.
  • Hot links to any web site or even to local files.
  • Browses WWW using default or user defined browser.
  • Storing Internet search words sequence.