webminingabird39seyeview内容摘要:

ed Web Agents  Database Approaches:  Multilevel Databases  Web Query Systems Intelligent Search Agents  Locating documents and services on the Web:  WebCrawler, Alta Vista (m): scan millions of Web documents and create index of words (too many irrelevant, outdated responses)  MetaCrawler: mines robotcreated indices  Retrieve product information from a variety of vendor sites using only general information about the product domain:  ShopBot Intelligent Search Agents (Cont‟d)  Rely either on prespecified domain information about particular types of documents, or on hard coded models of the information sources to retrieve and interpret documents:  Harvest  FAQFinder  Information Manifold  OCCAM  Parasite  Learn models of various information sources and translates these into its own concept hierarchy:  ILA (Inter Learning Agent) Information Filtering/Categorization  Using various information retrieval techniques and characteristics of open hypertext Web documents to automatically retrieve, filter, and categorize them.  HyPursuit: uses semantic information embedded in link structures and document content to create cluster hierarchies of hypertext documents, and structure an information space  BO (Bookmark Organizer): bines hierarchical clustering techniques and user interaction to anize a collection of Web documents based on conceptual information Personalized Web Agents  This category of Web agents learn user preferences and discover Web information sources based on these preferences, and those of other individuals with similar interests (using collaborative filtering)  WebWatcher  PAINT  Syskillamp。 Webert  GroupLens  Firefly  others Multiple Layered Web Architecture Generalized Descriptions More Generalized Descriptions Layer0 Layer1 Layern ... Multilevel Databases  At the higher levels, meta data or generalizations are  extracted from lower levels  anized in structured collections, . relational or objectoriented database.  At the lowest level, semistructured information are  stored in various Web repositories, such as hypertext documents Multilevel Databases (Cont‟d)  (Han, et. al.):  use a multilayered database where each layer is obtained via generalization and transformation operations performed on the lower layers  (Kholsa, et. al.):  propose the creation and maintenance of metadatabases at each information providing domain and the use of a global schema for the metadatabase Multilevel Databases (Cont‟d)  (King, et. al.):  propose the incremental integration of a portion of the schema from each information source, rather than relying on a global heterogeneous database schema  The ARANEUS system:  extracts relevant information from hypertext documents and integrates these into higherlevel derived Web Hypertexts which are generalizations of the notion of database views MultiLayered Database (MLDB)  A multiple layered database model  based on semistructured data hypothesis  queried by NetQL using a syntax similar to the relational language SQL  Layer0:  An unstructured, massive, primitive, diverse global informationbase.  Layer1:  A relatively structured, descriptorlike, massive, distributed database by data analysis, transformation and generalization techniques.  Tools to be developed for descriptor extraction.  Higherlayers:  Further generalization to form progressively smaller, better structured, and less remote databases for efficient browsing, retrieval, and information discovery. Three major ponents in MLDB  S (a database schema):  outlines the overall database structure of the global MLDB  presents a route map for data and metadata (., schema) browsing  describes how the generalization is performed  H (a set of concept hierarchies):  provides a set of concept hierarchies which assist the system to generalize lower layer information to high layeres and map queries to appropriate concept layers for processing  D (a set of database relations):  the whole global information base at the primitive information level (., layer0)  the generalized database relations at the nonprimitive layers 2020년 11월 4일 Web Mining 48 The General architecture of WebLogMiner (a Global MLDB) Site 1 Site 2 Site 3 Generalized Data Concept Hierarchies Higher layers Resource Discovery (MLDB) Knowledge Discovery (WLM) Characteristic Rules Discriminant Rules Association Rules Techniques for Web usage mining  Construct multidimensional view on the Weblog database  Perform multidimensional OLAP analysis to find the top N users, top N accessed Web pages, most frequently accessed time periods, etc.  Perform data mining on Weblog records  Find association patterns, sequential patterns, and trends of Web accessing  May need additional information,., user browsing sequences of the Web pages in the Web server buffer  Conduct studies to  Analyze system performance, improve system design by Web caching, Web page prefetching, and Web page swapping Web Usage Mining Phases  Three distinctive phases: preprocessing, pattern discovery, and pattern analysis  Preprocessing process to convert the raw data into the data abstraction necessary for the further applying the data mining algorithm  Resources: serverside, clientside, proxy servers, or database.  Raw data: Web usage logs, Web page descriptions, Web site topology, user registries, and questionnaire.  Conversion: Content converting, Structure converting, Usage converting  User: The principal using a client to interactively retrieve and render resources or resource manifestations.  Page view: Visual rendering of a Web page in a。
阅读剩余 0%
本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体),仅供学习参考。 用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权,请联系我们反馈本站将在三个工作日内改正。