webminingabird39seyeview内容摘要:
ed Web Agents Database Approaches: Multilevel Databases Web Query Systems Intelligent Search Agents Locating documents and services on the Web: WebCrawler, Alta Vista (m): scan millions of Web documents and create index of words (too many irrelevant, outdated responses) MetaCrawler: mines robotcreated indices Retrieve product information from a variety of vendor sites using only general information about the product domain: ShopBot Intelligent Search Agents (Cont‟d) Rely either on prespecified domain information about particular types of documents, or on hard coded models of the information sources to retrieve and interpret documents: Harvest FAQFinder Information Manifold OCCAM Parasite Learn models of various information sources and translates these into its own concept hierarchy: ILA (Inter Learning Agent) Information Filtering/Categorization Using various information retrieval techniques and characteristics of open hypertext Web documents to automatically retrieve, filter, and categorize them. HyPursuit: uses semantic information embedded in link structures and document content to create cluster hierarchies of hypertext documents, and structure an information space BO (Bookmark Organizer): bines hierarchical clustering techniques and user interaction to anize a collection of Web documents based on conceptual information Personalized Web Agents This category of Web agents learn user preferences and discover Web information sources based on these preferences, and those of other individuals with similar interests (using collaborative filtering) WebWatcher PAINT Syskillamp。 Webert GroupLens Firefly others Multiple Layered Web Architecture Generalized Descriptions More Generalized Descriptions Layer0 Layer1 Layern ... Multilevel Databases At the higher levels, meta data or generalizations are extracted from lower levels anized in structured collections, . relational or objectoriented database. At the lowest level, semistructured information are stored in various Web repositories, such as hypertext documents Multilevel Databases (Cont‟d) (Han, et. al.): use a multilayered database where each layer is obtained via generalization and transformation operations performed on the lower layers (Kholsa, et. al.): propose the creation and maintenance of metadatabases at each information providing domain and the use of a global schema for the metadatabase Multilevel Databases (Cont‟d) (King, et. al.): propose the incremental integration of a portion of the schema from each information source, rather than relying on a global heterogeneous database schema The ARANEUS system: extracts relevant information from hypertext documents and integrates these into higherlevel derived Web Hypertexts which are generalizations of the notion of database views MultiLayered Database (MLDB) A multiple layered database model based on semistructured data hypothesis queried by NetQL using a syntax similar to the relational language SQL Layer0: An unstructured, massive, primitive, diverse global informationbase. Layer1: A relatively structured, descriptorlike, massive, distributed database by data analysis, transformation and generalization techniques. Tools to be developed for descriptor extraction. Higherlayers: Further generalization to form progressively smaller, better structured, and less remote databases for efficient browsing, retrieval, and information discovery. Three major ponents in MLDB S (a database schema): outlines the overall database structure of the global MLDB presents a route map for data and metadata (., schema) browsing describes how the generalization is performed H (a set of concept hierarchies): provides a set of concept hierarchies which assist the system to generalize lower layer information to high layeres and map queries to appropriate concept layers for processing D (a set of database relations): the whole global information base at the primitive information level (., layer0) the generalized database relations at the nonprimitive layers 2020년 11월 4일 Web Mining 48 The General architecture of WebLogMiner (a Global MLDB) Site 1 Site 2 Site 3 Generalized Data Concept Hierarchies Higher layers Resource Discovery (MLDB) Knowledge Discovery (WLM) Characteristic Rules Discriminant Rules Association Rules Techniques for Web usage mining Construct multidimensional view on the Weblog database Perform multidimensional OLAP analysis to find the top N users, top N accessed Web pages, most frequently accessed time periods, etc. Perform data mining on Weblog records Find association patterns, sequential patterns, and trends of Web accessing May need additional information,., user browsing sequences of the Web pages in the Web server buffer Conduct studies to Analyze system performance, improve system design by Web caching, Web page prefetching, and Web page swapping Web Usage Mining Phases Three distinctive phases: preprocessing, pattern discovery, and pattern analysis Preprocessing process to convert the raw data into the data abstraction necessary for the further applying the data mining algorithm Resources: serverside, clientside, proxy servers, or database. Raw data: Web usage logs, Web page descriptions, Web site topology, user registries, and questionnaire. Conversion: Content converting, Structure converting, Usage converting User: The principal using a client to interactively retrieve and render resources or resource manifestations. Page view: Visual rendering of a Web page in a。webminingabird39seyeview
阅读剩余 0%
本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体),仅供学习参考。
用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权,请联系我们反馈本站将在三个工作日内改正。