Libraries @ Montana State University

mobile-friendly page | skip navigation

MSU home page MSU Academic Affairs MSU Administration MSU Admissions Ask a Librarian ask a librarian chat reference im reference email reference phone reference

Results :: Individual ETD

  • Title: Apriori approach to graph-based clustering of text documents

  • Creator: Hossain, Mahmud Shahriar

  • Description: This thesis report introduces a new technique of document clustering based on frequent senses. The developed system, named GDClust (Graph-Based Document Clustering) [1], works with frequent senses rather than dealing with frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and uses an Apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate accurate sense-based document clusters. We propose a novel multilevel Gaussian minimum support strategy for candidate subgraph generation. Additionally, we introduce another novel mechanism called Subgraph-Extension mining that reduces the number of candidates and overhead imposed by the traditional Apriori-based candidate generation mechanism. GDClust utilizes an English language thesaurus (WordNet [2]) to construct document-graphs and exploits graph-based data mining techniques for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.

  • Location: http://etd.lib.montana.edu/etd/2008/hossain/HossainM0508.pdf

  • Document Type: Masters

  • Contributor: Angryk, Rafal A. (committee chairperson)

  • Committee Members: John Paxton, Hunter Lloyd

  • Department: Computer Science

  • Program: Computer Science

  • Publisher: Montana State University

  • Date Created: 2008-05-15

  • Access Rights: Accessible under copyright for educational purposes.

print-friendly page | mobile-friendly page