On the Effect of Location Uncertainty in Spatial Querying Java
RiMOM: A Dynamic Multistrategy Ontology Alignment Framework –java /dotnet
Similarity-Profiled Temporal Association Mining –java/dotnet
Ranking and Suggesting Popular Items -java /
Olex: Effective Rule Learning for Text Categorization – java /dotnet
Multirelational k-Anonymity --- dotnet/java
E-card
Electronic Billing
Online E- banking
Digital Image Forensics via Intrinsic Fingerprints-java
A Fast Search Algorithm for a Large Fuzzy Database –java/dotnet
Unseen Visible Watermarking: A Novel Methodology for Auxiliary Information Delivery via Visual Contents –java /dotnet
A Game Theoretical Framework on Intrusion Detection in Heterogeneous Networks –java
Spread-Spectrum Watermarking Security –java./dotnet
A Hypothesis Testing Approach to Semifragile Watermark-Based -Authentication –java
Robust Blind Watermarking of Point-Sampled Geometry
Spatial PrObabilistic Temporal (SPOT) databases
Role Engineering via Prioritized -java
Discovery of Structural and Functional Features in RNA Pseudoknots-java/dotnet
Predicting Missing Items in Shopping Carts –j2ee/dotnet
CS AND IT PROJECTS LIST
Effective Collaboration with Information Sharing in Virtual Universities
Abstract
A global education system, as a key area in future IT, has fostered developers to provide various learning systems with low cost. While a variety of e-learning advantages has been recognized for a long time and many advances in e-learning systems have been implemented, the needs for effective information sharing in a secure manner have to date been largely ignored, especially for virtual university collaborative environments. Information sharing of virtual universities usually occurs in broad, highly dynamic network-based environments, and formally accessing the resources in a secure manner poses a difficult and vital challenge. This paper aims to build a new rule-based framework to identify and address issues of sharing in virtual university environments through role-based access control (RBAC) management. The framework includes a role-based group delegation granting model, group delegation revocation model, authorization granting, and authorization revocation. We analyze various revocations and the impact of revocations on role hierarchies. The implementation with XML-based tools demonstrates the feasibility of the framework and authorization methods. Finally, the current proposal is compared with other related work.
A global education system, as a key area in future IT, has fostered developers to provide various learning systems with low cost. While a variety of e-learning advantages has been recognized for a long time and many advances in e-learning systems have been implemented, the needs for effective information sharing in a secure manner have to date been largely ignored, especially for virtual university collaborative environments. Information sharing of virtual universities usually occurs in broad, highly dynamic network-based environments, and formally accessing the resources in a secure manner poses a difficult and vital challenge. This paper aims to build a new rule-based framework to identify and address issues of sharing in virtual university environments through role-based access control (RBAC) management. The framework includes a role-based group delegation granting model, group delegation revocation model, authorization granting, and authorization revocation. We analyze various revocations and the impact of revocations on role hierarchies. The implementation with XML-based tools demonstrates the feasibility of the framework and authorization methods. Finally, the current proposal is compared with other related work.
NNexus: An Automatic Linker for Collaborative Web-Based Corporation
Abstract
In this paper, we introduce Noosphere Networked Entry eXtension and Unification System (NNexus), a generalization of the automatic linking engine of Noosphere (at PlanetMath.org) and the first system that automates the process of linking disparate "encyclopediardquo entries into a fully connected conceptual network. The main challenges of this problem space include: 1) linking quality (correctly identifying which terms to link and which entry to link to with minimal effort on the part of users), 2) efficiency and scalability, and 3) generalization to multiple knowledge bases and web-based information environment. We present the NNexus approach that utilizes subject classification and other metadata to address these challenges. We also present evaluation results demonstrating the effectiveness and efficiency of the approach and discuss ongoing and future directions of research.
In this paper, we introduce Noosphere Networked Entry eXtension and Unification System (NNexus), a generalization of the automatic linking engine of Noosphere (at PlanetMath.org) and the first system that automates the process of linking disparate "encyclopediardquo entries into a fully connected conceptual network. The main challenges of this problem space include: 1) linking quality (correctly identifying which terms to link and which entry to link to with minimal effort on the part of users), 2) efficiency and scalability, and 3) generalization to multiple knowledge bases and web-based information environment. We present the NNexus approach that utilizes subject classification and other metadata to address these challenges. We also present evaluation results demonstrating the effectiveness and efficiency of the approach and discuss ongoing and future directions of research.
Open Smart Classroom: Extensible and Scalable Learning System in Smart Space Using Web Service Technology
Abstract
Real-time interactive virtual classroom with teleeducation experience is an important approach in distance learning. However, most current systems fail to meet new challenges in extensibility and scalability, which mainly lie with three issues. First, an open system architecture is required to better support the integration of increasing human-computer interfaces and personal mobile devices in the classroom. Second, the learning system should facilitate opening its interfaces, which will help easy deployment that copes with different circumstances and allows other learning systems to talk to each other. Third, problems emerge on binding existing systems of classrooms together in different places or even different countries such as tackling systems intercommunication and distant intercultural learning in different languages. To address these issues, we build a prototype application called Open Smart Classroom built on our software infrastructure based on the multiagent system architecture using Web Service technology in Smart Space. Besides the evaluation of the extensibility and scalability of the system, an experiment connecting two Open Smart Classrooms deployed in different countries is also undertaken, which demonstrates the influence of these new features on the educational effect. Interesting and optimistic results obtained show a significant research prospect for developing future distant learning systems.
Real-time interactive virtual classroom with teleeducation experience is an important approach in distance learning. However, most current systems fail to meet new challenges in extensibility and scalability, which mainly lie with three issues. First, an open system architecture is required to better support the integration of increasing human-computer interfaces and personal mobile devices in the classroom. Second, the learning system should facilitate opening its interfaces, which will help easy deployment that copes with different circumstances and allows other learning systems to talk to each other. Third, problems emerge on binding existing systems of classrooms together in different places or even different countries such as tackling systems intercommunication and distant intercultural learning in different languages. To address these issues, we build a prototype application called Open Smart Classroom built on our software infrastructure based on the multiagent system architecture using Web Service technology in Smart Space. Besides the evaluation of the extensibility and scalability of the system, an experiment connecting two Open Smart Classrooms deployed in different countries is also undertaken, which demonstrates the influence of these new features on the educational effect. Interesting and optimistic results obtained show a significant research prospect for developing future distant learning systems.
Toward a Fuzzy Domain Ontology Extraction Method for Adaptive e-Learning
Abstract
With the widespread applications of electronic learning (e-Learning) technologies to education at all levels, increasing number of online educational resources and messages are generated from the corresponding e-Learning environments. Nevertheless, it is quite difficult, if not totally impossible, for instructors to read through and analyze the online messages to predict the progress of their students on the fly. The main contribution of this paper is the illustration of a novel concept map generation mechanism which is underpinned by a fuzzy domain ontology extraction algorithm. The proposed mechanism can automatically construct concept maps based on the messages posted to online discussion forums. By browsing the concept maps, instructors can quickly identify the progress of their students and adjust the pedagogical sequence on the fly. Our initial experimental results reveal that the accuracy and the quality of the automatically generated concept maps are promising. Our research work opens the door to the development and application of intelligent software tools to enhance e-Learning
With the widespread applications of electronic learning (e-Learning) technologies to education at all levels, increasing number of online educational resources and messages are generated from the corresponding e-Learning environments. Nevertheless, it is quite difficult, if not totally impossible, for instructors to read through and analyze the online messages to predict the progress of their students on the fly. The main contribution of this paper is the illustration of a novel concept map generation mechanism which is underpinned by a fuzzy domain ontology extraction algorithm. The proposed mechanism can automatically construct concept maps based on the messages posted to online discussion forums. By browsing the concept maps, instructors can quickly identify the progress of their students and adjust the pedagogical sequence on the fly. Our initial experimental results reveal that the accuracy and the quality of the automatically generated concept maps are promising. Our research work opens the door to the development and application of intelligent software tools to enhance e-Learning
Communities and Emerging Semantics in Semantic Link Network: Discovery and Learning
Abstract
The World Wide Web provides plentiful contents for Web-based learning, but its hyperlink-based architecture connects Web resources for browsing freely rather than for effective learning. To support effective learning, an e-learning system should be able to discover and make use of the semantic communities and the emerging semantic relations in a dynamic complex network of learning resources. Previous graph-based community discovery approaches are limited in ability to discover semantic communities. This paper first suggests the semantic link network (SLN), a loosely coupled semantic data model that can semantically link resources and derive out implicit semantic links according to a set of relational reasoning rules. By studying the intrinsic relationship between semantic communities and the semantic space of SLN, approaches to discovering reasoning-constraint, rule-constraint, and classification-constraint semantic communities are proposed. Further, the approaches, principles, and strategies for discovering emerging semantics in dynamic SLNs are studied. The basic laws of the semantic link network motion are revealed for the first time. An e-learning environment incorporating the proposed approaches, principles, and strategies to support effective discovery and learning is suggested.
The World Wide Web provides plentiful contents for Web-based learning, but its hyperlink-based architecture connects Web resources for browsing freely rather than for effective learning. To support effective learning, an e-learning system should be able to discover and make use of the semantic communities and the emerging semantic relations in a dynamic complex network of learning resources. Previous graph-based community discovery approaches are limited in ability to discover semantic communities. This paper first suggests the semantic link network (SLN), a loosely coupled semantic data model that can semantically link resources and derive out implicit semantic links according to a set of relational reasoning rules. By studying the intrinsic relationship between semantic communities and the semantic space of SLN, approaches to discovering reasoning-constraint, rule-constraint, and classification-constraint semantic communities are proposed. Further, the approaches, principles, and strategies for discovering emerging semantics in dynamic SLNs are studied. The basic laws of the semantic link network motion are revealed for the first time. An e-learning environment incorporating the proposed approaches, principles, and strategies to support effective discovery and learning is suggested.
Monitoring Online Tests through Data Visualization –j2ee/dotnet
Abstract
We present an approach and a system to let tutors monitor several important aspects related to online tests, such as learner behavior and test quality. The approach includes the logging of important data related to learner interaction with the system during the execution of online tests and exploits data visualization to highlight information useful to let tutors review and improve the whole assessment process. We have focused on the discovery of behavioral patterns of learners and conceptual relationships among test items. Furthermore, we have led several experiments in our faculty in order to assess the whole approach. In particular, by analyzing the data visualization charts, we have detected several previously unknown test strategies used by the learners. Last, we have detected several correlations among questions, which gave us useful feedbacks on the test quality.
We present an approach and a system to let tutors monitor several important aspects related to online tests, such as learner behavior and test quality. The approach includes the logging of important data related to learner interaction with the system during the execution of online tests and exploits data visualization to highlight information useful to let tutors review and improve the whole assessment process. We have focused on the discovery of behavioral patterns of learners and conceptual relationships among test items. Furthermore, we have led several experiments in our faculty in order to assess the whole approach. In particular, by analyzing the data visualization charts, we have detected several previously unknown test strategies used by the learners. Last, we have detected several correlations among questions, which gave us useful feedbacks on the test quality.
Clustering and Sequential Pattern Mining of Online Collaborative Learning Data –j2ee /dotnet
Abstract
Group work is widespread in education. The growing use of online tools supporting group work generates huge amounts of data. We aim to exploit this data to support mirroring: presenting useful high-level views of information about the group, together with desired patterns characterizing the behavior of strong groups. The goal is to enable the groups and their facilitators to see relevant aspects of the group's operation and provide feedback if these are more likely to be associated with positive or negative outcomes and indicate where the problems are. We explore how useful mirror information can be extracted via a theory-driven approach and a range of clustering and sequential pattern mining. The context is a senior software development project where students use the collaboration tool TRAC. We extract patterns distinguishing the better from the weaker groups and get insights in the success factors. The results point to the importance of leadership and group interaction, and give promising indications if they are occurring. Patterns indicating good individual practices were also identified. We found that some key measures can be mined from early data. The results are promising for advising groups at the start and early identification of effective and poor practices, in time for remediation.
Group work is widespread in education. The growing use of online tools supporting group work generates huge amounts of data. We aim to exploit this data to support mirroring: presenting useful high-level views of information about the group, together with desired patterns characterizing the behavior of strong groups. The goal is to enable the groups and their facilitators to see relevant aspects of the group's operation and provide feedback if these are more likely to be associated with positive or negative outcomes and indicate where the problems are. We explore how useful mirror information can be extracted via a theory-driven approach and a range of clustering and sequential pattern mining. The context is a senior software development project where students use the collaboration tool TRAC. We extract patterns distinguishing the better from the weaker groups and get insights in the success factors. The results point to the importance of leadership and group interaction, and give promising indications if they are occurring. Patterns indicating good individual practices were also identified. We found that some key measures can be mined from early data. The results are promising for advising groups at the start and early identification of effective and poor practices, in time for remediation.
ANGEL: Enhancing the Utility of Generalization for Privacy Preserving Publication –java/dotnet
Abstract
Generalization is a well-known method for privacy preserving data publication. Despite its vast popularity, it has several drawbacks such as heavy information loss, difficulty of supporting marginal publication, and so on. To overcome these drawbacks, we develop ANGEL,1 a new anonymization technique that is as effective as generalization in privacy protection, but is able to retain significantly more information in the microdata. ANGEL is applicable to any monotonic principles (e.g., l-diversity, t-closeness, etc.), with its superiority (in correlation preservation) especially obvious when tight privacy control must be enforced. We show that ANGEL lends itself elegantly to the hard problem of marginal publication. In particular, unlike generalization that can release only restricted marginals, our technique can be easily used to publish any marginals with strong privacy guarantees.
Generalization is a well-known method for privacy preserving data publication. Despite its vast popularity, it has several drawbacks such as heavy information loss, difficulty of supporting marginal publication, and so on. To overcome these drawbacks, we develop ANGEL,1 a new anonymization technique that is as effective as generalization in privacy protection, but is able to retain significantly more information in the microdata. ANGEL is applicable to any monotonic principles (e.g., l-diversity, t-closeness, etc.), with its superiority (in correlation preservation) especially obvious when tight privacy control must be enforced. We show that ANGEL lends itself elegantly to the hard problem of marginal publication. In particular, unlike generalization that can release only restricted marginals, our technique can be easily used to publish any marginals with strong privacy guarantees.
Rough Cluster Quality Index Based on Decision Theory-java/dotnet
Abstract
Quality of clustering is an important issue in application of clustering techniques. Most traditional cluster validity indices are geometry-based cluster quality measures. This paper proposes a cluster validity index based on the decision-theoretic rough set model by considering various loss functions. Experiments with synthetic, standard, and real-world retail data show the usefulness of the proposed validity index for the evaluation of rough and crisp clustering. The measure is shown to help determine optimal number of clusters, as well as an important parameter called threshold in rough clustering. The experiments with a promotional campaign for the retail data illustrate the ability of the proposed measure to incorporate financial considerations in evaluating quality of a clustering scheme. This ability to deal with monetary values distinguishes the proposed decision-theoretic measure from other distance-based measures. The proposed validity index can also be extended for evaluating other clustering algorithms such as fuzzy clustering.
Quality of clustering is an important issue in application of clustering techniques. Most traditional cluster validity indices are geometry-based cluster quality measures. This paper proposes a cluster validity index based on the decision-theoretic rough set model by considering various loss functions. Experiments with synthetic, standard, and real-world retail data show the usefulness of the proposed validity index for the evaluation of rough and crisp clustering. The measure is shown to help determine optimal number of clusters, as well as an important parameter called threshold in rough clustering. The experiments with a promotional campaign for the retail data illustrate the ability of the proposed measure to incorporate financial considerations in evaluating quality of a clustering scheme. This ability to deal with monetary values distinguishes the proposed decision-theoretic measure from other distance-based measures. The proposed validity index can also be extended for evaluating other clustering algorithms such as fuzzy clustering.
Predictive Ensemble Pruning by Expectation Propagation –java
Abstract
An ensemble is a group of learners that work together as a committee to solve a problem. The existing ensemble learning algorithms often generate unnecessarily large ensembles, which consume extra computational resource and may degrade the generalization performance. Ensemble pruning algorithms aim to find a good subset of ensemble members to constitute a small ensemble, which saves the computational resource and performs as well as, or better than, the unpruned ensemble. This paper introduces a probabilistic ensemble pruning algorithm by choosing a set of ldquosparserdquo combination weights, most of which are zeros, to prune the ensemble. In order to obtain the set of sparse combination weights and satisfy the nonnegative constraint of the combination weights, a left-truncated, nonnegative, Gaussian prior is adopted over every combination weight. Expectation propagation (EP) algorithm is employed to approximate the posterior estimation of the weight vector. The leave-one-out (LOO) error can be obtained as a by-product in the training of EP without extra computation and is a good indication for the generalization error. Therefore, the LOO error is used together with the Bayesian evidence for model selection in this algorithm. An empirical study on several regression and classification benchmark data sets shows that our algorithm utilizes far less component learners but performs as well as, or better than, the unpruned ensemble. Our results are very competitive compared with other ensemble pruning algorithms.
An ensemble is a group of learners that work together as a committee to solve a problem. The existing ensemble learning algorithms often generate unnecessarily large ensembles, which consume extra computational resource and may degrade the generalization performance. Ensemble pruning algorithms aim to find a good subset of ensemble members to constitute a small ensemble, which saves the computational resource and performs as well as, or better than, the unpruned ensemble. This paper introduces a probabilistic ensemble pruning algorithm by choosing a set of ldquosparserdquo combination weights, most of which are zeros, to prune the ensemble. In order to obtain the set of sparse combination weights and satisfy the nonnegative constraint of the combination weights, a left-truncated, nonnegative, Gaussian prior is adopted over every combination weight. Expectation propagation (EP) algorithm is employed to approximate the posterior estimation of the weight vector. The leave-one-out (LOO) error can be obtained as a by-product in the training of EP without extra computation and is a good indication for the generalization error. Therefore, the LOO error is used together with the Bayesian evidence for model selection in this algorithm. An empirical study on several regression and classification benchmark data sets shows that our algorithm utilizes far less component learners but performs as well as, or better than, the unpruned ensemble. Our results are very competitive compared with other ensemble pruning algorithms.
Discovery of Structural and Functional Features in RNA Pseudoknots-java/dotnet
Abstract
An RNA pseudoknot consists of nonnested double-stranded stems connected by single-stranded loops. There is increasing recognition that RNA pseudoknots are one of the most prevalent RNA structures and fulfill a diverse set of biological roles within cells, and there is an expanding rate of studies into RNA pseudoknotted structures as well as increasing allocation of function. These not only produce valuable structural data but also facilitate an understanding of structural and functional characteristics in RNA molecules. PseudoBase is a database providing structural, functional, and sequence data related to RNA pseudoknots. To capture the features of RNA pseudoknots, we present a novel framework using quantitative association rule mining to analyze the pseudoknot data. The derived rules are classified into specified association groups regarding structure, function, and category of RNA pseudoknots. The discovered association rules assist biologists in filtering out significant knowledge of structure-function and structure-category relationships. A brief biological interpretation to the relationships is presented, and their potential correlations with each other are highlighted.
An RNA pseudoknot consists of nonnested double-stranded stems connected by single-stranded loops. There is increasing recognition that RNA pseudoknots are one of the most prevalent RNA structures and fulfill a diverse set of biological roles within cells, and there is an expanding rate of studies into RNA pseudoknotted structures as well as increasing allocation of function. These not only produce valuable structural data but also facilitate an understanding of structural and functional characteristics in RNA molecules. PseudoBase is a database providing structural, functional, and sequence data related to RNA pseudoknots. To capture the features of RNA pseudoknots, we present a novel framework using quantitative association rule mining to analyze the pseudoknot data. The derived rules are classified into specified association groups regarding structure, function, and category of RNA pseudoknots. The discovered association rules assist biologists in filtering out significant knowledge of structure-function and structure-category relationships. A brief biological interpretation to the relationships is presented, and their potential correlations with each other are highlighted.
Discovery of Structural and Functional Features in RNA Pseudoknots-java/dotnet
Abstract
An RNA pseudoknot consists of nonnested double-stranded stems connected by single-stranded loops. There is increasing recognition that RNA pseudoknots are one of the most prevalent RNA structures and fulfill a diverse set of biological roles within cells, and there is an expanding rate of studies into RNA pseudoknotted structures as well as increasing allocation of function. These not only produce valuable structural data but also facilitate an understanding of structural and functional characteristics in RNA molecules. PseudoBase is a database providing structural, functional, and sequence data related to RNA pseudoknots. To capture the features of RNA pseudoknots, we present a novel framework using quantitative association rule mining to analyze the pseudoknot data. The derived rules are classified into specified association groups regarding structure, function, and category of RNA pseudoknots. The discovered association rules assist biologists in filtering out significant knowledge of structure-function and structure-category relationships. A brief biological interpretation to the relationships is presented, and their potential correlations with each other are highlighted.
An RNA pseudoknot consists of nonnested double-stranded stems connected by single-stranded loops. There is increasing recognition that RNA pseudoknots are one of the most prevalent RNA structures and fulfill a diverse set of biological roles within cells, and there is an expanding rate of studies into RNA pseudoknotted structures as well as increasing allocation of function. These not only produce valuable structural data but also facilitate an understanding of structural and functional characteristics in RNA molecules. PseudoBase is a database providing structural, functional, and sequence data related to RNA pseudoknots. To capture the features of RNA pseudoknots, we present a novel framework using quantitative association rule mining to analyze the pseudoknot data. The derived rules are classified into specified association groups regarding structure, function, and category of RNA pseudoknots. The discovered association rules assist biologists in filtering out significant knowledge of structure-function and structure-category relationships. A brief biological interpretation to the relationships is presented, and their potential correlations with each other are highlighted.
Predicting Missing Items in Shopping Carts –j2ee/dotnet
Abstract
Existing research in association mining has focused mainly on how to expedite the search for frequently co-occurring groups of items in ldquoshopping cartrdquo type of transactions; less attention has been paid to methods that exploit these ldquofrequent itemsetsrdquo for prediction purposes. This paper contributes to the latter task by proposing a technique that uses partial information about the contents of a shopping cart for the prediction of what else the customer is likely to buy. Using the recently proposed data structure of itemset trees (IT-trees), we obtain, in a computationally efficient manner, all rules whose antecedents contain at least one item from the incomplete shopping cart. Then, we combine these rules by uncertainty processing techniques, including the classical Bayesian decision theory and a new algorithm based on the Dempster-Shafer (DS) theory of evidence combination
Existing research in association mining has focused mainly on how to expedite the search for frequently co-occurring groups of items in ldquoshopping cartrdquo type of transactions; less attention has been paid to methods that exploit these ldquofrequent itemsetsrdquo for prediction purposes. This paper contributes to the latter task by proposing a technique that uses partial information about the contents of a shopping cart for the prediction of what else the customer is likely to buy. Using the recently proposed data structure of itemset trees (IT-trees), we obtain, in a computationally efficient manner, all rules whose antecedents contain at least one item from the incomplete shopping cart. Then, we combine these rules by uncertainty processing techniques, including the classical Bayesian decision theory and a new algorithm based on the Dempster-Shafer (DS) theory of evidence combination
A Communication Perspective on Automatic Text Categorization –java/dotnet
Abstract
The basic concern of a Communication System is to transfer information from its source to a destination some distance away. Textual documents also deal with the transmission of information. Particularly, from a text categorization system point of view, the information encoded by a document is the topic or category it belongs to. Following this initial intuition, a theoretical framework is developed where Automatic Text Categorization(ATC) is studied under a Communication System perspective. Under this approach, the problematic indexing feature space dimensionality reduction has been tackled by a two-level supervised scheme, implemented by a noisy terms filtering and a subsequent redundant terms compression. Gaussian probabilistic categorizers have been revisited and adapted to the concomitance of sparsity in ATC. Experimental results pertaining to 20 Newsgroups and Reuters-21578 collections validate the theoretical approaches. The noise filter and redundancy compressor allows an aggressive term vocabulary reduction (reduction factor greater than 0.99) with a minimum loss (lower than 3 percent) and, in some cases, gain (greater than 4 percent) of final classification accuracy. The adapted Gaussian Naive Bayes classifier reaches classification results similar to those obtained by state-of-the-art Multinomial Naive Bayes (MNB) and Support Vector Machines (SVMs).
The basic concern of a Communication System is to transfer information from its source to a destination some distance away. Textual documents also deal with the transmission of information. Particularly, from a text categorization system point of view, the information encoded by a document is the topic or category it belongs to. Following this initial intuition, a theoretical framework is developed where Automatic Text Categorization(ATC) is studied under a Communication System perspective. Under this approach, the problematic indexing feature space dimensionality reduction has been tackled by a two-level supervised scheme, implemented by a noisy terms filtering and a subsequent redundant terms compression. Gaussian probabilistic categorizers have been revisited and adapted to the concomitance of sparsity in ATC. Experimental results pertaining to 20 Newsgroups and Reuters-21578 collections validate the theoretical approaches. The noise filter and redundancy compressor allows an aggressive term vocabulary reduction (reduction factor greater than 0.99) with a minimum loss (lower than 3 percent) and, in some cases, gain (greater than 4 percent) of final classification accuracy. The adapted Gaussian Naive Bayes classifier reaches classification results similar to those obtained by state-of-the-art Multinomial Naive Bayes (MNB) and Support Vector Machines (SVMs).
Efficient Skyline Computation in Structured Peer-to-Peer Systems –java
Abstract
An increasing number of large-scale applications exploit peer-to-peer network architecture to provide highly scalable and flexible services. Among these applications, data management in peer-to-peer systems is one of the interesting domains. In this paper, we investigate the multidimensional skyline computation problem on a structured peer-to-peer network. In order to achieve low communication cost and quick response time, we utilize the iMinMax(theta ) method to transform high-dimensional data to one-dimensional value and distribute the data in a structured peer-to-peer network called BATON. Thereafter, we propose a progressive algorithm with adaptive filter technique for efficient skyline computation in this environment. We further discuss some optimization techniques for the algorithm, and summarize the key principles of our algorithm into a query routing protocol with detailed analysis. Finally, we conduct an extensive experimental evaluation to demonstrate the efficiency of our approach
An increasing number of large-scale applications exploit peer-to-peer network architecture to provide highly scalable and flexible services. Among these applications, data management in peer-to-peer systems is one of the interesting domains. In this paper, we investigate the multidimensional skyline computation problem on a structured peer-to-peer network. In order to achieve low communication cost and quick response time, we utilize the iMinMax(theta ) method to transform high-dimensional data to one-dimensional value and distribute the data in a structured peer-to-peer network called BATON. Thereafter, we propose a progressive algorithm with adaptive filter technique for efficient skyline computation in this environment. We further discuss some optimization techniques for the algorithm, and summarize the key principles of our algorithm into a query routing protocol with detailed analysis. Finally, we conduct an extensive experimental evaluation to demonstrate the efficiency of our approach
Determining Attributes to Maximize Visibility of Objects –java
Abstract
In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: How to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. We develop several formulations of this problem. Although the problems are NP-complete, we give several exact and approximation algorithms that work well in practice. One type of exact algorithms is based on integer programming (IP) formulations of the problems. Another class of exact methods is based on maximal frequent item set mining algorithms. The approximation algorithms are based on greedy heuristics. A detailed performance study illustrates the benefits of our methods on real and synthetic data
In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: How to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. We develop several formulations of this problem. Although the problems are NP-complete, we give several exact and approximation algorithms that work well in practice. One type of exact algorithms is based on integer programming (IP) formulations of the problems. Another class of exact methods is based on maximal frequent item set mining algorithms. The approximation algorithms are based on greedy heuristics. A detailed performance study illustrates the benefits of our methods on real and synthetic data
A Divide-and-Conquer Approach for Minimum Spanning Tree-Based Clustering –java
Abstract
Due to their ability to detect clusters with irregular boundaries, minimum spanning tree-based clustering algorithms have been widely used in practice. However, in such clustering algorithms, the search for nearest neighbor in the construction of minimum spanning trees is the main source of computation and the standard solutions take O(N2) time. In this paper, we present a fast minimum spanning tree-inspired clustering algorithm, which, by using an efficient implementation of the cut and the cycle property of the minimum spanning trees, can have much better performance than O(N2).
Due to their ability to detect clusters with irregular boundaries, minimum spanning tree-based clustering algorithms have been widely used in practice. However, in such clustering algorithms, the search for nearest neighbor in the construction of minimum spanning trees is the main source of computation and the standard solutions take O(N2) time. In this paper, we present a fast minimum spanning tree-inspired clustering algorithm, which, by using an efficient implementation of the cut and the cycle property of the minimum spanning trees, can have much better performance than O(N2).
Evaluating the Effectiveness of Personalized Web Search –java/dotnet
Abstract
Although personalized search has been under way for many years and many personalization algorithms have been investigated, it is still unclear whether personalization is consistently effective on different queries for different users and under different search contexts. In this paper, we study this problem and provide some findings. We present a large-scale evaluation framework for personalized search based on query logs and then evaluate five personalized search algorithms (including two click-based ones and three topical-interest-based ones) using 12-day query logs of Windows Live Search. By analyzing the results, we reveal that personalized Web search does not work equally well under various situations. It represents a significant improvement over generic Web search for some queries, while it has little effect and even harms query performance under some situations. We propose click entropy as a simple measurement on whether a query should be personalized. We further propose several features to automatically predict when a query will benefit from a specific personalization algorithm. Experimental results show that using a personalization algorithm for queries selected by our prediction model is better than using it simply for all queries
Although personalized search has been under way for many years and many personalization algorithms have been investigated, it is still unclear whether personalization is consistently effective on different queries for different users and under different search contexts. In this paper, we study this problem and provide some findings. We present a large-scale evaluation framework for personalized search based on query logs and then evaluate five personalized search algorithms (including two click-based ones and three topical-interest-based ones) using 12-day query logs of Windows Live Search. By analyzing the results, we reveal that personalized Web search does not work equally well under various situations. It represents a significant improvement over generic Web search for some queries, while it has little effect and even harms query performance under some situations. We propose click entropy as a simple measurement on whether a query should be personalized. We further propose several features to automatically predict when a query will benefit from a specific personalization algorithm. Experimental results show that using a personalization algorithm for queries selected by our prediction model is better than using it simply for all queries
Optimal-Location-Selection Query Processing in Spatial Databases –java
Abstract
This paper introduces and solves a novel type of spatial queries, namely, Optimal-Location-Selection (OLS) search, which has many applications in real life. Given a data object set D_A, a target object set D_B, a spatial region R, and a critical distance d_c in a multidimensional space, an OLS query retrieves those target objects in D_B that are outside R but have maximal optimality. Here, the optimality of a target object b in D_B located outside R is defined as the number of the data objects from D_A that are inside R and meanwhile have their distances to b not exceeding d_c. When there is a tie, the accumulated distance from the data objects to b serves as the tie breaker, and the one with smaller distance has the better optimality. In this paper, we present the optimality metric, formalize the OLS query, and propose several algorithms for processing OLS queries efficiently. A comprehensive experimental evaluation has been conducted using both real and synthetic data sets to demonstrate the efficiency and effectiveness of the proposed algorithms.
This paper introduces and solves a novel type of spatial queries, namely, Optimal-Location-Selection (OLS) search, which has many applications in real life. Given a data object set D_A, a target object set D_B, a spatial region R, and a critical distance d_c in a multidimensional space, an OLS query retrieves those target objects in D_B that are outside R but have maximal optimality. Here, the optimality of a target object b in D_B located outside R is defined as the number of the data objects from D_A that are inside R and meanwhile have their distances to b not exceeding d_c. When there is a tie, the accumulated distance from the data objects to b serves as the tie breaker, and the one with smaller distance has the better optimality. In this paper, we present the optimality metric, formalize the OLS query, and propose several algorithms for processing OLS queries efficiently. A comprehensive experimental evaluation has been conducted using both real and synthetic data sets to demonstrate the efficiency and effectiveness of the proposed algorithms.
Lanczos Vectors versus Singular Vectors for Effective Dimension Reduction –java/dotnet
Abstract
This paper takes an in-depth look at a technique for computing filtered matrix-vector (mat-vec) products which are required in many data analysis applications. In these applications, the data matrix is multiplied by a vector and we wish to perform this product accurately in the space spanned by a few of the major singular vectors of the matrix. We examine the use of the Lanczos algorithm for this purpose. The goal of the method is identical with that of the truncated singular value decomposition (SVD), namely to preserve the quality of the resulting mat-vec product in the major singular directions of the matrix. The Lanczos-based approach achieves this goal by using a small number of Lanczos vectors, but it does not explicitly compute singular values/vectors of the matrix. The main advantage of the Lanczos-based technique is its low cost when compared with that of the truncated SVD. This advantage comes without sacrificing accuracy. The effectiveness of this approach is demonstrated on a few sample applications requiring dimension reduction, including information retrieval and face recognition. The proposed technique can be applied as a replacement to the truncated SVD technique whenever the problem can be formulated as a filtered mat-vec multiplication.
This paper takes an in-depth look at a technique for computing filtered matrix-vector (mat-vec) products which are required in many data analysis applications. In these applications, the data matrix is multiplied by a vector and we wish to perform this product accurately in the space spanned by a few of the major singular vectors of the matrix. We examine the use of the Lanczos algorithm for this purpose. The goal of the method is identical with that of the truncated singular value decomposition (SVD), namely to preserve the quality of the resulting mat-vec product in the major singular directions of the matrix. The Lanczos-based approach achieves this goal by using a small number of Lanczos vectors, but it does not explicitly compute singular values/vectors of the matrix. The main advantage of the Lanczos-based technique is its low cost when compared with that of the truncated SVD. This advantage comes without sacrificing accuracy. The effectiveness of this approach is demonstrated on a few sample applications requiring dimension reduction, including information retrieval and face recognition. The proposed technique can be applied as a replacement to the truncated SVD technique whenever the problem can be formulated as a filtered mat-vec multiplication.
Low-Complexity Iris Coding and Recognition Based on Directionlets –java
Abstract
A novel iris recognition method is presented. In the method, the iris features are extracted using the oriented separable wavelet transforms (directionlets) and they are compared in terms of a weighted Hamming distance. The feature extraction and comparison are shift-, size- and rotation-invariant to the location of iris in the acquired image. The generated iris code is binary, whose length is fixed (and therefore commensurable), independent of the iris image, and comparatively short. The novel method shows a good performance when applied to a large database of irises and provides reliable identification and verification. At the same time, it preserves conceptual and computational simplicity and allows for a quick analysis and comparison of iris samples
A novel iris recognition method is presented. In the method, the iris features are extracted using the oriented separable wavelet transforms (directionlets) and they are compared in terms of a weighted Hamming distance. The feature extraction and comparison are shift-, size- and rotation-invariant to the location of iris in the acquired image. The generated iris code is binary, whose length is fixed (and therefore commensurable), independent of the iris image, and comparatively short. The novel method shows a good performance when applied to a large database of irises and provides reliable identification and verification. At the same time, it preserves conceptual and computational simplicity and allows for a quick analysis and comparison of iris samples
Watermarking Robustness Evaluation Based on Perceptual Quality via Genetic Algorithms
Abstract
This paper presents a novel and flexible benchmarking tool based on genetic algorithms (GA) and designed to assess the robustness of any digital image watermarking system. The main idea is to evaluate robustness in terms of perceptual quality, measured by weighted peak signal-to-noise ratio. Through a stochastic approach, we optimize this quality metric, by finding the minimal degradation that needs to be introduced in a marked image in order to remove the embedded watermark. Given a set of attacks, chosen according to the considered application scenario, GA support the optimization of the parameters to be assigned to each processing operation, in order to obtain an unmarked image with perceptual quality as high as possible. Extensive experimental results demonstrate the effectiveness of the proposed evaluation tool
This paper presents a novel and flexible benchmarking tool based on genetic algorithms (GA) and designed to assess the robustness of any digital image watermarking system. The main idea is to evaluate robustness in terms of perceptual quality, measured by weighted peak signal-to-noise ratio. Through a stochastic approach, we optimize this quality metric, by finding the minimal degradation that needs to be introduced in a marked image in order to remove the embedded watermark. Given a set of attacks, chosen according to the considered application scenario, GA support the optimization of the parameters to be assigned to each processing operation, in order to obtain an unmarked image with perceptual quality as high as possible. Extensive experimental results demonstrate the effectiveness of the proposed evaluation tool
Digital Image Forensics via Intrinsic Fingerprints-java
Abstract
Digital imaging has experienced tremendous growth in recent decades, and digital camera images have been used in a growing number of applications. With such increasing popularity and the availability of low-cost image editing software, the integrity of digital image content can no longer be taken for granted. This paper introduces a new methodology for the forensic analysis of digital camera images. The proposed method is based on the observation that many processing operations, both inside and outside acquisition devices, leave distinct intrinsic traces on digital images, and these intrinsic fingerprints can be identified and employed to verify the integrity of digital data. The intrinsic fingerprints of the various in-camera processing operations can be estimated through a detailed imaging model and its component analysis. Further processing applied to the camera captured image is modelled as a manipulation filter, for which a blind deconvolution technique is applied to obtain a linear time-invariant approximation and to estimate the intrinsic fingerprints associated with these postcamera operations. The absence of camera-imposed fingerprints from a test image indicates that the test image is not a camera output and is possibly generated by other image production processes. Any change or inconsistencies among the estimated camera-imposed fingerprints, or the presence of new types of fingerprints suggest that the image has undergone some kind of processing after the initial capture, such as tampering or steganographic embedding. Through analysis and extensive experimental studies, this paper demonstrates the effectiveness of the proposed framework for nonintrusive digital image forensics
Digital imaging has experienced tremendous growth in recent decades, and digital camera images have been used in a growing number of applications. With such increasing popularity and the availability of low-cost image editing software, the integrity of digital image content can no longer be taken for granted. This paper introduces a new methodology for the forensic analysis of digital camera images. The proposed method is based on the observation that many processing operations, both inside and outside acquisition devices, leave distinct intrinsic traces on digital images, and these intrinsic fingerprints can be identified and employed to verify the integrity of digital data. The intrinsic fingerprints of the various in-camera processing operations can be estimated through a detailed imaging model and its component analysis. Further processing applied to the camera captured image is modelled as a manipulation filter, for which a blind deconvolution technique is applied to obtain a linear time-invariant approximation and to estimate the intrinsic fingerprints associated with these postcamera operations. The absence of camera-imposed fingerprints from a test image indicates that the test image is not a camera output and is possibly generated by other image production processes. Any change or inconsistencies among the estimated camera-imposed fingerprints, or the presence of new types of fingerprints suggest that the image has undergone some kind of processing after the initial capture, such as tampering or steganographic embedding. Through analysis and extensive experimental studies, this paper demonstrates the effectiveness of the proposed framework for nonintrusive digital image forensics
A Fast Search Algorithm for a Large Fuzzy Database –java/dotnet
Abstract
In this paper, we propose a fast search algorithm for a large fuzzy database that stores iris codes or data with a similar binary structure. The fuzzy nature of iris codes and their high dimensionality render many modern search algorithms, mainly relying on sorting and hashing, inadequate. The algorithm that is used in all current public deployments of iris recognition is based on a brute force exhaustive search through a database of iris codes, looking for a match that is close enough. Our new technique, Beacon Guided Search (BGS), tackles this problem by dispersing a multitude of ldquobeaconsrdquo in the search space. Despite random bit errors, iris codes from the same eye are more likely to collide with the same beacons than those from different eyes. By counting the number of collisions, BGS shrinks the search range dramatically with a negligible loss of precision. We evaluate this technique using 632,500 iris codes enrolled in the United Arab Emirates (UAE) border control system, showing a substantial improvement in search speed with a negligible loss of accuracy. In addition, we demonstrate that the empirical results match theoretical predictions
In this paper, we propose a fast search algorithm for a large fuzzy database that stores iris codes or data with a similar binary structure. The fuzzy nature of iris codes and their high dimensionality render many modern search algorithms, mainly relying on sorting and hashing, inadequate. The algorithm that is used in all current public deployments of iris recognition is based on a brute force exhaustive search through a database of iris codes, looking for a match that is close enough. Our new technique, Beacon Guided Search (BGS), tackles this problem by dispersing a multitude of ldquobeaconsrdquo in the search space. Despite random bit errors, iris codes from the same eye are more likely to collide with the same beacons than those from different eyes. By counting the number of collisions, BGS shrinks the search range dramatically with a negligible loss of precision. We evaluate this technique using 632,500 iris codes enrolled in the United Arab Emirates (UAE) border control system, showing a substantial improvement in search speed with a negligible loss of accuracy. In addition, we demonstrate that the empirical results match theoretical predictions
Unseen Visible Watermarking: A Novel Methodology for Auxiliary Information Delivery via Visual Contents –java /dotnet
Abstract
A novel data hiding scheme, denoted as unseen visible watermarking (UVW), is proposed. In UVW schemes, hidden information can be embedded covertly and then directly extracted using the human visual system as long as appropriate operations (e.g., gamma correction provided by almost all display devices or changes in viewing angles relative to LCD monitors) are performed. UVW eliminates the requirement of invisible watermarking that specific watermark extractors must be deployed to the receiving end in advance, and it can be integrated with 2-D barcodes to transmit machine-readable information that conventional visible watermarking schemes fail to deliver. We also adopt visual cryptographic techniques to guard the security of hidden information and, at the same time, increase the practical value of visual cryptography. Since UVW can be alternatively viewed as a mechanism for visualizing patterns hidden with least-significant-bit embedding, its security against statistical steganalysis is proved by empirical tests. Limitations and other potential extensions of UVW are also addressed
A novel data hiding scheme, denoted as unseen visible watermarking (UVW), is proposed. In UVW schemes, hidden information can be embedded covertly and then directly extracted using the human visual system as long as appropriate operations (e.g., gamma correction provided by almost all display devices or changes in viewing angles relative to LCD monitors) are performed. UVW eliminates the requirement of invisible watermarking that specific watermark extractors must be deployed to the receiving end in advance, and it can be integrated with 2-D barcodes to transmit machine-readable information that conventional visible watermarking schemes fail to deliver. We also adopt visual cryptographic techniques to guard the security of hidden information and, at the same time, increase the practical value of visual cryptography. Since UVW can be alternatively viewed as a mechanism for visualizing patterns hidden with least-significant-bit embedding, its security against statistical steganalysis is proved by empirical tests. Limitations and other potential extensions of UVW are also addressed
A Game Theoretical Framework on Intrusion Detection in Heterogeneous Networks –java
Abstract
Due to the dynamic, distributed, and heterogeneous nature of today's networks, intrusion detection systems (IDSs) have become a necessary addition to the security infrastructure and are widely deployed as a complementary line of defense to classical security approaches. In this paper, we address the intrusion detection problem in heterogeneous networks consisting of nodes with different noncorrelated security assets. In our study, two crucial questions are: What are the expected behaviors of rational attackers? What is the optimal strategy of the defenders (IDSs)? We answer the questions by formulating the network intrusion detection as a noncooperative game and performing an in-depth analysis on the Nash equilibrium and the engineering implications behind. Based on our game theoretical analysis, we derive the expected behaviors of rational attackers, the minimum monitor resource requirement, and the optimal strategy of the defenders. We then provide guidelines for IDS design and deployment. We also show how our game theoretical framework can be applied to configure the intrusion detection strategies in realistic scenarios via a case study. Finally, we evaluate the proposed game theoretical framework via simulations. The simulation results show both the correctness of the analytical results and the effectiveness of the proposed guidelines
Due to the dynamic, distributed, and heterogeneous nature of today's networks, intrusion detection systems (IDSs) have become a necessary addition to the security infrastructure and are widely deployed as a complementary line of defense to classical security approaches. In this paper, we address the intrusion detection problem in heterogeneous networks consisting of nodes with different noncorrelated security assets. In our study, two crucial questions are: What are the expected behaviors of rational attackers? What is the optimal strategy of the defenders (IDSs)? We answer the questions by formulating the network intrusion detection as a noncooperative game and performing an in-depth analysis on the Nash equilibrium and the engineering implications behind. Based on our game theoretical analysis, we derive the expected behaviors of rational attackers, the minimum monitor resource requirement, and the optimal strategy of the defenders. We then provide guidelines for IDS design and deployment. We also show how our game theoretical framework can be applied to configure the intrusion detection strategies in realistic scenarios via a case study. Finally, we evaluate the proposed game theoretical framework via simulations. The simulation results show both the correctness of the analytical results and the effectiveness of the proposed guidelines
Spread-Spectrum Watermarking Security –java./dotnet
Abstract
This paper presents both theoretical and practical analyses of the security offered by watermarking and data hiding methods based on spread spectrum. In this context, security is understood as the difficulty of estimating the secret parameters of the embedding function based on the observation of watermarked signals. On the theoretical side, the security is quantified from an information-theoretic point of view by means of the equivocation about the secret parameters. The main results reveal fundamental limits and bounds on security and provide insight into other properties, such as the impact of the embedding parameters, and the tradeoff between robustness and security. On the practical side, workable estimators of the secret parameters are proposed and theoretically analyzed for a variety of scenarios, providing a comparison with previous approaches, and showing that the security of many schemes used in practice can be fairly low
This paper presents both theoretical and practical analyses of the security offered by watermarking and data hiding methods based on spread spectrum. In this context, security is understood as the difficulty of estimating the secret parameters of the embedding function based on the observation of watermarked signals. On the theoretical side, the security is quantified from an information-theoretic point of view by means of the equivocation about the secret parameters. The main results reveal fundamental limits and bounds on security and provide insight into other properties, such as the impact of the embedding parameters, and the tradeoff between robustness and security. On the practical side, workable estimators of the secret parameters are proposed and theoretically analyzed for a variety of scenarios, providing a comparison with previous approaches, and showing that the security of many schemes used in practice can be fairly low
A Hypothesis Testing Approach to Semifragile Watermark-Based -Authentication –java
This paper studies the problem of achieving watermark semi-fragility in multimedia authentication through a composite hypothesis testing approach. The embedding of a semi-fragile watermark serves to distinguish legitimate distortions caused by signal processing manipulations from illegitimate ones caused by malicious tampering. This leads us to consider authentication verification as a composite hypothesis testing problem with the watermark as a priori information. Based on the hypothesis testing model, we investigate the best embedding strategy which assists the watermark verifier to make correct decisions. Our results show that the quantization; based watermarking method is more appropriate than the spread spectrum method to achieve the best tradeoff between two error probabilities. This observation is confirmed by a case study of additive Gaussian white noise channel with Gaussian source using two figures of merit: relative entropy of the two hypothesis distributions and the receiver operating characteristic. Finally, we focus on certain common signal processing distortions such as JPEG compression and image filtering, and investigate the best test statistic and optimal decision regions to distinguish legitimate and illegitimate distortions. The results of the paper show that our approach provides insights for authentication watermarking and allows better control of semi-fragility in specific applications
Robust Blind Watermarking of Point-Sampled Geometry
ABSTRACT
Watermarking schemes for copyright protection of point cloud representation of 3D models operate only on the geometric data, and are also applicable to mesh based representations of 3D models, defined using geometry and topological information. For building such generic copyright schemes for 3D models, this paper presents a robust spatial blind watermarking mechanism for 3D point sampled geometry. To find the order in which points are to be encoded/decoded, a clustering approach is proposed. The points are divided into clusters, and ordering is achieved using inter-cluster and intra-cluster ordering. Inter-cluster ordering achieves local ordering of points, whereas intra-cluster ordering does it globally. Once ordered, a sequence of clusters is chosen based on nearest neighbor heuristic. An extension of quantization index of bit encoding scheme is proposed, and used to encode and decode inside the clusters. The encoding mechanism makes the technique robust against uniform affine transformations (rotation, scaling, and transformation), reordering attack and topology altering (e.g. retriangulation) attack when applied to 3D meshes as well. Replication of watermark provides robustness against localized noise addition, cropping, simplification and global noise addition attacks. Security of the scheme is analyzed, and the time complexity is estimated as O (n log n), where n is the number of 3D points. Theoretical bounds on hiding capacity are estimated, and experiments show that a high hiding capacity is high, with embedding rate greater than 3 bits/point. The bit encoding method reduces the distortions and makes the watermark imperceptible, indicated by a signal to noise ratio greater than 100 dB.
Watermarking schemes for copyright protection of point cloud representation of 3D models operate only on the geometric data, and are also applicable to mesh based representations of 3D models, defined using geometry and topological information. For building such generic copyright schemes for 3D models, this paper presents a robust spatial blind watermarking mechanism for 3D point sampled geometry. To find the order in which points are to be encoded/decoded, a clustering approach is proposed. The points are divided into clusters, and ordering is achieved using inter-cluster and intra-cluster ordering. Inter-cluster ordering achieves local ordering of points, whereas intra-cluster ordering does it globally. Once ordered, a sequence of clusters is chosen based on nearest neighbor heuristic. An extension of quantization index of bit encoding scheme is proposed, and used to encode and decode inside the clusters. The encoding mechanism makes the technique robust against uniform affine transformations (rotation, scaling, and transformation), reordering attack and topology altering (e.g. retriangulation) attack when applied to 3D meshes as well. Replication of watermark provides robustness against localized noise addition, cropping, simplification and global noise addition attacks. Security of the scheme is analyzed, and the time complexity is estimated as O (n log n), where n is the number of 3D points. Theoretical bounds on hiding capacity are estimated, and experiments show that a high hiding capacity is high, with embedding rate greater than 3 bits/point. The bit encoding method reduces the distortions and makes the watermark imperceptible, indicated by a signal to noise ratio greater than 100 dB.
Spatial PrObabilistic Temporal (SPOT) databases
ABSTRACT
Spatial PrObabilistic Temporal (SPOT) databases are a paradigm for reasoning with probabilistic statements about where a vehicle may be now or in the future. They express statements of the form “Object O is in spatial region R at some time t with some probability in the interval [L, U].” Past work on SPOT databases has developed selection operators based on selecting SPOT atoms that are entailed by the SPOT database—we call this “cautious” selection. In this paper, we study several problems. First, we note that the runtime of consistency checking and cautious selection algorithms in past work is influenced greatly by the granularity of the underlying Cartesian space. In this paper, we first introduce the notion of “optimistic” selection, where we are interested in returning all SPOT atoms in a database that are consistent with respect to a query, rather than having an entailment relationship. We then develop an approach to scaling SPOT databases that has three main contributions: 1) We develop methods to eliminate variables from the linear programs used in past work, thus greatly reducing the size of the linear programs used—the resulting advances apply to consistency checking, optimistic selection, and cautious selection. 2) We develop a host of theorems to show how we can prune the search space when we are interested in optimistic selection. 3) We use the above contributions to build an efficient index to execute optimistic selection queries over SPOT databases. Our approach is superior to past work in two major respects: First, it makes fewer assumptions than all past works. Second, our experiments, which are based on real-world data about ship movements, show that our algorithms are much more efficient.
Spatial PrObabilistic Temporal (SPOT) databases are a paradigm for reasoning with probabilistic statements about where a vehicle may be now or in the future. They express statements of the form “Object O is in spatial region R at some time t with some probability in the interval [L, U].” Past work on SPOT databases has developed selection operators based on selecting SPOT atoms that are entailed by the SPOT database—we call this “cautious” selection. In this paper, we study several problems. First, we note that the runtime of consistency checking and cautious selection algorithms in past work is influenced greatly by the granularity of the underlying Cartesian space. In this paper, we first introduce the notion of “optimistic” selection, where we are interested in returning all SPOT atoms in a database that are consistent with respect to a query, rather than having an entailment relationship. We then develop an approach to scaling SPOT databases that has three main contributions: 1) We develop methods to eliminate variables from the linear programs used in past work, thus greatly reducing the size of the linear programs used—the resulting advances apply to consistency checking, optimistic selection, and cautious selection. 2) We develop a host of theorems to show how we can prune the search space when we are interested in optimistic selection. 3) We use the above contributions to build an efficient index to execute optimistic selection queries over SPOT databases. Our approach is superior to past work in two major respects: First, it makes fewer assumptions than all past works. Second, our experiments, which are based on real-world data about ship movements, show that our algorithms are much more efficient.
Design and Evaluation of the iMed Intelligent Medical Search Engine –dotnet
Abstract — Searching for medical information on the Web is popular and important. However, medical search has its own unique requirements that are poorly handled by existing medical Web search engines. This paper presents iMed, the first intelligent medical Web search engine that extensively uses medical knowledge and questionnaire to facilitate ordinary Internet users to search for medical information. iMed introduces and extends expert system technology into the search engine domain. It uses several key techniques to improve its usability and search result quality. First, since ordinary users often cannot clearly describe their situations due to lack of medical background, iMed uses a questionnaire-based query interface to guide searchers to provide the most important information about their situations. Second, iMed uses medical knowledge to automatically form multiple queries from a searcher’ answers to the questions. Using these queries to perform search can significantly improve the quality of search results. Third, iMed structures all the search results into a multilevel hierarchy with explicitly marked medical meanings to facilitate searchers’ viewing. Lastly, iMed suggests diversified,related medical phrases at each level of the search result hierarchy. These medical phrases are extracted from the MeSH ontology and can help searchers quickly digest search results and refine their inputs. We evaluated iMed under a wide range of medical scenarios. The results show that iMed is effective and efficient for medical search.
On the Effect of Location Uncertainty in Spatial Querying –java
Abstract—An emerging topic in the field of spatial data management is the handling of location uncertainty of spatial objects, mainly due to inaccurate measurements. The literature on location uncertainty so far has focused on modifying traditional spatial search algorithms in order to handle the impact of objects’ location uncertainty on the query results. In this paper, we present the first, to the best of our knowledge, theoretical analysis that estimates the average number of false hits introduced in the results of rectangular range queries in the case of data points uniformly distributed in 2D space. Then, we relax the original distribution assumptions showing how to deal with arbitrarily distributed data points and more realistic location uncertainty distributions. The accuracy of the results of our analytical approach is demonstrated through an extensive experimental study using various synthetic and real data sets. Our proposal can be directly employed in spatial database systems in order to provide users with the accuracy of spatial query results based only on known data set and query parameters.
Index Terms—Spatial databases, GIS ZDMDZZ7Q4C2D
Index Terms—Spatial databases, GIS ZDMDZZ7Q4C2D
Role Engineering via Prioritized :java project
Abstract—Today, role-based access control (RBAC) has become a well-accepted paradigm for implementing access control because of its convenience and ease of administration. However, in order to realize the full benefits of the RBAC paradigm, one must first define the roles accurately. This task of defining roles and associating permissions with them, also known as role engineering, is typically accomplished either in a top-down or in a bottom-up manner. Under the top-down approach, a careful analysis of the business processes is done to first define job functions and then to specify appropriate roles from them. While this approach can help in defining roles more accurately, it is tedious and time consuming since it requires that the semantics of the business processes be well understood. Moreover, it ignores existing permissions within an organization and does not utilize them. On the other hand, under the bottom-up approach, existing permissions are used to derive roles from them. As a result, it may help automate the process of role definition. In this paper, we present an unsupervised approach, called RoleMiner, for mining roles from existing user-permission assignments. Since a role, when semantics are unavailable, is nothing but a set of permissions, the task of role mining is essentially that of clustering users having the same (or similar) permissions. However, unlike the traditional applications of data mining that ideally
require identification of nonoverlapping clusters, roles will have overlapping permissions and thus permission sets that define roles should be allowed to overlap. It is this distinction from traditional clustering that makes the problem of role mining nontrivial. Our experiments with real and simulated data sets indicate that our role mining process is quite accurate and efficient. Since our role mining approach is based on subset enumeration, it is fairly robust to reasonable levels of noise.
require identification of nonoverlapping clusters, roles will have overlapping permissions and thus permission sets that define roles should be allowed to overlap. It is this distinction from traditional clustering that makes the problem of role mining nontrivial. Our experiments with real and simulated data sets indicate that our role mining process is quite accurate and efficient. Since our role mining approach is based on subset enumeration, it is fairly robust to reasonable levels of noise.
RiMOM A Dynamic Multistrategy Ontology Alignment Framework java /dotnet
Abstract
Ontology alignment identifies semantically matching entities in different ontologies. Various ontology alignment strategies have been proposed; however, few systems have explored how to automatically combine multiple strategies to improve the matching effectiveness. This paper presents a dynamic multistrategy ontology alignment framework, named RiMOM. The key insight in this framework is that similarity characteristics between ontologies may vary widely. We propose a systematic approach to quantitatively estimate the similarity characteristics for each alignment task and propose a strategy selection method to automatically combine the matching strategies based on two estimated factors. In the approach, we consider both textual and structural characteristics of ontologies. With RiMOM, we participated in the 2006 and 2007 campaigns of the Ontology Alignment Evaluation Initiative (OAEI). Our system is among the top three performers in benchmark data sets.
Ontology alignment identifies semantically matching entities in different ontologies. Various ontology alignment strategies have been proposed; however, few systems have explored how to automatically combine multiple strategies to improve the matching effectiveness. This paper presents a dynamic multistrategy ontology alignment framework, named RiMOM. The key insight in this framework is that similarity characteristics between ontologies may vary widely. We propose a systematic approach to quantitatively estimate the similarity characteristics for each alignment task and propose a strategy selection method to automatically combine the matching strategies based on two estimated factors. In the approach, we consider both textual and structural characteristics of ontologies. With RiMOM, we participated in the 2006 and 2007 campaigns of the Ontology Alignment Evaluation Initiative (OAEI). Our system is among the top three performers in benchmark data sets.
Similarity Profiled Temporal Association Mining java/dotnet project
Abstract
Given a time stamped transaction database and a user-defined reference sequence of interest over time, similarity-profiled temporal association mining discovers all associated item sets whose prevalence variations over time are similar to the reference sequence. The similar temporal association patterns can reveal interesting relationships of data items which co-occur with a particular event over time. Most works in temporal association mining have focused on capturing special temporal regulation patterns such as cyclic patterns and calendar scheme-based patterns. However, our model is flexible in representing interesting temporal patterns using a user-defined reference sequence. The dissimilarity degree of the sequence of support values of an item set to the reference sequence is used to capture how well its temporal prevalence variation matches the reference pattern. By exploiting interesting properties such as an envelope of support time sequence and a lower bounding distance for early pruning candidate item sets, we develop an algorithm for effectively mining similarity-profiled temporal association patterns. We prove the algorithm is correct and complete in the mining results and provide the computational analysis. Experimental results on real data as well as synthetic data show that the proposed algorithm is more efficient than a sequential method using a traditional support-pruning scheme.
Given a time stamped transaction database and a user-defined reference sequence of interest over time, similarity-profiled temporal association mining discovers all associated item sets whose prevalence variations over time are similar to the reference sequence. The similar temporal association patterns can reveal interesting relationships of data items which co-occur with a particular event over time. Most works in temporal association mining have focused on capturing special temporal regulation patterns such as cyclic patterns and calendar scheme-based patterns. However, our model is flexible in representing interesting temporal patterns using a user-defined reference sequence. The dissimilarity degree of the sequence of support values of an item set to the reference sequence is used to capture how well its temporal prevalence variation matches the reference pattern. By exploiting interesting properties such as an envelope of support time sequence and a lower bounding distance for early pruning candidate item sets, we develop an algorithm for effectively mining similarity-profiled temporal association patterns. We prove the algorithm is correct and complete in the mining results and provide the computational analysis. Experimental results on real data as well as synthetic data show that the proposed algorithm is more efficient than a sequential method using a traditional support-pruning scheme.
Ranking and Suggesting Popular Items java Project
Abstract
We consider the problem of ranking the popularity of items and suggesting popular items based on user feedback. User feedback is obtained by iteratively presenting a set of suggested items, and users selecting items based on their own preferences either from this suggestion set or from the set of all possible items. The goal is to quickly learn the true popularity ranking of items (unbiased by the made suggestions), and suggest true popular items. The difficulty is that making suggestions to users can reinforce popularity of some items and distort the resulting item ranking. The described problem of ranking and suggesting items arises in diverse applications including search query suggestions and tag suggestions for social tagging systems. We propose and study several algorithms for ranking and suggesting popular items, provide analytical results on their performance, and present numerical results obtained using the inferred popularity of tags from a month-long crawl of a popular social bookmarking service. Our results suggest that lightweight, randomized update rules that require no special configuration parameters provide good performance.
We consider the problem of ranking the popularity of items and suggesting popular items based on user feedback. User feedback is obtained by iteratively presenting a set of suggested items, and users selecting items based on their own preferences either from this suggestion set or from the set of all possible items. The goal is to quickly learn the true popularity ranking of items (unbiased by the made suggestions), and suggest true popular items. The difficulty is that making suggestions to users can reinforce popularity of some items and distort the resulting item ranking. The described problem of ranking and suggesting items arises in diverse applications including search query suggestions and tag suggestions for social tagging systems. We propose and study several algorithms for ranking and suggesting popular items, provide analytical results on their performance, and present numerical results obtained using the inferred popularity of tags from a month-long crawl of a popular social bookmarking service. Our results suggest that lightweight, randomized update rules that require no special configuration parameters provide good performance.
Olex Effective Rule Learning for Text Categorization – java /dotnet
Abstract
This paper describes Olex, a novel method for the automatic induction of rule-based text classifiers. Olex supports a hypothesis language of the form "if T_{1} or cdots or T_{n} occurs in document d, and none of T_{n + 1}, ldots T_{n + m} occurs in d, then classify d under category c,” where each T_{i} is a conjunction of terms. The proposed method is simple and elegant. Despite this, the results of a systematic experimentation performed on the Reuters-21578, the Ohsumed, and the ODP data collections show that Olex provides classifiers that are accurate, compact, and comprehensible. A comparative analysis conducted against some of the most well-known learning algorithms (namely, Naive Bayes, Ripper, C4.5, SVM, and Linear Logistic Regression) demonstrates that it is more than competitive in terms of both predictive accuracy and efficiency
This paper describes Olex, a novel method for the automatic induction of rule-based text classifiers. Olex supports a hypothesis language of the form "if T_{1} or cdots or T_{n} occurs in document d, and none of T_{n + 1}, ldots T_{n + m} occurs in d, then classify d under category c,” where each T_{i} is a conjunction of terms. The proposed method is simple and elegant. Despite this, the results of a systematic experimentation performed on the Reuters-21578, the Ohsumed, and the ODP data collections show that Olex provides classifiers that are accurate, compact, and comprehensible. A comparative analysis conducted against some of the most well-known learning algorithms (namely, Naive Bayes, Ripper, C4.5, SVM, and Linear Logistic Regression) demonstrates that it is more than competitive in terms of both predictive accuracy and efficiency
Multirelational k-Anonymity --- dotnet/java project abstract
Abstract
k-Anonymity protects privacy by ensuring that data cannot be linked to a single individual. In a k-anonymous data set, any identifying information occurs in at least k tuples. Much research has been done to modify a single-table data set to satisfy anonymity constraints. This paper extends the definitions of k-anonymity to multiple relations and shows that previously proposed methodologies either fail to protect privacy or overly reduce the utility of the data in a multiple relation setting. We also propose two new clustering algorithms to achieve multi relational anonymity. Experiments show the effectiveness of the approach in terms of utility and efficiency
k-Anonymity protects privacy by ensuring that data cannot be linked to a single individual. In a k-anonymous data set, any identifying information occurs in at least k tuples. Much research has been done to modify a single-table data set to satisfy anonymity constraints. This paper extends the definitions of k-anonymity to multiple relations and shows that previously proposed methodologies either fail to protect privacy or overly reduce the utility of the data in a multiple relation setting. We also propose two new clustering algorithms to achieve multi relational anonymity. Experiments show the effectiveness of the approach in terms of utility and efficiency