Tuesday, September 8, 2009

Summary of the Talk by Prof. FeiFei Li

Summary of the Talk by Prof. FeiFei Li

Alan Lupsha

Professor FeiFei Li researches Database Management and Database technologies. His research focuses on efficient indexing, querying and managing large scale databases, spatio-temporal databases and applications, and sensor and stream databases.

Efficient indexing, querying and managing large scale databases deals with problems such as retrieving structured data from the web and automating the process of identifying the structure of web sites (ex. to create customized reports for users). It is important to interpret web pages and to identify data tree structures. This allows one to first create a schema for the structure of the data, and then to integrate information from different sources together in a meaningful way. The topic of indexing higher dimensional data (using tree structures and multi dimensional structures) deals with space partitioning that indexes data anywhere from 2 to 6 dimensions.

The topic of spatio-temporal databases and applications deals with the execution of queries, like finding solutions to NP-hard problems such as the traveling salesman problem. A solution uses a greedy algorithm, which has a start node location and finds the nearest neighbor in each predefined category of nodes. By minimizing the sum distance (using the minimum sum distance algorithm), a path from a start to and end node is found in such a way that each category is visited, and the solution is at most 3 times the complexity of the optimal solution.

Sensor and stream databases deal with the integration of sensors into network models. A large set of sensors is distributed in a sensor field, and a balance is sought to solve problems such as data flow between sensors, hierarchy of sensors and efficient data transmission for the purpose of saving battery life. Professor Li analyzes the best data flow models between sensors and different ways to group sensors so that hub nodes transmit data further to other hub nodes (an example of such an application is the monitoring of temperatures on an active volcano). One can not use broadcast since this would drain the sensors’ battery life. Thus, routing methods and fail over mechanisms are examined, to ensure that all sensor data is properly being read.

Professor Li also researches problems with the method of Identical Independent Distributed Random Noise (IID), which introduces errors in data sets for the purpose of hiding secret data, while maintaining correct data averages and other data benchmarks (for example hiding real stock data or employees’ salaries, but preserving averages). The problem with IID is that attackers can filter out outliers in data and still extract the data that is meant to remain secret. A solution to this problem is to add noise to the original component of the data set by adding the same amount of noise, but in parallel to the principal component. This yields more securely obfuscated data.

No comments:

Post a Comment