Skim on VLDB 2010
Industry Sessions
感兴趣的industry paper
I09 p.1469: Distance-Based Outlier Detection: Consolidation and Renewed Bearing
异常点检测的实验
I12 p.1505: Confucius and Its Intelligent Disciples: Integrating Social with Search
Q&A Sites@Google
Research Sessions
Session 1: Database Security
R1 p.13: Building Disclosure Risk Aware Query Optimizers for Relational Databases
R2 p.25: Secure Personal Data Servers: a Vision Paper
大量的隐私数据被网络所收集: 建立个人Data Server, 用户决定自己的数据如何被使用(by whom, for how long, according to which rule, for which purpose)。很有意思的概念
R3 p.36: PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting
Session 2: Parallel and Distributed Databases
R4 p.48: Schism: a Workload-Driven Approach to Database Replication and Partitioning
R5 p.58: Ten Thousand SQLs: Parallel Keyword Queries Computing
R6 p.70: The Case for Determinism in Database Systems
Session 3: Data Exchange
数据集成/交换中的经典问题: Schema Mapping
R7 p.81: MapMerge: Correlating Independent Schema Mappings
R8 p.93: Chase Termination: A Constraints Rewriting Approach
R9 p.105: Scalable Data Exchange with Functional Dependencies
Session 4: Database Services and Applications
R10 p.117: Interactive Route Search in the Presence of Order Constraints
常规的地理search, 返回相关的实体集合,这里定义了一种新的查询方式
输入: 几个search queries 输出: A Route that goes via entities
这个过程是Interactive的,分步进行,用户每步给出相应的entities是否正确。
R11 p.129: Energy Management for MapReduce Clusters
Data Center的能源管理策略:1)关掉利用率低的机器,2)用cluster计算完成任务后,关掉整个cluster。结果是2)比1)好
R12 p.140: Toward Scalable Keyword Search over Relational Data
为了性能的提升, 返回部分结果,牺牲广度和精度。
Session 5: Data Models and Languages
R13 p.150: From Regular Expressions to Nested Words: Unifying Languages and Query Execution for Relational and XML Sequences
pattern matching over event stream 的语言
R14 p.162: Avalanche-Safe LINQ Compilation
LINQ
R15 p.173: Towards Certain Fixes with Editing Rules and Master Data
完整性约束只能找出错误,利用master data纠正错误
Session 6: Semantics 语义
R16 p.185: Explaining Missing Answers to SPJUA Queries
允许用户做出这样的查询: why certain tuples are not in the query results
R17 p.197: Sampling the Repairs of Functional Dependency Violations under Hard Constraints
修复违反FD(函数依赖)的数据。
传统方法1) 产生基于某种metric最优的repairs(优化问题);2)不直接产生repair,而是产生符合要求的answer
本文方法: 精心生成候选集,从中采样。
R18 p.208: Evaluating Entity Resolution Results
使用GMD measure作为ER结果的评估方法,类似于edit distance, 以split 和merge作为基本操作。
Session 7: Stream Databases
R19 p.220: High-Performance Dynamic Pattern Matching over Disordered Streams
Stream pattern matching
R20 p.232: SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems
分析Stream Processing Engines的引擎
R21 p.244: Recognizing Patterns in Streams with Imprecise Timestamps
不确定的流数据,以前假设流中的事件发生时刻已知且精确,这里假设未知或不精确。
Session 8: RDF and Graphs
R22 p.256: x-RDF-3X: Fast Querying, High Update Rates, and Consistency for RDF Databases
R23 p.264: Graph Pattern Matching: From Intractable to Polynomial Time
定义了一种新式的graph pattern: edge代表数据图中的连通性
R24 p.276: GRAIL: Scalable Reachability Index for Large Graphs
大图的可达索引,关注scalability
Session 9: Middleware Platforms for Data Management 中间件
R25 p.285: HaLoop: Efficient Iterative Data Processing on Large Clusters
R26 p.297: The Impact of Virtual Views on Containment
R27 p.309: Updatable and Evolvable Transforms for Virtual Databases
Session 10: Novel/Advanced Applications
R28 p.320: Navigating in Complex Mashed-Up Applications
Mashup and navigation
R29 p.330: Dremel: Interactive Analysis of Web-Scale Datasets
Interactive ad-hoc query system @ Google, 使用列存储nested data
R30 p.340: On Graph Query Optimization in Large Networks
Spath查询方法
Session 11: Ranking Queries
R31 p.352: Proximity Rank Join
已知: Relations, tuple = (score, 实值特征向量),给定一个目标vector
返回与目标最“接近”的k个tuples的组合。
R32 p.364: Identifying the Most Influential Data Objects with Reverse Top-k Queries
Reverse Top-k: 识别出最喜欢(受影响)的用户的集合,两个算法(SB和BB),这名字起得真是囧。
R33 p.373: Retrieving Top-k Prestige-Based Relevant Spatial Web Objects
对于查询,如果一个相关结果其附近的对象也是相关,那么这个结果就有较高的威望。基于此,进行location aware keyword search.
Session 12: Spatial and Temporal Databases
R34 p.385: Parsimonious Linear Fingerprinting for Time Series
利用数值序列的joint dynamic得到fingerprinting来进行 mining and summarizing, 优点是a)易于理解 b)应用广泛,压缩、聚类等c) 线性时间。
R35 p.397: The HV-tree: a Memory Hierarchy Aware Version Index
Version Index: 将数据渐进地迁移到更大(也更慢)的存储系统中. TSB Tree考虑了两级(MemoryàDisk), 本文考虑Cache与Memory
R36 p.409: Transforming Range Queries To Equivalent Box Queries To Optimize Page Access
将Range Queries转化为Box Queries优化页存取
Session 13: Record Linkage
R37 p.417: Record Linkage with Uniqueness Constraints and Erroneous Values
数据冲突问题,利用某些实体的属性满足唯一性约束(如身份证等),将问题转换为k-划分图问题。
R38 p.429: On-the-Fly Entity-Aware Query Processing in the Presence of Linkage
用Uncertainty解决实体识别,并且是online的
R39 p.439: Behavior Based Record Linkage
利用transaction log计算行为,以及合并2个表面上看起来不同但是行为相同的结果对应的代价来决定是否合并
Session 14: Experimental Analysis and Performance
R40 p.449: iGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques
目前的Graph Query采用的方法是利用index 过滤,然后进行子图同构验证。
R41 p.460: Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance
EC2性能测试
R42 p.472: The Performance of MapReduce: An In-depth Study
Handoop性能测试
R43 p.484: Evaluation of entity resolution approaches on real-world match problems
实体识别实验,使用真实世界的match task
Session 15: Cloud Computing
R44 p.494: MRShare: Sharing Across Multiple Queries in MapReduce
共享相似的查询结果
R45 p.506: Towards Elastic Transactional Cloud Storage with Range Query Support
云存储系统
R46 p.518: Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)
让一头黄色的大象跑得像猎豹一样
Session 16: Query Processing I
R47 p.530: Slicing Long-Running Queries
切分运行时间长的Query
R48 p.542: Sharing-Aware Horizontal Partitioning for Exploiting Correlations During Query Processing
R49 p.554: Advanced Processing for Ontological Queries
Session 17: Data Extraction
R50 p.566: Towards The Web of Concepts: Extracting Concepts from Large Datasets
Concept代表实体的词序列(句子?)
R51 p.578: Exploiting Content Redundancy for Web Information Extraction
抽取template-based 的网页
R52 p.588: Automatic Rule Refinement for Information Extraction
基于规则的自动信息提取
Session 18: Privacy
R53 p.598: Embellishing Text Search Queries To Protect User Privacy
文本搜索时关键词导致的隐私泄露
R54 p.608: Small Domain Randomization: Same Privacy, More Utility
传统的Privacy问题
R55 p.619: Nearest Neighbor Search with Strong Location Privacy
NN search位置隐私
Session 19: Probabilistic and Uncertain Databases
R56 p.630: UPI: A Primary Index for Uncertain Databases
不确定数据库的主索引
R57 p.638: Ranking Continuous Probabilistic Datasets
连续概率分布上的rank查询
R58 p.650: Set Similarity Join on Probabilistic Data
概率数据上的set similarity join
Session 20: Databases on Modern Hardware
R59 p.660: Complex Event Detection at Wire Speed with FPGAs
R60 p.670: Database Compression on Graphics Processors
R61 p.681: Aether: A Scalable Approach to Logging
Session 21: Data Mining
R62 p.693: Scalable Discovery of Best Clusters on Large Graphs
Top-Graph Clusters, 只返回最可能的那几个cluster
R63 p.703: An Architecture for Parallel Topic Models
R64 p.711: Keyword++: A Framework to Improve Keyword Search Over Entity Databases
通过实体数据库改进搜索质量
Session 22: Moving Object Databases
R65 p.723: Swarm: Mining Relaxed Temporal Moving Object Clusters
对象可能暂时分散,但是在某个时间点聚集,以前的定义要求对象在一个连续的时间距离,这里定义了一种新的结构,称为swarm
R66 p.735: An Adaptive Updating Protocol for Reducing Moving Object Databases Workload
(location 和speed)的更新问题,允许延迟的更新,并根据结果调整。
R67 p.747: Shortest Path Computation on Air Indexes
Mobile Road Network中的最短路径查询
Session 23: Probabilistic Data
R68 p.758: Efficient and Effective Similarity Search over Probabilistic Data based on Earth Mover’s Distance
基于Earth Mover’s Distance的similarity search, 利用对偶原理以及B树剪枝。
R69 p.770: Probabilistic XML via Markov Chains
Session 24: Fuzzy, Probabilistic and Approximate Databases
R70 p.782: MCDB-R: Risk Analysis in the Database
不确定性被建模在不确定值的概率分布,导致了其查询结果也是个概率分布。企业通常关心分布的lower or upper tail(风险评估)。这里通过改进的MCDB能获取tail的sample.
R71 p.794: Scalable Probabilistic Databases with Factor Graphs and MCMC
一种新的概率数据库建模方法,利用factor graph编码分布,利用MCMC recover, 低层仍然采用single world模型
Session 25: Discovery and Exploration
R72 p.805: On Multi-Column Foreign Key Discovery
对于没有给出外键约束的DB自动发现约束
R73 p.815: Explore or Exploit? Effective Strategies for Disambiguating Large Databases
利用上下文clean data
Session 26: Information Filtering and Dissemination
R74 p.826: Building Ranked Mashups of Unstructured Sources with Uncertain Information
R75 p.838: Computing Closed Skycubes
Session 27: Query Processing II
R76 p.848: Generating Databases for Query Workloads
R77 p.860: Processing Top-k Join Queries
R78 p.871: Two-way Replacement Selection
这个貌似很NB。Merge Sort生成的runs的replacement策略的改进
Session 28: XML Data
R79 p.882: XPath Whole Query Optimization
R80 p.894: Fast Optimal Twig Joins
R81 p.906: Destabilizers and Independence of XML Updates
Session 29: Workflows, Transactions and Business Processes
R82 p.918: Searching Workflows with Hierarchical Views
R83 p.928: Data-Oriented Transaction Execution
R84 p.940: Optimal Top-K Query Evaluation for Weighted Business Processes
Session 30: Scientific databases
R85 p.952: Behavioral Simulations in MapReduce
R86 p.964: A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays
R87 p.975: On Dense Pattern Mining in Graph Streams
频繁的dense pattern挖掘
Session 31: Mobility and Spatial Queries
R88 p.985: Efficient Proximity Detection among Mobile Users via Self-Tuning Policies
给定用户之间的threshold, 找出和其proximity在threshold在某threshold一下的用户
R89 p.997: k-Nearest Neighbors in Uncertain Graphs
不确定图的kNN
R90 p.1009: Mining Significant Semantic Locations From GPS Data
利用GPS位置信息获取语义信息,建立location, user的Graph,利用RW设定significance
Session 32: Data Anonymization Techniques
R91 p.1021: Boosting the Accuracy of Differentially Private Histograms Through Consistency
R92 p.1033: rho-uncertainty: Inference-Proof Transaction Anonymization
R93 p.1045: Minimizing Minimality and Maximizing Utility: Analyzing Method-based attacks on Anonymized Data
Session 33: Querying and Integrating Probabilistic Databases
R94 p.1057: Querying Probabilistic Information Extraction
综合IE+Query步骤
R95 p.1068: Read-Once Functions and Query Evaluation in Probabilistic Databases
PDB的Query Evaluation
R96 p.1080: Foundations of Uncertain-Data Integration
Session 34: Database Design
R97 p.1091: Identifying, Attributing and Describing Spatial Bursts
利用用户生成数据识别地理集中的事件
R98 p.1103: CORADD: Correlation Aware Database Designer for Materialized Views and Indexes
R99 p.1114: Regret-Minimizing Representative Databases
定义了一种新的查询k-regret查询
Session 35: Query Optimization
R100 p.1125: An Access Cost-Aware Approach for Object Retrieval over Multiple Sources
多数据源选择
R101 p.1137: On the Stability of Plan Costs and the Costs of Plan Stability
参数化的计划生成与选择
R102 p.1149: Xplus: A SQL-Tuning-Aware Query Optimizer
Session 36: Graph and Pattern Matching
R103 p.1161: Graph Homomorphism Revisited for Graph Matching
p-同构
R104 p.1173: SigMatch: Fast and Scalable Multi-Pattern Matching
字符多目标匹配
R105 p.1185: SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs
找出匹配图出现的次数(with possible missing edges)
Session 37: Indexing Techniques
R106 p.1195: Tree Indexing on Solid State Drives
R107 p.1207: Efficient B-tree Based Indexing for Cloud Data Processing
R108 p.1219: Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints
Session 38: Query Processing III
R109 p.1231: VoR-Tree: R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries
R-tree with Voroni Diagrams
R110 p.1243: Efficient RkNN Retrieval with Arbitrary Non-Metric Similarity Measures
RkNN
R111 p.1255: Efficient Skyline Evaluation over Partially Ordered Domains
Session 39: Streaming and Sensor Data
R112 p.1267: Achieving High Output Quality under Limited Resources through Structure-based Spilling in XML Streams
R113 p.1279: Dynamic Join Optimization in Multi-Hop Wireless Sensor Networks
R114 p.1291: Database-support for Continuous Prediction Queries over Streaming Data
R115 p.1302: Conditioning and Aggregating Uncertain Data Streams: Going Beyond Expectations
Session 40: Information Integration and Retrieval
R116 p.1314: TRAMP: Understanding the Behavior of Schema Mappings through Provenance
R117 p.1326: Entity Resolution with Evolving Rules
R118 p.1338: Annotating and Searching Web Tables Using Entities, Types and Relationships
Session 41: Data Mining, Copy Detection and Data Publication
R119 p.1348: Interesting-Phrase Mining for Ad-Hoc Text Analytics
R120 p.1358: Global Detection of Complex Copying Relationships Between Sources
R121 p.1370: Fragments and Loose Associations: Respecting Privacy in Data Publishing











