{Seminar}2009.12.12. – Light-weight, Runtime Verification of Query Sources.[1]
Light-weight, Runtime Verification of Query Sources.
Tingjian Ge (Brown U), Stan Zdonik (Brown U).
ICDE 2009
Lei YU
Background
Data owner outsources their data to the Network Storage System such as AWS.
The storage provider should ensure the underlying data not been modified by attackers
This paper address arbitrary modification of data incluing changes that may pass the DB.
In a world, it deals with modification in file system.
Q: What about Secure File system? What is the difference?
A: They are general purpose, not full using DB semantics? (No time to review, see paper)
“The basic idea is that we build one or more Merkle hash tree(s) for each sensitive DB object (e.g. specifica attributes and their associate index).
Then at query execution time, as we read data from the disk,
we verify that a minimum subset of “referenced” bytes from the sensitive DB objects is authentic,
thus providing that the data source and query result and both correct and complete
relateive to the initial “good state” from which the Merkle trees were built.”
The Merkle Hash Tree is a tree structure originally proposed for efficient authentication of equality queries in a DB sorted on the query attribute.
Every DB record corresponds to a distinct leaf node.
Each leaf node stores a hash value for the binary representation of its record.
The tree is constructed bottom-up, with each internal node storing the hash value of the concatenation of the hash value of its children.
With the root of the hash tree *and* the encryption key, we compute the signature of sensitive data.
*Query*
As a query runs, we need to verify it with Hash Tree by computing the signature again.
The server verifies the authenticity of the data it used from disk by computing hash values in a bottom-up manner, all the way up to the root.
Then we compared the signature with original signature to verified the authenticity.
*Update*
If we insert a data, we need to update the corresponding hash value in the tree.
The problem is if only part of the data modified, is it failure for all records?
*Organization*
Step 1. We split each page of a sensitive DB objects into several sub-blocks and construct a small in-page hash tree.
Step 2. All these in-pages hash trees make up a big hash tree.
SHA-1 is faster when the input is bigger message blocks
Q: Why we have to split a page?
A: Choose a fanout such that each hash operation is applied to a number of bytes that is close to the sub-block size
Q: How to split a page (i.e. what’s sub-block size of a page)?
A: In the experiment, try 3 size (1024, 512 and 256 bytes)
Tradeoff between time and space.
Small Size: quicker but bigger
Big Size: slower but smaller
Q: OK, then how to organize the in-page hash page ? (i.e. #fanout=?)
A: each hash operation applied to a number of bytes that is close to the sub-block size.
*Concurrency Control*
Two basic problems in CC, sync and reader/writer, merge some update operation to improver performance.
4 Phases in Writer
Synchronization Primitives and Reader/Writer
Write Transaction
Verification
Mark
Update
Helping
Q: Does the tree need to be balanced?
娱乐:
八卦一下 Merkle, 似乎很多数据结构都是以这个作为开头的
钻石时代

