Decentralized Recovery for Survivable Storage Systems
Theodore Ming-Tao Wong
May 2004
CMU-CS-04-119
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy.
Thesis Committee:
Jeannette M. Wing, Chair (Dept. of Computer Science, Carnegie Mellon University)
Gregory R. Ganger (Dept. of Electrical and Computer Engineering, Carnegie Mellon University)
Chenxi Wang (Dept. of Electrical and Computer Engineering, Carnegie Mellon University)
Michael K. Reiter (Dept. of Electrical and Computer Engineering, Carnegie Mellon University)
Copyright °c 2004 Theodore Ming-Tao Wong
This research is sponsored in part by the Army Research Office (contract DAAD19-01-1-0485), the Defense Advanced
Research Projects Agency (AF contracts F30602-99-2-0539-AFRL, F33615-93-1-1330, F30602-97-2-0031), and
the National Science Foundation (contracts CCR-0121547, CCR-0208853, CCR-9523972).
The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding
any copyright notation thereon. The views and conclusions contained herein are those of the author and should not be
interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ARO,
DARPA, the NSF, or the U.S. Government.
Keywords: Survivable storage systems, verifiable secret redistribution, threshold sharing
To my mother and father, who taught me the importance of scholarship.
Abstract
Modern society has produced a wealth of data to preserve for the long term. Some
data we keep for cultural benefit, in order to make it available to future generations,
while other data we keep because of legal imperatives. One way to preserve such
data is to store it using survivable storage systems. Survivable storage is distinct from
reliable storage in that it tolerates confidentiality failures in which unauthorized users
compromise component storage servers, as well as crash failures of servers. Thus, a
survivable storage system can guarantee both the availability and the confidentiality of
stored data.
Research into survivable storage systems investigates the use of m-of-n threshold
sharing schemes to distribute data to servers, in which each server receives a share of
the data. Any m shares can be used to reconstruct the data, but any m−1 shares reveal
no information about the data. The central thesis of this dissertation is that to truly
preserve data for the long term, a system that uses threshold schemes must incorporate
recovery protocols able to overcome server failures, adapt to changing availability or
confidentiality requirements, and operate in a decentralized manner.
To support the thesis, I present the design and experimental performance analysis
of a verifiable secret redistribution protocol for threshold sharing schemes. The protocol
redistributes shares of data from old to new, possibly disjoint, sets of servers,
such that new shares generated by redistribution cannot be combined with old shares to
reconstruct the original data. The protocol is decentralized, and does not require intermediate
reconstruction of the data; thus, one does not create a central point of failure
or risk the exposure of the data during protocol execution. The protocol incorporates a
verification capability that enables new servers to confirm that their shares can be used
to reconstruct the original data.
vi
Acknowledgements
I think no student could find a better advisor than in mine, JeannetteWing. She has always been
there to provide feedback, support, and guidance throughout the course of my dissertation research,
and I thank her for all of her efforts. I also thank the members of my committee—Greg Ganger,
Mike Reiter, and Chenxi Wang—for their feedback and support.
I would also like to acknowledge the guidance that I have received from my colleagues in the research
community. In particular, I would like to thank Garth Gibson (my previous advisor), Richard
Golding, and John Wilkes for their research and personal advice.
I am grateful to both Ken Birman (myM.Eng. advisor) and Fred Schneider at Cornell University
for providing lab facilities and research feedback while I was in Ithaca for a month in 2002. I also
thank the members of the PDL Consortium (including EMC, Hewlett-Packard, Hitachi, IBM, Intel,
LSI, Microsoft, Network Appliance, Oracle, Panasas, Seagate, Sun, and Veritas) for their feedback
and support.
I would like to thank Paul Mazaitis and Joan Digney for taking the time to proofread my dissertation
for any bugs in the writing, as well as for all of their help during PDL Retreats and Open
Houses.
I have been fortunate to have been a part of the CMU SCS community. Not only is it a top-notch
research environment, but it has also been a friendly and welcoming place to me. I have made many
friends while at CMU, and I thank them all for their support, and for being a part of my life and
letting me be a part of theirs.
Last, but not least, I would to thank my wife, Addie, for her love and encouragement. She never
stopped believing that I could (and would) finish my dissertation, even when I had my doubts.
And, for anyone who has been on zephyr in the last two years, “<slip>”.
Contents
1 Introduction 1
1.1 Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 System model 5
2.1 Abstract storage system model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The mobile adversary and its effect on servers . . . . . . . . . . . . . . . . . . . . 7
2.3 The dynamic membership model . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Verifiable Secret Redistribution 13
3.1 Cryptographic building blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 Shamir’s threshold sharing scheme . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Desmedt and Jajodia’s secret redistribution protocol . . . . . . . . . . . . 14
3.1.3 Feldman’s VSS scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 The VSR protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 Assumptions about faulty shareholders . . . . . . . . . . . . . . . . . . . 21
3.2.2 Detection of faulty old shareholders . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 Computation cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.4 Protocol correctness on termination . . . . . . . . . . . . . . . . . . . . . 24
3.2.5 Protocol security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 The mobile adversary and the VSR protocol . . . . . . . . . . . . . . . . . . . . . 30
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Hathor: An Experimental Storage System 35
4.1 Data distribution schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
viii CONTENTS
4.2 Client and server implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 I/O operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 STORE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.2 REDISTRIBUTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.3 RETRIEVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Performance Evaluation 49
5.1 Shamir’s threshold sharing scheme performance . . . . . . . . . . . . . . . . . . . 50
5.2 Verifiable secret redistribution performance . . . . . . . . . . . . . . . . . . . . . 51
5.2.1 VSR with an exponentiation witness function . . . . . . . . . . . . . . . . 51
5.2.2 VSR with an elliptic curve witness function . . . . . . . . . . . . . . . . . 57
5.3 Hathor storage system performance . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6 RelatedWork 65
6.1 Redistribution for threshold sharing schemes . . . . . . . . . . . . . . . . . . . . . 65
6.2 Survivable storage systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3 Design studies for survivable storage systems . . . . . . . . . . . . . . . . . . . . 69
7 Conclusions and future work 71
7.1 Research contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
A REDISTRIBUTE for REPLICA and HYBRID 75
List of Figures
1.1 A storage system without a recovery mechanism . . . . . . . . . . . . . . . . . . . 2
2.1 Abstract model of a storage system with n servers . . . . . . . . . . . . . . . . . . 6
2.2 A storage system in the presence of an mobile adversary . . . . . . . . . . . . . . 8
2.3 A storage system with a recovery protocol in the presence of a mobile adversary . . 9
3.1 Desmedt and Jajodia’s secret redistribution protocol . . . . . . . . . . . . . . . . . 15
3.2 Feldman’s verifiable secret sharing scheme . . . . . . . . . . . . . . . . . . . . . 16
3.3 Verifiable secret redistribution protocol . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 A storage system with the VSR protocol in the presence of a mobile adversary . . . 32
4.1 Data distribution schemes implemented in Hathor . . . . . . . . . . . . . . . . . . 36
4.2 Functional modules of the client and server implementations in Hathor . . . . . . . 38
4.3 STORE operation state machine for clients and servers in Hathor . . . . . . . . . . 40
4.4 REDISTRIBUTE operation state machine for old servers in Hathor . . . . . . . . . . 42
4.5 REDISTRIBUTE operation state machine for new servers in Hathor . . . . . . . . . 44
4.6 RETRIEVE operation state machine for clients and servers in Hathor . . . . . . . . 47
5.1 Performance of Shamir’s threshold sharing scheme . . . . . . . . . . . . . . . . . 50
5.2 Performance of the VSR protocol with an exponentiation witness function . . . . . 53
5.3 SUBSHARE performance of the VSR protocol . . . . . . . . . . . . . . . . . . . . 54
5.4 Performance of the VSR protocol vs. block size . . . . . . . . . . . . . . . . . . . 56
5.5 Performance of the VSR protocol with an elliptic curve witness function . . . . . . 58
5.6 SHARES-VALID performance of the VSR protocol vs. finite field bit sizes . . . . . 60
5.7 Performance of Hathor vs. file size . . . . . . . . . . . . . . . . . . . . . . . . . . 63
x LIST OF FIGURES
A.1 REDISTRIBUTE operation state machine for old servers in Hathor, in a system that
uses the REPLICA scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
A.2 REDISTRIBUTE operation state machine for old servers in Hathor, in a system that
uses the HYBRID scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
List of Tables
5.1 Time per exponentiation modulo q for exponents modulo p . . . . . . . . . . . . . 52
5.2 Time per exponentiation modulo q for exponents modulo p, using Brickell exponentiation,
for an 8 KB block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 Time per point multiplication in the elliptic curve computed over the finite field Zq′ ,
using Brickell exponentiation, for an 8 KB block . . . . . . . . . . . . . . . . . . 59
5.4 Time taken to store, redistribute, and retrieve a 0-byte file in Hathor . . . . . . . . 61
5.5 Average I/O overhead in Hathor of the HYBRID scheme over the REPLICA scheme . 62
6.1 A comparison of robust (m,n) threshold sharing schemes . . . . . . . . . . . . . . 66
6.2 A comparison of survivable storage systems . . . . . . . . . . . . . . . . . . . . . 68
xii LIST OF TABLES
Chapter 1
Introduction
Modern society has produced a wealth of data to preserve for the long term. Some data we keep for
cultural benefit, in order to make it available to future generations. For example, the Internet Archive
(http://www.archive.org) aims to preserve indefinitely both the contents of all Internet
websites and of all digitized physical media. Other data we keep because of legal imperatives. For
example, several laws (e.g., the Gramm-Leach-Bliley Act of 1999 and the Sarbanes-Oxley Act of
2002) mandate retention and privacy standards for financial records.
One way to preserve long-term data is to store it using survivable storage systems. Survivable
storage systems are generally aggregations of unreliable components, such as heterogeneous
“bricks” able to provide both storage and metadata management functions [22, 34], or peer-to-peer
workstations [1, 46]. They are distinct from reliable storage systems [40] in that they tolerate confidentiality
failures (in which unauthorized users compromise components) as well as crash failures
of components. Thus, survivable storage systems are able to guarantee both availability (will one
be able to recover one’s data?) and confidentiality (can one be sure that an unauthorized person has
not obtained one’s data?).
A number of researchers propose using m-of-n threshold sharing schemes [49] in survivable
storage systems to tolerate component failures. A system that uses a threshold scheme distributes
n shares of the original data to components such that any m can be used to reconstruct the data.
Moreover, any m−1 shares reveal no information about the data. Thus, the system can tolerate
the loss n−m shares without losing data availability, and the compromise of m−1 shares (i.e.,
revelation to an unauthorized user) without losing data confidentiality. Some examples of such
systems include e-Vault [33] and PASIS [53].
2 CHAPTER 1. INTRODUCTION
3
Availability
Time
"You−lose" line
2−of−3 threshold scheme
1 2 4
Figure 1.1: A storage system that uses a 2-of-3 threshold sharing scheme without recovery. Availability
is plotted informally on the y axis, and time is plotted on the x axis. Suppose that the system suffers a
server failure at time 1. Data remains available through time 2 because enough servers (i.e., 2) remain to
serve shares. However, suppose that the system suffers another failure at time 3. From that time forward, an
insufficient number of servers remain.
A survivable storage system requires mechanisms to recover from component failures, as a
threshold scheme by itself is insufficient for the long term. Consider the graph in Figure 1.1 of a
three-component system that uses a 2-of-3 scheme to store data. If one assumes that all components
will eventually fail, then the system will reach a point when only one non-failed component remains.
By then, the end-user’s data will be lost. To recover from failures in threshold sharing scheme-based
systems, researchers have proposed proactive secret sharing (PSS) schemes [19, 20, 21, 31, 32, 44,
56, 55]. PSS schemes enable systems to replace lost shares and render compromised shares useless.
Recovery mechanisms for survivable storage systems must be decentralized. A na¨ıve approach
to recovery would be for a system to use a recovery server to reconstruct all of the stored data and
redistribute new shares. An obvious shortcoming with this centralized approach is that it introduces
a single point of failure in the system. If the recovery server itself crashes, recovery becomes
impossible. Worse, unauthorized users who compromise the server immediately gain the ability to
obtain all stored data.
This dissertation research contributes the first recovery mechanism that enables a system to
adjust m and n, even if some of its components are controlled by unauthorized users. The limitation
with mechanisms that only replace lost shares (such as PSS schemes) is that the assumptions behind
the selection of m and n may prove to be invalid over time: components may be more crash-prone,
or unauthorized users may be more aggressive in their attacks. The ability to change m and n
enables a system to survive more crash-prone components, by increasing m, or to defend against
more aggressive unauthorized users, by increasing n.
1.1. THESIS STATEMENT 3
1.1 Thesis statement
The thesis statement of this dissertation is:
We can create a survivable storage system that:
• Recovers from component failures,
• Adapts to changing requirements, and
• Accomplishes these goals in a decentralized manner.
To support the thesis, I present the design and experimental performance analysis of a verifiable
secret redistribution (VSR) protocol for Shamir’s threshold sharing scheme [49]. The VSR protocol
redistributes shares of data from old to new, possibly disjoint, sets of servers, such that new shares
generated by redistribution cannot be combined with old shares to reconstruct the original data. The
protocol is decentralized, and does not require intermediate reconstruction of the data; thus, one
does not create a central point of failure or risk the exposure of the data during protocol execution.
The protocol incorporates a verification capability that enables new servers to confirm that their
shares can be used to reconstruct the original data.
The rest of this dissertation is organized as follows. In Chapter 2, I present an abstract model of
a survivable storage system, and a model of a mobile adversary who subverts servers in the system
and causes them to fail. I also postulate the design requirements for a recovery protocol in the
context of a mobile adversary. In Chapter 3, I present the design of the VSR protocol, which can
be used by a storage system to counteract the adversary discussed in Chapter 2. I prove that the
shares held by servers after protocol execution can be used to reconstruct the original secret, and
demonstrate that it satisfies all of the design requirements for a recovery protocol. In Chapter 4,
I discuss the design and implementation of an experimental storage system called Hathor. The
primary purpose of Hathor is to provide a platform on which to evaluate the end-to-end cost of
storing, redistributing, and retrieving data using a variety of data distribution schemes (including
Shamir’s scheme). In Chapter 5, I present and analyze the results of experiments done to measure
the raw computational performance of the VSR protocol, as well as the results of experiments done
to measure the end-to-end cost of storing, redistributing, and retrieving data with Hathor. The
results demonstrate that although the VSR protocol is computationally expensive, the cost can be
offset through careful selection of the data distribution scheme. In Chapter 6, I survey the related
work on survivable storage systems and recovery protocols for threshold sharing schemes. Finally,
in Chapter 7, I end with a discussion of conclusions, research results, and directions of future work.
4 CHAPTER 1. INTRODUCTION
Chapter 2
System model
Lay not up for yourselves treasures upon earth, where moth and rust doth corrupt,
and where thieves break through and steal.
— Matthew 6:19 (KJV)
In this chapter, I present an abstract model of a survivable storage system consisting of clients,
servers, and a communication network. I also present a model of how an adversary might subvert
the servers in such a system, and discuss how the subverted servers could attempt to disrupt system
operation. I then discuss how the system can use a recovery protocol to counteract the actions of
the adversary, and identify the design requirements for the protocol.
2.1 Abstract storage system model
The abstract storage system model consists of clients, servers, and a communications network.
Figure 2.1 shows clients, servers, and the network channels that connect clients and servers to
each other. In this section, I present a high-level description of the system components, and the
assumptions I make about their behavior.
Clients are hosts that store and retrieve data with m-of-n distribution schemes. Clients are always
correct: given data d and scheme s, a client applies s to d according to the specification of s. Servers
are hosts that store pieces of data on stable storage. Servers are usually correct: given a piece p of
data, a server saves p on stable storage when it receives p froma client, and returns p when requested
by the client. However, a server may sometimes be faulty, exhibiting Byzantine behavior: given p,
6 CHAPTER 2. SYSTEM MODEL
Broadcast channel
Clients
Servers
n
1
m
Figure 2.1: Abstract model of a storage system with n servers. The solid lines show point-to-point connections
between components: clients are connected to servers, and a server is connected to all other servers.
The dashed lines show connections of servers to a broadcast channel. Dots indicate elided clients or servers.
a faulty server may refuse to save p on stable storage, or save a corrupted piece p′ instead of p, or
refuse to return p. I assume that correct servers make forward progress; in particular, a server is
deemed faulty if it does not send messages when expected to in a timely manner.
I assume that underlying authentication and permission mechanisms exist for a server to confirm
that a client has the right to store a piece at the server. I also assume for now that the membership
of the set of servers is constant, though I will relax this assumption later.
The network in the system provides point-to-point channels from a client to all servers, and from
a server to all other servers. The channels deliver messages reliably in first-in, first-out (FIFO) order
for each pair of hosts: if a host A sends a message M to host B, B is guaranteed to receive M, and
guaranteed to receive M before any other message M′ that A sends after sending M. The channels
are authenticated: if A sends a message to B, no host E can masquerade as either A or B. Lastly, the
channels are private: if A sends a message to B, its contents are known only to A and B.
The network also provides a broadcast channel that connects all servers. The channel delivers
messages from a server reliably in FIFO order: if a server broadcasts a message M followed by
M′, all other servers receive M followed by M′. However, the channel does not enforce an ordering
between messages sent by different servers: if server A broadcasts MA, and server B broadcasts
MB, server C may receive MA followed by MB or MB followed by MA, and server D may receive
2.2. THE MOBILE ADVERSARY AND ITS EFFECT ON SERVERS 7
MA and MB in an order different from C. The channel is authenticated: no server E, regardless of
whether it is correct or faulty, can masquerade as some other server (i.e., E cannot forge signatures
on messages).
2.2 The mobile adversary and its effect on servers
An adversary is an external entity that attempts to subvert servers. Once subverted, a server is
controlled by the adversary, and may behave in ways that deviate from its specification. In this
section, I present a temporal model of how an adversary may subvert target servers.
Conceptually, the adversary is an external host that is connected by point-to-point channels to all
servers, while also being connected to the broadcast channel. The adversary is not connected to the
clients, and in any case cannot subvert the clients (consistent with the assumption of correct clients).
The adversary cannot eavesdrop on the point-to-point channels, or inject messages into those channels
that purport to be from other hosts (consistent with the assumptions about the point-to-point
channels). The adversary can see all messages sent over the broadcast channel, and can also send
messages over the channel or inject messages via servers under its control. The adversary causes
subverted servers to become faulty, and exhibit the Byzantine behavior discussed in Section 2.1. In
turn, a faulty server may reveal its memory contents (e.g., stored pieces of data) to the adversary.
To reason about the temporal behavior of the adversary, I adopt the mobile adversary model
proposed by Ostrovsky and Yung [39] and refined by Herzberg et al. [32]. In the model, time is
divided into epochs in which the adversary may subvert a limited number of servers. The limit is
specified implicitly by the data distribution scheme used in the system, e.g., for an m-of-n scheme,
the adversary may control at most m−1 servers per epoch. An update phase separate consecutive
epochs. At the start of the update phase, the system tries to remove all subverted servers from
under the control of the adversary, though it might succeed in removing only some or none. A
formerly subverted server may have corrupted memory contents. After the end of the update phase,
the adversary may again subvert servers, up to the limit specified by the distribution scheme. The
adversary is constrained from re-subverting servers during the update phase, which is reasonable
provided that an update phase is short relative an epoch. Ostrovsky and Yung assume that the
system has some external mechanism for the detection and repair of subverted servers.
One can visualize the mobile adversary as having a fixed number of “pebbles” that it may place
on servers during each epoch. A server is covered by at most one pebble. At the start of the update
phase, the system tries to remove all pebbles and gives them back to the adversary, though it might
8 CHAPTER 2. SYSTEM MODEL
Servers
1 m−1
Epoch t
1 m
n
m+1
m−1
m−2
1 m−1 m
Epoch t+1
m
m−1 n
m−2
m+1
1 m−1
m
m−1 n
m−2
m+1
Update phase t
1 1
Adversary
Servers
Adversary
Servers
Adversary
Figure 2.2: A storage system that uses an m-of-n data distribution scheme, in the presence of a mobile
adversary. The memory contents (i.e., pieces of data) of the adversary and servers are shown. The adversary
may only control at most m−1 servers per epoch. Crosshatched servers are controlled by the adversary. In
epoch t, the adversary subverts servers 1 through m−1 and obtains their pieces. During the update phase,
the system removes all servers from under the control of the adversary. In epoch t+1, the adversary subverts
server m and obtains its piece. The adversary can then reconstruct the original data.
succeed in removing only some or none. At the start of the next epoch, the adversary may again
place pebbles on servers. A server that is covered by a pebble is controlled by the adversary. A
covered server that is later uncovered is no longer controlled by the adversary.
Even though a mobile adversary can only subvert a limited number of servers in each epoch,
it can eventually subvert every server over multiple epochs. For example, consider the system in
Figure 2.2 that uses an m-of-n scheme. The adversary may control at most m−1 servers per epoch.
Initially, in epoch t, the adversary subverts servers 1 through m−1, and obtains the pieces held by
those servers. During the update phase, the system removes all of the servers from under the control
of the adversary. In epoch t +1, the adversary subverts server m, and obtains the piece held by m;
thus, the adversary may now reconstruct the original data.
To counteract the adversary, the system requires a recovery protocol to execute after it has tried
to remove subverted servers from under the control of the adversary. I postulate the following
protocol design requirements in the context of the mobile adversary model:
• It must generate new pieces for the next epoch such that they can be used to reconstruct
the secret, and such that they cannot be combined with pieces from the current epoch to
reconstruct the secret.
2.2. THE MOBILE ADVERSARY AND ITS EFFECT ON SERVERS 9
Servers
1 m−1 1 m−1 1 m−1
Epoch t
1 m
n
m+1
m−1
m−2
Epoch t+1
Update phase t
1
m−2
m−1 n
m+1
m
m
1
m−2
m−1
m
m+1
n
Adversary Adversary Adversary
Servers Servers
Figure 2.3: A storage system with a recovery protocol in the presence of a mobile adversary. The adversary
may only control at most m−1 servers per epoch. Crosshatched servers are controlled by the adversary.
The system executes a recovery protocol during the update phase to generate new (shaded) pieces for correct
servers. New pieces cannot be combined with current pieces to reconstruct the original data. In epoch t, the
adversary subverts servers 1 through m−1 and obtains their pieces. During the update phase, the system
removes all servers from under the control of the adversary. In epoch t +1, the adversary subverts server m
and obtains its piece. However, the adversary does not obtain enough pieces (current or new) to reconstruct
the original data.
• It must include mechanisms to prevent the adversary from corrupting protocol execution,
because the adversary may still control some servers during the update phase.
• It must erase the pieces for the current epoch from server memories, to prevent the adversary
from ever obtaining any other current pieces it needs.
With a recovery protocol that satisfies these design requirements, the system can prevent a mobile
adversary from ever obtaining enough pieces to reconstruct the original data; I will prove this
point in Section 3.3. For example, consider the system in Figure 2.3 that uses an m-of-n scheme,
and contrast it with the system in Figure 2.2. As before, the adversary may control at most m−1
servers per epoch. Initially, in epoch t, the adversary subverts servers 1 through m−1, and obtains
the pieces held by those servers. During the update phase, the system removes all of the servers
from under the control of the adversary, and executes the recovery protocol. In epoch t +1, the adversary
subverts server m, and obtains the piece held by m; however, this piece cannot be combined
with the pieces from t to reconstruct the data.
10 CHAPTER 2. SYSTEM MODEL
2.3 The dynamic membership model
Ostrovsky and Yung assume a static membership model of the system in their mobile adversary
model [39]. In their model, the number of servers, the membership of the set of servers, and the
threshold parameter of the underlying data distribution scheme are all fixed throughout the duration
of system operation. In practice, however, one might wish for the system to exclude subverted
servers from the system, replacing them with new servers. Also, one might wish to increase the
number of servers in the system (and the threshold parameter of the underlying distribution scheme),
in order to increase the number of servers the adversary must subvert in order to reconstruct data.
I adopt a dynamic membership model of a system by extending the mobile adversary model to
allow servers to join and leave the system. As in the original model, I divide time into epochs during
which an adversary may subvert a limited number of servers. Consecutive epochs are separated
by an update phase. At the start of the update phase, the system tries to remove all subverted
servers from under the control of the adversary (though it might succeed in removing only some or
none), and admits new servers. It then executes the recovery protocol before allowing old servers
to leave the system. After the end of the update phase, the adversary may again subvert servers.
The adversary is constrained from subverting servers during the update phase, and from corrupting
the state of underlying membership protocols that manage the set of servers. An old server, once it
leaves the system, is treated like a new server if it tries to rejoin the system.
The dynamic membership model impacts the design of mechanisms put in place to prevent an
adversary from corrupting recovery protocol execution (the second requirement in Section 2.2).
The set of servers in the next epoch may be completely disjoint from the set in the current epoch,
thus none of the new servers will have any information about the original data, (e.g., none of them
will have ever stored pieces of the data). This lack of information rules out simple adaptations of
recovery protocols designed around static membership models, in which participant servers perform
verification computations (to prevent an adversary from corrupting execution) in the current update
phase that require information that they have received in the previous update phase.
I impose a new requirement on the recovery protocol to accommodate dynamic membership:
• It must enable the system to change the threshold parameter of the underlying data distribution
scheme.
I motivate the new requirement with the following example. Consider a system of n servers
that uses an n-of-n scheme. Suppose that the set of servers changes such that there are n′ servers,
2.4. SUMMARY 11
where n′ < n. The system must change the parameters of the underlying distribution scheme so that
a lower number of pieces can be used to reconstruct the data.
Because the threshold parameter may change, I need to add a new pair of assumptions to the
adversary model. Suppose the system uses an m-of-n scheme in the current epoch, and will use an
m′-of-n′ scheme in the next epoch. With these parameters, the adversary can control at most m−1
servers in the current epoch, and at most m′ −1 servers in the next epoch. Also, at the start of the
update phase, I assume that the system removes subverted servers from under the control of the
adversary such that, during the update phase, at most m−1 servers from the current epoch and at
most m′−1 servers from the next epoch are subverted. Previously, in a system that used an m-of-n
scheme, the adversary could control at most m−1 servers per epoch; I assumed that the system
would try to remove all subverted servers from under the control of the adversary at the start of the
update phase, though it might succeed in removing only some or none.
2.4 Summary
I have presented the abstract storage system model that I will use throughout the remainder of this
dissertation. Distinct from previous work, the set of servers in the system has a dynamic membership:
the number of servers, the membership of the set, and the threshold parameter of the underlying
data distribution scheme may change during the lifetime of the system. I have also discussed
the design requirements for a recovery protocol that the system can use to counteract an adversary
in the context of a dynamic membership model.
12 CHAPTER 2. SYSTEM MODEL
Chapter 3
Verifiable Secret Redistribution
Trust, but verify.
— Russian proverb
In this chapter, I present the verifiable secret redistribution (VSR) protocol for threshold sharing
schemes. The VSR protocol is designed to counteract a mobile adversary in a system of servers
with dynamic membership, and ensure that the servers have valid shares of data after redistribution.
The protocol executes in the update phase between epochs, after the system has removed some
(perhaps none) of the servers from under the control of the adversary. Any shares of data obtained by
the adversary prior to protocol execution are rendered useless after successful execution, provided
that the adversary had only obtained a sub-threshold number of shares. Moreover, the adversary
cannot combine shares obtained prior to protocol execution with shares obtained after execution to
reconstruct the data.
In the presentation of the VSR protocol, I employ the terminology that is generally used in the
discussion of threshold sharing schemes. Thus, in this chapter I refer to data as secrets, clients as
dealers, and servers as shareholders.
The rest of this chapter is organized as follows. In Section 3.1, I outline the cryptographic protocols
that are the building blocks for the VSR protocol. In Section 3.2, I present the VSR protocol,
and prove that shares held by shareholders after protocol execution can be used to reconstruct the
original secret. In Section 3.3, I show that the VSR protocol fulfills all of the design requirements
discussed in Chapter 2, and demonstrate how it counteracts a mobile adversary.
14 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
3.1 Cryptographic building blocks
In this section, I outline the building blocks of the VSR protocol: Shamir’s threshold sharing scheme
[49], Desmedt and Jajodia’s secret redistribution protocol [16], and Feldman’s VSS scheme [17].
3.1.1 Shamir’s threshold sharing scheme
Shamir presents a scheme for distributing n shares of a secret to n shareholders, such that the shares
of any subset of m unique shareholders can be used to reconstruct the secret [49]. Secrets k are in the
finite field of integers Zp (where p is prime and p > n). Shareholders i are in the set of participants
P (|P| = n). Shares si of i are also in the set Zp. Each subset of m unique shareholders forms an
authorized subset B; all authorized subsets are in the access structure G(m,n)
P .
To distribute k to i ∈ P, a dealer selects a random m−1 degree polynomial a(x) with constant
term equal to k and random coefficients a1 ... am−1 ∈ Zp, and uses a(x) to generate si for each i:
si = k+a1i+. . .+am−1im−1 mod p (3.1)
To reconstruct k, the dealer retrieves m shares si of i ∈ B, and uses Lagrange interpolation to
recover the constant term of a(x), i.e., k:
k = å
i∈B
bisi mod p where bi = Õ
j∈B, j6=i
j
( j−i)
mod p (3.2)
For the rest of the chapter, I omit the modulus operator to simply the notation.
3.1.2 Desmedt and Jajodia’s secret redistribution protocol
Desmedt and Jajodia present a protocol for the redistribution of shares of secrets for threshold
sharing schemes [16] that does not require the intermediate reconstruction of the secret. I summarize
a specialized version of their protocol for use with Shamir’s threshold sharing scheme [49], as shown
in Figure 3.1. To redistribute a secret k from the access structure G(m,n)
P to the access structure
G(m′,n′)
P′ , one selects an authorized subset B ∈ G(m,n)
P . Each shareholder i ∈ B uses Shamir’s scheme
to distribute subshares ˆ si j of its share si to each shareholder j ∈ P′:
ˆ si j = si+a′
i1′ j+. . .+a′
i(m′−1) jm′−1 (3.3)
3.1. CRYPTOGRAPHIC BUILDING BLOCKS 15
Desmedt and Jajodia’s Secret Redistribution protocol for Shamir’s scheme
To redistribute a secret k ∈ Zp from the access structure G(m,n)
P to the access structure G(m′,n′)
P′ using the
authorized subset B ∈ G(m,n)
P :
1. For each i ∈ B, use the random polynomial a′
i( j) = si+a′
i1 j+. . .+a′
i(m′−1) jm′−1 to compute the
subshares ˆ si j of si, and send ˆ si j to the corresponding j ∈ P′.
2. For each j ∈ P′, generate a new share s′
j by Lagrange interpolation:
s′
j = å
i∈B
bi ˆ si j where bi = Õ
x∈B,x6=i
x
(x−i)
bi are constant for each i ∈ B, are independent of the choice of a′
i(x), and may be precomputed.
Figure 3.1: Protocol for the redistribution of shares of a secret from the access structure G(m,n)
P to the access
structure G(m′,n′)
P′ [16] with Shamir’s threshold sharing scheme [49].
Each j then generates a new share s′
j by Lagrange interpolation:
s′
j = å
i∈B
bi ˆ si j where bi = Õ
x∈B,x6=i
x
(x−i)
(3.4)
One can redistribute shares of k an arbitrary number of times prior to the reconstruction of k.
To reconstruct k after redistribution, one retrieves m′ shares s′
j of j ∈ B′, and uses Lagrange
interpolation:
k = å
j∈B′
b′
js j where b′
j = Õ
x∈B′,x6=j
x
(x− j)
(3.5)
3.1.3 Feldman’s VSS scheme
Feldman presents a scheme for verifiable secret sharing (VSS) [17] that shareholders of a secret can
use to verify that their shares are valid; i.e., the shares of any authorized subset of shareholders can
be used to reconstruct the original secret.
To use Feldman’s scheme, assume that there exists a homomorphic witness functionW(x) of the
form
16 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
Feldman’s Verifiable Secret Sharing scheme for Shamir’s scheme
To distribute a secret k ∈ Zp to the access structure G(m,n)
P :
1. Use the random polynomial a(i) = k+a1i+. . .+am−1im−1 to compute the shares si of k, and send si to the
corresponding i ∈ P over private channels.
2. Use a witness function W(x) to compute W(k),W(a1) . . .W(am−1), and send them to all i ∈ P over the
broadcast channel.
3. For each i ∈ P, verify that:
W(si) ≡W(k)⊕(W(a1)⊗i)⊕. . .⊕(W(a1)⊗im−1)
If the condition holds, i broadcasts a “commit” message. Otherwise, i broadcasts an “abort” message.
Figure 3.2: Feldman’s VSS scheme [17] for Shamir’s threshold sharing scheme [49].
W(a+b) = W(a)⊕W(b) (3.6)
W(ab) = W(a)⊗b (3.7)
for which inversion is intractable: that is, given W(x) it is intractable to compute x. The witness
functions allows one to prove knowledge of some value without revealing the value. The ⊕ and
⊗ operations are the homomorphic equivalents of addition and multiplication in the finite field of
integers.
The VSS scheme works as follows. The dealer uses Shamir’s scheme with a random polynomial
a(x) to distribute the secret k ∈ Zp to the access structure G(m,n)
P . In addition to sending the shares si
to shareholders i ∈ P, the dealer broadcasts witnesses of k and the coefficients a1 ... am−1 of a(x),
i.e.,W(k) andW(a1) ... W(am−1). Each i may then verify that si is a valid share of k using
W(si) ≡W(k)⊕(W(a1)⊗i)⊕. . .⊕(W(a1)⊗im−1) (3.8)
which is the result of applying W(x) to a(x) (Equation (3.1)). Because the inversion of W(x) is
intractable, no one can learn k or a1 ... am−1 from the broadcast of the witnesses.
In this dissertation, I consider two candidate witness functions: exponentiation in a finite field
and point multiplication on an elliptic curve.
3.1. CRYPTOGRAPHIC BUILDING BLOCKS 17
Exponentiation
The finite field of integers Zp (p prime) has a corresponding multiplicative ring Z∗q (q prime; q =
pr+1; r is a positive integer). Given Zp and Z∗q, one can define a witness function
W(x) = gx where x ∈ Zp (3.9)
where integer g ∈ Z∗q is a publicly-known generator of a sub-ring of Z∗q of prime order. The intractability
of the inversion of W(x) is based on the discrete logarithm problem (DLP) for finite
fields: given g and gk, it is hard to compute k. The ⊕ operation is multiplication in Z∗q, and the ⊗
operation is exponentiation in Z∗q (cf. Equation (3.6)):
W(a+b) = gagb (3.10)
W(ab) = (ga)b (3.11)
Point multiplication on an elliptic curve
An elliptic curve E over the finite field Fpm (p prime; m is a positive integer) is denoted by E(Fpm),
and is defined by the equation
y2 = x3+ax+b (p ≥ 3) (3.12)
y2+xy = x3+ax+b (p = 2) (3.13)
where the coordinate points x, y ∈ Fpm. The points on the curve form a group comprising |E(Fpm)|
points, with a rule that defines the addition of points P and Q and another rule that defines the
multiplication of a point P by a scalar integer k. The group contains an additive identity called the
point at infinity O (that is, a point P added to the point at infinity yields P). Blake, Seroussi, and
Smart present a more comprehensive discussion of elliptic curves in cryptography [6].
Given an elliptic curve, one can define a witness function
W(x) = [x]G where 1 < x < r, [r]G = O (3.14)
18 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
where G ∈ E(Fpm) is a publicly-known point (cf., g for exponentiation) that generates a prime order
subgroup of E(Fpm) of order r < |E(Fpm)|. The notation [x]G distinguishes the scalar variable x
from the multi-dimensional coordinate G. The intractability of the inversion of W(x) is based on
the analog to the DLP for elliptic curves: given P and [k]P, it is hard to compute k. The ⊕ operation
is point addition, and the ⊗ operation is scalar multiplication (cf. Equation (3.6)):
W(a+b) = [a]G+[b]G (3.15)
W(ab) = [b]([a]G) (3.16)
3.2 The VSR protocol
Here, I present the verifiable secret redistribution protocol for secrets distributed with Shamir’s
threshold sharing scheme [49]. The protocol takes as input shares of a secret distributed to an
authorized subset B in the access structure G(m,n)
P , and outputs shares of the secret distributed to the
access structure G(m′,n′)
P′ . I assume that there exists a witness functionW(x) with the properties shown
in Equation (3.6), and assume that inversion of W(x) is intractable. The system model is the one
presented in Section 2.1, in which the dealer is always correct, but in which some of the shareholders
may be subverted by an adversary. I assume that there are at least m correct old shareholders, and
that there are at most m−1 faulty old shareholders. I also assume that there are at least m′ correct
new shareholders, and that there are at most m′ −1 faulty new shareholders. I assume that correct
shareholders make forward progress; a shareholder is deemed faulty if it does not send protocol
messages in a timely manner. The dealer and shareholders are fully connected to each other by
private channels, and shareholders are also connected to each other by a broadcast channel.
The initial distribution of a secret (INITIAL in Figure 3.3) is with Shamir’s threshold scheme
[49]. The dealer of secret k distributes shares si to each shareholder i ∈ P, using the random polynomial
a(i) (step 1 of INITIAL). The dealer also sends the witness W(k) to each i (step 2). Each i
receives the same value forW(k), consistent with the assumption that the dealer is correct.
Redistribution of the secret (REDIST in Figure 3.3) is similar to Desmedt and Jajodia’s protocol
[16]. Each i in an authorized subset B ∈ G(m,n)
P uses Shamir’s scheme (with the random polynomial
a′
i( j)) to distribute subshares ˆ si j ∈ Zp of its share si to shareholders j ∈ P′ (step 1 of REDIST). Each
j receives ˆ si j from each i, and generates a new share s′
j (Equation (3.4), which is step 4). One can
redistribute shares of k an arbitrary number of times prior to reconstruction of k.
3.2. THE VSR PROTOCOL 19
Verifiable Secret Redistribution protocol for Shamir’s sharing scheme
INITIAL
To distribute a secret k ∈ Zp to the access structure G(m,n)
P :
1. Use the random polynomial a(i) = k+a1i+. . .+am−1im−1 to compute the shares si of k, and send si to the
corresponding i ∈ P over private channels.
2. Use witness functionW(x) to computeW(k), and send it to all i ∈ P over private channels.
REDIST
To redistribute k ∈ Zp from the authorized subset B in the access structure G(m,n)
P to the access structure G(m′,n′)
P′ :
1. For each i ∈ B, use the random polynomial a′
i( j) = si+a′
i1 j+. . .+a′
i(m′−1) jm′−1 to compute the subshares
ˆ si j of si, and send ˆ si j to the corresponding j ∈ P′ over private channels. An i that has a null share sends
nothing. A j that does not receive a timely ˆ si j from i broadcasts an “abort” message.
2. For each i ∈ B, use W(x) to compute W(si),W(a′
i1) . . .W(a′
i(m′−1)), and send them and W(k) to all j ∈ P′
over the broadcast channel. An i that has a null share sends nothing. A j that does not receive a timely
broadcast from i broadcasts an “abort” message.
3. For each j ∈ P′, verify that:
∀i ∈ B :W( ˆ si j) ≡W(si)⊕(W(a′
i1)⊗ j)⊕. . .⊕(W(a′
i(m′−1))⊗ jm′−1)
and:
W(k) ≡M
i∈B
W(si)⊗bi where bi = Õ
l∈B,l6=i
l
(l−i)
If the conditions hold, j broadcasts a “commit” message. Otherwise, j broadcasts an “abort” message. A j
that does not send a timely “commit” message is assumed to have implicitly sent an “abort” message.
4. If at least 2m′ −1 j ∈ P′ broadcast “commit” messages, each j generates a new share s′
j :
s′
j = å
i∈B
bi ˆ si j where bi = Õ
l∈B,l6=i
l
(l−i)
and stores s′
j and W(k); all i ∈ P then erase their shares. A j that has received fewer than m subshares
generates a null share.
5. If at least m′ j ∈ P′ broadcast “abort” messages, all i ∈ B and all j ∈ P′ abort the protocol.
Figure 3.3: Protocol for the verifiable redistribution of shares of a secret from the authorized subset B in the
access structure G(m,n)
P to the access structure G(m′,n′)
P′ with Shamir’s threshold sharing scheme [49].
20 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
For the new shareholders to have a valid sharing of the secret after redistribution, two conditions,
called SHARES-VALID and SUBSHARES-VALID, must hold:
SHARES-VALID:
k = åi∈B bisi
SUBSHARES-VALID:
∃B′ ∈ G(m′,n′)
P′ : ∀i ∈ B : si = åj∈B′ b′
j ˆ si j
The VSR protocol must ensure that correct new shareholders have a valid sharing after protocol
execution, i.e., that there exists an authorized subset with shares that can be used to reconstruct
the original secret. Thus, I define a NEW-SHARES-VALID condition, which holds if an authorized
subset of new shareholders have valid shares of the secret. I prove in Section 3.2.4 that NEW-
-SHARES-VALID holds if SHARES-VALID and SUBSHARES-VALID hold. The definition of NEWSHARES-
VALID is similar to Equation (3.2):
NEW-SHARES-VALID:
∃B′ ∈ G(m′,n′)
P′ : k = åj∈B′ b′
js′
j
The definition of NEW-SHARES-VALID may seem counterintuitive, as one may expect any subset
of m′ new shareholders to have shares than can be used to reconstruct k. However, observe that
some new shareholders may not have valid shares. First, faulty new shareholders may not have
valid shares. Second, when 2m′ −1 shareholders broadcast a “commit” message (step 5 of REDIST),
there may exist new shareholders that have received fewer than m subshares, and thus cannot
generate new shares; such shareholders must store a null share. As I will prove in Section 3.2.4,
there will still exist at least m′ new shareholders that can generate new shares.
I use Feldman’s VSS scheme [17] to verify that SUBSHARES-VALID holds. The distribution of
ˆ si j from si (step 1 of REDIST) is just an application of Shamir’s scheme. Thus, each i ∈ B broadcasts
witnesses of its share and the coefficients of a′
i( j), W(si) and W(ai1) ... W(ai(m−1)), which each j
uses to verify the validity of ˆ si j (step 2).
The key insight embodied in the VSR protocol is that the na¨ıve extension of Desmedt and
Jajodia’s protocol with Feldman’s scheme does not in itself allow the new shareholders to verify that
NEW-SHARES-VALID holds. The difficulty arises because Feldman’s scheme only verifies that SUBSHARES-
VALID holds, which in the absence of SHARES-VALID is insufficient to verify that NEW-
-SHARES-VALID holds. Although Desmedt and Jajodia observe that the linear properties of their
3.2. THE VSR PROTOCOL 21
protocol and the properties of W(x) ensure that each j generates valid shares [16], they implicitly
assume that each i ∈ B distributes subshares of valid si. The VSS scheme only enables i to prove
that it distributed valid ˆ si j of some value. However, i may have distributed “subshares” of some
random value instead of subshares of si. Thus, one requires a sub-protocol for i to prove that it
distributed ˆ si j of si.
A similar flaw can be found in the proactive RSA scheme proposed by Frankel et al. [18]. Their
protocol uses a poly-to-sum redistribution from a polynomial sharing scheme to an additive sharing
scheme, and a sum-to-poly redistribution from the additive scheme back to a polynomial scheme.
They suggest that changes in the threshold and number of shareholders can be accommodated in
the poly-to-sum redistribution. Unfortunately, their verification checks hold only if one retains the
same set of shareholders, because their “proper secret” check relies on a witness (gsiL2 in their
paper) computed from information distributed in the preceding execution of the protocol.
To allow the new shareholders to verify that SHARES-VALID holds, the old shareholders in
the protocol broadcast a witness of the original secret k. Each i ∈ B stores W(k) received during
INITIAL and later broadcast it to all j ∈ P′. Recall that each j receives W(si) from each i to verify
that SUBSHARES-VALID holds. Once j receivesW(k), it verifies that each si is a valid share of k:
W(k) =M
i∈B
W(si)⊗bi (3.17)
which is the result of applying W(x) to Equation (3.2). Because inversion of W(x) is intractable,
no-one can learn k directly from the broadcast ofW(k).
3.2.1 Assumptions about faulty shareholders
During redistribution from the authorized subset B in access structure G(m,n)
P to access structure
G(m′,n′)
P′ with the VSR protocol, I assume that at least m shareholders in P are correct, and that at
most m−1 shareholders in P are faulty. I also assume that at least m′ shareholders in P′ are correct,
and that at most m′ −1 shareholders in P′ are faulty. I denote faulty shareholders and invalid
values with over-bars. A correct shareholder i ∈ P distributes valid subshares ˆ si j of its share si to
all shareholders j ∈ P′ and broadcasts W(k) corresponding to secret k (REDIST in Figure 3.3). A
faulty shareholder ı ∈ P may distribute invalid subshares ˆ sı j or broadcast W(k) not corresponding
to k; of course, ı may instead distribute valid subshares or valid witnesses.
22 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
One can derive a relationship between the old threshold scheme parameters m and n. If at least
m old shareholders must be correct and at most m−1 old shareholders may be faulty, it is required
that m+m−1 ≤ n, or
m ≤ ¹n+1
2 º (3.18)
One can also derive a relationship between the new threshold scheme parameters m′ and n′. To
avoid a race condition in which some new shareholders have received at least 2m′ −1 “commit”
messages and fewer than m′ “abort” messages, while others have received at least m′ “abort” messages
but fewer than 2m′−1 “commit” messages, it is required that (2m′−1)+(m′−1) ≥ n′. Also,
to ensure that new shareholders broadcast at least 2m′ −1 “commit” messages, it is required that
2m′−1 ≤ n′. Hence,
»n′+2
3 ¼≤ m′ ≤ ¹n′+1
2 º (3.19)
3.2.2 Detection of faulty old shareholders
The VSR protocol enables new shareholders to detect that some subset of the old shareholders are
faulty. However, depending on the actions taken by faulty shareholders, the new shareholders may
not be able to pinpoint the identity of the faulty shareholders. I consider two different scenarios for
redistribution between access structures G(m,n)
P and G(m′,n′)
P′ , and show the circumstances in which the
new shareholders can pinpoint a faulty old shareholder ı in an authorized subset B ∈ G(m,n)
P .
First, suppose that ı ∈ B broadcasts valid witnesses W(k), W(sı) and W(aı1) ... W(aı(m−1))
(step 2 of REDIST in Figure 3.3). If ı sends an invalid subshare ˆ sı j to j ∈ P′, j will find that
the SUBSHARES-VALID condition does not hold. j can pinpoint ı, because only ı can generate
the information used to verify whether or not SUBSHARES-VALID holds (i.e., W(sı), W(aı1) ...
W(aı(m−1)), and ˆ sı j). j will therefore broadcast an “abort” message (step 3 of REDIST).
On the other hand, suppose that ı broadcasts an invalid witnessW(k) 6=W(k) (or an invalid witnessW(
sı)), while the other shareholders i ∈ B broadcast the valid witnessW(k) (or valid witnesses
W(si)). j will find that the SHARES-VALID condition does not hold. However, j cannot pinpoint ı,
because j cannot distinguish between the case where ı broadcasts an invalid witness, and the case
where ı is correct and the other shareholders in B have conspired to broadcast invalid witnesses.
3.2. THE VSR PROTOCOL 23
Any randomly selected authorized subset B must contain at least one correct shareholder (consistent
with the assumption that at most m−1 old shareholders are faulty). If the new shareholders
find that one of the SHARES-VALID or SUBSHARES-VALID conditions does not hold, they can restart
the redistribution protocol with another authorized subset until both conditions hold. For G(m,n)
P , the
number of times the new shareholders must restart the redistribution protocol is bounded in the
worst case by the number of sets of size m containing at least one faulty shareholder, given m−1
faulty shareholders:
Ãn
m!−Ãn−m+1
m !=
m−1
å
i=1 Ãm−1
i !Ãn−m+1
m−i ! (3.20)
The VSR protocol does not specify a restart mechanism. A system that incorporates the VSR
protocol would need to implement a mechanism to detect “abort” messages (step 3 of REDIST in
Figure 3.3), and restart the protocol with a different authorized subset in G(m,n)
P .
3.2.3 Computation cost
The computation cost of verification for each old shareholder in the VSR protocol (step 2 of REDIST
in Figure 3.3) is O(m′)W(x) computations. Consider redistribution from the access structure G(m,n)
P
to the access structure G(m′,n′)
P′ . Each old shareholder i ∈ B (B ∈ G(m,n)
P ) computesW(si) for its share
si, andW(ai j) for each of the m′−1 coefficients ai j in the subshare generation polynomial ai(x), for
a total cost of O(m′).
The computation cost of verification for each new shareholder (step 3 of REDIST) is O(m)W(x)
computations, O(mm′) ⊕ operations, and O(mm′) ⊗ operations. Again, consider redistribution from
the access structure G(m,n)
P to the access structure G(m′,n′)
P′ . Each new shareholder j ∈ P′ computes
W( ˆ si j) to obtain a witness of the subshare ˆ si j from each i (i ∈ B, |B| = m), for a total cost of
O(m). Each j also performs m′ −1 ⊕ operations (B′ ∈ GP′ ; |B′| = m′) and m′ −1 ⊗ operations
for m old shareholders i ∈ B to verify that SUBSHARES-VALID holds (Equation (3.8)), for a total
cost of O(mm′). Finally, each j also performs m−1 ⊕ operations and m ⊗ operations to verify
that SHARES-VALID holds (Equation (3.17)), for a total cost of O(m); I exclude the (small) cost of
computing the powers of i because it is generally small compared to the cost of W(x), ⊕, and ⊗
(Chapter 5).
24 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
3.2.4 Protocol correctness on termination
For the verifiable redistribution of shares of a secret k from an authorized subset B in the access
structure G(m,n)
P to the access structure G(m′,n′)
P′ , I prove that if at least 2m′ −1 new shareholders
broadcast a “commit” message, SHARES-VALID and SUBSHARES-VALID both hold. I then prove
that SHARES-VALID and SUBSHARES-VALID are sufficient conditions for NEW-SHARES-VALID.
Lemma 1 If at least 2m′ −1 shareholders in P′ broadcast a “commit” message, SUBSHARESVALID
holds.
PROOF: Assume that at least 2m′ −1 shareholders in P′ broadcast a “commit” message. Also,
recall the assumption that, at most, m′ −1 shareholders in P′ are faulty. I then need to prove that
SUBSHARES-VALID holds.
Consider the shares si of old shareholders i ∈ B, and the subshares ˆ si j that are distributed to new
shareholders j ∈ P′. At most, m′ −1 “commit” messages originate from faulty new shareholders.
Because at least 2m′−1 new shareholders broadcast “commit” messages, at least m′ messages must
originate from correct shareholders that together constitute an authorized subset B′ ∈ G(m′,n′)
P′ . Each
j ∈ B′ broadcasts a “commit” message only if Equation (3.8) holds for its subshare ˆ si j from each i
(step 3 of REDIST in Figure 3.3), i.e.,
∀ j ∈ B′ : ∀i ∈ B :W( ˆ si j) ≡W(si)⊕(W(a′
i1)⊗ j)⊕. . .⊕(W(a′
i(m′−1))⊗ jm′−1)
which (from the homomorphic properties of the witness functionW(x)) is equivalent to
∀ j ∈ B′ : ∀i ∈ B : ˆ si j = si+a′
i1 j . . .a′
i(m′−1) jm′−1
The ˆ si j of j can be used to reconstruct si by Lagrange interpolation, i.e.,
∀i ∈ B : si = å
j∈B′
b′
j ˆ si j
¤
3.2. THE VSR PROTOCOL 25
Lemma 2 If at least m′ shareholders in P′ broadcast a “commit” message, SHARES-VALID holds.
PROOF: Assume that at least m′ shareholders in P′ broadcast a “commit” message. Also, recall the
assumption that, at most, m′ −1 shareholders in P′ are faulty. I then needs to prove that SHARESVALID
holds.
Consider the secret k, the shares si of old shareholders i ∈ B, and the witnesses W(si) of the
shares andW(k) of the secret. At most, m′−1 “commit” messages originate from faulty shareholders.
Any remaining “commit” messages must originate from correct shareholders j ∈ P′. Each j
broadcasts a “commit” message only if Equation (3.17) holds (step 3 of REDIST in Figure 3.3), i.e.,
W(k) =M
i∈B
W(si)⊗bi
which (from the homomorphic properties of the witness functionW(x)) is equivalent to
k = å
i∈B
bisi
Note that because the witnesses are sent by reliable broadcast, all new shareholders will see the
same values. Thus, if SHARES-VALID holds at one correct shareholder, it will hold at all correct
shareholders. ¤
Theorem 1 (VSR correctness on termination) If SHARES-VALID and SUBSHARES-VALID hold,
NEW-SHARES-VALID holds.
PROOF: Assume that SHARES-VALID and SUBSHARES-VALID hold. One then needs to prove that
NEW-SHARES-VALID holds.
The correctness proof is similar to that for Desmedt and Jajodia’s secret redistribution protocol
[16]. There exists B′ ∈ G(m′,n′)
P′ such that:
26 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
k = å
i∈B
bisi (Lemma 2)
= å
i∈BÃbi å
j∈B′
b′
j ˆ si j! (Lemma 1)
= å
i∈B
å
j∈B′
bib′
j ˆ si j (x(y+z) = xy+xz)
= å
i∈B
å
j∈B′
b′
jbi ˆ si j (xy = yx)
= å
j∈B′
å
i∈B
b′
jbi ˆ si j (x+y = y+x)
= å
j∈B′Ãb′
j å
i∈B
bi ˆ si j! (xy+xz = x(y+z))
= å
j∈B′
b′
js′
j (Equation (3.4))
¤
3.2.5 Protocol security
For the verifiable redistribution of shares of a secret k from an authorized subset B in the access
structure G(m,n)
P to the access structure G(m′,n′)
P′ , I show that an adversary cannot reconstruct k from
a combination of shares from the old and new sets of shareholders. In particular, I prove a lemma,
Lemma 7, that an adversary cannot combine the shares of shareholders in the non-authorized subset
B /∈ G(m,n)
P (|B| < m) and the shares of shareholders in the non-authorized subset B′ /∈ G(m′,n′)
P′
(|B′| < m′) to uniquely determine k. I have not been able to prove a second lemma, Lemma 8, that
a computationally-bound adversary cannot use the shares of shareholders in B with the witnesses
W(k),W(s1), . . . ,W(sm) to uniquely determine k. I do, however, provide a conjecture (and corresponding
proof sketch) that would follow from Lemmas 7 and 8: a computationally-bound adversary
cannot use the shares of shareholders in B and B′, and the witnesses W(k),W(s1), . . . ,W(sm)
to uniquely determine k.
3.2. THE VSR PROTOCOL 27
I require some lemmas presented by Beaumont [4] and Kostrikin [36] for systems of u linear
equations in v unknowns of the form
m11x1+m12x2+ · · · +m1vxv = b1
m21x1+m22x2+ · · · +m2vxv = b2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
mu1x1+mu2x2+ · · · +muvxv = bu
(3.21)
Let M, x, and b denote
M=
m11 · · · m1v
...
. . .
...
mu1 · · · muv
, x =
x1
...
xv
, b =
b1
...
bu
let [
M
|
b
]
denote the augmented matrix
[M|b] =
m11 · · · m1v b1
...
. . .
...
...
mu1 · · · muv bu
let rank(M) denote the rank of M (number of linearly independent columns in M), and let det(M)
denote the determinant of M.
Lemma 3 rank(M) = rank(MT ).
Lemma 4 (Kronecker-Capelli theorem) Iff rank(M) = rank([M|b]), Equation (3.21) has a solution
for x. Furthermore, if rank(M) < v, Equation (3.21) has no unique solution for x.
Lemma 5 (Cramer’s rule) If u = v and det(M) 6=0, Equation (3.21) has a unique solution for x.
Lemma 6 For u×u matrix A, v×v matrix B, and u×v matrix C,
detÃ"A C
0 B#!= det(A)det(B)
PROOF: By Kostrikin [36]. ¤
28 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
Lemma 7 (VSR share security) An adversary cannot combine the shares si of shareholders i in
any non-authorized subset B /∈ G(m,n)
P (|B| < m) with the shares s′
j of shareholders j in any nonauthorized
subset B′ /∈ G(m′,n′)
P′ (|B′| < m′) to uniquely determine k.
PROOF: Assume that there is a unique solution for k from the shares of shareholders in B and B′. I
will show that this assumption leads to a contradiction.
Suppose that |B|=m−1 and |B′|=m′−1, and suppose that the adversary has obtained si of i ∈
B and s′
j of j ∈ B′. Without loss of generality, suppose that the shares are s1, . . . , sm−1, s′
1, . . . , s′
m′−1.
Equation (3.1) can be used to construct the system of equations
1 1 · · · 1m−1 0 · · · 0
...
...
· · ·
...
...
...
1 i · · · im−1
...
. . .
...
...
...
· · ·
...
...
...
1 (m−1) · · · (m−1)m−1 0 · · · 0
1 0 · · · 0 1 · · · 1m′−1
1
...
...
...
· · ·
...
1
...
. . .
...
j · · · jm′−1
1
...
...
...
· · ·
...
1 0 · · · 0 (m′−1) · · · (m′−1)m′−1
k
a1
...
am−1
a′
1
...
a′
m′−1
=
s1
...
si
...
sm−1
s′
1
...
s′
j
...
s′
m′−1
(3.22)
LetMdenote the left-hand matrix in Equation (3.22), a the coefficient vector k, a1 ... a′
m′−1, and
s the share vector. The maximum possible value for rank(M) is the number of rows (m+m′−2, by
Lemma 3), which is less than the number of values in a (m+m′−1). Also, rank(M) = rank([M|s])
since s is a linear combination of the columns ofM(by the method of share generation). Thus, there
are no unique solutions for a in Equation (3.22) (by Lemma 4). One arrives at the same conclusion
for any B′′
/∈ G(m′,n′)
P′ such that |B′′
| < m′−1.
3.2. THE VSR PROTOCOL 29
By assumption, there is a unique solution for k, thus Equation (3.22) can be re-written as
1 · · · 1m−1 0 · · · 0
...
· · ·
...
...
...
i · · · im−1
...
. . .
...
...
· · ·
...
...
...
(m−1) · · · (m−1)m−1 0 · · · 0
0 · · · 0 1 · · · 1m′−1
...
...
...
· · ·
...
...
. . .
...
j · · · jm′−1
...
...
...
· · ·
...
0 · · · 0 (m′−1) · · · (m′−1)m′−1
a1
...
am−1
a′
1
...
a′
m′−1
=
s1−k
...
si−k
...
sm−1−k
s′
1−k
...
s′
j −k
...
s′
m′−1−k
(3.23)
Let Mk denote the left-hand matrix in Equation (3.23), and let ak denote the coefficient vector
a1 ... a′
m′−1. Let MUL
k and MLR
k denote the upper-left and lower-right square sub-matrices of Mk
MUL
k =
1 · · · 1m−1
...
. . .
...
(m−1) · · · (m−1)m−1
and
MLR
k =
1 · · · 1m′−1
...
. . .
...
(m′−1) · · · (m′−1)m′−1
One can express det(MUL
k ) as
det(MUL
k ) = 1· · · (m−1)¯¯¯¯¯¯¯¯
1 · · · 1m−2
...
. . .
...
1 · · · (m−1)m−2
¯¯¯¯¯¯¯¯
= 1· · · (m−1) Õ
1≤i, j≤m−1;i>j
(i− j)
and observe immediately that det(MUL
k ) and det(MLR
k ) are non-zero. Thus, det(Mk) is non-zero
since det(Mk) = det(MUL
k )det(MLR
k ) (by Lemma 6).
30 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
Equation (3.23) has a unique solution for ak, because det(Mk) is non-zero (by Lemma 5). If
Equation (3.23) has a unique solution for ak, Equation (3.22) has a unique solution for a, because
there is a unique solution for k. But it has been established that there is no unique solutions for a,
so assuming that there is a unique solution for k has led to a contradiction. ¤
Lemma 8 (VSR witness security) A computationally-bound adversary cannot use the shares of
shareholders in any non-authorized subset B /∈ G(m,n)
P (|B| < m) with the witnesses W(k), W(s1),
. . .,W(sm) to uniquely determine k.
PROOF SKETCH: To prove the lemma, I would prove that determining k from the shares of shareholders
in B and the witnesses W(k),W(s1), . . . ,W(sm) is computationally equivalent to the intractable
problem of determining k fromW(k)
Conjecture 1 (VSR security) A computationally-bound adversary cannot use the shares of shareholders
in any non-authorized subset B /∈ G(m,n)
P (|B| < m), the shares of shareholders in any nonauthorized
subset B′ /∈ G(m′,n′)
P′ (|B′| < m′), and the witnesses W(k),W(s1), . . . ,W(sm) to uniquely
determine k.
PROOF SKETCH: From Lemma 7, the adversary cannot uniquely determine k from the shares of
shareholders in B and B′. From Lemma 8, the adversary cannot uniquely determine k from the
shares of shareholders in B andW(k),W(s1), . . . ,W(sm).
3.3 The mobile adversary and the VSR protocol
Here, I show that the VSR protocol fulfills requirements from Chapter 2 for a recovery protocol.
• It generates new shares for the next epoch such that they can be used to reconstruct the secret
(Theorem 1), and such that they cannot be combined with shares from the current epoch to
reconstruct the secret (Lemma 7).
• It includes mechanisms to prevent the adversary from corrupting protocol execution, because
the adversary may still control some servers (i.e., shareholders) during the update phase. The
old servers broadcast witnesses for the SHARES-VALID and SUBSHARES-VALID conditions
(step 2 of REDIST in Figure 3.3). The new servers then verify that the SHARES-VALID and
3.3. THE MOBILE ADVERSARY AND THE VSR PROTOCOL 31
SUBSHARES-VALID conditions hold before they generate new shares (step 3); if the conditions
hold, the new shares are valid (Theorem 1). The protocol does not reconstruct the
original secret, thus there is no server the adversary can subvert to obtain the secret directly.
• It erases the shares for the current epoch from server memories (step 4 of REDIST), to prevent
the adversary from ever obtaining any other current shares it needs.
• It must allow the system to change the threshold parameter of the underlying data distribution
scheme. The system can accomplish such a change when executing the protocol (step 1 of
REDIST).
I show how an abstract storage system with dynamic membership can use the VSR protocol
to counteract a mobile adversary. Consider the system shown in Figure 3.4. The figure shows
the memory contents of the servers and the adversary during two consecutive epochs t and t +1
and the intervening update phase. The system uses Shamir’s threshold sharing scheme [49], with
the parameters (m,n) in epoch t and (m′,n′) in epoch t +1. For simplicity, assume that the sets of
servers in the two epochs are disjoint. In each epoch, the adversary may only control a sub-threshold
number of servers.
Initially, in epoch t, the adversary has subverted servers 1 through m−1 and has obtained their
shares. During the update phase, m correct servers from epoch t (say, m through 2m−1) use the
VSR protocol to redistribute their shares to the servers from epoch t +1; note that the adversary
still controls servers 1 through m−1, but does not control any of the new servers 1′ through n′. At
the end of the update phase, servers from epoch t erase their shares. In epoch t +1, the adversary
subverts servers 1′ through m′−1 and obtains their shares.
The VSR protocol, by fulfilling the design requirements for a recovery protocol, is able to
prevent the mobile adversary from ever obtaining enough shares to reconstruct the original data. In
epoch t +1, the adversary will have the shares from all of the servers it subverted in all epochs,
i.e., at most m−1 shares from epoch t and m′ −1 shares from epoch t +1. However, it is unable
to combine the two sets of shares to reconstruct k (Section 3.2.5). The adversary can never obtain
more than m−1 shares from epoch t because all of the correct servers from epoch t erase their
shares (step 4 of REDIST). There is no server that the adversary can subvert to obtain k directly
because the protocol does not reconstruct the secret at any server. Finally, the adversary cannot
learn k directly from the broadcast ofW(k) (step 2 of REDIST), consistent with the assumption that
inversion of the witness function is intractable.
32 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
Servers
1 m−1
Adversary
1 m−1
Adversary
1 m−1
1’ m’−1
2m−1
m
m
1’
m
n’
n’
2m−1
1’
2m−1
1’ n’
Epoch t
1 m
n
m+1
m−1
m−2
Epoch t+1
1’
m’−2
m’+1
n’
m’
m’−1
Update phase t
Adversary
Subshares
Old shares
New shares
Servers
Figure 3.4: A storage system with dynamic membership that uses the VSR protocol to redistribute shares of
a secret from the access structure G(m,n)
P to the access structure G(m′,n′)
P′ , in the presence of a mobile adversary.
The memory contents (i.e., shares) of the adversary and servers are shown for two consecutive epochs t and
t +1 and the intervening update phase. The adversary may only control m−1 servers in epoch t, and m′−1
servers in epoch t +1. Crosshatched servers are under the control of the adversary. The sets of servers in
epochs t and t +1 are disjoint. The system executes the VSR protocol during the update phase to generate
new (marked by primes) shares for correct servers. New shares cannot be combined with current shares to
reconstruct the original data. The distribution of subshares during the update phase is shown, but the broadcast
of witnesses is omitted to simplify the figure. In epoch t, the adversary subverts servers 1 through m−1 and
obtains their shares. During the update phase, the system removes all servers from under the control of the
adversary. In epoch t+1, the adversary subverts servers 1′ through m′−1 and obtains their shares. However,
the adversary does not (and can never) obtain enough shares (current or new) to reconstruct the original data.
3.4. SUMMARY 33
3.4 Summary
I have presented the VSR protocol for Shamir’s sharing scheme. The VSR protocol operates by
having an authorized subset of old shareholders distribute subshares of their shares to new shareholders.
The new shareholders generate new shares from the subshares. A key insight embodied in
the VSR protocol is that the verification is a two-step process: not only must the new shareholders
verify the validity of the subshares they receive (the SUBSHARES-VALID condition), but they must
also verify the validity of the shares used to distribute those subshares (the SHARES-VALID condition).
To enable verification of the validity of old shares, the dealer of the original secret must
provide a witness of the secret to the old shareholders during the distribution of shares, and the old
shareholders in the authorized subset must all broadcast the witness to the new shareholders. The
old shareholders must also broadcast witnesses of their shares, and of the coefficients of the subshare
generation polynomials. By verifying the validity of shares and subshares, new shareholders
implicitly verify the validity of their new shares (the NEW-SHARES-VALID condition).
I have shown how a storage system can use the VSR protocol to counteract the mobile adversary
in a system with dynamic membership. If a faulty old shareholder (i.e., a shareholder under the control
of the adversary) sends invalid protocol information, the new shareholders will detect that some
subset of the old shareholders are faulty. The new shareholders in some cases can pinpoint faulty
shareholders that send invalid information (e.g., if a faulty shareholder sends an invalid subshare),
but otherwise can only detect that faulty shareholders exist.
34 CHAPTER 3. VERIFIABLE SECRET REDISTRIBUTION
Chapter 4
Hathor: An Experimental Storage
System
O thou beautiful Being, thou dost renew thyself in thy season in the form of the
Disk, within thy mother Hathor.
— Papyrus of Nekht, Brit. Mus. No. 10471, Sheet 2
In this chapter, I discuss the design and implementation of an experimental storage system called
Hathor1. The primary purpose of Hathor is to prove the thesis that recovery can be efficient. It is
an experimental platform for the evaluation of the end-to-end cost of storing, redistributing, and
retrieving data, using a variety of data distribution schemes. Hathor is designed to operate in the
system environment defined by the abstract system model described in Chapter 2: a client-server
environment in which clients are always correct, but in which some number of servers may be
controlled by an adversary—that is, they may be faulty. In what follows, I use “the client” as a
shorthand to refer to the portion of Hathor that executes on clients in the system, and “the server”
as a shorthand to refer to the portion that executes on servers.
1Hathor is the Greek transliteration of the Egyptian name Ht-Hrt. In Egyptian mythology, Hathor is the Goddess of
the Dead and the Protectoress of the City of the Dead in Thebes. I wanted to pick a system name that (for once) was
not a Greek or Roman name. Also, Hathor was originally conceived as a write-infrequently, read-mostly data archive
prototype, which made the name seem particularly appropriate.
36 CHAPTER 4. HATHOR: AN EXPERIMENTAL STORAGE SYSTEM
Shares + witness
Data
Key
Data
Data
Key
1
m
n
1
m
n
Encrypted replicas
+
Key shares + key witness
1
m
n
Encrypted replicas
REPLICA THRESHOLD HYBRID
Figure 4.1: REPLICA, THRESHOLD, and HYBRID data distribution schemes implemented in Hathor. In each
scheme, n servers stores a piece of the original data, and m pieces are required to recover the data. For both
REPLICA and HYBRID, the servers store identical encrypted replicas of the data; additionally for HYBRID, the
servers store different shares of the encryption key but identical key witnesses. For THRESHOLD, the servers
store different shares of the data but identical witnesses to the data. Each replica, data share, and data witness
is the same size as the original data; each key share and key witness is the same size as the key.
4.1 Data distribution schemes
Hathor implements three data distribution schemes: REPLICA, THRESHOLD, and HYBRID. I have
selected these schemes in order to model common schemes used in other survivable storage systems.
Thus, the performance evaluation in Chapter 5 should provide a rough guide to the cost of supporting
decentralized recovery in a variety of environments.
All of the schemes are m-of-n schemes. Figure 4.1 shows the contents of the pieces stored by
each server given a particular scheme. To store data, a client distributes a piece of the data to each of
the n servers; each piece contains a replica, a share, and a witness as appropriate for the distribution
scheme in use. To retrieve the data, the client retrieves the pieces from m servers and reconstructs
the data. In terms of the mobile adversary model, I assume that storage and retrieval operations
occur during an epoch (and not during a update phase), and that an adversary can subvert at most
m−1 servers in an epoch.
Informally, the process of distribution and reconstruction of data for each scheme is as follows:
• REPLICA: To store data, the client distributes an encrypted replica of the data to each server,
and stores the encryption key locally. Each server stores the encrypted replica. To retrieve
data, the client retrieves replicas from m servers, verifies that all of the replicas are identical
4.1. DATA DISTRIBUTION SCHEMES 37
(to prevent faulty servers from convincing it to accept an invalid replica), and decrypts one of
the replicas to recover the original data.
REPLICA is similar to the scheme used in the Farsite [1, 11], PAST [47], and Pond [37, 46]
survivable storage systems.
• THRESHOLD: To store data, the client uses the INITIAL phase of the VSR protocol (Figure
3.3) to generate n shares of the data and a witness of the data. Each server stores a share
and the witness; the servers will use the witness during the redistribution of shares (the REDIST
phase of the VSR protocol). To retrieve data, the client retrieves shares from m servers,
and uses Shamir’s scheme [49] to reconstruct the original data.
A variant of THRESHOLD without redistribution is implemented as one of the available
schemes in the PASIS survivable storage system [53].
• HYBRID: To store data, the client distributes an encrypted replica of the data to each server. It
also uses the INITIAL phase of the VSR protocol to generate n key shares and a key witness.
Each server stores the encrypted replica, a key share, and the key witness; the servers will
use the witness during the redistribution of key shares. To retrieve data, the client retrieves
shares and replicas from m servers, verifies that all of the replicas are identical (to prevent
faulty servers from convincing it to accept an invalid replica), and uses Shamir’s scheme to
reconstruct the original encryption key. It then decrypts one of the replicas to recover the
original data.
A variant of HYBRID without redistribution of key shares is similar to the scheme used in the
Publius robust publishing system [52]. It is also similar to the scheme used in the e-Vault data
repository [33]; the difference is that e-Vault stores IDA fragments [42] of the original data at
each server, instead of a complete replica.
Previous comparison studies show that REPLICA is generally faster than either THRESHOLD or
HYBRID [53], which raises the question as to why one would use schemes other than REPLICA. One
reason is that diversity in schemes can provide an additional defense against system compromise,
in the same way that diversity in server operating systems provides a defense: an adversary may
not be able to leverage the ability to compromise data stored with one scheme (e.g, REPLICA)
to compromise data stored with a different scheme (e.g., THRESHOLD). A second reason is that
THRESHOLD and HYBRID enable the client to store data without the need manage an encryption
key: with THRESHOLD, no key is required, while with HYBRID, the key is managed with the data.
38 CHAPTER 4. HATHOR: AN EXPERIMENTAL STORAGE SYSTEM
Always correct
Communication/
Event handling
Disk I/O
Redistribution
Client−server
channel
Other
client−server/
server−server
channels
Communication
(User application)
Distribution & reconstruction
Client Server
May be correct or faulty
group membership
Figure 4.2: Functional modules of the client and server implementations of Hathor. The broken line represents
the border between parts of Hathor that are always correct, and parts of Hathor that may be either
correct or faulty (i.e., the parts that may be controlled by an adversary). The user application is shown as
a module (though it is not strictly part of Hathor), and is included inside the border to emphasize the fact
that the application can only access the servers via the distribution and reconstruction module. The various
point-to-point channels are also included inside the border.
Though I will use REPLICA as the baseline scheme for the performance evaluation in Chapter 5,
one should bear in mind the non-performance reasons for using THRESHOLD or HYBRID.
4.2 Client and server implementation
The Hathor client and server implementations are fairly lightweight. The implementation supports
the REPLICA, THRESHOLD, and HYBRID schemes as described above, and also supports the protocols
necessary to redistribute pieces of data generated by the schemes. It supports enough of
a communications and disk storage infrastructure to measure the end-to-end cost of data storage,
redistribution, and retrieval operations.
The client and server implementations in Hathor are composed of abstract functional modules,
as shown in Figure 4.2. The dashed line around the modules represents the border between the parts
of the Hathor that are always correct and the parts that may potentially be faulty (i.e., the modules
that may be controlled by an adversary). I include the various point-to-point channels inside the
border to emphasize the assumptions about their privacy, reliability, and ordering properties.
4.3. I/O OPERATIONS 39
The Hathor client implements the functionality necessary to store and retrieve data. The distribution
and reconstruction module implements the threshold sharing and encryption algorithms used
by REPLICA, THRESHOLD, and HYBRID, and the communication module manages the client-server
channels. A user application stores and retrieves data through an interface that is exported by the
distribution module. I assume that the application can only access the servers via the distribution
module, and thus cannot disrupt server operation through the direct injection of invalid messages
into the client-server channel. To emphasize this constraint, I show the application as a module in
the client in Figure 4.2, and include it inside the border of the parts of Hathor that are always correct.
The Hathor server is more complex than the client. The redistribution module implements a
pair of state machines (Section 4.3.2) to manage the redistribution of pieces of data for REPLICA,
THRESHOLD, and HYBRID. A disk I/O module marshals pieces of data to and from the on-disk
representations. An event module parses the messages that drive the redistribution state machine; it
also passes client requests to store and retrieve pieces to the disk I/O module.
The server communication module is largely a wrapper around the Ensemble group communications
toolkit [29]. The module manages the client-server channels, as well as the server-server
channels. The module (in conjunction with the communication modules on other servers) also implements
a broadcast channel abstraction on top of the server-server channels. A group membership
service within the module maintains a view of active servers that are sending and receiving broadcast
messages. The service updates the view whenever a server joins or leaves Hathor, and assigns
a unique rank to each server in the view. The communication module is included inside the border
of modules that are always correct, consistent with the assumption in Chapter 2 that an adversary
cannot disrupt the broadcast channel (e.g., a set of servers controlled by the adversary cannot break
the property that a broadcast message M is either delivered to all servers or to none of them). Note
that the adversary can still see all broadcast messages even if the communication modules are always
correct: a server controlled by the adversary can simply pass on any messages received by the
server at the event module.
4.3 I/O operations
Hathor supports three basic I/O operations: STORE, REDISTRIBUTE, and RETRIEVE. STORE stores
pieces of data to a set of servers. RETRIEVE retrieves a sufficient number of pieces to reconstruct
the original data. REDISTRIBUTE redistributes pieces from an old set of servers to the new set of
servers, and verifies that any new pieces can be used to reconstruct the original data. In this section, I
40 CHAPTER 4. HATHOR: AN EXPERIMENTAL STORAGE SYSTEM
(WRITE)
DATA
dist
n ACK
RECV:
initial−client ack−wait done−client
initial−server recv done−server
SEND:
ACK
write
n PIECES,
n CHECKSUMS
SEND:
RECV:
PIECE,
CHECKSUM
SEND:
ACK
Client
Server
RECV:
Figure 4.3: STORE state machine for clients and servers. States correspond to intermediate computations,
while state transitions correspond to sent or received messages. Clients start in the initial-client state, and
servers start in the initial-server state. Solid arrows indicate transitions taken by correct clients and servers,
while dashed arrows indicate transitions that may be taken by faulty servers. Unlabeled dashed arrows that
loop to the same state indicate no-op self-transitions.
discuss what steps each operation takes for each data distribution scheme. In particular, I show how
the implementation of REDISTRIBUTE for the THRESHOLD scheme corresponds to the VSR protocol
in Section 3.2, and how the implementations of REDISTRIBUTE differ between the REPLICA,
THRESHOLD, and HYBRID schemes (which will help to explain the results in Section 5.3).
For each scheme, I consider storage and retrieval using an m-of-n scheme, and redistribution
to change the scheme parameters from m-of-n to m′-of-n′. As stated before, I assume during a
STORE or RETRIEVE that at most m−1 servers may be faulty. Additionally, I assume during a
REDISTRIBUTE that at least m old servers are correct, that at most m−1 old servers are faulty, that
at least m′ new servers are correct, and that at most m′ −1 new servers are faulty, consistent with
the assumptions about faulty servers in the presentation of the VSR protocol.
4.3.1 STORE
The state machines in Figure 4.3 show the sequences of steps at the client and server for a STORE
to an m-of-n scheme. Transitions between states are atomic, e.g., the client does not make a transition
from the ack-wait state to the done-client state until it has received all n acknowledgments. I
describe the sequence of steps for a client, a correct server, and a faulty server below.
4.3. I/O OPERATIONS 41
A client starts in the initial-client state. When the client receives data from the user application,
it makes a transition to the dist state, in which it obtains the view of active servers from the servers’
group membership service, and distributes a piece of the data to each of the servers. The client also
computes and sends an checksum of the data to each server; it will later use the checksum to verify
the validity of reconstructed data. The client next makes a transition to the ack-wait state, in which
it waits to receive an acknowledgment from each of the n servers; an acknowledgment indicates that
a server has written its piece to disk. The client then makes a final transition to the done-client state.
A correct server starts in the initial-server state. When the server receives a piece of data and a
checksum from a client, it makes a transition to the recv state. In the recv state, the server “sends”
a write request to itself, and immediately makes a transition to the write state. In the write state, the
server writes the piece and the checksum to disk, sends an acknowledgment back to the client, and
makes a transition to the done-server state.
A faulty server also starts in the initial-server state. When the server receives a piece of data
from a client, it makes a transition to the recv state. In the recv state, a faulty server may send an
acknowledgment to the client, and make a transition to the done-server state. The client will think
the server has stored its piece even though the server has not written the piece to disk. However, m
servers are guaranteed to write their pieces to disk (consistent with the assumption that at least m
servers are correct), thus subsequent REDISTRIBUTE or RETRIEVE operations can execute.
A faulty server may also cease to send messages while in the recv or write states, which results
in no-op self-transitions. If the server remains in these states forever, the client that sent the piece
(that triggered the transition from the initial-server state to dist state) will wait forever to receive an
acknowledgment from the server, and will thus wait forever to complete a STORE.
Strictly speaking, a client does not need to wait to receive acknowledgments from the servers,
because the servers are guaranteed to receive pieces sent by the client (consistent with the assumption
of reliable private channels in Chapter 2), and at least m correct servers are guaranteed to write
their pieces to disk. In practice, the client waits to receive acknowledgments to measure of the endto-
end cost of storing data: the client starts its timer when it receives data from the application, and
stops its timer when it receives acknowledgments from the n servers. The client thus runs the risk
that it will wait forever in the ack-wait state if a faulty server does not send an acknowledgment.
42 CHAPTER 4. HATHOR: AN EXPERIMENTAL STORAGE SYSTEM
SHARE
(ERASE)
RECV:
REDIST n’ COMMIT
RECV:
initial−old erase−wait erase
SEND:
SHARE
redist
SEND:
SHARE
SEND:
SUBSHARE,
CHECKSUM
BCAST:
WITNESSES
SEND:
INVALID SUBSHARE,
INVALID CHECKSUM
BCAST:
INVALID WITNESSES
SEND:
Figure 4.4: REDISTRIBUTE state machine for old servers. States correspond to intermediate computations,
while state transitions correspond to sent or received messages. Old servers start in the initial-old state. Solid
arrows indicate transitions taken by correct servers, while dashed arrows indicate transitions that may be
taken by faulty servers. Unlabeled dashed arrows that loop to the same state indicate no-op self-transitions.
4.3.2 REDISTRIBUTE
The state machines in Figure 4.4 and Figure 4.5 show the sequences of steps for a REDISTRIBUTE
when changing scheme parameters from m-of-n to m′-of-n′, in which m old servers redistribute
their pieces of the original data to n′ new servers. There is no client state machine because clients
do not participate in REDISTRIBUTE. As for STORE, transitions between states are atomic, e.g.,
an old server does not make the transition from the erase-wait state to the erase state until it has
received all n′ “commit” messages. For simplicity, I describe the steps when THRESHOLD is used
to distribute pieces of the original data, and then highlight how REPLICA or HYBRID differ.
In the current implementation of Hathor, m old servers and all n′ new servers participate in
REDISTRIBUTE. Moreover, Hathor requires all n′ new servers to send “commit” messages before
completing redistribution, to ensure that all non-faulty servers have valid pieces of data; thus, a
faulty old server or a faulty new server can cause all servers to either hang or abort. In an experimental
system whose primary purpose is to measure the end-to-end cost of redistribution, such
behavior is tolerable: a human operator can shut Hathor down and determine the cause of the fault.
However, such behavior would be unacceptable for a production system.
4.3. I/O OPERATIONS 43
Old server state machine
A correct old server follows the state machine transitions indicated by the solid arrows in Figure 4.4.
The server starts in the initial-old state with a share of the original data and a witness of the data.
When the server receives a redistribution request, it makes a transition to the redist state. There, the
server distributes subshares of its share (step 1 of REDIST in the VSR protocol of Figure 3.3) and
the checksum of the original data to each server. The server also broadcasts the witness of the data,
the witness of its share, and the witnesses of the coefficients of its subshare generation polynomial
(step 2 of REDIST). Once the server has sent subshares and witnesses, it makes a transition to the
erase-wait state. In the erase-wait state, the server waits to receive a “commit” message from each
of the n′ new servers. Upon receiving the n′ “commit” messages, the server makes a final transition
to the erase state, and erases its share.
A faulty old server may make the following additional state machine transitions, as indicated by
the dashed arrows in Figure 4.4:
• It may send its share to the adversary while in any state other than the erase state. Sending
a share results in a self-transition. The erase state does not have this self-transition since the
server will have erased its share in this state.
Note that the adversary will never obtain enough shares to reconstruct the original data, consistent
with the assumption that at most m−1 old servers are faulty.
• It may “send” an erase request to itself while in any state other than the erase state, which
results in a transition to the erase state.
• It may send invalid subshares, witnesses, or checksums while in the redist state. Recall from
Section 3.2.1 that invalid subshares do not reconstruct a consistent value, and that invalid witnesses
do not correspond to the original data, or to shares of the data, or to the coefficients of
the subshare generation polynomial. Sending invalid information results in a self-transition:
the server remains in the redist state.
• It may refuse to send subshares or witnesses while in the redist state, and thus remain in that
state forever (resulting in a no-op self-transition).
• It may refuse to “send” an erase request to itself while in the erase-wait state, and thus remain
in that state forever (resulting in a no-op self-transition).
44 CHAPTER 4. HATHOR: AN EXPERIMENTAL STORAGE SYSTEM
REDIST
initial−new verify−wait verify gen−wait
n’ COMMIT
RECV:
gen
BCAST:
COMMIT
abort
RECV:
ABORT
m (INVALID) CHECKSUMS
m (INVALID) SUBSHARES & WITNESSES,
RECV:
BCAST:
COMMIT
BCAST OR RECV:
ABORT
BCAST:
ABORT
RECV:
Figure 4.5: REDISTRIBUTE state machine for new servers. States correspond to intermediate computations,
while state transitions correspond to sent or received messages. New servers start in the initial-new state.
Solid arrows indicate transitions taken by correct servers, while dashed arrows indicate transitions that may
be taken by faulty servers. Unlabeled dashed arrows that loop to the same state indicate no-op self-transitions.
New server state machine
A new server follows the state machine transitions indicated by the solid arrows in Figure 4.5. The
server starts in the initial-new state. When the server receives a redistribution request, it makes a
transition to the verify-wait state, in which the server waits to receive a subshare, the corresponding
witnesses, and a checksum from each of the m old servers; note that a faulty old server may send
invalid information. Upon receiving the m subshares, witnesses, and checksums, the server makes
a transition to the verify state.
While in the verify state, a new server tries to verify that the SHARES-VALID and SUBSHARES-
-VALID conditions hold for the subshares and witnesses (step 3 of REDIST in the VSR protocol of
Figure 3.3). It also tries to verify that all m of the received checksums match (to prevent m−1 faulty
servers from convincing it to accept an invalid checksum). The server may then make one of several
possible transitions:
• If it verifies that the SHARES-VALID and SUBSHARES-VALID conditions hold, and that the
checksums match, it broadcasts a “commit” message, which results in a transition to the genwait
state.
• If it finds that the SHARES-VALID and SUBSHARES-VALID conditions do not both hold, or
that the checksums do not match, it broadcasts an “abort” message to indicate that it has
4.3. I/O OPERATIONS 45
received invalid information from at least one of the old servers (i.e., at least one of the m old
servers is faulty; recall from Section 3.2.2 that new servers may not be able to pinpoint faulty
old servers). Sending an “abort” message results in a transition to the abort state.
• If it receives an “abort” message, it makes a transition to the abort state.
When the server enters the gen-wait state, it waits to receive a “commit” message from each of
the n′ new servers (note that it will have received its own message). If the server receives an “abort”
message, it makes a transition to the abort state. Otherwise, upon receiving n′ “commit” messages,
the server makes a transition to the gen state, in which it generates a new share from the received
subshares, and writes the share, the witness to the original data, and the checksum to disk.
A faulty old server may make the following additional state machine transitions, as indicated by
the dashed arrow