I need the following after reviewing the paper Problem Statement – Issues discussed by the author Approach & design

I need the following after reviewing the paper
Problem Statement – Issues discussed by the author
Approach & design – How the authors approach to the issue & what proposed ideas they mentioned
Strengths and Weakness – strengths & weakness of the proposed approach & design, and about the paper. what are the key strengths of the authors proposed system and weakness of the system.
Evaluation(Performance) – How the authors evaluated the proposed system, what parameters they used to test the performance
Conclusion(In readers perspective)
Along with these, I need to have a detailed explanation of the paper section-wise:
sections are:
Abstract
Introduction
FUSE Semantics and API (in detail)
Applications
Liveness Checking Topologies
Implementation(in detail)
Experimental Evaluation(in detail)
Summary
Conclusion of the authors for proposed system

FUSE: Lightweight Guaranteed Distributed Failure Notification

Don't use plagiarized sources. Get Your Custom Assignment on

I need the following after reviewing the paper Problem Statement – Issues discussed by the author Approach & design – How the authors approach to the

From as Little as $13/Page

John Dunagan Nicholas J. A. Harvey Michael B. Jones Dejan Kostic

Marvin Theimer Alec Wolman

I am not discouraged, because every failure is
another step forward. Thomas Edison

Abstract

FUSE is a lightweight failure notification service for
building distributed systems. Distributed systems built
with FUSE are guaranteed that failure notifications never
fail. Whenever a failure notification is triggered, all
live members of the FUSE group will hear a notifica-
tion within a bounded period of time, irrespective of node
or communication failures. In contrast to previous work
on failure detection, the responsibility for deciding that a
failure has occurred is shared between the FUSE service
and the distributed application. This allows applications
to implement their own definitions of failure. Our expe-
rience building a scalable distributed event delivery sys-
tem on an overlay network has convinced us of the use-
fulness of this service. Our results demonstrate that the
network costs of each FUSE group can be small; in par-
ticular, our overlay network implementation requires no
additional liveness-verifying ping traffic beyond that al-
ready needed to maintain the overlay, making the steady
state network load independent of the number of active
FUSE groups.

1 Introduction

This paper describes FUSE, a lightweight failure noti-
fication service. When building distributed systems, man-
aging failures is an important and often complex task.
Many different architectures, abstractions, and services
have been proposed to address this [5, 10, 13, 28, 30, 38,
42, 43, 44]. FUSE provides a new programming model
for failure management that simplifies the task of agree-
ing when failures have occurred in a distributed system,
thereby reducing the complexity faced by application de-
velopers. The most closely related prior work on coping
with failures has centered around failure detection ser-
vices. FUSE takes a somewhat different approach, where

Microsoft Research, Microsoft Corporation, Redmond, WA.
{jdunagan, mbj, theimer, alecw}@microsoft.com

Department of Computer Science, Duke University, Durham, NC.
[emailprotected]

Computer Science and Artificial Intelligence Laboratory, Mas-
sachusetts Institute of Technology, Cambridge, MA. [emailprotected]

detecting failures is a shared responsibility between FUSE
and the application. Applications create a FUSE group
with an immutable list of participants. FUSE monitors
this group until either FUSE or the application decides to
terminate the group, at which point all live participants are
guaranteed to learn of a group failure within a bounded
period of time. This focus on delivering failure notifica-
tions leads us to refer to FUSE as a failure notification
service. This is a very different approach than prior sys-
tems have adopted, and we will argue that it is a good
approach for wide-area Internet applications.

Applications make use of the FUSE abstraction as fol-
lows: the application asks FUSE to create a new group,
specifying the other participating nodes. When FUSE fin-
ishes constructing the group, it returns a unique identifier
for this group to the creator. The application then passes
this FUSE ID to the applications on all the other nodes
in this group, each of which registers a callback associ-
ated with the given FUSE ID. FUSE guarantees that every
group member will be reliably notified via this callback
whenever a failure condition affects the group. This fail-
ure notification may be triggered either explicitly, by the
application, or implicitly, when FUSE detects that com-
munication among group members is impaired.

Applications can create multiple FUSE groups for dif-
ferent purposes, even if those FUSE groups span the same
set of nodes. In the event that FUSE detects a low-level
communication failure, failure will be signalled on all the
FUSE groups using that path. However, on any individual
FUSE group the application may signal a failure without
affecting any of the other groups.

FUSE provides the guarantee that notifications will be
delivered within a bounded period of time, even in the
face of node crashes and arbitrary network failures. We
refer to these semantics as distributed one-way agree-
ment. One-way refers to the fact that there is only one
transition any group member can see: from live to
failed. After failure notification on a group, detecting
future failures requires creating a new group.

By providing these semantics, FUSE ensures that fail-
ure notifications never fail. This greatly simplifies failure
handling among nodes that have state that they want to
handle in a coordinated fashion. FUSE efficiently handles
all the corner cases of guaranteeing that all members will
be notified of any failure condition affecting the group.
Applications built on top of FUSE do not need to worry

that a failure message did not get through or that orphaned
state remains in the system.

The primary target for FUSE is wide-area Internet ap-
plications, such as content delivery networks, peer-to-peer
applications, web services, and grid computing. FUSE
is not targeted at applications that require strong consis-
tency on replicated data, such as stock exchanges and
missile control systems. Techniques such as virtual syn-
chrony [6], Paxos [29], and BFT [10] have already proven
to be effective in such environments. However, these tech-
niques incur significant overhead and therefore have lim-
ited scalability. Furthermore, FUSE does not provide re-
silience to malicious participants, though it also does not
preclude solving this problem at a higher layer.

Previous work on failure detection services uses
membership as the fundamental abstraction: these ser-
vices typically provide a list of each component in the
system, and whether it is currently up or down. Member-
ship services have seen widespread success as a building
block for implementing higher level distributed services,
such as consensus, and have even been deployed in such
commercially important systems as the New York Stock
Exchange [5]. However, one disadvantage of the mem-
bership abstraction is that it does not allow application
components to have failed with respect to one action, but
not with respect to another. For example, suppose a node
is engaged in an operation with a peer and at some point
fails to receive a timely response. The failure manage-
ment service should support declaring that this operation
has failed without requiring that a node or process be de-
clared to have failed. The FUSE abstraction provides this
flexibility: FUSE tracks whether individual application
communication paths are currently working in a manner
that is acceptable to the application.

The scenario described above is more likely to oc-
cur with wide-area Internet applications, which face a
demanding operating environment: network congestion
leads to variations in network loss rate and delay; intran-
sitive connectivity failures (sometimes called partial con-
nectivity failures) occur due to router or firewall miscon-
figuration [27, 31]; and individuals network components
such as links and routers can also fail. Under these con-
ditions, it is difficult for applications to make good deci-
sions based solely on the information provided by a mem-
bership service. A particular scenario illustrating why
applications need to participate in deciding when failure
has occurred is delivery of streaming content over the In-
ternet: failure to achieve a certain threshold bandwidth
may be unacceptable to such an application even though
other applications are perfectly happy with the connectiv-
ity provided by that same network path.

Our FUSE implementation is scalable, where scaling
is with respect to the number of groups: multiple failure
notification groups can share liveness checking messages.
Other implementations of the FUSE abstraction may sac-
rifice scalability in favor of increased security. Our FUSE
implementation is designed to support large numbers of

small to moderate size groups. We do not attempt to effi-
ciently support very large groups because we believe very
large groups will tend to suffer from too-frequent failure
notification, making them less useful.

Our FUSE implementation is particularly well-suited
to applications using scalable overlay networks. Scalable
overlay networks already do liveness checking to main-
tain their routing tables and FUSE can re-use this live-
ness checking traffic. In such a deployment, the network
traffic required to implement FUSE is independent of the
number of groups until a failure occurs; only creation
and teardown of a group introduce a per-group overhead.
FUSE can be implemented in the absence of a pre-existing
overlay network the implementation can construct its
own overlay, or it can use an alternative liveness checking
topology.

We have implemented FUSE on top of the Skip-
Net [25] scalable overlay network, and we have built a
scalable event delivery application using FUSE. We eval-
uated our implementation using two main techniques: a
discrete event simulator to evaluate its scalability, and a
live system with 400 overlay participants (each running
in its own process) running on a cluster of 40 worksta-
tions to evaluate correctness and performance. Both the
simulator and the live system use an identical code base
except for the base messaging layer. Our live system eval-
uation shows that our FUSE implementation is indeed
lightweight; the latency of FUSE group creation is the la-
tency of an RPC call to the furthest group member; the la-
tency of explicit failure notification is similarly dominated
by network latency; and we show that our implementation
is robust to false positives caused by network packet loss.

In summary, the key contributions of this paper are:

We present a novel abstraction, the FUSE failure notifi-
cation group, that provides the semantics of distributed
one-way agreement. These are desirable semantics in
our chosen setting of wide-area Internet applications.

We used FUSE in building a scalable event delivery ser-
vice and we describe how it significantly reduced the
complexity of this task.

We implemented FUSE on top of a scalable overlay net-
work. This allowed us to support FUSE without adding
additional liveness checking. We experimentally eval-
uated the performance of our implementation on a live
system with 400 virtual nodes.

2 Related Work

Failure detection has been the subject of more than two
decades of research. This work can be broadly classified
into unreliable failure detectors, weakly consistent mem-
bership services, and strongly consistent membership ser-
vices. Unreliable failure detectors provide the weakest
semantics directly, but they are a standard building block
in constructing membership services with stronger se-
mantics. Both weakly-consistent and strongly-consistent

membership services are based on the abstraction of a
list of available or unavailable components (typically pro-
cesses or machines). In contrast, a FUSE group ID is not
bound to a process or machine and hence can be used in
many contexts. For example, it can correspond to a set
of several processes, or to related data stored on several
different machines. This abstraction allows FUSE to pro-
vide novel semantics for distributed agreement, a subject
we will elaborate on in our discussion of weakly consis-
tent membership services.

Chandra et al. formalized the concept of unreliable
failure detectors, and showed that one such detector was
the weakest failure detector for solving consensus [11,
12]. Such detectors typically provide periodic heartbeat-
ing and callbacks saying whether the component is re-
sponding or not responding. These callbacks may be
aggregate judgments based on sets of pings. Unreliable
failure detectors provide the following semantic guaran-
tee: fail-stop crashes will be identified as such within a
bounded amount of time. There has been extensive work
on such detectors, focusing on such aspects as interface
design, scalability, rapidity of failure detection, and mini-
mizing network load [1, 17, 21, 23, 38].

FUSE uses a similar lightweight mechanism (periodic
heartbeats) to unreliable failure detectors, but provides
stronger distributed agreement semantics. Unreliable fail-
ure detectors are typically used as a component in a mem-
bership service, and the membership service is responsi-
ble for implementing distributed agreement semantics.

Weakly consistent membership services have also been
the subject of an extensive body of work [2, 19, 42, 44].
This work can be broadly classified as differing in speed
of failure detection, accuracy (low rate of false positives),
message load, and completeness (failed nodes are agreed
to have failed by everyone in the system). Epidemic and
gossip-style algorithms have been used to build highly
scalable implementations of this service [19, 42]. Pre-
vious application areas that have been proposed for such
membership services are distributed collaborative appli-
cations, online games, large-scale publish-subscribe sys-
tems, and multiple varieties of Web Services [5, 19].
These are the same application domains targeted by
FUSE, and FUSE has similar overhead to a weakly con-
sistent membership service in these settings. Typical char-
acteristics of such applications are that many operations
are idempotent or can be straightforwardly undone, oper-
ations can be re-attempted with a different set of partici-
pants, or the decision to retry can be deferred to the user
(as in an instant messaging service).

One novel aspect of the FUSE abstraction is the ability
to handle arbitrary network failures. In contrast, weakly
consistent membership services provide semantic guaran-
tees assuming only a fail-stop model. One kind of net-
work failure where the FUSE abstraction is useful is an
intransitive connectivity failure: A can reach B, B can
reach C, but A cannot reach C. This class of network fail-
ures is hard for a weakly consistent membership service to

handle because the abstraction of a membership list lim-
its the service to one of three choices, each of which has
drawbacks:

Declare one of the nodes experiencing the intransitive
connectivity failure to have failed. This prevents the
use of that node by any node that can reach it.

Declare all of the nodes experiencing the intransitive
failure to be alive because other nodes in the system can
reach them. This may cause the application to block for
the duration of the connectivity failure.

Allow a persistent inconsistency among different
nodes views of the membership list. This forces the
application to deal with inconsistency explicitly, and
therefore the membership service is no longer reducing
the complexity burden on the application developer.

FUSE appropriately handles intransitive connectivity
failures by allowing the application on a node experienc-
ing a failure to declare the corresponding FUSE group
to have failed. Other FUSE groups involving the same
node but not utilizing a failed communication path can
continue to operate. Application participation is required
to achieve this because FUSE may not have detected the
failure itself; like most weakly consistent membership
services, FUSE typically monitors only a subset of all
application-level communication paths.

FUSE would require application involvement in failure
detection even if it monitored all communication paths.
Consider a multi-tier service composed of a front-end,
middle-tier, and back-end. Suppose the middle-tier com-
ponent is available but misconfigured. FUSE allows the
front-end to declare a failure, and to then perform appro-
priate failure recovery, such as finding another middle-tier
from some pool of available machines. This difference in
usage between membership services and FUSE reflects
a difference in philosophy. Membership services try to
proactively decide at a system level whether or not nodes
and processes are available. FUSE provides a mecha-
nism that applications can use to declare failures when
application-level constraints (such as configuration) are
violated.

Another contrast between the two approaches is that
the FUSE abstraction enables fate-sharing among dis-
tributed data items. By associating these items with a sin-
gle FUSE group, application developers can enforce that
invalidating any one item will cause all the remaining data
items to be invalidated. Weakly consistent membership
services do not explicitly provide this tying together of
distributed data.

Strongly consistent membership services share the ab-
straction of a membership list, but they also guarantee that
all nodes in the system always see a consistent list through
the use of atomic updates. Such membership services are
an important component in building highly available and
reliable systems using virtual synchrony. Notable exam-
ples of systems built using this approach are the New York

and Swiss Stock Exchanges, the French Air Traffic Con-
trol System, and the AEGIS Warship [2, 5, 6]. However,
a limitation of virtual synchrony is that it has only been
shown to perform well at small scales, such as five node
systems [6].

Some network routing protocols, such as IS-IS [8],
OSPF [33], and AutoNet [35], use mechanisms simi-
lar to FUSE. One similar aspect of AutoNet is its use
of teardown and recreate to manage failures. Any link-
state change causes all AutoNet switches to discard their
link-state databases and rebuild the global routing table.
OSPF and IS-IS take local link observations and prop-
agate them throughout the network using link-state an-
nouncements. They tolerate arbitrary network failures
using timers and explicit refreshes to maintain the link-
state databases. FUSE also uses timers and keep-alives to
tolerate arbitrary network failures. However, FUSE uses
them to tie together sets of links that provide end-to-end
connectivity between group members, and to provide an
overall yes/no decision for whether connectivity is satis-
factory.

A more distantly related area of prior work is black-
box techniques for diagnosing failures. Such techniques
use statistics or machine learning to distinguish successful
and failed requests [9, 13, 14, 15, 16]. In contrast, FUSE
assumes application developer participation and provides
semantic guarantees to the developer. Another significant
distinction is that black-box techniques typically require
data aggregation in a central location for analysis; FUSE
has no such requirement.

Distributed transactions are a well-known abstrac-
tion for simplifying distributed systems. Because FUSE
provides weaker semantics than distributed transactions,
FUSE can maintain its semantic guarantees under net-
work failures that cause distributed transactions to block.
Theoretical results on consensus show that the possibility
of blocking is fundamental to any protocol for distributed
transactions [22, 24].

Two of the design choices we made in building FUSE
were also recommended by recent works dealing with the
architectural design of network protocols. Ji et al. [26]
surveyed hard-state and soft-state signaling mechanisms
across a broad class of network protocols, and recom-
mended a soft state approach combining timers with ex-
plicit revocation: FUSE does this. Mogul et al. [32] ar-
gued that state maintained by network protocol imple-
mentations should be exposed to clients of those proto-
cols. As described in Section 6, we modified our overlay
routing layer to expose a mechanism for FUSE to piggy-
back content on overlay maintenance traffic.

3 FUSE Semantics and API

We begin by describing one simple approach to im-
plementing the FUSE abstraction. Suppose every group
member periodically pings every other group member
with an are you okay? message. A group member that

is not okay for any reason, either because of node failure,
network disconnect, network partition, or transient over-
load, will fail to respond to some ping. The member that
initiated this missed ping will ensure that a failure noti-
fication propagates to the rest of the group by ceasing to
respond itself to all pings for that FUSE group. Any in-
dividual observation of a failure is thus converted into a
group failure notification. This mechanism allows failure
notifications to be delivered despite any pattern of discon-
nections, partitions, or node failures. This specific FUSE
implementation guarantees that failure notifications are
propagated to every party within twice the periodic ping-
ing interval. Our implementation uses a different liveness
checking topology, discussed in Section 5. It also uses a
different ping retry policy; the retry policy and the guar-
antees on failure notification latency are discussed in Sec-
tion 3.3.

The name FUSE is derived from the analogy to laying
a fuse between the group members. Whenever any group
member wishes to signal failure it can light the fuse, and
this failure notification will propagate to all other group
members as the fuse burns. A connectivity failure or
node crash at any intermediate location along the fuse
will cause the fuse to be lit there as well, and the fuse
will then start burning in every direction away from the
failure. This ensures that communication failures will not
stop the progress of the failure notification. Also, once
the fuse has burnt, it cannot be relit, analogous to how the
FUSE facility only notifies the application once per FUSE
group.

Many different FUSE implementations are possible.
All implementations of the FUSE abstraction must pro-
vide distributed one-way agreement: failure notifica-
tions are delivered to all live group members under node
crashes and arbitrary network failures. Different FUSE
implementations may use different strategies for group
creation, liveness checking topology, retry, programming
interface, and persistence, with consequent variations in
performance. In this section, we describe the choices that
we made in our FUSE implementation, the resulting se-
mantics that application developers will need to under-
stand, and some of the alternative strategies that other
FUSE implementations could use.

3.1 Programming Interface

We now present the FUSE API for our implementation.
FUSE groups are created by calling CreateGroup with a
desired set of member nodes. This generates a FUSE ID
unique to this group, communicates it to the FUSE layers
on all the specified members, and then returns the ID to
the caller. Applications are subsequently expected to ex-
plicitly communicate the FUSE ID from the creator to the
other group members. Applications learning about this
FUSE ID register a handler for FUSE notifications us-
ing the RegisterFailureHandler function. In this design,
the FUSE layer is not responsible for communicating the
FUSE ID to applications on nodes other than the creator.

// Creates a FUSE notification group containing
// the nodes in the set
FuseId CreateGroup(NodeId[] set)

// Registers a callback function to be invoked
// when a notification occurs for the FUSE group
void RegisterFailureHandler

(Callback handler, FuseId id)

// Allows the application to explicitly cause
// FUSE failure notification
void SignalFailure(FuseId id)

Figure 1. The FUSE API

We believe the most likely use of FUSE is to allow
fate-sharing of distributed application state. Applications
should learn about FUSE IDs with sufficient context to
know what application state to associate with the FUSE
ID. Though failure handlers can simply perform garbage
collection of the associated application state, a handler is
also free to attempt to re-establish the application state
using a new FUSE group, or to execute arbitrary code. For
brevity, we refer to all of these permissible application-
level actions using the short-hand garbage collection.

The application handler is invoked whenever the FUSE
layer believes a failure has occurred, either because of
a node or communication failure or because the appli-
cation explicitly signalled a failure event at one of the
group members. If RegisterFailureHandler is called with
a FUSE ID parameter that does not exist, perhaps because
it has already been signalled, the handler callback is in-
voked immediately. Applications that wish to explicitly
signal failure do so by calling the SignalFailure function.

3.2 Group Creation

Group creation can be implemented in one of two
ways: it can return immediately, or it can block until
all nodes in the group have been contacted. Returning
immediately reduces latency, but because FUSE has not
checked that all group members are alive, the application
may perform expensive operations, only to have FUSE
signal failure a short time later. In contrast, blocking until
all members have been contacted increases creation la-
tency, but decreases the likelihood that the FUSE group
will immediately fail. We chose to implement blocking
create; this provides application developers the seman-
tic guarantee that if group creation returns successfully,
all the group members were alive and reachable. A high
enough rate of churn amongst group members could re-
peatedly prevent FUSE group creation from succeeding.
However, based on the low latency of FUSE group cre-
ation (Section 7), such a high churn rate is likely to cause
the system to fail in other ways before FUSE becomes a
bottleneck.

When CreateGroup is called, FUSE generates a glob-
ally unique FUSE ID for the group. Each node is then
contacted and asked to join the new FUSE group. If
any group member is unreachable, all nodes that already
learned of the new FUSE group are then notified of the
failure. Group members that learned of the group but sub-

sequently become unreachable similarly detect the group
failure through their inability to communicate with other
group members. If the FUSE layer is successfully con-
tacted on all members, the FUSE ID is returned to the
CreateGroup caller.

FUSE state is never orphaned by failures, even when
those failures occur just after group creation. An applica-
tion may receive a FUSE ID from the group creator, and
then attempt to associate a failure handler with this FUSE
group, only to find out that the group no longer exists be-
cause a failure has already been signalled. This causes the
failure handler to be invoked, just as if the notification had
arrived after the failure handler was registered.

3.3 Liveness Monitoring and Failure Notifica-
tion

Once setup is complete, our FUSE implementation
monitors the liveness of the group members using a span-
ning tree whose individual branches follow the overlay
routes between the group creator and the group members.
Each link in the tree is monitored from both sides; if either
side decides a link has failed, it ceases to acknowledge
pings for the given FUSE group along all its links. When
this occurs, one could immediately signal group failure,
but our implementation instead attempts repair, as will be
explained in more detail in Section 6.

This mechanism allows FUSE to guarantee that any
member of the group can cause a failure notification to
be received by every other live group member. FUSE in-
vokes the failure notification handler exactly once on a
node before tearing down the state for that FUSE group.
A node hearing the notification does not know whether it
was due to crash, network disconnect or partition, or if the
notification was explicitly triggered by some group mem-
ber. Explicit triggering is a necessary component of FUSE
because FUSE does not guarantee that it will notice all
persistent communication problems between group mem-
bers automatically; it only guarantees that a communica-
tion failure noticed by any group member will soon be
detected by all group members.

FUSE only guarantees delivery of failure notifications,
and only to nodes that have already been contacted during
group creation. Note that FUSE clients cannot use this
mechanism to implement general-purpose reliable com-
munication. Therefore, well-known impossibility results
for consensus do not apply to FUSE [22, 24]. A concrete
example illustrating this limitation is a network partition.
FUSE members on both sides of the partition will receive
failure notifications, but it is not possible to communicate
additional information, such as the cause of the failure,
across the partition.

FUSE will sometimes generate a notification to the en-
tire group even though all nodes are alive and the next at-
tempted communication would succeed. We refer to such
a notification as a false positive. It is easy to see how
false positives can occur transient communication fail-
ures can trigger group notification. The possibility of false

positives is inherent in building distributed systems on top
of unreliable infrastructure. One can tune the timeout and
retry policy used by the liveness checking mechanism, but
there is a fundamental tradeoff between the latency of fail-
ure detection and the probability that timeouts generate
false positives. We do not provide an API mechanism for
applications to modify the FUSE timeout and retry pol-
icy. Providing such a mechanism would add complexity
to our implementation, while providing little benefit: ap-
plications already need to implement their own timeouts,
as dictated by their choice of transport layer. FUSE re-
quires this participation from applications because FUSE
does not necessarily monitor every link.

Because we implement liveness checking using over-
lay routes, the maximum notification latency is propor-
tional to the diameter of the overlay: successive failures
at adversarially chosen times could cause each link to fail
exactly one failure timeout after the previous link. How-
ever, we expect notification after a failure to rarely require
more than a single failure timeout interval because our
FUSE implementation always attempts to aggressively
propagate failure notifications. Also, application develop-
ers do not need to know the maximum latency in order to
specify their timeouts. As mentioned above, sends should
be monitored using whatever timeout is appropriate to the
transport layer used by the application. If the sender times
out, it can signal the FUSE layer explicitly. If a node is
waiting to receive a message, specifying a timeout is diffi-
cult because only the sender knows when the transmission
is initiated. In this case, if the sender has crashed, devel-
opers should rely on the FUSE layer timeout to guarantee
the failure handler will be called.

FUSE failure notifications do not necessarily eliminate
all the race conditions that an application developer must
handle. For example, one group member might signal a
failure notification, and then initiate failure recovery by
sending a message to another group member. This failure
recovery message might arrive at the other group member
before it receives the FUSE failure notification. In our
experience, version-stamping the data associated with a
FUSE group was a simple and effective means of handling
these races.

3.4 Fail-on-Send

FUSE does not guarantee that all communication fail-
ures between group members will be proactively detected.
For example, in wireless networks sometimes link condi-
tions will allow only small messages such as liveness
ping messages to get through while larger messages
cannot. In this case, the application will detect that the
communication path is not working, and explicitly signal
FUSE. We call this reason for explicitly signalling FUSE
fail-on-send.

There are two categories of failures that require fail-on-
send. The first is a communication path that successfully
transmits FUSE liveness checking messages but which
does not meet the needs of the application. The second

is a failed communication path the application is using,
but which

Related Questions:

Related questions

Leave a Comment Cancel Reply