==================== Garlic Farm Protocol ==================== Author: zzz Created: 2019-05-02 Thread: http://zzz.i2p/topics/2234 Last updated: 2019-05-20 Status: Open Overview ======== This is the spec for the Garlic Farm wire protocol, based on JRaft, its "exts" code for implementation over TCP, and its "dmprinter" sample application [JRAFT]. JRaft is an implementation of the Raft protocol [RAFT]. We were unable to find any implementation with a documented wire protocol. However, the JRaft implementation is simple enough that we could inspect the code and then document its protocol. This proposal is the result of that effort. This will be the backend for coordination of routers publishing entries in a Meta LeaseSet. See proposal 123. Goals ===== - Small code size - Based on existing implementation - No serialized Java objects or any Java-specific features or encoding - Any bootstrapping is out-of-scope. At least one other server is assumed to be hardcoded, or configured out-of-band of this protocol. - Support both out-of-band and in-I2P use cases. Design ====== The Raft protocol is not a concrete protocol; it defines only a state machine. Therefore we document the concrete protocol of JRaft and base our protocol on it. There are no changes to the JRaft protocol other than the addition of an authentication handshake. Raft elects a Leader whose job is to publish a log. The log contains Raft Configuration data and Application data. Application data contains the status of each Server's Router and the Destination for the Meta LS2 cluster. The servers use a common algorithm to determine the publisher and contents of the Meta LS2. The publisher of the Meta LS2 is NOT necessarily the Raft Leader. Specification ============= The wire protocol is over SSL sockets or non-SSL I2P sockets. I2P sockets are proxied through the HTTP Proxy. There is no support for clearnet non-SSL sockets. Handshake and authentication ---------------------------- Not defined by JRaft. Goals: - User/password authentication method - Version identifier - Cluster identifier - Extensible - Ease of proxying when used for I2P sockets - Do not unnecessarily expose server as a Garlic Farm server - Simple protocol so a full web server implementation is not required - Compatible with common standards, so implementations may use standard libraries if desired We will use an websocket-like handshake [WEBSOCKET] and HTTP Digest authentication [RFC-2617]. RFC 2617 Basic authentication is NOT supported. When proxying through the HTTP proxy, communicate with the proxy as specified in [RFC-2616]. Credentials ``````````` Whether usernames and passwords are per-cluster, or per-server, is implementation-dependent. HTTP Request 1 `````````````` The originator will send the following. All lines are teriminated with CRLF as required by HTTP. GET /GarlicFarm/CLUSTER/VERSION/websocket HTTP/1.1 Host: (ip):(port) Cache-Control: no-cache Connection: close (any other headers ignored) (blank line) CLUSTER is the name of the cluster (default "farm") VERSION is the Garlic Farm version (currently "1") HTTP Response 1 ``````````````` If the path is not correct, the recipient will send a standard "HTTP/1.1 404 Not Found" response, as in [RFC-2616]. If the path is correct, the recipient will send a standard "HTTP/1.1 401 Unauthorized" response, including the WWW-Authenticate HTTP digest authentication header, as in [RFC-2617]. Both parties will then close the socket. HTTP Request 2 `````````````` The originator will send the following, as in [RFC-2617] and [WEBSOCKET]. All lines are teriminated with CRLF as required by HTTP. GET /GarlicFarm/CLUSTER/VERSION/websocket HTTP/1.1 Host: (ip):(port) Cache-Control: no-cache Connection: keep-alive, Upgrade Upgrade: websocket (Sec-Websocket-* headers if proxied) Authorization: (HTTP digest authorization header as in RFC 2617) (any other headers ignored) (blank line) CLUSTER is the name of the cluster (default "farm") VERSION is the Garlic Farm version (currently "1") HTTP Response 2 ``````````````` If the authentication is not correct, the recipient will send another standard "HTTP/1.1 401 Unauthorized" response, as in [RFC-2617]. If the authentication is correct, the recipient will send the following response, as in [WEBSOCKET]. All lines are teriminated with CRLF as required by HTTP. HTTP/1.1 101 Switching Protocols Connection: Upgrade Upgrade: websocket (Sec-Websocket-* headers) (any other headers ignored) (blank line) After this is received, the socket remains open. The Raft protocol as defined below commences, on the same socket. Caching ``````` Credentials shall be cached for at least one hour, so that subsequent connections may jump directly to "HTTP Request 2" above. Message Types ------------- There are two types of messages, requests and responses. Requests may contain Log Entries, and are variable-sized; responses do not contain Log Entries, and are fixed-size. Message types 1-4 are the standard RPC messages defined by Raft. This is the core Raft protocol. Message types 5-15 are the extended RPC messages defined by JRaft, to support clients, dynamic server changes, and efficient log synchronization. Message types 16-17 are the Log Compaction RPC messages defined in Raft section 7. ======================== ====== =========== ================= ===================================== Message Number Sent By Sent To Notes ======================== ====== =========== ================= ===================================== RequestVoteRequest 1 Candidate Follower Standard Raft RPC; must not contain log entries RequestVoteResponse 2 Follower Candidate Standard Raft RPC AppendEntriesRequest 3 Leader Follower Standard Raft RPC AppendEntriesResponse 4 Follower Leader / Client Standard Raft RPC ClientRequest 5 Client Leader / Follower Response is AppendEntriesResponse; must contain Application log entries only AddServerRequest 6 Client Leader Must contain a single ClusterServer log entry only AddServerResponse 7 Leader Client Leader will also send a JoinClusterRequest RemoveServerRequest 8 Follower Leader Must contain a single ClusterServer log entry only RemoveServerResponse 9 Leader Follower SyncLogRequest 10 Leader Follower Must contain a single LogPack log entry only SyncLogResponse 11 Follower Leader JoinClusterRequest 12 Leader New Server Invitation to join; must contain a single Configuration log entry only JoinClusterResponse 13 New Server Leader LeaveClusterRequest 14 Leader Follower Command to leave LeaveClusterResponse 15 Follower Leader InstallSnapshotRequest 16 Leader Follower Raft Section 7; Must contain a single SnapshotSyncRequest log entry only InstallSnapshotResponse 17 Follower Leader Raft Section 7 ======================== ====== =========== ================= ===================================== Establishment ------------- After the HTTP handshake, the establishment sequence is as follows: New Server Alice Random Follower Bob ClientRequest -------> <--------- AppendEntriesResponse If Bob says he is the leader, continue as below. Else, Alice must disconnect from Bob and connect to the leader. New Server Alice Leader Charlie ClientRequest -------> <--------- AppendEntriesResponse AddServerRequest -------> <--------- AddServerResponse <--------- JoinClusterRequest JoinClusterResponse -------> <--------- SyncLogRequest OR InstallSnapshotRequest SyncLogResponse -------> OR InstallSnapshotResponse Disconnect Sequence: Follower Alice Leader Charlie RemoveServerRequest -------> <--------- RemoveServerResponse <--------- LeaveClusterRequest LeaveClusterResponse -------> Election Sequence: Candidate Alice Follower Bob RequestVoteRequest -------> <--------- RequestVoteResponse if Alice wins election: Leader Alice Follower Bob AppendEntriesRequest -------> (heartbeat) <--------- AppendEntriesResponse Definitions ----------- - Source: Identifies the originator of the message - Destination: Identifies the recipient of the message - Terms: See Raft. Initialized to 0, increases monotonically - Indexes: See Raft. Initialized to 0, increases monotonically Requests -------- Requests contain a header and zero or more log entries. Requests contain a fixed-size header and optional Log Entries of variable size. Request Header `````````````` The request header is 45 bytes, as follows. All values are unsigned big-endian. Message type: 1 byte Source: ID, 4 byte integer Destination: ID, 4 byte integer Term: Current term (see notes), 8 byte integer Last Log Term: 8 byte integer Last Log Index: 8 byte integer Commit Index: 8 byte integer Log entries size: Total size in bytes, 4 byte integer Log entries: see below, total length as specified Notes ~~~~~ In the RequestVoteRequest, Term is the candidate's term. Otherwise, it is the leader's current term. In the AppendEntriesRequest, when the log entries size is zero, this message is a heartbeat (keepalive) message. Log Entries ``````````` The log contains zero or more log entries. Each log entry is as follows. All values are unsigned big-endian. Term: 8 byte integer Value type: 1 byte Entry size: In bytes, 4 byte integer Entry: length as specified Log Contents ```````````` All values are unsigned big-endian. ======================== ====== Log Value Type Number ======================== ====== Application 1 Configuration 2 ClusterServer 3 LogPack 4 SnapshotSyncRequest 5 ======================== ====== Application ~~~~~~~~~~~ Application contents are UTF-8 encoded [JSON]. See the Application Layer section below. Configuration ~~~~~~~~~~~~~ This is used for the leader to serialize a new cluster configuration and replicate to peers. It contains zero or more ClusterServer configurations. Log Index: 8 byte integer Last Log Index: 8 byte integer ClusterServer Data for each server: ID: 4 byte integer Endpoint data len: In bytes, 4 byte integer Endpoint data: ASCII string of the form "tcp://localhost:9001", length as specified ClusterServer ~~~~~~~~~~~~~ The configuration information for a server in a cluster. This is included only in a AddServerRequest or RemoveServerRequest message. When used in a AddServerRequest Message: ID: 4 byte integer Endpoint data len: In bytes, 4 byte integer Endpoint data: ASCII string of the form "tcp://localhost:9001", length as specified When used in a RemoveServerRequest Message: ID: 4 byte integer LogPack ~~~~~~~ This is included only in a SyncLogRequest message. The following is gzipped before transmission: Index data len: In bytes, 4 byte integer Log data len: In bytes, 4 byte integer Index data: 8 bytes for each index, length as specified Log data: length as specified SnapshotSyncRequest ~~~~~~~~~~~~~~~~~~~ This is included only in a InstallSnapshotRequest message. Last Log Index: 8 byte integer Last Log Term: 8 byte integer Config data len: In bytes, 4 byte integer Config data: length as specified Offset: The offset of the data in the database, in bytes, 8 byte integer Data len: In bytes, 4 byte integer Data: length as specified Is Done: 1 if done, 0 if not done (1 byte) Responses --------- All responses are 26 bytes, as follows. All values are unsigned big-endian. Message type: 1 byte Source: ID, 4 byte integer Destination: Usually the actual destination ID (see notes), 4 byte integer Term: Current term, 8 byte integer Next Index: Initialized to leader last log index + 1, 8 byte integer Is Accepted: 1 if accepted, 0 if not accepted (see notes), 1 byte Notes ````` The Destination ID is usually the actual destination for this message. However, for AppendEntriesResponse, AddServerResponse, and RemoveServerResponse, it is the ID of the current leader. In the RequestVoteResponse, Is Accepted is 1 for a vote for the candidate (requestor), and 0 for no vote. Application Layer ================= Each Server periodically posts Application data to the log in a ClientRequest. Application data contains the status of each Server's Router and the Destination for the Meta LS2 cluster. The servers use a common algorithm to determine the publisher and contents of the Meta LS2. The server with the "best" recent status in the log is the Meta LS2 publisher. The publisher of the Meta LS2 is NOT necessarily the Raft Leader. Application Data Contents ------------------------- Application contents are UTF-8 encoded [JSON], for simplicity and extensibility. The full specification is TBD. The goal is to provide enough data to write an algorithm to determine the "best" router to publish the Meta LS2, and for the publisher to have sufficient information to weight the Destinations in the Meta LS2. The data will contain both router and Destination statistics. The data may optionally contain remote sensing data on the health of the other servers, and the ability to fetch the Meta LS. These data would not be supported in the first release. The data may optionally contain configuration information posted by an administrator client. These data would not be supported in the first release. If "name: value" is listed, that specifies the JSON map key and value. Otherwise, specification is TBD. Cluster data (top level): - cluster: Cluster name - date: Date of this data (long, ms since the epoch) - id: Raft ID (integer) Configuration data (config): - Any configuration parameters MetaLS publishing status (meta): - destination: the metals destination, base64 - lastPublishedLS: if present, base64 encoding of the last published metals - lastPublishedTime: in ms, or 0 if never - publishConfig: Publisher config status off/on/auto - publishing: metals publisher status boolean true/false Router data (router): - lastPublishedRI: if present, base64 encoding of the last published router info - uptime: Uptime in ms - Job lag - Exploratory tunnels - Participating tunnels - Configured bandwidth - Current bandwidth Destinations (destinations): List Destination data: - destination: the destination, base64 - uptime: Uptime in ms - Configured tunnels - Current tunnels - Configured bandwidth - Current bandwidth - Configured connections - Current connections - Blacklist data Remote router sensing data: - Last RI version seen - LS Fetch time - Connection test data - Closest floodfills profile data for time periods yesterday, today, and tomorrow Remote destination sensing data: - Last LS version seen - LS Fetch time - Connection test data - Closest floodfills profile data for time periods yesterday, today, and tomorrow Meta LS sensing data: - Last version seen - Fetch time - Closest floodfills profile data for time periods yesterday, today, and tomorrow Administration Interface ======================== TBD, possibly a separate proposal. Not required for the first release. Requirements of an admin interface: - Support for multiple master destinations, i.e. multiple virtual clusters (farms) - Provide comprehensive view of shared cluster state - all stats published by members, who is the current leader, etc. - Ability to force removal of a participant or leader from the cluster - Ability to force publish metaLS (if current node is publisher) - Ability to exclude hashes from metaLS (if current node is publisher) - Configuration import/export functionality for bulk deployments Router Interface ================ TBD, possibly a separate proposal. i2pcontrol is not required for the first release and detailed changes will be included in a separate proposal. Requirements for Garlic Farm to router API (in-JVM java or i2pcontrol) - getLocalRouterStatus() - getLocalLeafHash(Hash masterHash) - getLocalLeafStatus(Hash leaf) - getRemoteMeasuredStatus(Hash masterOrLeaf) // probably not in MVP - publishMetaLS(Hash masterHash, List contents) // or signed MetaLeaseSet? Who signs? - stopPublishingMetaLS(Hash masterHash) - authentication TBD? Justification ============= Atomix is too large and won't allow customization for us to route the protocol over I2P. Also, its wire format is undocumented, and depends on Java serialization. Notes ===== Issues ====== - There's no way for a client to find out about and connect to an unknown leader. It would be a minor change for a Follower to send the Configuration as a Log Entry in the AppendEntriesResponse. Migration ========= No backward compatibility issues. References ========== [JRAFT] https://github.com/datatechnology/jraft [JSON] https://json.org/ [RAFT] https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf [RFC-2616] https://tools.ietf.org/html/rfc2616 [RFC-2617] https://tools.ietf.org/html/rfc2617 [WEBSOCKET] https://en.wikipedia.org/wiki/WebSocket