Network Working Group                                       T. Klingberg
Request for Comments: NNNN                                   R. Manfredi
Category: Informational                                        June 2002



                        Gnutella 0.6


Status of this Memo

   This is a draft.


Copyright Notice

   Copyright (C) 2002, Tor Klingberg & Raphael Manfredi
   All Rights Reserved.

   Permission is granted to make verbatim copies of this document, 
   provided the Copyright Notice is preserved.


Rights of Free Implementation

   The authors of the various proposals that make up this document
   grant the rights to anyone to freely implement those proposals.
   Gnutella is an Open Protocol, where the specifications are
   public and free of any patent.


Table of Contents

   1   Introduction
   1.1   Purpose
   1.2   Requirements
   1.3   Terminology
   1.4   Extending the protocol
   2   Protocol Definition
   2.1   Initiating a Connection
   2.2   Gnutella Messages
   2.2.1   Message Header
   2.2.2   Ping (0x00)
   2.2.3   Pong (0x01)
   2.2.4   Use of Ping and Pong messages
   2.2.4.1   A simple pong caching scheme
   2.2.4.2   Other pong caching schemes
   2.2.5   Query (0x80)
   2.2.6   Query Hit
   2.2.7   Use of Query and Query Hit
   2.2.7.1   Forwarding and routing of Query and Query Hit messages
   2.2.7.2   When and how to send new Query messages.
   2.2.7.3   When and how to respond with Query Hit messages.
   2.2.8   Push (0x40)
   2.2.9   Bye (0x02)
   2.3   GGEP Extension blocks
   2.3.1   GGEP Format
   2.3.2   Creating Extension IDs
   3   Protocol Usage
   3.1   Flow Control
   3.2   Network Structure
   3.2.1   Ultrapeer system
   3.2.2   Query Routing Protocol (unfinished)
   4   File Transfer
   4.1   Normal File Transfer
   4.2   Firewalled servents
   4.3   Busy Servents
   4.4   Sharing
   5   Security Considerations
   5.1   Threats against individual Gnutella participants
   5.2   Threats against the Gnutella network
   5.3   Threats against third parties
   6   Credits
   Appendix 1   HUGE (Hash/URN Gnutella Extensions)
   Appendix 2   XML
   Appendix 3   Finding a Gnutella host
   Appendix 4   When to open or accept new Gnutella connections
   Appendix 5   Gnutella network traffic compression


1 Introduction

1.1 Purpose     

Gnutella is a decentralized peer-to-peer system. It allows the
participants to share resources from their system for others to
see and get, and locate resources shared by others on the network.

Resources can be anything: mappings to other resources, cryptographic
keys, files of any type, meta-information on keyable resources, etc.
However, the semantics for locating and handling resources other than
plain files are not specified in this document.

Each participant launches a Gnutella program, which will seek out 
other Gnutella nodes to connect to.  This set of connected nodes 
carries the Gnutella traffic, which is essentially made of queries, 
replies to those queries, and also other control messages to 
facilitate the discovery of other nodes.

Users interact with the nodes by supplying them with the list of
resources they wish to share on the network, can enter searches for
other's resources, will hopefully get results from those searches,
and can then select those resources amongst the results: if those
resources are files, for instance, they can download them.  But one
can imagine other types of resources that, once fetched, will bring
more than their content value.

Resource data exchanges between nodes are negotiated using the 
standard HTTP protocol.  The Gnutella network is only used to locate 
the nodes sharing those resources.

This document is intended for readers with a fair knowledge of 
network programming, but do not require any previous Gnutella 
experience.  Still, other implementations of this protocol will give
useful information about implementation techniques that is not 
included in this document.  A list of Gnutella programs can be found 
at http://www.gnutelliums.com


1.2 Requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [34].


1.2.1 The Gnutella Development Forum (the GDF)

The Gnutella Development Forum is a good place to find more Gnutella 
documentation, proposals about changes and extensions and to discuss 
Gnutella development with other developers. The message archive is 
also a good source for information about the protocol and its 
implementation. Some of the links in this document requires 
membership in the Gnutella Development Forum. Everyone is, of course,
allowed to become a member. The GDF is located at 
http://groups.yahoo.com/group/the_gdf

There are many other forums for discussing Gnutella development as 
well.


1.3 Terminology

Servent         A program participating in the Gnutella network is 
                called a servent. The words "peer", "node" and "host"
                have similar meanings, but refers to a network 
                participant rather than a program. When a servent 
                have a clear client or server role the words "client"
                or "server" may be used. The word "client" is 
                sometimes used as a synonym for servent. This is a 
                contraction of "SERVer" and "cliENT", Some other 
                documents use the word "servant" instead of servent.

Message         Messages are the entity in which information is
                transmitted over the network. Sometimes the word 
                "packet" is used with the same meaning. Some other 
                documents use the word "descriptor"

GUID            Globally Unique IDentifier.  This is a 16-byte long
                value made of random bytes, whose purpose it is to
                identify servents and messages.  This identification
                is not a signature, just a way to identify network
                entities in a unique manner.

1.4 Extending the protocol

This document is the definition of the Gnutella 0.6 protocol. 
Servents MAY extend the protocol or even change parts of it (for 
example by compressing or encrypting the messages), but servents 
MUST always stay compatible with servents that follow this 
specification. 

If a servent, for example, wants to compress the Gnutella messages, 
it MUST first make sure the remote host of a connection can 
decompress the stream (during handshake), and otherwise leave the 
messages uncompressed. Servents MAY chose not to accept a connection 
with a servent that does not support a feature, but MUST always make 
sure that the Gnutella network is not split into separate networks. 

Separate networks for special purposes are, of course, allowed but 
then it is no longer the Gnutella network, but another network. 

This protocol also allows for extensions inside many messages. Such 
extensions can pass through servents that do not know about the 
extension to reach servents that do.


2 Protocol Definition

The Gnutella protocol defines the way in which servents communicate 
over the network. It consists of a set of messages used for 
communicating data between servents and a set of rules governing 
the inter-servent exchange of messages. Currently, the following 
messages are defined:

Ping            Used to actively discover hosts on the network. A 
                servent receiving a Ping message is expected to 
                respond with one or more Pong messages.

Pong            The response to a Ping. Includes the address of a 
                connected Gnutella servent, the listening port of
                that servent, and information regarding the amount
                of data it is making available to the network.

Query           The primary mechanism for searching the distributed 
                network. A servent receiving a Query message will
                respond with a Query Hit if a match is found against 
                its local data set.

QueryHit        The response to a Query. This message provides the
                recipient with enough information to acquire the data
                matching the corresponding Query.

Push            A mechanism that allows a firewalled servent to 
                contribute file-based data to the network.

Bye             An optional message used to inform the remote host 
                that you are closing the connection, and your reason 
                for doing so.


2.1 Initiating a Connection

A Gnutella servent connects itself to the network by establishing a 
connection with another servent currently on the network. 
Techniques for finding the first host are described in Appendix 3. 
Once the first connection is established, the addresses of more hosts
will be supplied over the network. The default Gnutella port is 6346,
but servents MAY use any unused port. If the desired port is used 
(probably by another Gnutella servent) the servent SHOULD attempt to 
listen on another port.  This listening port is advertised by the
servent through the Pong messages.

Techniques and rules for how to select what other Gnutella hosts to 
connect to and when to accept connection requests can be found in 
Appendix 4.

Once the address of another servent on the network is obtained, a 
TCP/IP connection to the servent is created, and a handshaking 
sequence is initiated. The client is the host initiating the 
connection and the server is the host receiving it. "" refers 
to ASCII character 13 (carriage return), and "" to 10 (new line).

   1. The client establishes a TCP connection with the server.
   2. The client sends "GNUTELLA CONNECT/0.6".
   3. The client sends all capability headers--except for
      vendor-specific headers--each terminated by "", with
      an extra "" at the end.
   4. The server responds with "GNUTELLA/0.6 200 ".
       SHOULD be "OK", but servents SHOULD just look for the
      "200" code.
   5. The server sends all its headers, in the same format as in (3).
   6. The client sends "GNUTELLA/0.6 200 OK, as in (4) if
      after parsing the server's headers, it still wishes to connect.
      Otherwise, it needs to reply with an error code and close the
      connection.
   7. The client sends any vendor-specific headers as needed, in the
      same format as (3).
   8. Both client and server send binary messages at will, using the
      information gained in (3) and (5).  

All headers SHOULD be registered with the GDF database at
http://groups.yahoo.com/group/the_gdf/database?method=reportRows&tbl=9
(Requires GDF membership)

Headers follow the standards described in RFC822 and RFC2616.  Each
header is made of a field name, followed by a colon, and then the 
value.  Each line ends with the  sequence, and the end of the
headers is marked by a single  line.  Each line normally 
starts a new header, unless it begins with a space or an horizontal 
tab (ASCII codes 32 and 9 in decimal, respectively), in which case it
continues the preceding header line.  The extra spaces and tabs may 
be collapsed into a single space as far as the header value goes.  
For instance:

    First-Field: this is the value of the first field
    Second-Field: this is the value
        of the
        second field
    

The header above is made of two fields, "First-Field" and "Second-
Field" whose values are respectively "this is the value of the first 
field" and "this is the value of the second field" (leading spaces of
the continuation were collapsed).  Note that the leading space 
between the ":" ending the field name and the start of the value 
string does not count.

Multiple header lines with the same field name are identical to one
header line where all the values of the fields would be separated by 
",". This means:

    Field: first
    Field: second

is strictly equivalent to saying:

    Field: first,second

In other words, order matters in that case.

Here is a sample interaction between a client and a server.  Data 
sent from client to server is shown on the left; data sent from 
server to client is shown on the right.

     Client                           Server
     -----------------------------------------------------------
     GNUTELLA CONNECT/0.6
     User-Agent: BearShare/1.0
     Pong-Caching: 0.1
     GGEP: 0.5
     
                                      GNUTELLA/0.6 200 OK
                                      User-Agent: BearShare/1.0
                                      Pong-Caching: 0.1
                                      GGEP: 0.5
                                      Private-Data: 5ef89a
                                      
     GNUTELLA/0.6 200 OK
     Private-Data: a04fce
     

     [binary messages]                [binary messages]

A few notes about the responses: first, the client (server) SHOULD
disconnect if receiving any response other than "200" at step 4
(6).  There is no need to define these error codes now.  Second,
servents SHOULD ignore higher version numbers in steps (2), (4), and
(6).  For example, it is perfectly legal for a future client to
connect to a server and send "GNUTELLA CONNECT/0.7".  The server
SHOULD respond with "GNUTELLA/0.7 200 OK" if it supports the 0.7
protocol, or "GNUTELLA/0.6 200 OK" otherwise.

A few notes about the headers: servents SHOULD use standard HTTP
headers whenever appropriate.  For example, servents SHOULD use the
standard "User-Agent" header rather than make up a "Servent-Vendor"
header.  However, it is perfectly legal to add new headers (e.g.,
"Query-Routing") when no appropriate HTTP header exists, as long as
they follow HTTP syntax. Headers unknown to the servent MUST be 
ignored.

Some older servents will initiate the handshake by sending 
"GNUTELLA CONNECT/0.4". The server SHOULD then reply with 
"GNUTELLA OK" followed by binary messages, if it can accept 
the connection. Servents MAY retry using the 0.4 connect string if
the 0.6 connection attempt were rejected. No handshaking headers can 
be used in 0.4 handshaking.

When rejecting a connection, a servent MUST, if possible, provide the
remote host with a list of other Gnutella hosts, so it can try 
connecting to them. This SHOULD be done using the X-Try header.

An X-Try header can look like:

        X-Try:1.2.3.4:1234,3.4.5.6:3456

There MAY be a space after the colon and after each comma. There MAY 
be multiple X-Try headers in one header set. The header MAY end with
an extra comma.  The header MAY be formatted on several lines using
continuations.

Each item in the X-Try header gives the IP address of a servent
and its listening port number.  This is sometimes referred to as
being a "connection pong".  If the server sending the X-Try 
implements Pong-Caching, then the connection pongs being sent must be
fresh ones.

The normal status code for rejecting a connection because the servent
is busy is "503 " followed by "Busy" or another description string.


2.2 Gnutella Messages

Once a servent has connected successfully to the network, it 
communicates with other servents by sending and receiving Gnutella 
protocol messages. Each message is preceded by a Message Header with 
the byte structure given below.

Note 1: One IP packet may contain several Gnutella messages, and 
one Gnutella message may be split up on multiple IP-packets. This 
means one can never assume a Gnutella message ends when the chunk of 
data read from the socket ends.

Note 2: All fields in the following structures are in little-endian 
byte order unless otherwise specified.

Note 3: All IP addresses in the following structures are in IPv4 
format. For example, the IPv4 byte array

    0xD0     0x11     0x32     0x04
    byte 0   byte 1   byte 2   byte 3

represents the dotted address 208.17.50.4.


2.2.1 Message Header

The message header is 23 bytes divided into the following fields.

    Bytes:  Description:
    0-15    Message ID/GUID (Globally Unique ID)
    16      Payload Type
    17      TTL (Time To Live)
    18      Hops
    19-22   Payload Length

Message ID      A 16-byte string (GUID) uniquely identifying the
                message on the network. 
                        
                Servents SHOULD store all 1's (0xff) in byte 8 of the
                GUID.  (Bytes are numbered 0-15, inclusive.) This 
                serves to tag the GUID as being from a modern 
                servent.
        
                Servents SHOULD initially store all 0's in byte 15 of
                the GUID. This is reserved for future use.

                The other bytes SHOULD have random values.

Payload         Indicates the type of message
Type            0x00 = Ping
                0x01 = Pong
                0x02 = Bye
                0x40 = Push
                0x80 = Query
                0x81 = Query Hit

                Other Gnutella messages can be used, but if so the
                servent MUST first make sure that the remote host 
                supports this new message type. This can be done 
                using handshaking headers.

TTL             Time To Live. The number of times the message 
                will be forwarded by Gnutella servents before it is 
                removed from the network. Each servent will decrement
                the TTL before passing it on to another servent. When
                the TTL reaches 0, the message will no longer be 
                forwarded (and MUST not).

Hops            The number of times the message has been forwarded.
                As a message is passed from servent to servent, the
                TTL and Hops fields of the header must satisfy the 
                following condition:
                TTL(0) = TTL(i) + Hops(i)
                Where TTL(i) and Hops(i) are the value of the TTL and
                Hops fields of the message, and TTL(0) is maximum 
                number of hops a message will travel (usually 7).

Payload         The length of the message immediately following 
Length          this header. The next message header is located 
                exactly this number of bytes from the end of this 
                header i.e. there are no gaps or pad bytes in the 
                Gnutella data stream. Messages SHOULD NOT be larger
                than 4 kB.

The Payload Length field is the only reliable way for a servent to 
find the beginning of the next message in the input stream. 
Therefore, servents SHOULD rigorously validate the Payload Length 
field for each message received.  If a servent becomes out of synch 
with its input stream, it SHOULD close the connection associated with
the stream since the upstream servent is either generating, or 
forwarding, invalid messages.

Abuse of the TTL field in broadcasted messages (Query) will lead to 
an unnecessary amount of network traffic and poor network 
performance.  Therefore, servents SHOULD carefully check the TTL 
fields of received query messages and lower them as necessary.  
Assuming the servent's maximum admissible Query message life is 7 
hops, then if TTL + Hops > 7, TTL SHOULD be decreased so that TTL + 
Hops = 7.  Broadcasted messages with very high TTL values (>15) 
SHOULD be dropped.

Immediately following the message header, is a payload consisting 
of one of the following messages.


2.2.2 Ping (0x00)

Ping messages MAY contain a GGEP extension block (see Section 2.3),
but no other payload.


2.2.3 Pong (0x01)

Pong messages contains information about a Gnutella host. The 
message has the following fields

    Bytes:  Description:
    0-1     Port number. The port number on which the responding
            host can accept incoming connections.
    2-5     IP Address. The IP address of the responding host.
            Note: This field is in big-endian format.
    6-9     Number of shared files. The number of files that the
            servent with the given IP address and port is sharing
            on the network.
    10-13   Number of kilobytes shared. The number of kilobytes
            of data that the servent with the given IP address and
            port is sharing on the network.
    14-     OPTIONAL GGEP extension block. (see Section 2.3)

Pong messages are only sent in response to an incoming Ping 
message. It is valid for more than one Pong message to be sent in 
response to a single Ping message. This enables host caches to send 
cached servent address information in response to a Ping request.

The Message ID of a Pong message MUST be the Message ID of the Ping 
message it is sent in reply to.

The fields specifying the number of shared files and the number of 
kilobytes shared was intended to allow one to measure the amount of 
data available on the network.  With a very large Gnutella network, 
and minimized Ping and Pong message traffic, this can no longer be 
done.  Still, these fields SHOULD be filled out correctly. 


2.2.4 Use of Ping and Pong messages

In early versions Gnutella, Ping messages were broadcasted over the
network. Pong messages were then routed back to the originator of 
the Ping message the same way as Query Hits messages are routed
(se section 2.2.7). That system consumed a lot of network bandwidth, 
so modern Gnutella servents cache Pong messages, or use other means 
of minimizing the bandwidth used by Ping and Pong messages. 

There are different systems for handling Ping and Pong messages, 
but what they have in common is:

    * When a Ping message is received (TTL>1 and it was at least one 
      second since another Ping was received on that connection), a 
      servent MUST, if possible, respond with a number of Pong 
      Messages. These pongs MUST have the same message ID as the 
      incoming ping, and a TTL no lower than the hops value of the 
      ping. The number of pongs returned may vary, but 10 is a 
      reasonable number. Servents that are able to accept incoming 
      Gnutella SHOULD reply to these Ping messages.

    * The pongs sent SHOULD have a good quality. That includes
      high probability that they are connectable and a good spread 
      of hosts from across the network

    * The bandwidth used by Ping and Pong messages SHOULD be 
      minimized. Servents MUST never output very high quantities of 
      Ping and Pong messages.

    * An incoming Ping message with TTL = 1 and Hops = 0 or 1 is 
      used to probe the remote host of a connection, and MUST 
      always be replied to with a pong having information about the 
      host who received the ping.

    * An incoming Ping message with TTL = 2 and Hops = 0 is a 
      "Crawler Ping" used to scan the network. It and SHOULD be 
      replied to with pongs containing information about the host 
      receiving the ping and all other hosts it is connected to. The
      information about neighbour nodes can be provided either by 
      creating pongs on their behalf, or by forwarding the ping to 
      them, and forward the pongs returned to the crawler.

Servents fulfilling these requirements MUST provide a the header
"Pong-Caching: 0.1" (or a higher number if a later version is used) 
during the handshake. That allows other nodes to know if pong 
caching in any form is supported. Note that this applies to servents 
do not really cache pong messages as well, as long as the rules above
applies. Servents are strongly RECOMMENDED to follow the rules above,
and provide the Pong-Caching header.

When storing or forwarding Pong messages, any GGEP payload SHOULD be
included. When sending a Ping message, one cannot know if it will 
reach only the neighbour host, or many hosts on the network. It 
depends on what system for handling Ping and Pong messages other 
servents are using. Servents MUST NOT make assumptions of how far 
a Ping message (and its payload) will reach.


2.2.4.1 A simple pong caching scheme

This is one system for handling Ping and Pong messages. There are 
others available (see sect. 2.2.4.2), and any system that abides to 
the rules in sect. 2.2.4 is ok.

For each connection an array of Pong Messages are stored. 10 may 
be a good number. When a pong comes in, it overwrites the oldest 
stored pong in array of he connection the pong came from. The 
information that must be stored for each pong is:

 * IP Address
 * Port number
 * Number of files shared
 * Number of kilobytes shared
 * GGEP extension block (if present)
 * Hops value, i.e. how far away on the network the host using the 
   stored address is

When a Ping message, called P, is received over connection C, and
it has been at least one second since last time a ping was received 
over C, the servent will return a number of pongs (10 for example)
from its stored pongs. The pongs will be pick from all connections
except from C, since it would be no good sending pongs back where 
they came from. A servent should also return a pong with information 
about itself, if it can accept incoming connections.

The outgoing pong will have the same message ID as P, not the message
ID it had when the pong was received. The Hops is set to the stored 
hops value + 1, and TTL so that TTL+Hops=7. If the TTL is less than 
P's Hops value, the current stored pong will not be sent. This also 
means that pongs whose Hops value already is 7 will not be propagated
any further.

Exactly how to select which of the stored pongs to send in response 
to an incoming ping is up to each servent. A good idea is to pick 
pongs from different connections and with varying stored Hops values.

To keep the cache fresh, a ping (TTL=7, Hops=0) is sent over all 
connections at small interval (like every 3 seconds). This look like
very often, but remember that the neighbour servents will just 
respond with pongs from its own cache. The short time ensures that 
pongs are always fresh. To neighbour hosts who has not indicated 
that they support pong caching (using the Pong-Caching handshaking 
header), one ping per minute might be a better number.

Incoming pings with TTL=1 and Hops=0 or 1 (see above section 2.2.4) 
is replied to with a single pong containing information about the 
local host. Pings with TTL=2 and Hops=0 are replied to with one pong 
about the local host, and one about each other host the local host is
connected to. Information about the neighbour hosts is retrieved when
a new connection is started by sending a TTL=1, Hops=0 ping and 
storing the pong returned. This can be done using handshaking headers
instead.

The bandwidth used by this scheme is very limited. Assuming a ping is
sent every 3 seconds and that 10 pongs are returned to every ping. 
Since a (without extensions) is 23 bytes and a pong (without 
extensions) is 37 bytes, the amount of bandwidth used per connection 
is (23+10*37)/3 = 131 bytes/sec/connection. If extensions are used in
ping and/or pong messages, the bandwidth usage will increase, but 
will still be kept on an acceptable level. If the bandwidth usage 
must re decreased further, the interval between update pings could be
increased.


2.2.4.2 Other pong caching schemes

A slightly more advanced scheme for pong caching is available at
http://www.limewire.com/index.jsp/pingpong

A different, but compatible scheme can be found at
http://groups.yahoo.com/group/the_gdf/files/Proposals/PONG/Variants/
        pingreduce.txt

Other schemes might have been created after this was written.


2.2.5 Query (0x80)

Since Query messages are broadcasted to many nodes, the total size 
of the message SHOULD not be larger than 256 bytes. Servents MAY drop
Query messages larger that 256 bytes, and SHOULD drop Query messages 
larger than 4 kB.

A Query message has the following fields:

Bytes:  Description:
0-1     Minimum Speed. The minimum speed (in kb/second) of servents
        that should respond to this message. A servent receiving a 
        Query message with a Minimum Speed field of n kb/s SHOULD 
        only respond with a Query Hit if it is able to communicate at
        a speed >= n kb/s.
        
2-      Search Criteria. This field is terminated by a NUL (0x00).

        See section 2.2.7.3 for rules and information on how to 
        interpret the Search Criteria
        
Rest    OPTIONAL extensions block. The rest of the query message is
        used for extensions to the original query format. The allowed
        extension types are GGEP, HUGE and XML (see Section 2.3 and 
        Appendixes 1 and 2).
        
        If two or more of these extension types exist together, 
        they are separated by a 0x1C (file separator) byte. Since 
        GGEP blocks can contain 0x1C bytes, the GGEP block, if 
        present, MUST be located after any HUGE and XML blocks.
        
        The type of each block can be determined by looking for the 
        prefixes "urn:" for a HUGE block, "<" or "{" for XML and 0xC3 for
        GGEP.

        The extension block SHOULD NOT be followed by a null (0x00) 
        byte, but some servents wrongly do that.


2.2.6 Query Hit

Query Hit messages has the following fields:

Bytes:  Description:
0       Number of Hits. The number of query hits in the result set 
        (see below).
        
1-2     Port. The port number on which the responding host can accept
        incoming HTTP file requests. This is usually the same port as
        is used for Gnutella network traffic, but any port MAY be 
        used.
        
3-6     IP Address. The IP address of the responding host. 
        Note: This field is in big-endian format.

7-10    Speed The speed (in kb/second) of the responding host.
        
11-     Result Set. A set of responses to the corresponding Query. 
        This set contains Number_of_Hits elements, each with the 
        following structure:
        
        Bytes:  Description:
        0-3     File Index. A number, assigned by the responding 
                host, which is used to uniquely identify the file 
                matching the corresponding query.

        4-7     File Size. The size (in bytes) of the file whose 
                index is File_Index.

        8-      File Name. The name of the file whose index is 
                File_Index. Terminated by a null (i.e. 0x00) 

        x       Extensions block. Allowed extension types are HUGE, 
                GGEP and plain text metadata. This field is 
                terminated by a null (0x00), even if there are no 
                extensions (resulting in a double null). Also, the
                extensions block itself MUST NOT contain any null 
                bytes.
        
                If two or more of these extension types exist 
                together, they are separated by a 0x1C (file 
                separator) byte. Since GGEP blocks can contain 0x1C 
                bytes, the GGEP block, if present, MUST be located 
                after any HUGE and plan text blocks.
                
                The type of each block can be determined by looking 
                for the prefixes "urn:" for a HUGE block, 0xC3 for 
                GGEP and anything else is probably plain text 
                metadata.
        
                Plain text metadata is intended to be displayed 
                directly to the user. It was first invented by 
                Gnotella (a now discontinued Gnutella servent) to tag
                MP3 files. Examples:
                "192 kbps 44 kHz 3:23"
                "120 kbps(VBR) 44kHz 3:55" (variable bitrate)
                Other plan text formats MAY be used.
        
x       RECOMMENDED extra block. This block is not required, but 
        strongly recommended. It is sometimes called EQHD, or 
        (incorrectly) just QHD. It has the following format:
        
        Bytes:
        0-3     Vendor Code. Four case-insensitive characters 
                representing a vendor code. For example "LIME" for 
                LimeWire. See registered codes and register yours at
                http://groups.yahoo.com/group/the_gdf/database?
                        method=reportRows&tbl=6 
                        (Requires GDF membership)

        4       Open Data Size. Contains the length (in bytes) of the
                Open Data field. Set to 2 in most current 
                implementations, and 4 in those that support XML 
                metadata outside GGEP (see Section 2.3 and Appendix 2). 
                The Open Data area MAY be larger to allow future 
                extensions.
        
        x       Open Data. Contains two 1-byte flags fields with the 
                following layout and in the specified order:
                
                bit:    Description:
                7,6     Reserved for future use
                5       flagGGEP
                4       flagUploadSpeed
                3       flagHaveUploaded
                2       flagBusy
                1       Reserved for future use
                0       flagPush

                The first flag byte can be viewed as an "enabler" for
                the flags in the second byte, the "setter".  Only
                those bits that were enabled must be considered by
                the servent as being valid.  This logic is reversed
                for flagPush, which is set in the first byte and
                enabled in the second.  The enabling byte allows
                you to know which flags are supported by a given
                servent.

                Bits 5,4,3,2 in the first byte MUST be set if and 
                only if the corresponding flag in the second byte is 
                meaningful.

                Bit 0 in the second byte MUST be set if and only 
                if the corresponding flag in the second byte is 
                meaningful. Yes, the order is reversed for this flag.

                flagGGEP is set is set if and only if the private
                data block (see below) contains a GGEP block.

                flagUploadSpeed is set if and only if the Speed field
                of the QueryHit message contains the highest 
                average transfer rate (in kbps) of the last 10 
                uploads. Otherwise Speed field contains the hosts 
                total upload speed as set by the user, and therefore 
                less reliable.

                flagHaveUploaded is set if and only if the servent 
                has successfully uploaded at least one file.

                flagBusy is set if and only if the all of the 
                servent's upload slots are currently full.

                flagPush is set if and only if the servent is 
                firewalled or cannot accept incoming TCP connections 
                for any other reason.

                The reserved flags MUST not be set, unless they are 
                used for a future extension.

                If XML metadata (Appendix 2) is included in the 
                current Query Hit message, the following 2 bytes of
                Open Data area will contain the size of the XML 
                block. The XML block itself is placed in the private 
                area (see below).

x       Private Data. Undocumented vendor-specific data. This field 
        continues till the servent Identifier, which uses the last 16
        bytes of the message.

        If the flagGGEP in the open data block is set, this block
        contains a GGEP (see Section 2.3) extension block. The GGEP 
        block starts with a 0xC3 byte. Any data before or after the 
        GGEP block is vendor-specific data, and MUST be ignored, if 
        not recognized.

        Servents are NOT RECOMMENDED to use the private data area for
        vendor specific data. Servents SHOULD use GGEP extensions 
        instead.

        If the Open Data area indicates an XML block is will also be 
        placed in the private area (see Appendix 2). Assuming that 
        the two bytes in the Open Data area specifies an XML block of
        m bytes, that block can be found by extracting the last m 
        bytes of the private area. Both GGEP and XML can exist in the
        same Private Data area, but XML SHOULD be implemented inside 
        GGEP.
        [TODO: How about the nul after the XMP block? What is it good for?]

Last 16 Servent Identifier. A 16-byte string uniquely identifying the
        responding servent on the network. This SHOULD be constant 
        for all Query Hit messages emitted by a servent and is 
        typically some function of the servent's network address. The
        servent Identifier is mainly used for routing the Push 
        Message (see below).


2.2.7 Use of Query and Query Hit

2.2.7.1 Forwarding and routing of Query and Query Hit messages

A servent SHOULD forward incoming Query messages to all of its 
directly connected servents, except the one that delivered the 
incoming Query. Servents using Flow control or Ultrapeers (sections 
3.1 and 3.2) will not always forward every Query over every 
connection.

A servent MUST decrement a message header's TTL field, and 
increment its Hops field, before it forwards the message to any 
directly connected servent. If, after decrementing the header's TTL 
field, the TTL field is found to be zero, the message MUST NOT 
be forwarded along any connection.

A servent receiving a message with the same Payload Message and 
Message ID as one it has received before, MUST discard the 
message. It means the message has already been seen.

QueryHit messages MUST only be sent along the same path that 
carried the incoming Query message. This ensures that only those 
servents that routed the Query message will see the QueryHit 
message in response. A servent that receives a QueryHit message
with  Message ID = n, but has not seen a Query message with 
Message ID = n SHOULD remove the QueryHit message from the 
network.


2.2.7.2 When and how to send new Query messages.

Query messages are usually sent when the user initiates a search. A
servent MAY also create Queries automatically, to find more locations
of a resource for example. If doing so the servent MUST be very 
careful not overload the network. A servent SHOULD not send more than
one automatic query per hour.

Servents SHOULD NOT allow the user to create a large amount of 
queries by repeatedly clicking on a button.

Servents SHOULD watch queries originating from its neighbours
(Hops=0) If those queries are too frequent, are duplicates or 
indicate bad servents behavior in any other way, the servents SHOULD 
drop those queries or even close the connection.

The TTL value of a new query created by a servent SHOULD NOT be 
higher than 7, and MUST NOT be higher than 10. The hops value MUST be
set to 0.


2.2.7.3 When and how to respond with Query Hit messages.

When a servent receives an incoming Query message it SHOULD match 
the Search Criteria of the query against its local shared files. 

The Search Criteria is text, and it has never been specified which 
charset that text was encoded with.  Therefore, servents MUST assume 
it is pure ASCII only.  If any byte with the 7th bit set (high bit) 
is found, then either there is a GGEP extension specifying the 
encoding used, or the servent SHOULD guess the proper encoding.
Most likely, it will be ISO-latin-1 or UTF-8.

Exactly how to interpret the Search Criteria is not specified 
either, but here are some guidelines for interoperability between 
servents:

The Search Criteria is a string of keywords.  A servent SHOULD only 
respond with files that has all the keywords.  It is RECOMMENDED to 
break up the words on any non-alphanumeric characters (anything but 
letters and numbers).  A space is the standard separator between 
words.

Servents MAY also require that all matching terms be present in the 
same number and order as in the query.

Regular expressions are not supported and common regexp "meta-
characters" such as "*" or "." will either stand for themselves or be
ignored. The matching SHOULD be case insensitive.  Empty queries or 
queries containing only 1-letter words SHOULD be ignored.

GGEP extensions MAY be used to provide details on how to parse the 
Search Criteria (such as specifying that regular expressions matching
should be used), but a servent can never be sure other servents will 
understand the GGEP extension.

Servents MAY ignore queries whose Search Criteria is shorter than 
a chosen length. The reason is to ignore too broad searches.

Query messages with TTL=1, hops=0 and Search Criteria="    " (four
spaces) are used to index all files a host is sharing. Servents 
SHOULD reply to such queries with all its shared files. Multiple 
Query Hit messages SHOULD be used if sharing many files. Allowed 
reasons not to respond to index queries include privacy and 
bandwidth. 

Query Hit messages MUST have the same Message ID as the Query message
it is sent in reply to. The TTL SHOULD be set to at least the hops
value of the corresponding query plus 2, to allow the Query Hit to
take a longer route back, if necessary. The TTL value MUST be at
least the hops value of the corresponding query, and the initial
hops value of the Query Hit message MUST (as usual) be set to 0.
Some servents use a TTL of (2 * Query_TTL + 2) in their replies to
be sure that the reply will reach its destination.  Replies with
high TTL level SHOULD be allowed to pass through.


2.2.8 Push (0x40)

A Push message has the following fields:
Bytes:  Description:

0-15    Servent Identifier. The 16-byte string uniquely identifying 
        the servent on the network who is being requested to push the
        file with index File_Index. The servent initiating the push 
        request MUST set this field to the Servent_Identifier 
        returned in the corresponding QueryHit message. This is 
        used to route the Push message to the sender of the Query 
        Hit message.

16-19   File Index. The index uniquely identifying the file to be 
        pushed from the target servent. The servent initiating the 
        push request MUST set this field to the value of one of the 
        File_Index fields from the Result Set in the corresponding 
        QueryHit message.

20-23   IP Address. The IP address of the host to which the file with
        File_Index should be pushed. This field is in big-endian 
        format.

24-25   Port. The port number the receiver of this message should 
        push to.

26-     OPTIONAL GGEP extension block. (see Section 2.3)

A servent may send a Push message if it receives a QueryHit 
message from a servent that doesn't support incoming connections. 
This might occur when the servent sending the QueryHit message is 
behind a firewall.  When a servent receives a Push message, it SHOULD
act upon the push request if and only if the servent_Identifier field
contains the value of its servent identifier.  The Message_Id field
in the Message Header of the Push message SHOULD not contain the same
value as that of the associated QueryHit message, but SHOULD contain 
a new value generated by the servent's Message_Id generation 
algorithm.

Push messages are forwarded back to the originator of the Query Hits 
message using the Servent Identifier value.  This means multiple Push
messages can have the same Servent Identifier.  Push messages MUST 
only be considered as duplicates if the Message ID in the header is 
the same.  Since Push messages are not broadcasted, duplicate 
messages should be very rare.


2.2.9 Bye (0x02)

The Bye message is an OPTIONAL message used to inform the 
servent you are connected to that you are closing the connection. 

Servents supporting the Bye message MUST indicate that by sending 
the following header in the handshaking sequence:

        Bye-Packet: 0.1

Servents MUST NOT send Bye messages to hosts that has not indicated 
support using the above header.  Future versions will be backwards 
compatible, so Bye messages MAY also be sent to hosts providing the 
above header with a later version number.

A Bye packet MUST be sent with TTL=1 (to avoid accidental propagation
by an unaware servent), and hops=0 (of course).

A servent receiving a Bye message MUST close he connection 
immediately. The servent that sent the packet MUST wait a few 
seconds for the remote host to close the connection before closing 
it.  Other data MUST NOT be sent after the Bye message.  Make sure 
any send queues are cleared. 

The servent that sent by Bye message MAY also call shutdown() with 
'how' set to 1 after sending the Bye message, partially closing the 
connection.  Doing a full close() immediately after sending the Bye 
messages would prevent the remote host from possibly seeing the Bye 
message.

After sending the Bye message, and during the "grace period" when
we don't immediately close the connection, the servent MUST read
all incoming messages, and drop them unless they are Query Hits
or Push, which MAY still be forwarded (it would be nice to the
network).  The connection will be closed as soon as the servent
gets an EOF condition when reading, or when the "grace period"
expires.

A Bye message has the following fields:
Bytes:  Description:

0-1     Code. The presence of the Code allows for automated processing
        of the message, and the regular SMTP classification of error 
        code ranges should apply. Of particular interests are the 
        200..299, 400..499 and 500..599 ranges.
        Here is the general classification ("User" here refers to the 
        remote node that we are disconnecting from):

        2xx     The User did nothing wrong, but the servent chose to 
                close the connection: it is either exiting normally 
                (200), or the local manager of the servent requested 
                an explicit close of the connection (201).

        4xx     The User did something wrong, as far as the servent is
                concerned. It can send packets deemed too big (400), 
                too many duplicate messages (401), relay improper 
                queries (402), relay messages deemed excessively long-
                lived [hop+TTL > max] (403), send too many unknown 
                messages (404), the node reached its inactivity 
                timeout (405), it failed to reply to a ping with TTL=1
                (406), or it is not sharing enough (407).

        5xx     The servent noticed an error, but it is an "internal" 
                one. It can be an I/O error or other bad error (500), 
                a protocol desynchronization (501), the send queue 
                became full (502).

2-      NULL-terminated Description String. The format of the String 
        is the following ( refers to "\r" and  to "\n"):

                Error message, as descriptive as possible

        or optionally, something more qualified, with HTTP-like 
        headers giving out more information:

                Error message, as descriptive as possible
                Server: some server/version
                X-Gnutella-XXX: some specific Gnutella header
                for instance telling the host about alternate
                nodes it could connect to
                

        The presence of a  at the end of message indicates 
        that HTTP-like headers are present.  The absence of any 
         indicates that the short error message form was used.

        Unless circumstances making that impossible (urgent 
        disconnection due to a memory fault), the HTTP-like headers 
        version SHOULD be used, with at least a Server: header, 
        allowing better tracing and debugging.

For further information about the Bye message, please refer to the
original documentation located at:
http://groups.yahoo.com/group/the_gdf/files/Proposals/BYE/


2.3 GGEP Extension blocks

The Gnutella Generic Extension Protocol (GGEP) allows arbitary 
extensions in Gnutella messages. A GGEP block is a framework for 
other extensions. If you wish to implement a new extension to a 
packet, you MUST do so inside GGEP. Some extensions that were 
invented before GGEP (XML metadata for example) are allowed to 
existoutside GGEP. 

Servents are RECOMMENDED to implement GGEP.  However, all servents 
MUST pass on GGEP extension blocks inside Gnutella messages. servents
that have support the forwarding of all packets that contain GGEP 
extensions (whether or not they can process them), MUST include a new
header in the Gnutella 0.6 connection handshake indicating this 
support.  This will allow other servents to know what types of 
packets this servent can accept.  The format of this header is 

GGEP : majorversion.minorversion

As the current version of GGEP is 0.5 when this was written the 
header would be

GGEP: 0.5

Servents SHOULD remove any GGEP blocks from Ping, Pong and Push 
messages before sending those messages to hosts that have not 
indicated GGEP support.

For the original GGEP documentation see
http://groups.yahoo.com/group/the_gdf/files/Proposals/GGEP/


2.3.1 GGEP Format

A GGEP block always starts with a magic byte used to help distinguish
GGEP extensions from legacy data which may exist.  It must be set to 
the value 0xC3.

When a GGEP block is used between the nulls in a result in a Query 
Hits message, it is not allowed to contain any null bytes. This 
requires some special tricks in the field format.

The magic byte is followed by any number of extensions. They SHOULD 
be processed in the order in which they appear. The following is the 
format of each extension:

Bytes used:     Field Name:
1               Flags
1-15            ID
1-3             Data Length
x               Extension Data

Flags:          These are options which describe the encoding of the 
                extension header and data.

                Bit    Name             
                7      Last Extension.  When set, this is the last 
                       extension in the GGEP block.

                6      Encoding.  The value contained in this field 
                       dictates the type of encoding which should be 
                       applied to the extension data (after possible 
                       compression).

                       0 = There is no encoding on the data. 
                       1 = The data is encoded using the COBS scheme.

                       Details about the COBS encoding scheme can be 
                       found at http://www.acm.org/sigcomm/sigcomm97/
                                    papers/p062.pdf  

                5      Compression.  The value contained in this 
                       field dictates the type of compression that 
                       should be applied to the extension data. 

                       0 = The extension data has not been compressed.
                       1 = The extension data should be decompressed 
                           using the deflate algorithm. 

                       One should only compress data if doing so will
                       make a material difference in the resulting 
                       packet size.                       

                       Details about the Deflate compression scheme 
                       may be found at http://www.gzip.org/zlib/
                       and http://www.faqs.org/rfcs/rfc1951.html 

                4      Reserved. This field is currently reserved for
                       future use.  It must be set to 0. 

                3-0    ID Len Value 1-15 can be stored here.  Since 
                       this will not be zero, it ensures this byte 
                       will not be 0x0. 

ID:             The raw binary data in this field is the extension ID.
                The length of this field can range between 1 and 15 
                bytes, and is determined by the Flags field. See 
                section 2.3.2 below on suggestions and rules for 
                creating extension IDs.  No byte in the extension 
                header may be 0x0.

Data Length:    This is the length of the raw extension data.  Please
                note that most Gnutella clients will drop messages, 
                and possibly connections if the message size is 
                larger than a certain threshold (which varies 
                according to message type).  Please pay attention to 
                these limits when creating and bundling new 
                extensions. 

                This field uses an encoding technique that ensures 
                that 0x0 is never the value of any byte.  Steps were 
                also taken to ensure that the encoding is compact. In
                this technique, a length field is the concatenation 
                of length chunks.  The format of each length chunk 
                (which contains 6 bits of length info) is described 
                in bit level below:

                Format:
                76543210
                MLxxxxxx
                        
                M = 1 if there is another length chunk in the 
                sequence, else 0

                L = 1 if this is the last length chunk in the 
                sequence, else 0

                xxxxxx = 6 bits of data.

                01aaaaaa ==> aaaaaa (2^6 values = 0-63)

                10bbbbbb 01aaaaaa ==> bbbbbbaaaaaa 
                (2^12 values = 0-4095)

                10ccccccc 10bbbbbb 01aaaaaa ==> ccccccbbbbbbaaaaaa 
                (2^18 values = 0-262143)

                Boundary Cases:
                0      =                       01 000000b = 0x40 
                63     =                       01 111111b = 0x7f 
                64     =            10 000001  01 000000b = 0x8140 
                4095   =            10 111111  01 111111b = 0xbf7f 
                4096   = 10 000001  10 000000  01 000000b = 0x818040
                262143 = 10 111111  10 111111  01 111111b = 0xbfbf7f
                
                As you see, when the bits are concatenated, the 
                number is in big endian format.

Extension Data  The actual extension data.  The format of this field 
                varies between extensions.  A servent that does not 
                recognize and extension will not be able to parse the
                Extension data, but since the length of this field is
                specified by Data Length, it can still skip to the 
                next extension.  Note that extensions MAY be empty.


2.3.2 Creating Extension IDs

The Extension ID field in the GGEP header is a binary field 
consisting of between 1 and 15 bytes.  It cannot contain the 
byte 0x0, and one must be able to compare IDs with a simple binary 
comparison.  Aside from those rules, GGEP does not mandate any 
particular format, but does encourage the creation of short IDs that 
are free from conflicts.  One should also note that Extension IDs are
meant to be consumed by machines.  Still, the following rules apply.

GDF Registered Extensions:
Any Extension ID of less than 4 bytes MUST be stored in the 
appropriate GDF database.  Any Extension ID of less than 3 bytes must
also be approved by the GDF.  The format of the extension data must 
also be registered.

VendorID Extensions
This simple technique allows for the creation of ExtensionIDs based 
upon uses the following format VendorID.BinaryID

VendorID for a Gnutella servent is a 4 byte value that has been 
registered in the GDF Peer Codes database.  In the QueryHit 
Descriptor, this case is case insensitive.  With ExtensionIDs, the 
case matters, as one must be able to perform a binary comparison on 
the ID.  This means an ExtensionID of "SWAP.1" and "swap.1" are 
different, but both "belong" the vendor who ones the code "SWAP."

This technique may be good for experimental and strictly vendor-
specific extensions, but should be avoided for extension that may be
useful for other vendors as well. Marking an extension by a vendor ID
makes it harder for other vendors to use the extension in their 
servents.

Extension implementers SHOULD publish the ID, format, and expected
data size for their extensions in the GDF database called 
"GGEP Extensions." located at
http://groups.yahoo.com/group/the_gdf/database?method=reportRows&tbl=10
(Requires GDF membership)


3 Protocol Usage

Apart from the protocol definition in section 2, there are also 
some guidelines on how to use the protocol. These are not absolutely
necessary to participate in the network, but very important for an
effective network.


3.1 Flow Control

It is very important that all servents have a system for regulating 
the data that passes through a connection.

The most simple way is to close a connection if it gets overloaded.
A better way is to drop broadcasted packets to reduce the amount of
bandwidth used.  A much better way is to do the following:

Implement an output queue, listing pending outgoing messages in
FIFO order.  As long as the queue is less than, say, 25% of its
max size (in bytes queued, not in amount of messages), do nothing.
If the queue gets filled above 50%, enter flow-control mode.  You
stay in flow-control mode (FC mode for short) as long as the queue
does not drop below 25%.  This is called "hysteresis".

The queue size SHOULD be at least 150% of the maximum admissible
message size.

In FC mode, all incoming queries on the connection are dropped.
The rationale is that we would not want to queue back potentially
large results for this connection since it has a throughput problem.

Messages to be sent to the node (i.e. queued on the output queue)
are prioritized:

* For broadcasted messages, the more hops the packet has traveled,
  the less prioritary it is.  Or the less hops, the more prioritary.
  This means your own queries are the most prioritary (hops = 0).

* For replies (query hits), the more hops the packet has traveled,
  the more prioritary it is.  This is to maximize network usefulness.
  The packet was relayed by many hosts, so it should not be dropped
  or the bandwidth it used would become truly wasted.

* Individual messages are prioritized thusly, from the most
  prioritary to the least: Push, Query Hit, Pong, Query, Ping.
  The Bye message being special, it is always sent (i.e. the queue
  cannot be in FC mode since it needs to be cleared before sending
  Bye).

Normally, all messages are accepted.  However, when the message to
enqueue would make the queue fill to more than 100% of its maximum
size, any queued message less prioritary in the queue is dropped.
If enough room could be made, enqueue the packet.  Otherwise, if the
message is a Query, a Pong or a Ping, drop it.  If not, send a
Bye 502 (Send Queue Overflow) message.

Other known flow-control algorithms:

SACHRIFC is a simple, but still very effective flow control system 
that drops less important packets first. It can be found at:
http://groups.yahoo.com/group/the_gdf/message/5726

A more advanced flow control system can be found at:
http://www.grouter.net/gnutella/


3.2 Network Structure

[TODO: Ultrapeer is so important that the information required to 
implement it will be included here.]

Originally, all Gnutella nodes were connected to each other randomly.
It worked fine for users with broadband connections, but not for 
users with slow modems. That problem can be solved by organizing the 
network in a more structured form. 


3.2.1 Ultrapeer system

[TODO: Describe ultrapeer system here. Handshaking etc. Reference to QRP when used]
[TODO: Ultrapeer marked pongs: size field = power of 2]

The Ultrapeer system has been found effective for this purpose. It is
a scheme to have a hierarchical Gnutella network by categorizing the 
nodes on the network as leaves and ultrapeers. A leaf keeps only a 
small number of connections open, and that is to ultrapeers. An 
ultrapeer acts as a proxy to the Gnutella network for the leaves 
connected to it. This has an effect of making the Gnutella network 
scale, by reducing the number of nodes on the network involved in 
message handling and routing, as well as reducing the actual traffic 
among them.

An ultrapeer only forwards a query to a leaf if it believes the leaf 
can answer it, and leaves never relay queries between ultrapeers. 
Ultrapeers are connected to each other and to "normal" Gnutella hosts
(hosts that do not implement the Ultrapeer system).

An ultrapeer decides what queries to forward to leaf nodes using the
Query Routing Protocol, QRP, which is described in section 3.2.2 
below. If both an ultrapeer and a leaf node supports another protocol
for deciding which queries are forwarded that MAY be used instead.
QRP routing is not used between ultrapeers/normal hosts.

It is RECOMMENDED that servents implement the Ultrapeer system, or 
any future system for decreasing the bandwidth load on modem users.

For more information please read the original specification at:
http://www.limewire.com/developer/Ultrapeers.html


3.2.1.1 Ultrapeer election

Since Gnutella is a decentralized system, ultrapeers are elected 
without the use of a central server. It is up to each node to 
determine if it is to become an ultrapeer or a shielded leaf node.
First, there are some basic requirements that must be satisfied to 
even consider becoming an ultrapeer.

* Not firewalled.  This can usually be approximated by looking at 
whether the host has received incoming connections. 

* Suitable operating system.  Some operating systems handle large 
numbers of sockets better than others.  Linux, Windows 2000/NT/XP, 
and Mac OS/X will typically make better ultrapeers than Windows 
95/98/ME or Mac Classic. 

* Sufficient bandwidth.  At least 15KB/s downstream and 10KB/s 
upstream bandwidth is recommended.  This can be approximated by 
looking at the maximum upload and download throughput. 

* Sufficient uptime.  Ultrapeers should have long expected uptimes.  
A reasonable heuristic is that the expected future uptime is 
proportional to the current uptime.  That is, nodes should not become
ultrapeers until they have been running at least a few hours.

* Sufficient RAM and CPU speed.  Ultrapeers need memory for storing 
routing tables and CPU speed for outing all incoming queries. Exactly
how much is needed depends how efficiently it is implemented and must
be experimented with.

If the above criterias are met, a node is said to be ultrapeer 
capable.  Note that this is not the same as actually being an 
ultrapeer.

Wheneither an ultrapeer capable node will actually become an 
ultrapeer depends on if there is need for more ultrapeers on the
network, and on how well the above criterias are met.  The need for
ultrapeers can be estimated from the noumber of ultrapeers found, and
can be communicated when new connections are established (see below).


3.2.1.2 Ultrapeer Handshaking

Ultrapeer capatibilities and information is exchanges during the 
handskaking sequence when trying to establishing a new Gnutella
connection (see section 2.1). The following new headers are used:

* X-Ultrapeer: "True" signals that node is an ultrapeer, "False" 
signals that the node wants to be a shielded leaf node.

* X-Ultrapeer-Needed: Used to balance the number of ultrapeers. [TODO: Write more about this one]

* X-Try-Ultrapeers: Like X-Try (see section 2.1), but contains only 
addresses of ultrapeers.

* X-Query-Routing: Signals support for the Query Routing Protocol
(section 3.2.2). The header value is the QRP version (curretly 0.1).

It is important to note that headers can be sent in any order.  
Also, case is ignored in "True" and "False".

Here is a sample interaction where a leaf connects to an ultrapeer.

   Leaf                             Ultrapeer
   -----------------------------------------------------------
   GNUTELLA CONNECT/0.6
   User-Agent: LimeWire/1.0
   X-Ultrapeer: False
   X-Query-Routing: 0.1
   
                                    GNUTELLA/0.6 200 OK
                                    User-Agent: LimeWire/1.0
                                    X-Ultrapeer: False
                                    X-Ultrapeer-Needed: False
                                    X-Query-Routing: 0.1
                                    X-Try: 24.37.144:6346,
                                     193.205.63.22:6346
                                    X-Try-Ultrapeers: 23.35.1.7:6346,
                                     18.207.63.25:6347
                                    
   GNUTELLA/0.6 200 OK
   

   [binary messages]                [binary messages]

The leaf is now a shielded node of the ultrapeer. The leaf should 
drop any non ultrapeer connections and send a QRP routing table 
(assuming QRP is used).

If a shielded leaf node receives a connection request, it will refuse
to accept the connection by returning a 503 error code together with 
X-Try and X-Try-Ultrapeer headers to redirect to remote host to other
addresses. For example, when a leaf tries to connect to another leaf 
it may look like this. Non-essential headers have been removed in 
this and the following examples.

   Leaf1                            Leaf2
   -----------------------------------------------------------
   GNUTELLA CONNECT/0.6
   X-Ultrapeer: False
   
                                    GNUTELLA/0.6 503 I am a leaf
                                    X-Ultrapeer: False
                                    X-Try: 24.37.144:6346
                                    X-Try-Ultrapeers: 23.35.1.7:6346
                                    

                                    [Terminates connection]


Sometimes nodes will be ultrapeer-incapable but unable to find an 
ultrapeer.  In this case, they behave exactly like old, unrouted 
Gnutella 0.4 connections.

   Leaf1                            Leaf2
   -----------------------------------------------------------
   GNUTELLA CONNECT/0.6
   X-Ultrapeer: False
   
                                    GNUTELLA/0.6 200 OK
                                    X-Ultrapeer: False
                                    
   GNUTELLA/0.6 200 OK
   

   [binary messages]                [binary messages]


When two ultrapeers meet, both set X-Ultrapeer: true.  If both have 
leaf nodes, they will remain ultrapeers after the interaction.  Note 
that no QRP route table is sent between ultrapeers after the 
connection is established. Example handshake:

   UltrapeerA                       UltrapeerB
   -----------------------------------------------------------
   GNUTELLA CONNECT/0.6
   X-Ultrapeer: True
   
                                    GNUTELLA/0.6 200 OK
                                    X-Ultrapeer: True
                                    

   GNUTELLA/0.6 200 OK
   

   [binary messages]                [binary messages]


Sometimes there will be too many ultrapeer-capable nodes on the 
network.  Consider the case of an ultrapeer A connecting to an 
ultrapeer B.  If B doesn�t have enough leaves, it may direct A to 
become a leaf node.  If A has no leaf connections, it stops fetching 
new connections, drops any Gnutella 0.4 connections, and sends a QRP 
table to B.  Then B will shield A from all traffic.  If A has leaf c
onnections, it ignores the guidance, as in the above case.

   UltrapeerA                       UltrapeerB
   -----------------------------------------------------------
   GNUTELLA CONNECT/0.6
   X-Ultrapeer: True
   
                                    GNUTELLA/0.6 200 OK
                                    X-Ultrapeer: True
                                    X-Ultrapeer-Needed: False
                                    

   GNUTELLA/0.6 200 OK
   X-Ultrapeer: False
   

   [binary messages]                [binary messages]


3.2.2 Query Routing Protocol

The Query Routing Protocol (QRP for short) is an essential part of
the Ultrapeer specification: it governs how the Ultrapeer will filter
queries and only forward those to the leaf nodes most likely to have
a match.  This is done without even knowing the resource names, by
looking the query words through a big hash table, that is sent by the
leaf node to its Ultrapeer.

The aim of the QRP is to avoid forwarding a query that cannot match,
it is not to forward only those queries that will match.

The overall operation goes thusly:

* At the leaf node level:

  + Break all the resource names into individual words.  A word is
    made of a consecutive sequence of letters and digits.

  + Hash each word with a well-known hash function and insert a
    "present" flag in the corresponding hash table slot.
    Note that this hash table is a big array, and we don't store
    the key, only the fact that a key ended up filling some slot.
    All words are lower-cased and all accents are removed from
    them, i.e. "d�j�" is transformed into "deja", so that only
    ASCII characters remain.  Only those words that are made of
    at least 3 letters are retained.

  + All words are re-hashed with their trailing 1, 2, or 3 letters
    removed, provided the word length after such trimming is at
    least 3 letter long.  This is a simple attempt to remove plural
    from words.  Optionally, nodes can chop off more letters from the
    end, provided that each hashed word is at least 3 character long.

  + The "boolean vector" built at later stage is optionally 
    compressed, broken up in small messages, and sent mixed with 
    regular Gnet traffic to the ultrapeer.

* At the Ultrapeer level:

  + Until the whole "boolean vector" is received from a leaf node, 
    all queries are forwarded to that node.

  + When the "boolean vector" is fully received, it is going to be
    used as the Query Routing table for that leaf node: queries are
    broken into individual words, all accentuated letters are 
    removed.

  + For each leaf node with a Query Routing table:

    . Each word is then hashed and looked up in the Query Routing
      table.
  
    . Depending on the query matching rules (see 2.2.7.3), either ALL
      the words will be required to be found in the Query Routing, or
      only some of them, to declare a Query Routing Hit.

    . Only those queries that were declared a Hit at the previous
      stage will be forwarded to a given leaf node.

The remaining sections define the hashing function, the mechanism
used to build up the "boolean vector" and compress it, the protocol
to transmit the vector to the Ultrapeer, and finally give operating
hints for the table sizing.

[TODO: finish the QRP description --RAM]

The Query Routing Protocol (QRP) used in Ultrapeer can be found at:
http://www.limewire.com/developer/query_routing/keyword%20routing.htm


4 File Transfer

4.1 Normal File Transfer

Once a servent receives a QueryHit message, it may initiate the 
direct download of one of the files described by the message's Result
Set. Files are downloaded out-of-network i.e. a direct connection 
between the source and target servent is established in order to 
perform the data transfer. File data is never transferred over the 
Gnutella network.

The file download protocol is HTTP. It is RECOMMENDED to use HTTP 1.1
(RFC 2616), but HTTP 1.0 (RFC 1945) can be used instead. The full 
specifications are available in those RFCs. The following includes 
only the basic things. The following examples assumes that HTTP 1.1 
is used. 

The servent initiating the download sends a request string on the 
following form to the target server:

    GET /get// HTTP/1.1
    User-Agent: Gnutella
    Host: 123.123.123.123:6346
    Connection: Keep-Alive
    Range: bytes=0-
    

where  and  are one of the File Index/File 
Name pairs from a QueryHit message's Result Set. For example, if the 
Result Set from a QueryHit message contained the entry

    File Index: 2468
    File Size: 4356789
    File Name: Foobar.mp3

then a download request for the file described by this entry would be
initiated as follows:

    GET /get/2468/Foobar.mp3 HTTP/1.1
    User-Agent: Gnutella
    Host: 123.123.123.123:6346
    Connection: Keep-Alive
    Range: bytes=0-
    

Servents MUST encode the filename in GET requests according the 
standard URL/URI encoding rules. Servents MUST accept URL-encoded GET
requests. Since some old servents does not support encoding, servents
SHOULD accept non-encoded requests and MAY try a non-encoded requests
if a 404 Not Found error is returned for the initial request. 

The Host header is required by HTTP 1.1 and specifies what address 
you have connected to. It is usually not used by the receiving 
servent, but its presence is required by the protocol.

The allowable values of the User-Agent string are defined by the HTTP
standard. Servent developers cannot make any assumptions about the 
value here. The use of 'Gnutella' is for illustration purposes only. 

The server receiving this download request responds with HTTP 1.1 
compliant headers such as

    HTTP/1.1 200 OK
    Server: Gnutella
    Content-type: application/binary
    Content-length: 4356789
    

The file data then follows and should be read up to, and including, 
the number of bytes specified in the Content-length provided in the 
server's HTTP response.

Note: Servents SHOULD use HTTP version 1.1 for file transfer, but 
some support only HTTP version 1.0. Servents MUST accept incoming 
HTTP/1.0 requests, and SHOULD retry with HTTP/1.0 if the remote host
is not HTTP/1.1 compliant.

Though it is strongly RECOMMENDED to have full HTTP/1.1 
support, some servents do not. The most important features for 
Gnutella, range requests and Persistent Connections MUST be 
supported. Some old servents, however, do not.

Range requests are on the form

    GET /get/2468/Foobar.mp3 HTTP/1.1
    User-Agent: Gnutella
    Host: 123.123.123.123:6346
    Connection: Keep-Alive
    Range: bytes=4932766-5066083
    

Note that the Range header does not have to specify both start and 
end positions. The response is on the form

    HTTP/1.1 206 Partial Content
    Server: Gnutella
    Content-Type: audio/mpeg
    Content-Length: 133318
    Content-Range: bytes 4932766-5066083/5332732
    

The Connection header tells the remote host if the connection should 
be closed when the transfer is finished or not. "Connection: close" 
means that the connection MUST be closed after the transfer. 
"Connection: Keep-Alive" or no Connection header means the connection
MUST be kept open. The client MAY then issue another request for 
another range or another file. The request MAY be sent before the 
previous transfer is finished. Persistent Connections is described in
section 8.1 of RFC 2616. 

Headers unknown to the servent MUST be quietly ignored.

Servents SHOULD NOT attempt to download multiple files from the same 
source at once. Files SHOULD be locally queued instead.

Servents are also RECOMENDED to use and understand the http extension
described in HUGE. (see Appendix 1)


4.2 Firewalled servents

It is not always possible to establish a direct connection to a 
Gnutella servent in an attempt to initiate a file download. The 
servent may, for example, be behind a firewall that does not permit
incoming connections to its Gnutella port. If a direct connection 
cannot be established, the servent attempting the file download may 
request that the servent sharing the file "push" the file instead. A 
servent can request a file push by routing a Push request back to the
servent that sent the QueryHit message describing the target file.
The servent that is the target of the Push request (identified by the
Servent Identifier field of the Push message) SHOULD, upon receipt
of the Push message, attempt to establish a new TCP/IP connection 
to the requesting servent (identified by the IP Address and Port 
fields of the Push message). If this direct connection cannot be 
established, then it is likely that the servent that issued the Push 
request is itself behind a firewall. In this case, file transfer 
cannot take place by the means of what is described in this document.

If a direct connection can be established from the firewalled servent
to the servent that initiated the Push request, the firewalled 
servent should immediately send the following:

    GIV :/

Where  and  are the values of the 
File Index and Servent Identifier fields respectively from the Push 
request received, and  is the name of the file in the 
local file table whose file index number is . The File 
Name MAY be url/uri encoded. The servent that receives the GIV (the
servent that wants to receive a file) SHOULD ignore the File Index 
and File Name, and request the file it wants to download. The 
servent that sent the GIV MUST allow the client to request any 
file, and not just the one specified in the Push message.  The GET 
request and the remainder of the file download process is identical 
to that described in the section 4.1 (Normal File Transfer) above.

The  is formatted as hexadecimal, and must
be read case-insensitively.  For instance:

    GIV 36:809BC12168A1852CFF5D7A785833F600/Foo.txt
    GIV 124:d51dff817f895598ff0065537c09d503/Bar.html

If the TCP connection is lost during a Push initiated file transfer, 
it is strongly RECOMMENDED that the servent who initiated the TCP 
connection (the servent providing the file) attempt to re-connect. 
That is important, since the servent receiving the file might not be
able to get another Push message to the servent providing the file.


4.3 Busy Servents

Servents whose upload bandwidth is already saturated with transfers 
MAY reject a download request by returning the 503 response code. 
Servents MAY simply have a fixed number of available upload slots, 
but SHOULD use a system that utilizes upload bandwidth better. 
Allowing new downloads as long as 20% of total upload bandwidth is 
unused is one possibility.

Busy servents receiving a Push message SHOULD connect to the host 
requesting a push, and return the 503 Busy code when the remote host 
has requested the file. 

Servents MAY try requesting a download again when the servent 
providing the file returns the busy code, but MUST not do so more 
often than once per minute and file source. That means a Servent 
MUST NOT open new connections to a remote host more than once per 
minute. Servents SHOULD prevent other servents breaking the above 
rule from increasing their chanses to downlaoad a file. This can for 
example be archived by refusing any connection attempts from a 
particular host if a download request has been denies less than 50 
seconds ago, or by adding hosts that request too often to a ban list.

Servents MAY use queuing systems to allow downloaders to stand in 
queue to download a file, but that is outside the scope of this 
document.

If a transfer is interrupted, the serving servent SHOULD keep the 
allocated slot/bandwidth reserved for at least one minute. The 
downloader would then be allowed to reconnect and resume the 
transfer.


4.3.1 Upload queueing

[TODO: Write about Shareaza style upload queues (optional)]
[TODO: Rewrite this (copied from GDF post)]

Clients which support queues send "X-Queue: 0.1", which simply tags 
the request as a candidate for queuing.  If this header is not 
received, the requesting client is assumed to follow normal Gnutella 
behavior in the event of a busy response.

If there is an upload "slot" available, the download begins as 
normal with a 200 or 206 response.  If not, the request is placed at 
the end of the queue and a 503 response is returned with the 
additional X-Queue header, of the form:

X-Queue: position=2,length=5,limit=4,pollMin=45,pollMax=120

Clearly this header includes several pieces of information separated 
by commas in the usual manner.  Every part is optional, and if 
desired it can be broken into multiple headers, etc.  Anyway, the 
parts:

The "position" key indicates the request's position in the queue, 
where position 1 is next in line for an available slot.  
The "length" key indicates the current length of the queue, for 
informational purposes.  Likewise the "limit" key specifies the 
number of concurrent uploads allowed.  All of this information is 
completely optional, and is only used for display within the client.

Finally, "pollMin" and "pollMax" provide hints to the requesting 
client as to how often it should re-request the file (in seconds).  
Requesting more often than pollMin will be seen as flooding, and 
cause a disconnection.  Failing to issue a request before pollMax 
will be seen as a dropped connection.  Once again these items are 
optional and need not be present in the header, in which case a 
default retry interval can be used.

Upon receiving a 503 response with an X-Queue header, the downloader 
displays any information it received to the user and waits for an 
appropriate period before reissuing the request.  The default retry 
period is adjusted to lie comfortably within pollMin and pollMax if 
they were present in the response, which allows a particularly busy 
server to adjust its parameters and reduce load.  When the request 
finally succeeds, it does so in the normal way.

[TODO: End of rewrite this]


4.4 Sharing

Servents that are able to download files MUST also be able to share 
files with others. Servents SHOULD encourage users to share files.

Servents SHOULD attempt to prevent programs that are not able to 
share files from downloading files. This means that servent SHOULD 
not allow uploads to web browsers and download accelerators. The 
User-Agent http header tells what program the remote host is running.
Many servents return a html page instead, telling the user how 
Gnutella works, and where to get a servent.

Servents MUST NOT give precedence to other users using the same 
servent. They MUST answer Query messages and accept file download
requests using the same rules for all servents. Servents MAY,
however, attempt to block servents that do not follow the rules in 
this protocol in way that seriously hurts others experience of the
Gnutella network.

Servents SHOULD, by default, share the directory where downloaded 
files are placed. Servents SHOULD also share new downloaded files 
without waiting for the servent to be restarted. Servents SHOULD 
avoid changing the index numbers of shared files.

Servents MUST NOT share partially downloaded (incomplete) files as if
they were complete. This is often done by using a separate directory 
for incomplete downloads. When the download finishes, the file is 
move to the downloads directory (that should be shared). Partial 
files MAY be shared in a way the makes it clear to other servents 
that the file is incomplete. 


5 Security Considerations

[TODO: This section is very incomplete. Any suggestions are welcome.]


5.1 Threats against individual Gnutella participants

[TODO: Write about threats against individual Gnutella participants Such as flooding, 
fake files, DoS, etc. Flooding hostcache with faked pongs]
[TODO: How one can protect oneself and other gnet users]

Inexperienced users might share sensitive information, such as cookie
or password files, on the Gnutella network. Servents SHOULD warn 
users who try share such information.

Malicious file, such as viruses and trojans, might be shared on the 
Gnutella network by malicious or unexpecting users. Servents SHOULD 
encourage users to scan downloaded files for viruses etc. but this is
outside the scope of the Gnutella protocol.


5.2 Threats against the Gnutella network

[TODO: Write about threats against the Gnutella network
Such as query flooding, fake files, DoS, fake random pongs etc.]
[TODO: What a servent should do to protect the network]

[TODO: A solution is fair sharing of bandwidth between connections and message types]

5.3 Threats against third parties

[TODO: Write about threats against third parties. Such (D)DoS, etc.]
[TODO: How to avoid]

Would it be possible to use the power of the Gnutella network to 
attack internet hosts? That issue will be discussed in this section.

The ways of doing so an attacker might try is:

    * Responding to Ping messages with Pong messages containing the
      IP address and port of the target host. This would cause other
      Gnutella servents to attempt to connect to the target host, 
      thinking it is a Gnutella host.
      [TODO: Can this really be an effective attack? Would the target receive that many connection attempts?]

    * Responding to Query messages with Query Hit messages containing 
      the IP address and port of the target host. 
      [TODO: This might actually be an issue. How about recommending servents to attempt to drop faked QHs somehow?]

    * Responding to Query Hit messages with Push messages containing 
      the IP address and port of the target host. This would cause 
      the servent receiving the Push message to attempt a connection
      to the target host. It would, however, not be more than one 
      connection per Push message, so it could not be used to launch 
      large Denial-of-Service attacks. 
      [TODO: Is that correct, or is it a real threat?]


6 Credits

The authors would like to thank:
    [TODO fill]

New features mentioned in this document, not present in the original 
0.4 Gnutella specification document, published by the now defunct 
Clip2 company should be credited thusly:

    0.6 Handshaking Protocol            LimeWire LLC
    X-Try Header                        Mike Green
    Bye Packet                          Raphael Manfredi
    Pong Caching                        LimeWire LLC and others
    EQHD Block                          Free Peers Inc.
    GGEP                                Jason Thomas
    HUGE                                Gordon Mohr
    Ultrapeers                          LimeWire LLC
    Query Routing Protocol              LimeWire LLC
    XML queries                         LimeWire LLC
    Negotiation of Gnet Compression     Raphael Manfredi

    [TODO missing?]


Appendix 1: HUGE (Hash/URN Gnutella Extensions)

HUGE is used to provide capability for hashes (numbers uniquely 
identifying files) and other urn:s to Gnutella. HUGE SHOULD be 
implemented inside GGEP (Section 2.3), but can also be used as a 
stand-alone extension block. When inside GGEP, the GGEP extension-
identifier for HUGE info is 'u'. When used as a stand-alone 
extension, HUGE blocks start with "urn:". There MAY be multiple HUGE 
blocks in one Gnutella message (separated by 0x1C bytes).

HUGE also extends the direct file transfer between hosts, to allow
communication or urn:s, and to build "download meshes" that inform
servents of other locations of a file.

The HUGE documentation is available at:
http://groups.yahoo.com/group/the_gdf/files/Proposals/HUGE

Servents are RECOMMENDED to implement HUGE.


Appendix 2: XML

XML blocks can be used to issue rich queries and to include metadata 
about files in query hit messages. XML SHOULD be implemented inside 
GGEP, but can also be used as a stand-alone extension block. 

If there is a "{deflate}" before the XML block it is compressed using
the defalte algorithm. "{plaintext}" or no such prefix means 
uncompressed plaintext XML. Any other prefix mean the XML block is 
compressed using an algorithm that is not known at this time. 

XML blocks can be recognized by them starting with "<" or "{".
(Do not rely on "

Note that although we only specify "deflate" here, the servent MAY
advertise the set of various compression algorithms it knows,
subsequent items being separated by a ",".

And to accept compression, the other side acknowledges by sending:

   Content-Encoding: deflate

The servent just picks the compression scheme it supports amongst
the ones advertised by the remote end in the Accept-Encoding line.
The Content-Encoding MUST contain only one value.

This also means that compression settings is asymmetric: a node can
send compressed data but receive uncompressed data.

Here's an example where both nodes support compression, comments
starting with "--", and ending  removed for clarity:

   GNUTELLA CONNECT/0.6
   Accept-Encoding: deflate       -- OK for reception of compressed data

       GNUTELLA/0.6 200 OK
       Accept-Encoding: deflate   -- I can also receive compressed data
       Content-Encoding: deflate  -- And I will send compressed data

   GNUTELLA/0.6 200 OK
   Content-Encoding: deflate      -- OK, will also compress data

   

Here's an example where compression will only be made on the 
transmission side of the first node (A is the node initiating the 
handshake, B is the node replying):

   GNUTELLA CONNECT/0.6
   Accept-Encoding: deflate      -- OK for reception of compressed data

       GNUTELLA/0.6 200 OK
       Accept-Encoding: deflate  -- I can also receive compressed data
                                 -- I refuse to compress data, sorry

   GNUTELLA/0.6 200 OK
   Content-Encoding: deflate     -- OK, I will compress data sent
                                 -- But I will receive uncompressed data

   B is compressed, flow from B->A is not>

Even though GGEP payloads (see Section 2.3) can be compressed, and 
this information is visible in the GGEP header, it is not advisable 
to decompress those payloads before sending them to the compressing 
layer.  The deflate algorithm does not expand already-compressed data
by a large factor and emits them as clearly marked non-compressible 
data (the overhead is limited to roughly 0.1%). If connection 
compression is widely used on the Gnutella network, individual GGEP 
extensions SHOULD NOT be compressed.