Advanced HTTP and TCP proxy configuration

The HTTP protocol runs on top of the TCP protocol, but provides extra information about the message destination. For that reason, the two proxies are configured differently.

HTTP traffic includes the destination host and port for the message, and is sent over a TCP connection to a TCP endpoint, that is, a specific host and port. Typically, the HTTP message specifies the same TCP endpoint as the one to which the underlying TCP connection is made. If you change the client configuration to use an HTTP proxy, the TCP connection is made to a different host and port than the one in the HTTP URLs, which means that the TCP endpoint in the message is different from the endpoint being connected to. For example, if an HTTP request is sent to http://192.0.2.1:8080/operation, the request includes "192.0.2.1:8080" in the "Host" header of the HTTP message that is sent to the TCP port 8080 on host 192.0.2.1.

However, if you configure the HTTP client to use a proxy, the underlying TCP connection goes to the TCP endpoint for the proxy, while the messages still contain the original TCP endpoint. For example, if you configure the client to send its messages to a proxy at 198.51.100.1 port 3128, and the client sends a request for http://192.0.2.1:8080/operation, the message still contains "192.0.2.1:8080" in the "Host" header, and now in the "Request-Line" field also. However, this message is now sent over a TCP connection to the proxy at 198.51.100.1:3128. In this way, the HTTP proxy can receive messages on a single port and can forward those messages to many different services based on the destination information in the message.

Note: The "Host" header was added in HTTP/1.1. HTTP/1.0 connections do not include this header. For this reason, HTTP/1.0 connections that do not pass-through a proxy do not include the host and port for the message. However, HTTP/1.0 messages that are sent to a proxy still contain the destination host and port in the "Request-Line"; therefore, the absence of a "Host" header does not cause a problem for proxies.

To enable a TCP proxy, you change the client configuration from the live system TCP endpoint to the TCP endpoint for the proxy. Unlike HTTP, TCP does not provide the built-in ability to use a proxy. That is, if you connect to a proxy through TCP, no mechanism is defined to communicate the intended final destination to the proxy. The only way for a TCP proxy to allow connections to multiple live systems (that is, to final destinations, or onward endpoints), without knowing what traffic will be sent over those connections, is to listen on a different port for each live system it allows connections to, and to maintain the information about which of its port numbers corresponds to each onward endpoint. The client is then configured with the appropriate proxy port corresponding to each live system that it needs to communicate with. The TCP proxy ports to listen on, and their corresponding onward endpoints, are configured in <forward> statements in the proxy configuration file, RTCP_install_dir/httptcp/registration.xml. In the following example, 198.51.100.1 is the IP address of the proxy. Any traffic sent to port 3333 on the proxy is forwarded to port 80 at www.example.com:

<forward bind ="198.51.100.1:3333" destination="www.example.com:80"/>

You must therefore change the client configuration file whenever you add a new destination for proxy traffic. This restriction does not apply to HTTP proxies.

To understand how port numbers are handled differently in the HTTP proxy and the TCP proxy, assume that you have two services, one at 192.0.2.1:8080 and one at 192.0.2.1:8081, and a proxy that is running on 198.51.100.1. (If the two services differed in IP address rather than in port number, this example would be the same except for the appropriate IP address for each service.) If these two services expect HTTP traffic, a single HTTP proxy port (such as 3128) is opened, and requests for both TCP endpoints can be sent to that port. When the HTTP proxy sees that a message is addressed to 192.0.2.1:8080, the proxy either redirects the message to that address or applies any rules that it has for that service. The same procedure applies to 192.0.2.1:8081, using the same proxy port.

If these two services instead expect TCP traffic, two TCP proxy ports must be opened, defined by two <forward> elements in the configuration file:

<forward bind ="198.51.100.1:3333" destination="192.0.2.1:8080"/>
<forward bind ="198.51.100.1:3334" destination="192.0.2.1:8081"/>

The client configuration for the first service changes from "192.0.2.1:8080" to "198.51.100.1:3333" and for the second service from "192.0.2.1:8081" to "198.51.100.1:3334". The client sends a message (TCP packet) to the first service at 198.51.100.1:3333. The proxy receives it on that port (3333), but does not know what data is being sent over that TCP connection. All it knows is that the connection was made to port 3333. Therefore the proxy consults its configuration and sees that traffic to that port must be forwarded to 192.0.2.1:8080 (or that a rule for that service must be applied to it).

If you cannot route all of your HTTP traffic through a proxy server because the client configuration does not support HTTP proxy configuration, you must use a reverse HTTP proxy. In a reverse HTTP proxy, you change the destination URL instead of configuring a proxy. This process is similar to that for setting up a TCP proxy in that you specify the proxy as the TCP endpoint for the message in the client system and create a forwarding rule in the proxy. The difference is that you add a type attribute to the rule that specifies HTTP, as in the following example:

<forward bind ="198.51.100.1:3333" destination="192.0.2.1:8080" type="HTTP"/>

Now that the proxy server is configured to receive only HTTP traffic on the designated port (3333 in the example), the server can apply the richer filtering that is available from the HTTP proxy to messages that are addressed to stubs. For example, the server can filter out traffic to the stub that does not have a certain path in its URL, or that does not use a certain HTTP method, such as POST. However, because a stub is not always running, the server still needs the destination from the <forward> element to be able to send traffic to the live system. For example, assume a client needs to connect to a service on 192.0.2.1:8080 and uses a reverse HTTP proxy on 198.51.100.1:3333. Before the client can use the proxy, the client configuration for that service must be changed from a URL such as http://192.0.2.1:8080/operation to http://198.51.100.1:3333/operation. A request that is sent to that new URL reaches the proxy. The request message contains the TCP endpoint for the proxy (198.51.100.1:3333) in the "Host" header rather than the address of the live system because the client is not aware that it is sending the message to a proxy rather than a normal server. This simplified client role defines the nature of a reverse proxy. Thus the proxy uses the <forward> elements to know that a request that comes in on port 3333 requires one of the following actions:

The request must be redirected to the live system at 192.0.2.1:8080, and the Host header in the message must be updated to specify that live system.
Any rules for that service must be applied to the message, such as routing it to a stub instead.

In conclusion, for efficiency and ease of configuration, use the standard HTTP proxy whenever possible. When you cannot, use the reverse proxy. Use the TCP proxy when you work with TCP traffic that is not HTTP.