Winsock Programmer’s FAQ: The Lame List

The Lame List

Introduction

I have reproduced The Lame List here because it is so valuable. This text is cut-and-pasted directly from Appendix C of version 2.2.2 of the Windows Sockets 2 Application Programming Interface. The list originally started out as a list of complaints by Winsock stack vendors about wrongheaded applications created back when Winsock was new and not as well understood. Despite that, these items are still valuable because newbie Winsockers still make the same wrongheaded mistakes. Avoiding the items on this list will take you a long way along the road toward Winsock guruhood.

This version of the List is slightly different from the original: I have changed some punctuation, minor bits of phrasing, etc. And, of course, I have added all the pretty HTML formatting.

The original introduction to the List:

Keith Moore of Microsoft gets the credit for starting this, but other folks have begun contributing as well. Bob Quinn, from sockets.com, is the kind soul who provided the elaborations on why these things are lame and what to do instead. This is a snapshot of the list as we went to print (plus a few extras thrown in at the last minute).

The Windows Sockets Lame List
(or What’s Weak This Week)

Brought to you by The Windows Sockets Vendor Community

Calling connect() on a non-blocking socket, getting WSAEWOULDBLOCK, then immediately calling recv() and expecting WSAEWOULDBLOCK before the connection has been established. Lame.

Reason: This assumes that the connection will never be established by the time the application calls recv(). Lame assumption.

Alternative: Don’t do that. An application using a non-blocking socket must handle the WSAEWOULDBLOCK error value, but must not depend on occurrence of the error.
Calling select() with three empty fd_sets and a valid TIMEOUT structure as a sleazy delay function. Inexcusably lame.

Reason: The select() function is intended as a network function, not a general purpose timer.

Alternative: Use a legitimate system timer service.
Polling with connect() on a non-blocking socket to determine when the connection has been established. Dog lame.

Reason: The Winsock 1.1 spec does not define an error for connect() when a non-blocking connection is pending, so the error value returned may vary.

Alternative: Using asynchronous notification of connection completion is the recommended alternative. An application that prefers synchronous operation mode could use the select() function (but see item 23).

Non-Alternative: Changing a non-blocking socket to blocking mode to block on send() or recv() is even more lame than polling on connect().
Assuming socket handles are always less than 16. Mired in a sweaty mass of lameness.

Reason: The only invalid socket handle value is defined by the winsock.h file as INVALID_SOCKET. Any other value the SOCKET type can handle is fair game, and an application must handle it. In any case, socket handles are supposed to be opaque, so applications shouldn’t depend on specific values for any reason.

Alternative: Expect a socket handle of any value, including 0. And don’t expect socket handle values to change with each successive call to socket() or WSASocket(). Socket handles may be reused by the Winsock implementation.
Polling with select() and a zero timeout in Win16’s non-preemptive environment. Nauseatingly lame.

Reason: With any non-zero timeout, select() will call the current blocking hook function, so an application anticipating an event will yield to other processes executing in a 16-bit Windows environment. However, with a zero timeout an application will not yield to other processes, and may not even allow network operations to occur (so it will loop forever).
Alternative: Use a small non-zero timeout. Better yet, use asynchronous notification instead of using select().
Calling WSAAsyncSelect() with a zero Event mask just to make the socket non-blocking. Lame. Lame. Lame. Lame. Lame.

Reason: WSAAsyncSelect() is designed to allow an application to register for asynchronous notification of network events. The Winsock 1.1 specification didn’t specify an error for a zero event mask, but may interpret it as an invalid input argument (so it may fail with WSAEINVAL), or silently ignore the request.

Alternative: To make a socket non-blocking without registering for asynchronous notification, use ioctlsocket(FIONBIO). That’s what it’s for.
Telnet applications that neither enable SO_OOBINLINE, nor read OOB data. Violently lame.

Reason: It is not uncommon for Telnet servers to generate urgent data, like when a Telnet client will send a Telnet BREAK command or Interrupt Process command. The server then employs a "Synch" mechanism which consists of a TCP Urgent notification coupled with the Telnet DATA MARK command. If the telnet client doesn’t read the urgent data, then it won’t get any more normal data. Not ever, ever, ever, ever.

Alternative: Every telnet client should be able to read and/or detect OOB data. They should either enable inline OOB data by calling setsockopt(SO_OOBINLINE), or use WSAAsyncSelect() (or WSAEventSelect()) with FD_OOB or select() using except_fds to detect OOB data arrival, and call recv()/WSARecv() with MSG_OOB in response.
Assuming 0 is an invalid socket handle value. Uncontrollably lame.

Reason and Alternative: See item 4.
Applications that don’t properly shut down when the user closes the main window while a blocking API is in progress. Totally lame.

Reason: Winsock applications that don’t close sockets, and call WSACleanup(), may not allow a Winsock implementation to reclaim resources used by the application. Resource leakage can eventually result in resource starvation by all other Winsock applications (i.e. network system failure).

Alternative: While a blocking API is in progress in a 16-bit Winsock 1.1 application, the proper way to abort is to:
- Call WSACancelBlockingCall()
- Wait until the pending function returns. If the cancellation occurs before the operation completes, the pending function will fail with the WSAEINTR error, but applications must also be prepared for success, due to the race condition involved with cancellation.
- Close this socket, and all other sockets. Note: the proper closure of a connected stream socket involves:
  - call shutdown() with the how equal to 1
  - loop on recv() until it returns 0 or fails with any error
  - call closesocket()
  - Call WSACleanup()
This procedure is not relevant to 32-bit Winsock 2 applications, since they really block, so calling WSACancelBlockingCall() from the same thread is impossible. (Therefore, this call is deprecated under Winsock 2.) However, the shutdown procedure above is still useful.
Out of band data. Intensely lame.

Reason: TCP can’t do Out of Band (OOB) data reliably. If that isn’t enough, there are incompatible differences in the implementation at the protocol level (in the urgent pointer offset). Berkeley (BSD) Unix implements RFC 793 literally, and many others implement the corrected RFC 1122 version. (Some versions also allow multiple OOB data bytes by using the start of the MAC frame as the starting point for the offset.) If two TCP hosts have different OOB versions, they cannot send OOB data to each other.

Alternative: Ideally, you can use a separate socket for urgent data, although in reality it is inescapable sometimes. Some protocols require it (see item 7), in which case you need to minimize your dependence, or beef up your technical support staff to handle user calls.
Calling strlen() on a hostent structure’s ip address, then truncating it to four bytes, thereby overwriting part of malloc()’s heap header. In all my years of observing lameness, I have seldom seen something this lame.

Reason: This doesn’t really need a reason, does it?

Alternative: Clearly, the only alternative is a brain transplant.
Polling with recv(MSG_PEEK) to determine when a complete message has arrived. Thrashing in a sea of lameness.

Reason: A stream socket (TCP) does not preserve message boundaries (see item 20). An application that uses recv(MSG_PEEK) or ioctlsocket(FIONREAD) to wait for a complete message to arrive, may never succeed. One reason might be the internal service provider’s buffering; if the bytes in a "message" straddle a system buffer boundary, the Winsock may never report the bytes that exist in other buffers.

Alternative: Don’t use peek reads. Always read data into your application buffers, and examine the data there.
Passing a longer buffer length than the actual buffer size since you know you won’t receive more than the actual buffer size. Universally lame.

Reason: Winsock implementations often check buffers for readability or writability before using them to avoid Protection Faults. When a buffer length is longer than the actual buffer length, this check will fail, so the function call will fail with WSAEFAULT.

Alternative: Always pass a legitimate buffer length.
Bounding every set of operations with calls to WSAStartup() and WSACleanup(). Pushing the lameness envelope.

Reason: This is not illegal, as long as each WSAStartup() has a matching call to WSACleanup(), but it is more work than necessary.

Alternative: In a DLL, custom control or class library, it is possible to register the calling client based on a unique task handle or process ID. This allows automatic registration without duplication. Automatic de-registration can occur when a process closes its last socket. This is even easier if you use the process notification mechanisms available in the 32-bit environment.
Ignoring API errors. Glaringly lame.

Reason: Error values are your friends! When a function fails, the error value returned by WSAGetLastError() or included in an asynchronous message can tell you why it failed. Based on the function that failed, and the socket state, you can often infer what happened, why, and what to do about it.

Alternative: Check for error values, and write your applications to anticipate them, and handle them gracefully when appropriate. When a fatal error occurs, always display an error message that shows:
- the function that failed
- the Winsock error number, and/or macro
- a short description of the error meaning
- suggestions for how to remedy, when possible
Calling recv(MSG_PEEK) in response to an FD_READ async notification message. Profoundly lame.

Reason: It’s redundant. It’s redundant.

Alternative: Make a plain recv() call in response to an FD_READ message. Even if it fails with WSAEWOULDBLOCK, that error is easy to ignore, and you are guaranteed to get another FD_READ message later since there is data pending.
Installing an empty blocking hook that just returns FALSE. Floundering in an endless desert of lameness.

Reason: One of the primary purposes of the blocking hook function was to provide a mechanism for an application with a pending blocking operation to yield. By returning FALSE from the blocking hook function, you defeat this purpose and your application will prevent multitasking in the non-preemptive 16-bit Windows environment. This may also prevent some Winsock implementations from completing the pending network operation.

Alternative: Typically this hack is done to try to prevent reentrant messages. There are better ways to do this, like subclassing the active window, although, admittedly, preventing reentrant messages is not an easy problem to avoid.

Note that this is not an issue for Winsock 2 applications, since blocking hooks are now a thing of the past! (Good riddance.)
Client applications that bind to a specific port. Suffocating in self lameness.

Reason: By definition, client applications actively initiate a network communication, unlike server applications which passively wait for communication. A server must bind() to a specific port which is known to clients that need to use the service, however, a client need not bind() its socket to a specific port in order to communicate with a server.

Not only is it unnecessary for all but a very few application protocols, it is dangerous for a client to bind() to a specific port number. There is a danger in conflicting with another socket that is already using the port number, which would cause the call to bind() to fail with WSAEADDRINUSE.

Alternative: Simply let the Winsock implementation assign the local port number implicitly when you call connect() (on stream or datagram sockets), or sendto() (on datagram sockets).
Nagle challenged applications. Perilously teetering on the edge of a vast chasm of lameness.

Reason: The Nagle algorithm reduces trivial network traffic. In a nutshell, the algorithm says don’t send a TCP segment until either:
- all outstanding TCP segments have been acknowledged; or
- there’s a full TCP segment ready to send
A "Nagle challenged application" is one that cannot wait until either of these conditions occurs, but has such time-critical data that it must send continuously. This results in wasteful network traffic.

Alternative: Don’t write applications that depend on the immediate data echo from the remote TCP host.
Assuming stream sockets maintain message frame boundaries. Mind bogglingly lame.

Reason: Stream sockets (TCP) are called stream sockets, because they provide data streams (duh). As such, the largest message size an application can ever depend on is one-byte in length. No more, no less. This means that with any call to send() or recv(), the Winsock implementation may transfer any number of bytes less than the buffer length specified.

Alternative: Whether you use a blocking or non-blocking socket, on success you should always compare the return from send() or recv() with the value you expected. If it is less than you expected, you need to adjust the buffer length, and pointer, for another function call (which may occur asynchronously, if you are using asynchronous operation mode).
16-bit DLLs that call WSACleanup() from their WEP. Inconceivably lame.

Reason: WEP() is lame, ergo depending on it is lame. Seriously, 16-bit Windows did not guarantee that WEP() would always be called, and the Windows subsystem was often in such a hairy state that doing anything in WEP() was dangerous.

Alternative: Stay away from WEP().
Single byte send()s and recv()s. Festering in a pool of lameness.

Reason: Couple one-byte sends with Nagle disabled, and you have at best a 40:1 overhead-to-data ratio. Can you say wasted bandwidth? I thought you could.

As for one-byte receives, think of the effort and inefficiency involved with trying to drink a Guinness Stout through a hypodermic needle. That’s about how your application would feel "drinking" data one-byte at a time.

Alternative: Consider Postel’s RFC 793 words to live by: "Be conservative in what you do, be liberal in what you accept from others." In other words, send modest amounts, and receive as much as possible.
select(). Self abusively lame.

Reason: Consider the steps involved in using select(). You need to use the macros to clear the 3 fd_sets, then set the appropriate fd_sets for each socket, then set the timer, then call select().

Then after select() returns with the number of sockets that have done something, you need to go through all the fd_sets and all the sockets using the macros to find the event that occurred, and even then the (lack of) resolution is such you need to infer the event from the current socket state.

Alternative: Use asynchronous operation mode (e.g. WSAAsyncSelect() or WSAEventSelect()).
Applications that call gethostbyname() before calling inet_addr(). Words fail to express such all-consuming lameness.

Reason: Some users prefer to use network addresses rather than hostnames at times. The Winsock 1.1 specification does not say what gethostbyname() should do with an IP address in standard ASCII dotted IP notation. As a result, it may succeed and do an (unnecessary) reverse-lookup, or it may fail.

Alternative: With any destination input by a user—which may be a hostname or dotted IP address—you should call inet_addr() first to check for an IP address, and if that fails call gethostbyname() to try to resolve it.

Furthermore, in some applications, you may want to explicitly check the input string for the broadcast address "255.255.255.255," since the return value from inet_addr() for this address is the same as SOCKET_ERROR.
Win32 applications that install blocking hooks. Grossly lame.

Reason: Besides yielding to other applications (see item 17), blocking hook functions were originally designed to allow concurrent processing within a task while there was a blocking operation pending. In Win32, there’s threading.

Alternative: Use threads.
Polling with ioctlsocket(FIONREAD) on a stream socket until a complete "message" arrives. Exceeds the bounds of earthly lameness.

Reason and Alternative: See item 12.
Assuming that a UDP datagram of any length may be sent. Criminally lame.

Reason: Various networks all have their limitations on maximum transmission unit (MTU). As a result, fragmentation will occur, and this increases the likelihood of a corrupted datagram (more pieces to lose or corrupt). Also, the TCP/IP service providers at the receiving end may not be capable of re-assembling a large, fragmented datagram.

Alternative: Check for the maximum datagram size with the SO_MAX_MSG_SIZE socket option, and don’t send anything larger. Better yet, be even more conservative. A max of 8K is a good rule-of-thumb.
Assuming the UDP transmissions (especially multicast transmissions) are reliable. Sinking in a morass of lameness.

Reason: UDP has no reliability mechanisms (that’s why we have TCP).

Alternative: Use TCP and keep track of your own message boundaries.
Applications that require vendor-specific extensions, and cannot run (or worse yet, load) without them. Stooping to unspeakable depths of lameness.

Reason: If you can’t figure out the reason, it’s time to hang up your keyboard.

Alternative: Have a fallback position that uses only base capabilities for when the extension functions are not present.
Expecting errors when UDP datagrams are dropped by the sender, receiver, or any router along the way. Seeping lameness from every crack and crevice.

Reason: UDP is unreliable. TCP/IP stacks don’t have to tell you when they throw your datagrams away (a sender or receiver may do this when they don’t have buffer space available, and a receiver will do it if they cannot reassemble a large fragmented datagram.

Alternative: Expect to lose datagrams, and deal. Implement reliability in your application protocol, if you need it (or use TCP, if your application allows it).

Copyright owned by the authors of the Lame List items, including, but not necessarily limited to, the people mentioned in the introductory matter at the beginning of this article.

The Lame List

Introduction

The original introduction to the List:

The Windows Sockets Lame List (or What’s Weak This Week)

The Windows Sockets Lame List
(or What’s Weak This Week)