Winsock Programmer’s FAQ Articles: The Lame List |
I have reproduced The Lame List here because it is so valuable. This text is cut-and-pasted directly from Appendix C of version 2.2.2 of the Windows Sockets 2 Application Programming Interface. The list originally started out as a list of complaints by Winsock stack vendors about wrongheaded applications created back when Winsock was new and not as well understood. Despite that, these items are still valuable because newbie Winsockers still make the same wrongheaded mistakes. Avoiding the items on this list will take you a long way along the road toward Winsock guruhood.
This version of the List is slightly different from the original: I have changed some punctuation, minor bits of phrasing, etc. And, of course, I have added all the pretty HTML formatting.
Keith Moore of Microsoft gets the credit for starting this, but other folks have begun contributing as well. Bob Quinn, from sockets.com, is the kind soul who provided the elaborations on why these things are lame and what to do instead. This is a snapshot of the list as we went to print (plus a few extras thrown in at the last minute).
Brought to you by The Windows Sockets Vendor Community
connect()
on a non-blocking socket, getting
WSAEWOULDBLOCK
, then immediately calling recv()
and expecting WSAEWOULDBLOCK
before the connection has
been established. Lame.recv()
. Lame assumption.WSAEWOULDBLOCK
error value, but must not depend on occurrence of the error.
select()
with three empty fd_set
s
and a valid TIMEOUT
structure as a sleazy delay
function. Inexcusably lame.select()
function is intended as a network
function, not a general purpose timer.
connect()
on a non-blocking socket to
determine when the connection has been established. Dog
lame.connect()
when a non-blocking connection is pending, so the
error value returned may vary.select()
function (but see item 23).send()
or recv()
is even more lame than
polling on connect()
.
Alternative: Use a small non-zero timeout. Better yet, use
asynchronous notification instead of using select()
.
WSAAsyncSelect()
with
a zero Event mask just to make the socket
non-blocking. Lame. Lame. Lame. Lame. Lame.WSAAsyncSelect()
is designed to allow an application
to register for asynchronous notification of network events. The
Winsock 1.1 specification didn’t specify an error for a zero
event mask, but may interpret it as an invalid input argument
(so it may fail with WSAEINVAL
), or silently ignore
the request.ioctlsocket(FIONBIO)
. That’s what it’s for.
SO_OOBINLINE
, nor read OOB data. Violently
lame.
Reason: It is not uncommon for Telnet servers to generate urgent
data, like when a Telnet client will send a Telnet BREAK command
or Interrupt Process command. The server then employs a "Synch"
mechanism which consists of a TCP Urgent notification coupled with
the Telnet DATA MARK command. If the telnet client doesn’t
read the urgent data, then it won’t get any more normal data.
Not ever, ever, ever, ever.
Alternative: Every telnet client should be able to read and/or detect
OOB data. They should either enable inline OOB data by calling
setsockopt(SO_OOBINLINE)
, or use WSAAsyncSelect()
(or WSAEventSelect()
) with FD_OOB
or select()
using except_fds
to detect OOB data arrival, and call
recv()
/WSARecv()
with MSG_OOB
in response.
Reason and Alternative: See item 4.
Reason: Winsock applications that don’t close sockets, and
call WSACleanup()
, may not allow a Winsock implementation
to reclaim resources used by the application. Resource leakage
can eventually result in resource starvation by all other Winsock
applications (i.e. network system failure).
Alternative: While a blocking API is in progress in a 16-bit Winsock
1.1 application, the proper way to abort is to:
WSACancelBlockingCall()
WSAEINTR
error, but applications must
also be prepared for success, due to the race condition involved
with cancellation.
shutdown()
with the how equal to 1
recv()
until it returns 0 or fails with any error
closesocket()
WSACleanup()
This procedure is not relevant to 32-bit Winsock 2 applications,
since they really block, so calling WSACancelBlockingCall()
from the same thread is impossible. (Therefore, this call is
deprecated under Winsock 2.) However, the shutdown procedure above
is still useful.
Reason: TCP can’t do Out of Band (OOB) data reliably. If
that isn’t enough, there are incompatible differences
in the implementation at the protocol level (in the
urgent pointer offset). Berkeley (BSD) Unix implements
RFC 793
literally, and many others implement the corrected RFC 1122 version. (Some
versions also allow multiple OOB data bytes by using the start of
the MAC frame as the starting point for the offset.) If two TCP
hosts have different OOB versions, they cannot send OOB data to
each other.
Alternative: Ideally, you can use a separate socket for urgent data,
although in reality it is inescapable sometimes. Some protocols
require it (see item 7), in which case you
need to minimize your dependence, or beef up your technical support
staff to handle user calls.
strlen()
on a hostent structure’s ip address,
then truncating it to four bytes, thereby overwriting part of
malloc()
’s heap header. In all my years of observing
lameness, I have seldom seen something this lame.
Reason: This doesn’t really need a reason, does it?
Alternative: Clearly, the only alternative is a brain
transplant.
recv(MSG_PEEK)
to determine when
a complete message has arrived. Thrashing in a sea of
lameness.
Reason: A stream socket (TCP) does not preserve message boundaries
(see item 20). An application that uses
recv(MSG_PEEK)
or ioctlsocket(FIONREAD)
to wait
for a complete message to arrive, may never succeed. One reason
might be the internal service provider’s buffering; if the
bytes in a "message" straddle a system buffer boundary, the Winsock
may never report the bytes that exist in other buffers.
Alternative: Don’t use peek reads. Always read data into your
application buffers, and examine the data there.
Reason: Winsock implementations often check buffers for
readability or writability before using them to avoid Protection
Faults. When a buffer length is longer than the actual buffer
length, this check will fail, so the function call will fail with
WSAEFAULT
.
Alternative: Always pass a legitimate buffer length.
WSAStartup()
and WSACleanup()
. Pushing the
lameness envelope.
Reason: This is not illegal, as long as each WSAStartup()
has a matching call to WSACleanup()
, but it is more work
than necessary.
Alternative: In a DLL, custom control or class library, it is
possible to register the calling client based on a unique task
handle or process ID. This allows automatic registration without
duplication. Automatic de-registration can occur when a process
closes its last socket. This is even easier if you use the process
notification mechanisms available in the 32-bit environment.
Reason: Error values are your friends! When a function fails, the
error value returned by WSAGetLastError()
or included in an
asynchronous message can tell you why it failed. Based on
the function that failed, and the socket state, you can often infer
what happened, why, and what to do about it.
Alternative: Check for error values, and write your applications to
anticipate them, and handle them gracefully when appropriate. When
a fatal error occurs, always display an error message that
shows:
recv(MSG_PEEK)
in response to an
FD_READ
async notification message. Profoundly
lame.
Reason: It’s redundant. It’s redundant.
Alternative: Make a plain recv()
call in response
to an FD_READ
message. Even if it fails with
WSAEWOULDBLOCK
, that error is easy to ignore, and you
are guaranteed to get another FD_READ
message later
since there is data pending.
FALSE
. Floundering in an endless desert of
lameness.
Reason: One of the primary purposes of the blocking hook function
was to provide a mechanism for an application with a pending
blocking operation to yield. By returning FALSE
from the
blocking hook function, you defeat this purpose and your application
will prevent multitasking in the non-preemptive 16-bit Windows
environment. This may also prevent some Winsock implementations
from completing the pending network operation.
Alternative: Typically this hack is done to try to prevent reentrant
messages. There are better ways to do this, like subclassing the
active window, although, admittedly, preventing reentrant messages
is not an easy problem to avoid.
Note that this is not an issue for Winsock 2 applications, since
blocking hooks are now a thing of the past! (Good riddance.)
Reason: By definition, client applications actively initiate a
network communication, unlike server applications which passively
wait for communication. A server must bind()
to a specific port
which is known to clients that need to use the service, however,
a client need not bind()
its socket to a specific port in
order to communicate with a server.
Not only is it unnecessary for all but a very few application
protocols, it is dangerous for a client to bind()
to a specific
port number. There is a danger in conflicting with another socket
that is already using the port number, which would cause the call
to bind()
to fail with WSAEADDRINUSE
.
Alternative: Simply let the Winsock implementation assign the local
port number implicitly when you call connect()
(on stream or
datagram sockets), or sendto()
(on datagram sockets).
Reason: The Nagle algorithm reduces trivial network traffic. In a
nutshell, the algorithm says don’t send a TCP segment until
either:
A "Nagle challenged application" is one that cannot wait until
either of these conditions occurs, but has such time-critical data
that it must send continuously. This results in wasteful network
traffic.
Alternative: Don’t write applications that depend on the
immediate data echo from the remote TCP host.
Reason: Stream sockets (TCP) are called stream sockets, because
they provide data streams (duh). As such, the largest message size
an application can ever depend on is one-byte in length. No more, no
less. This means that with any call to send()
or recv()
,
the Winsock implementation may transfer any number of bytes less
than the buffer length specified.
Alternative: Whether you use a blocking or non-blocking socket,
on success you should always compare the return from send()
or recv()
with the value you expected. If it is less than
you expected, you need to adjust the buffer length, and pointer,
for another function call (which may occur asynchronously, if you
are using asynchronous operation mode).
WSACleanup()
from their
WEP. Inconceivably lame.
Reason: WEP()
is lame, ergo depending on it is lame. Seriously,
16-bit Windows did not guarantee that WEP()
would always be
called, and the Windows subsystem was often in such a hairy state
that doing anything in WEP()
was dangerous.
Alternative: Stay away from WEP()
.
send()
s and recv()
s. Festering
in a pool of lameness.
Reason: Couple one-byte sends with Nagle disabled, and you have at
best a 40:1 overhead-to-data ratio. Can you say wasted bandwidth? I
thought you could.
As for one-byte receives, think of the effort and inefficiency
involved with trying to drink a Guinness Stout through a hypodermic
needle. That’s about how your application would feel "drinking"
data one-byte at a time.
Alternative: Consider Postel’s RFC 793 words to live by:
"Be conservative in what you do, be liberal in what you accept from
others." In other words, send modest amounts, and receive as much
as possible.
select()
. Self abusively lame.
Reason: Consider the steps involved in using select()
. You need
to use the macros to clear the 3 fd_set
s, then set the
appropriate fd_set
s for each socket, then set the timer,
then call select()
.
Then after select()
returns with the number of sockets that
have done something, you need to go through all the fd_set
s
and all the sockets using the macros to find the event that occurred,
and even then the (lack of) resolution is such you need to infer
the event from the current socket state.
Alternative: Use asynchronous operation mode
(e.g. WSAAsyncSelect()
or WSAEventSelect()
).
gethostbyname()
before calling
inet_addr()
. Words fail to express such all-consuming
lameness.
Reason: Some users prefer to use network addresses rather than
hostnames at times. The Winsock 1.1 specification does not say what
gethostbyname()
should do with an IP address in standard
ASCII dotted IP notation. As a result, it may succeed and do an
(unnecessary) reverse-lookup, or it may fail.
Alternative: With any destination input by a user—which may
be a hostname or dotted IP address—you should call
inet_addr()
first to check for an IP address, and if
that fails call gethostbyname()
to try to resolve it.
Furthermore, in some applications, you may want to explicitly
check the input string for the broadcast address "255.255.255.255,"
since the return value from inet_addr()
for this address is
the same as SOCKET_ERROR
.
Reason: Besides yielding to other applications (see
item 17), blocking hook functions were originally designed to
allow concurrent processing within a task while there was a blocking
operation pending. In Win32, there’s threading.
Alternative: Use threads.
ioctlsocket(FIONREAD)
on a stream socket
until a complete "message" arrives. Exceeds the bounds of earthly
lameness.
Reason and Alternative: See item 12.
Reason: Various networks all have their limitations on maximum
transmission unit (MTU). As a result, fragmentation will occur,
and this increases the likelihood of a corrupted datagram (more
pieces to lose or corrupt). Also, the TCP/IP service providers
at the receiving end may not be capable of re-assembling a large,
fragmented datagram.
Alternative: Check for the maximum datagram size with the
SO_MAX_MSG_SIZE
socket option, and don’t send
anything larger. Better yet, be even more conservative. A max of
8K is a good rule-of-thumb.
Reason: UDP has no reliability mechanisms (that’s why we
have TCP).
Alternative: Use TCP and keep track of your own message boundaries.
Reason: If you can’t figure out the reason, it’s time
to hang up your keyboard.
Alternative: Have a fallback position that uses only base
capabilities for when the extension functions are not present.
Reason: UDP is unreliable. TCP/IP stacks don’t have to tell
you when they throw your datagrams away (a sender or receiver may
do this when they don’t have buffer space available, and a
receiver will do it if they cannot reassemble a large fragmented
datagram.
Alternative: Expect to lose datagrams, and deal. Implement
reliability in your application protocol, if you need it (or use TCP,
if your application allows it).
Copyright owned by the authors of the Lame List items, including, but not necessarily limited to, the people mentioned in the introductory matter at the beginning of this article.
<< How to Use TCP Effectively |
Debugging TCP/IP >> |
Updated Fri Dec 16 2022 12:23 MST | Go to my home page |