The TCP/IP Protocol Model

how_tcp-ip_works_

Many works on TCP/IP networking begin with a discussion of the seven-layer Open Systems Interconnection (OSI) model and then map the TCP/IP model to the OSI model. This approach fails because TCP/IP does not neatly map into the OSI model and because many application models do not map to the OSI application layers. If the OSI model fails to correctly reflect the nature of TCP/IP, what is the best model? Over the years, different authors have described TCP/IP with three-, four-, and five-layer models. Because it most accurately reflects the nature of TCP/IP, the following discussion of TCP/IP uses the four-layer model as depicted in Figure 1.4.

Figure 1.4 : The four-layer TCP/IP model.
Figure 1.4 : The four-layer TCP/IP model.

The TCP/IP protocols have their roots in the Network Control Program (NCP) protocol of the early ARPAnet. In the early 1970s, the Transmission Control Protocol (TCP) emerged without the IP layer. Thus the three-layer model, consisting of the physical layer, transport layer, and application layer, was created. By the time OSI published its seven-layer model in 1977, IP was a separate layer, the Network layer. Some publications add a Data Link layer between the Network layer and Physical layer. However, such an expansion is not necessary because the TCP/IP model treats the Physical and Application layers as open-ended, thereby enabling each layer to have several different models. The following sections explore the layers of the TCP/IP model in more depth.

In addition to describing the flow of information, the four-layer model (physical, network, transport, and application) also organizes the large number of protocols into a structure that makes them easier to understand. In Internet jargon, the Request For Comment (RFC) describes a protocol. Today, more than 1,900 RFCs describe protocols, provide information, or are tutorials. In part, the success of the Internet is a result of the open nature of its organization. The Internet Activities Board (IAB) provides overall direction and management (see Figure 1.5 for the organizational chart), and task forces manage the development of new protocols. The Internet Research Task Force (IRTF) deals with the long-term direction of the Internet, and the Internet Engineering Task Force (IETF) handles implementation and engineering problems. The Internet Engineering Steering Group (IESG) coordinates the activities of the many working groups. Membership is not controlled by any organization; instead, anyone who so desires can participate. Anyone can submit an RFC, and anyone can participate in a workgroup. In many ways, the Internet is a grassroots organization that is beyond the control of any organization or government.

f1-5
Figure 1.5 : The organization of the Internet Activities Board.

To get the latest status of the Internet or to obtain a copy of an RFC, the fastest route is via http://www.internic.net. In particular, RFC 1920 (or its replacement) describes the organization of the IAB in more detail, provides the status of RFCs, and gives instructions on how to submit an RFC.

The Physical Layer

As the lowest layer in the stack, the Physical layer has the task of transmitting the IP datagram over the physical media. Unlike the other layers, this layer has specific knowledge of the underlying network. As a result, detailed models of this layer depend on the transmission protocol being used. The sections “LAN Topologies” and “Internetworking-Linking LANs Together” explain these protocols in more detail.

In general, the Physical layer

  • Encapsulates the IP datagram into the frames that are transmitted by the network
  • Maps IP addresses to the physical addresses used by the network (the MAC address for Ethernet, token ring, and FDDI)
  • Performs the operations necessary to transmit the frame over a particular media (such as thick cable, thin cable, telephone wire, or optical fiber)

The Network Layer

The Network layer is sometimes called the Internetworking layer, which is a more descriptive term for this layer’s main function. Three protocols are defined in this layer: the Internet Protocol, Internet Control Message Protocol (ICMP), and the Address Resolution Protocol (ARP). Of these, IP is the most important protocol.

Internet Protocol

RFC 791 defines IP as a connectionless and unreliable protocol. It is connectionless because a connection to the destination host does not have to be made prior to sending an IP datagram. The header of the IP datagram contains sufficient information for it to be transported to its destination over either connectionless or connection-oriented transmission protocols. It is an unreliable protocol because it does not perform any error checking. All error checking, if required, is the responsibility of a higher layer protocol.

What then is the purpose of IP? As the heart of the Internet, the functions of IP include

  • Creating a virtual network for the user
  • Performing fragmentation and reassembly of datagrams
  • Routing datagrams

By creating a virtual network, IP hides the Physical layer and its underlying subnetwork from the end user. The user application needs to know only the destination IP address; IP hides the mechanics of delivery. It does this by fragmenting the packet received from the Transport layer to fit the Protocol Data Unit (PDU) of the transmission protocol. For example, the PDU for Ethernet is 1,500 octets; for SNAP, it is 1,492 octets; for X.25, it is 128 to 256 octets. For each transmission protocol encountered on a datagram’s journey to the destination host, IP makes the fragmentation decision until IP reassembles the packet for the Transport layer at its final destination.

IP also handles routing of the datagram to its destination. IP accomplishes this by passing to the Physical layer the IP address of the next hop, where a hop is the distance between a device and a gateway or the distance between two gateways. The following are the rules IP uses for determining the next hop:

  • For IP addresses on a local network, send the datagram directly to the host.
  • For other addresses, check the routing table for the gateway IP address toward the destination network.
  • For all other addresses, send the datagram to the default gateway.

gateway is any device that connects to two or more networks. The gateway then follows the same rules to make a routing decision. Because gateways make the routing decisions, the movement of the datagram from source to destination is independent of any transmission protocol. In addition, each datagram contains sufficient information to enable it to follow a separate path to the destination.

Internet Control Message Protocol (ICMP)

ICMP (RFC 792) performs the error reporting, flow control, and informational functions for IP. Following are the major characteristics of ICMP:

  • ICMP data units are encapsulated by IP for submission to the Physical layer.
  • ICMP is a required protocol.
  • ICMP does not make IP reliable, because it only reports errors.
  • ICMP reports errors on IP datagrams but not on ICMP data units.
  • ICMP reports an error only on the first IP datagram if the IP datagram is fragmented.
  • ICMP is not required to report errors on datagrams.

ICMP reports the following types of messages:

  • ICMP sends a source quench message to the host or gateway to indicate that the IP buffers are full. However, if the message flows through a gateway, the originating host does not receive the message. This ICMP message goes only to the gateway that is one hop away.
  • ICMP sends a destination unreachable message to the originating host when the net-work is unreachable, the host is unreachable, the protocol is unavailable, or the port is unavailable. (See the section, “The Transport Layer,” for an explanation of ports.)
  • A gateway sends a redirection message to the originating host to tell it to use another gateway.
  • An echo request message can be sent to an IP address to verify that IP is running. The destination responds with an echo reply message. The Ping command uses this type of ICMP message.

Address Resolution Protocol

Address Resolution Protocol (ARP) and Reverse Address Resolution Protocol (RARP) present an interesting problem regarding which layer to assign these protocols to. Because they are used only by the multinode transmission protocols (Ethernet, token ring, and FDDI), ARP and RARP belong to the Physical layer. However, because the transmission protocol encapsulates their packets, they belong to the Network layer. To keep the encapsulation boundary clear, I have included them in the discussion on the Network layer.

ARP (RFC 826) resolves IP addresses to MAC addresses (the “Ethernet LANs” section provides additional information about the MAC address) by broadcasting an ARP request to the attached LAN. The device that recognizes the IP address as its own returns an ARP reply along with its MAC address. The result is stored in the ARP cache. On subsequent requests, the transmission protocol needs to check only the ARP cache. To allow for the dynamic nature of networks, the system removes any ARP entry in the cache that has not been used in the last 20 minutes.

RARP (RFC 903) performs a totally different function. If a device does not know its IP address (such as a diskless workstation), it broadcasts a RARP request asking for an IP address. A RARP server responds with the IP address.

The Transport Layer

The Transport layer provides two protocols that link the Application layer with the IP layer. The TCP provides reliable data delivery service with end-to-end error detection and correction. The User Datagram Protocol (UDP), on the other hand, provides an application with a connectionless datagram delivery service.

 

Applications can use TCP, UDP, or both. The needs of the application determine the choice of protocols.

 

For both TCP and UDP, the port number defines the interface between the Transport layer and the Application layer. The port number is a 16-bit value that identifies a particular application. The services file, in UNIX/etc/services, connects application names with the port numbers and the associated protocol. Following is a sample of the contents of the services file:

# Format:

#

# <service name>    <port number>/<protocol>

#

echo        7/udp

echo        7/tcp

systat      11/tcp

netstat     15/tcp

ftp-data    20/tcp

fdp         21/tcp

telnet      23/tcp

smtp        25/tcp

time        37/tcp

time        37/udp

Port numbers between 1 and 255 are the “well-known services,” which represent the port numbers to common application protocols. The Internet Assigned Numbers Authority assigns the numbers for the well-known services, and RFC 1700 contains the current list of assigned numbers. Port numbers between 256 and 1024 previously referred to UNIX applications, although a number of these now exist on other platforms. BSD UNIX specifies that the numbers below 1024 require root permission. The remaining numbers are assigned to experimental applications and to user-developed applications. The packets sent by TCP and UDP contain both the source port number and the destination port number in the packet header. The section “Sockets and Socket APIs” covers the use of port numbers in more detail.

The Transport layer is stream-oriented and not block-oriented. The Applications layer sends data to the Transport layer byte-by-byte. The Transport layer assembles the bytes into segments and then passes the segments to the Network layer. TCP and UDP use different methods to determine the segment size as explained in the following sections.

Transmission Control Protocol (TCP)

The key to understanding TCP lies in the end-to-end principle. In a nutshell, the principle says that only the ends can be trusted because the carrier is unreliable. The ends, in this case, are the TCP protocol of the source and the destination hosts. Thus any error checking performed by the Physical layer is redundant.

Use TCP when an application must have reliable data delivery.

TCP uses a technique called positive acknowledgment and re-transmission (PAR) to ensure reliability. After waiting for a time-out, the sending host (or source host) retransmits a segment (the name for a TCP packet) unless it receives an acknowledgment from the destination host. The destination host acknowledges only segments that are error free and discards any segments that contain errors. Because an acknowledgment can arrive after the host resends the segment, the receiving host must discard duplicate segments. Also, because segments can arrive out of sequence (because each IP datagram can take a different route between the sending host and destination host) the destination host must resequence the segments.

The stream-oriented nature of TCP means that the responsibility for management of blocks, frames, or records lies in the application.

TCP sequentially numbers each byte in the input data stream and uses this sequence number to track the acknowledgments of data received by the destination host. To initiate the connection and establish the initial sequence number, the source host sends a sync packet to the destination host. The destination host responds with a sync acknowledgment that also contains the initial window size and, optionally, the maximum segment size. The window size represents the maximum amount of data that can be sent before receiving an acknowledgment. Because each acknowledgment contains the window size, the destination controls the flow of information. This form of flow control is separate from the flow control mentioned in the ICMP section. ICMP refers to IP buffers, whereas window size refers to TCP buffers. ICMP flow control affects the device one hop away (which may be a gateway or the source host); window size affects only the source host.

The acknowledgment message contains the highest consecutive octet received as well as the new window size. The destination host does not acknowledge out-of-sequence segments until it receives the intervening segments. After the time-out period elapses, the source host retransmits the unacknowledged portion of the window. This sliding window provides for end-to-end flow control and minimizes IP traffic by acknowledging more than one segment.

Figure 1.6 illustrates the principle of the sliding window. For the sake of simplicity, the segment size used in Figure 1.6 is 1,000 octets. The initial window size is 5,000 octets, as specified in the sync acknowledgment message. The source host transmits segments until the window size is reached. The destination host acknowledges receiving 2,000 contiguous octets and returns a new window size of 5,000, which enables the source host to send another 2,000 octets. Should the time-out period expire before the source host receives another acknowledgment, the source host retransmits the entire 5,000 octets remaining in the window. Upon receiving the duplicate segments, the destination host trashes these duplicates.

Figure 1.6 : TCP sliding window.
Figure 1.6 : TCP sliding window.

Upon receiving segments, the destination host passes the acknowledged segments to the receiving port number as a stream of octets. The application on the receiving host has the job of turning this stream into a format required by the application.

Just as the process to open a connection involves a three-way handshake, the closing process (fin) uses a three-way handshake. Upon receiving a close request from the application, the source host sends a fin request to the destination host, with the sequence number set to that of the next segment. To actually close the connection requires the acknowledgment of all segments sent and a fin acknowledgment from the destination host. The source host acknowledges the fin acknowledgment and notifies the application of the closure.

As I explained, the PAR method used by TCP minimizes the IP traffic over the connection. Furthermore, it does not depend on the reliability of the carrier. On the contrary, a carrier that spends too much time achieving reliability causes TCP to time out segments. For TCP/IP, a well-behaved transmission protocol trashes, rather than corrects, bad datagrams.

User Datagram Protocol (UDP)

UDP (RFC 768) provides an application interface for connectionless, unreliable datagram protocol. As mentioned at the beginning of “The Transport Layer” section, the protocol does not provide a mechanism for determining if the destination received the datagram or if it contained errors.

 

Because UDP has very little overhead, applications not requiring a connection-oriented, reliable delivery service should use UDP instead of TCP. Such applications include those that are message oriented.

The UDP packet reflects its simplicity as a protocol. The UDP packet contains only the beginning and destination port number, the length of the packet, a checksum, and the data. It needs nothing else because UDP treats every packet as an individual entity.

Applications that use UDP include applications that

  • Provide their own mechanism for connections, flow control, and error checking
  • Produce less retransmission overhead
  • Use a query/response model

For each type of application, UDP eliminates unnecessary overhead.

The Application Layer

The Application layer is the richest layer in terms of the number of protocols. This section covers the most popular protocols. The “Routing in an Internetwork” section covers the routing protocols, and “The Client/Server Model” section examines various models for application protocols. Despite this broad coverage, this chapter considers only a sampling of the many application protocols that have an RFC. An even greater number of intranet applications do not have an RFC. Applications need only follow the rules for interfacing to the Transport layer, and the socket APIs (covered in the “Sockets and Socket APIs” section) hide from the programmer the details of this interface.

It is important to remember that every TCP/IP application is a client/server application. When it comes to the well-known applications discussed in the following sections, the server side exists on every major hardware platform. Except for revisions to old application protocols and the development of new application protocols, much of the work in software development for the well-known protocols centers around the client side of the equation. This situation applies even to the wild world of HTTP. The once light Web browser now performs many tasks that once required the support of the server. Client-side image mapping, Java applets, and scripting languages such as JavaScript and Visual Basic make the Web browser the work horse. This transfer of processing to the client side leaves the server free to respond to more requests.

 

Telnet

Telnet is one of the oldest and most complicated of the application protocols. Telnet provides network terminal emulation capabilities. Although telnet standards support a wide variety of terminals, the most common emulation used in the Internet is the vt100 emulation. Besides terminal emulation, telnet includes standards for a remote login protocol, which is not the same as the Berkeley rlogin protocol. The full telnet specification encompasses 40 separate RFCs. The Internet Official Protocol Standards RFC (RFC 1920 as of this writing) contains a separate section that lists all current telnet RFCs. However, telnet clients usually implement a subset of these RFCs.

Telnet remains the primary protocol for remotely logging into a host. Although the terminal emulation features are important, other protocols use the remote login portion of the standard to provide authentication services to the remote host. In addition to the traditional remote login standard, some telnet clients and servers support Kerberos (the security method developed by the MIT Athena Project and used in DCE) to provide a secure login capability.

File Transfer Protocol (FTP)

FTP (RFC 959) is another old-time protocol. It is unusual in that it maintains two simultaneous connections. The first connection uses the telnet remote login protocol to log the client into an account and process commands via the protocol interpreter. The second connection is used for the data transfer process. Whereas the first connection is maintained throughout the FTP session, the second connection is opened and closed for each file transfer. The FTP protocol also enables an FTP client to establish connections with two servers and to act as the third-party agent in transferring files between the two servers.

FTP servers rarely change, but new FTP clients appear on a regular basis. These clients vary widely in the number of FTP commands they implement. Very few clients implement the third-party transfer feature, and most of the PC clients implement only a small subset of the FTP commands. Although FTP is a command-line oriented protocol, the new generation of FTP clients hides this orientation under a GUI environment.

Trivial File Transfer Protocol (TFTP)

The trivial part of the name refers not to the protocol itself, but to the small amount of code required to implement the protocol. Because TFTP does not include an authentication procedure, it is limited to reading and writing publicly accessible files. Consequently, TFTP is a security risk and must never be used to transfer data through a firewall.

TFTP uses the flip-flop protocol to transfer data. In other words, it sends a block of data and then waits for an acknowledgment before sending the next block. It uses UDP and therefore performs its own integrity checks and establishes a very minimal connection. In contrast to the richness of FTP, TFTP defines only five types of packets: read request, write request, data, acknowledgment, and error. Because TFTP is so limited, only a few applications (such as router management software and X-terminal software) use it.

Simple Mail Transfer Protocol (SMTP)

Figure 1.7 : Components of a mail handling system.
Figure 1.7 : Components of a mail handling system.

The word simple in Simple Mail Transfer Protocol refers to the protocol and definitely not to the software required to implement the protocol. Like telnet, many RFCs define SMTP. The two core RFCs are 821 and 822. RFC 821 defines the protocol for transfer of mail between two machines. RFC 822 defines the structure of the mail message. The mail handling system (MHS) model describes the components needed to support e-mail. As shown in Figure 1.7, the MHS consists of the mail transfer agent (MTA), the mail store (MS or mailbox), and the user agent (UA).

The MTA (such as sendmail) receives mail from the UA and forwards it to another MTA. Because the MTA for SMTP receives all mail through port number 25, it makes no distinction between mail arriving from another MTA or from a UA. The MS stores all mail destined for the MTA. The UA reads mail from the MS and sends mail to the MTA.

A UA executing on the MTA host needs only to read the MS as a regular file. However, the network UA requires a protocol to read and manage the MS. The most popular protocol for reading remote mail is the post office protocol version 3 (POP-3) as defined by RFC 1725, which makes RFC 1460 obsolete. So as to enhance the security of POP-3, RFC 1734 describes an optional authentication extension to POP-3. POP-3 transfers all of the mail to the UA and enables the mailbox to be kept or cleared. Because keeping the mail in the MS means that it is downloaded every time, most users choose the clear mailbox option. This option works when you have only one client workstation that always reads the mail. Users who need a more sophisticated approach to mail management should consider the Internet Mail Access Protocol (IMAP). RFC 1732 describes the specifications for version 4 of IMAP. With IMAP, the mail server becomes the mail database manager. This method enables a user to read mail from various workstations and still see one mail database that is maintained on the mail server.

Network News Transfer Protocol (NNTP)

In 1986, the year that Brian Kantor and Phil Lapsley wrote the Network News Transfer Protocol (NNTP) standard (RFC 977), ARPAnet used list servers for news distributions, and Usenet was still a separate UUCP network. With this standard, the door opened to establish Usenet news groups on ARPAnet. The NNTP standard provides for a new server that maintains a news database. The news clients use NNTP to read and send mail to the news server. In addition, the news server communicates with other news servers via NNTP.

Gopher Protocol

The popularity of the Gopher protocol is without question. Because it provides a simple mechanism for presenting textual documents on the Internet, Gopher servers provide access to over one million documents. Nevertheless, RFC 1436, the only Gopher protocol RFC, is an informational RFC.

Gopher uses a hierarchical menu system, which resembles a UNIX file system, to store documents. The Gopher client enables the user to negotiate this menu system and to display or download stored documents. Before the popularity of the Web browser, Gopher clients, as separate applications, were popular. Today, the Web browser is the most common client used to display Gopher documents.

HyperText Transfer Protocol

Although HTTP dates back to 1990, it still doesn’t have an RFC. However, several Internet drafts reflect the “work in progress.” Although the Web uses HTTP, HTTP’s potential uses include name servers and distributed object management systems. HTTP achieves this flexibility by being a generic, stateless, object-oriented protocol. It is generic because it merely transfers data according to the URL and handles different Multipurpose Internet Mail Extension (MIME) types (see the following section for details about MIME types). Because it treats every request as an isolated event, HTTP is stateless.

Although an HTTP server is small and efficient, the client bears the burden of the work. The client processes the HTML-encoded document and manages all the tasks involved in retrieval and presentation of the retrieved objects. The URL and MIME open the doors to a degree of flexibility not found in any other document retrieval protocol.

Multipurpose Internet Mail Extension

The original MIME standard set out to resolve the limitations of the ASCII text messages defined in RFC 822. Using the content header standard (RFC 1049), the MIME standard defines the general content type with a subtype to define a particular message format. MIME is an extensible standard, and RFC 1896 describes the latest extensions. Once again, the client (the UA for an MHS) must take care of building and displaying MIME attachments to mail messages.

The simplicity and extensibility of the MIME standard led it to being used to extend the content of other protocols beyond the limits of ASCII text. The notable uses are in the Gopher protocol and HTTP.

A Short History of the Internet

Most historical reviews of the Internet imply that networking began with ARPAnet. In a sense, digital transmission of data began when Samuel B. Morse publicly demonstrated the telegraph in 1844. In 1874, Thomas Edison invented the idea of multiplexing two signals in each direction over a single wire. With higher speeds and multiplexing, Edison’s teletype replaced Morse’s manual system; and a few teletype installations still exist today.

NOTE
In 1837 both Sir Charles Wheatstone in Great Britain and Samuel B. Morse in the United States announced their telegraphic inventions.

The early telegraph systems were, in modern terms, point-to-point links. As the industry grew, switching centers acted as relay stations and paper tape was the medium that the human routers used to relay information from one link to another. Figure 1.1 illustrates a simple single-layer telegraphic network configuration. Figure 1.2 shows a more complex multilayered network.

f1-1
Figure 1.1 : A simple asynchronous network.
f1-2
Figure 1.2 : A multilayered asynchronous network.

The links of these networks were point-to-point asynchronous serial connections. For a paper tape network, the incoming information was punched on paper tape by high-speed paper tape punches and was then manually loaded on an outgoing paper tape reader.

Although this activity might seem like ancient history to younger readers, let us put this story into a more understandable framework. In early 1962, Paul Baran and his colleagues at the Rand Corporation were tackling the problem of how to build a computer network that would survive a nuclear war.

The year 1969 was a year of milestones. Not only did NASA place the first astronauts on the moon but also, and with much less fanfare, Department of Defense’s Advanced Research Projects Agency (ARPA) contracted with Bolt, Baranek, and Newman (BBN) to develop a packet-switched network based on Paul Baran’s ideas. The initial project linked computers at the University of California at Los Angeles (UCLA), Stanford Research Institute (SRI) in Menlo Park, California, and University of Utah in Salt Lake City, Nevada.  On the other side of the continent from the ARPAnet action, Brian W. Kernighan and Dennis M. Ritchie brought UNIX to life at Bell Labs (now Lucent Technologies) in Murray Hills, New Jersey.

Even though message switching was well known, the original ARPAnet provided only three services: remote login (telnet), file transfer, and remote printing. In 1972, when ARPAnet consisted of 37 sites, e-mail joined the ranks of ARPAnet services. In October 1972 ARPAnet was demonstrated to the public at the International Conference on Computer Communications in Washington, D.C. In the following year, TCP/IP was proposed as a standard for ARPAnet.

The amount of military-related traffic continued to increase on ARPAnet. In 1975 the Defense Communications Agency (DCA) changed its name to DARPA (Defense Advanced Research Projects Agency) and took control of ARPAnet. Many non-government organizations wanted to connect to ARPAnet, but DARPA limited private sector connections to defense-related organizations. This policy led to the formation of other networks such as BBN’s commercial network Telenet.

The year 1975 marked the beginning of the personal computer industry’s rapid growth. In those days when you bought a microcomputer, you received bags of parts that you then assembled. Assembling a computer was a lot of work, for a simple 8KB memory card required over 1,000 solder connections. Only serious electronic hobbyists, such as those who attended the Home Brew computer club meetings at the Stanford Linear Accelerator Laboratories on Wednesday nights, built computers.

In 1976, four years after the initial public announcement that ARPAnet would use packet-switching technology, telephone companies from around the world through the auspices of CCITT (Consultative Committee for International Telegraphy and Telephony) announced the X.25 standard. Although both ARPAnet and X.25 used packet switching, there was a crucial difference in the implementations. As the precursor of TCP/IP, the ARPAnet protocol was based on the end-to-end principle; that is, only the ends are trusted and the carrier is considered unreliable.

On the other hand, the telephone companies preferred a more controllable protocol. They wanted to build packet-switched networks that used a trusted carrier, and they (the phone companies) wanted to control the input of network traffic. Therefore, CCITT based the X.25 protocol on the hop-to-hop principle in which each hop verified that it received the packet correctly. CCITT also reduced the packet size by creating virtual circuits.

In contrast to ARPAnet, in which every packet contained enough information to take its own path, with the X.25 protocol the first packet contains the path information and establishes a virtual circuit. After the initial packet, every other packet follows the same virtual circuit. Although this optimizes the flow of traffic over slow links, it means that the connection depends on the continued existence of the virtual circuit.

The end-to-end principle of TCP/IP and the hop-to-hop principle of X.25 represent opposing views of the data transfer process between the source and destination. TCP/IP assumes that the carrier is unreliable and that every packet takes a different route to the destination, and does not worry about the amount of traffic flowing through the various paths to the destination. On the other hand, X.25 corrects errors at every hop to the destination, creates a single virtual path for all packets, and regulates the amount of traffic a device sends to the X.25 network.

The year 1979 was another milestone year for the future of the Internet. Computer scientists from all over the world met to establish a research computer network called Usenet. Usenet was a dial-up network using UUCP (UNIX-to-UNIX copy). It offered Usenet News and mail servers. The mail service required a user to enter the entire path to the destination machine using the UUCP bang addressing wherein the names of the different machines were separated by exclamation marks (bangs). Even though I sent mail on a regular basis, I always had problems getting the address right. Only a few UUCP networks are left today, but Usenet News continues as NetNews. Also in 1979, Onyx Systems released the first commercial version of UNIX on a microcomputer.

The most crucial event for TCP/IP occurred on January 1, 1983, when TCP/IP became the standard protocol for ARPAnet, which provided connections to 500 sites. On that day the Internet was born. Since the late 1970s, many government, research, and academic networks had been using TCP/IP; but with the final conversion of ARPAnet, the various TCP/IP networks had a protocol that facilitated internetworking. In the same year, the military part of ARPAnet split off to form MILNET. As the result of funding from DARPA, the University of California’s Berkeley Software Distribution released BSD 4.2 UNIX with a TCP/IP stack. In addition, Novell released NetWare based on the XNS protocol developed at Xerox Park, Proteon shipped a software base router using the PDP-11, and C++ was transformed from an idea to a viable language.

That was the year in which the idea of building local-area networks (LANs) was new and hot. With the introduction of LANs, the topology of networks changed from the representation shown in Figure 1.2, which ties legacy systems together, to that shown in Figure 1.3, which ties LANs together.

f1-3
Figure 1.3 : A LAN-based model for internetworks.

With the growth in number of organizations connecting to ARPAnet and the increasing number of LANs connected to ARPAnet, another problem surfaced. TCP/IP routes traffic according to the destination’s IP address.

The IP address is a 32-bit number divided into four octets for the sake of human readability. Whereas computers work with numbers, humans remember names better than numbers. When ARPAnet was small, systems used the host file (in UNIX the file is /etc/hosts) to resolve names to Internet Protocol (IP) addresses. The Network Information Center (NIC) maintained the master file, and individual sites periodically downloaded the file. As the size of the ARPAnet grew, this arrangement became unmanageable in a fast-growing and dynamic network.

In 1984 the domain name system (DNS) replaced downloading the host file from NIC (the section “IP Addresses and Domain Names” discusses the relationship between the two in more detail). With the implementation of DNS, the management of mapping names to addresses moved out to the sites themselves.

For the next seven years, the Internet entered a growth phase. In 1987 the National Science Foundation created NFSNET to link super-computing centers via a high-speed backbone (56Kbps). Although NFSNET was strictly noncommercial, it enabled organizations to obtain an Internet connection without having to meet ARPAnet’s defense-oriented policy. By 1990 organizations connected to ARPAnet completed their move to NSFNET, and ARPAnet ceased to exist. NSFNET closed its doors five years later, and commercial providers took over the Internet world.

Until 1990 the primary Internet applications were e-mail, listserv, telnet, and FTP. In 1990, McGill University introduced Archie, an FTP search tool for the Internet. In 1991, the University of Minnesota released Gopher.

Gopher
Gopher

Gopher’s hierarchical menu structure helped users organize documents for presentation over the Internet. Gopher servers became so popular that by 1993 thousands of Gopher servers contained over a million documents. To find these documents, a person used the Gopher search tool Veronica (very easy rodent-oriented netwide index to computerized archives). These search tools are important, but they are not the ones that sparked the Internet explosion.

In 1992 Tim Berners-Lee, a physicist at CERN in Geneva, Switzerland, developed the protocols for the World Wide Web (WWW). Seeking a way to link scientific documents together, he created the Hypertext Markup Language (HTML), which is a subset of the Standard Generalized Markup Language (SGML). In developing the WWW, he drew from the 1965 work of Ted Nelson, who coined the word hypertext. However, the event that really fueled the Internet explosion was the release of Mosaic by the National Center for Supercomputing (NCSA) in 1993.

From a standard for textual documents, HTML now includes images, sound, video, and interactive screens via the common gateway interface (CGI), Microsoft’s ActiveX (previously called control OLE), and Sun Microsystem’s Java. The changes occur so fast that the standards lag behind the market.

How large is the Internet today?

That is a good question. We could measure the size of the Internet by the number of network addresses granted by InterNIC, but these addresses can be “subnetted,” so the number of networks is much larger than InterNIC figures suggest. We could measure the size of the Internet by the number of domain names, yet some of these names are vanity names (a domain name assigned to an organization, but supported by servers that support multiple domain names) and other aliases. Vanity names and aliases result in a higher name count than the number of IP addresses, because multiple names point to the same IP address.

Starting in the fall of 1995, companies and organizations began to include their uniform resources locator (URL), along with their street address, telephone number, and fax number, in television ads, newspaper ads, and consumer newsletters. Therefore, a company’s presence on the Internet, as represent by its Web address (the URL), reached a new level of general acceptance. The Internet emerged from academia to become a household word.

The question arises as to where all this technology is going. Because my crystal ball is broken, please don’t hold me to what I say.