![]()
Before you can send date over a network, you need lot's off stuff like protocols, ports and addressees. Although there is another section named "The Protocols", it doesn't mean that you won't find protocols here as well. People very often forget about the basic protocols of a network, although those are the most important ones. But before we start talking about protocols, let's first start talking about addresses.
() Back to Top ()
If you connect two PCs via a cable, you don't need any addresses. All data from PC A is send to PC B and the other way round. At the moment where you have more than just two PCs you'll need addresses to make sure the packages arrive at the right PC.
() Back to Top ()
The basic addresses of today's networks are the MAC addresses. Every network card that you can buy for PCs has such a MAC addresses written on it. When saying written on it I don't mean that it's written on the card itself, I mean it's written on a chip.
Those hardware-addresses can't be changed (the card already has it after you bought it, you can neither delete it nor reprogram it) and thus must be worldwide unique. To ensure this uniqueness, every company that produces such kind of cards has a unique identifier and after the identifier they place a unique number (e.g. 10-00-5A is the identifier of IBM, followed by the production number of the card)
Users don't have to care about MAC addresses, that's why I won't give further details here, all that users should know is that in fact data always travels between two hardware-addresses as those are the only addresses that hardware devices can understand.
Example for a MAC address:
00-00-1B-02-8A-58
The first three numbers tell us that this card was produced by Novell (company
ID), the other ones represent the number of the card.
() Back to Top ()
For various reason it was a clever idea to introduce software addresses. The advantages of software addresses are:
Nowadays users usually are only confronted with software addresses. The problem of those addresses is that hardware devices don't understand them, so before you can send a piece of data on its way, you must first figure out what the current hardware address of the PC is that has the software address you entered as destination.
If the PC is in you local network, your PC can send out a broadcast (a package that all PCs will accept, regardless hard- or software address), that more or less has a content like "Who's PC with software address XYZ". If one PC recognizes itself here, it'll reply and if the reply arrives, your PC will know the MAC address of the other PC (it's written in the head of the reply-package). BTW your PC isn't forced to do that over and over again as the MAC address of local PCs are cached for some time. They can't be cached forever, as the older the cache entry is, the more likely that it may not be correct anymore.
If the other PC is not in your local network, it's getting more complicated. That's what routers are good for. If your PC can't find the requested software address in you local network, it'll forward it to a router (in case such a PC exists in your network). To forward it, you PC will send out a package with the MAC address of the router. When the router receives the package, it can easily recognize that this package hasn't really reached it's destination, since while it has the correct hardware address, it doesn't have the correct software address.
The router (which is usually connected to at least two different networks) will now try to find the best path to reach the real destination of this package. Sometimes your router can simply forward it to the correct network, sometimes your package must pass several other routers on the Internet before it finally enters a network with the right PC. It would take too long to explain here how routers are working, let's just they they have own protocols and their task is to keep a map of the surrounding networks and routers, so they always know where to redirect a package and in case they don't know it, they always know the address of another router which may know it.
Most packages today must pass between 6 and 30 routers to reach their destination, with an average that is closer to 6 than to 30. BTW you often can hear the term HOP, what is mainly an acronym for passing a router. If a PC is 8 HOPs away, it means your packages must pass 8 routers to get there.
Software addresses must be unique worldwide, but there is one exception: There are so called "private" addresses that everyone can use (e.g.) in a home network. Those addresses are ignored by routers and thus can be applied to several PCs at once (not in the same local network of course). While you can easily reach such PCs inside your local network, nobody from outside can as routers will immediately throw away those packages when they are sent to them.
Routers slow down the transfer of a package, as they must first perform a table lookup to find out where to send the package at next, so the more routers you must pass to reach a PC, the longer it will take. Nevertheless it has a good reason why the Internet is parted into smaller networks connected via routers: Whenever two PCs communicate inside a network, the lines are busy and all the other PCs in this network have to wait until those two have finished their transfer. When you split up the Net into sub-nets, two PCs that communicate with each other will only cause busy lines in all sub-networks that their traffic is passing trough (e.g. 7 networks), in all the other sub-networks (several thousand ones) the lines aren't busy and can be used for other transfer; and in a big network like the Internet, I don't have to tell you how important that is.
IP addresses are currently 4 byte long. Another important number is the subnet-mask; it tells a PC which PCs are local and which PCs can only be contacted via a router. Thanks to this number, your PC doesn't need to memorize which addresses are in the local network and which ones aren't, they just have to compare the destination address with the subnet-mask.
Example for an IP address:
204.152.167.20
This software address is owned by Netcape.com
() Back to Top ()
While hardware devices are happy with MAC addresses and applications are happy with IP addresses, they are both not easy to memorize for users. Another address was born, the DNS Address (DNS = Domain Name Service).
The whole Internet has many DNS servers and all DNS server together form a huge database. Every single DNS server only knows the name of local PCs in their network (similar to most routers). In case they are asked for a name that isn't local, they will redirect that request to the next DNS server and this continues, until one DNS server is found that knows that name. If the name has been found, the (current) IP address of the PC with that name is returned.
Just like for routers, one reason to split up the DNS database is to keep it up-to-date. If the IP address of a PC has changed, the owner only needs to update the DNS entry for it on the local DNS server (instead of updating several thousand servers). Of course DNS server also have a cache, what can speed up the whole process extremely, but it's also evident that they can't cache data forever, otherwise it would soon be completely out-of-date.
DNS lookups are usually really fast (only a few seconds, even if you must pass many DNS servers to get a reply) and you only need to perform them once on your first transaction to a server, because when your local software knows its IP address, it'll continue to work with that address instead of the DNS address.
Many people misunderstand the DNS addresses. Let's take a look at an easy one:
www.microsoft.com
Now what's the name of the PC? No, it's not www.microsoft.com, the name of the
PC is "www". It's a PC named "www" that is placed in a network
named "microsoft", which is a sub-hierarchy of "com". Another
PC in that network is named "ftp" (ftp.microsoft.com), what can be
the same PC as before or a completely different one. Just like a server with
only one MAC address can have several IP addresses, a server with only one IP
address can have several DNS addresses.
A more complicated DNS address is:
www.informatik.tu-muenchen.de
It's in the de-hierarchy (country hierarchy for Germany), in a sub-net named
"tu-muenchen" (that's a university in Munich, Germany), in a sub-net
of this university named "informatik" (=computer science) and the
name of the PC is "www".
On many systems it won't play a role whether you type "www.tgos.org" or "tgos.org", because the TGOS sub-net only has a single PC and all requests are redirected to this PC. At some services that are used by millions of people daily, you will see that when you enter "www.whatever.com", you are redirected to "www4.whatever.com", being the fourth www-server under that address, because a single server couldn't handle all the requests. In that case the server named "www" is only for welcoming people and redirecting them to another server that isn't very busy right now.
() Back to Top ()
The IP protocol is the basic protocol of the worldwide Internet. Many people misunderstand the task of this protocol: The task of the IP protocol is to send a package on its way from PC A to PC B. It's not the task of the IP protocol to verify that the package really arrives there, nor is it the task of the IP protocol to make sure it arrived error free, in the correct order (if it was split into smaller pieces) or to check if PC B even exists and is online. This doesn't mean that IP is a unreliable protocol, because what it is supposed to do is done very reliable, it's just not part of its task, that's all.
The IP protocol contains the IP addresses of source and destination PC, the length of the data package, some additional information important for routers (e.g. the time when the package was sent, status flags) and the TTL (Time To Live). The TTL is a number that is decreased by one whenever the package passes a router. If it reaches zero, the next router will throw it away.
The TTL is important to avoid that a package may be send forth and back between two routers for eternity, what could be the case if router A believe the best way to reach the destination is via router B and router B thinks the best way is via router A. Such Ping-Pong-Effect happens once a while between routers until the one is finally updated via a router protocol. Further it is possible that the PC isn't existing at all or that it isn't currently online, in that case it may happen that package travels completely aimless through the Net, but once their TTL is zero, they are stopped. Usually the TTL of IP protocol implementations is high enough to reach any PC on the Net, but most operating systems allow you to manually increase it if really necessary.
Hosts use the IP protocol mainly for looking at the IP address in the destination field to find out if this package is for them, routers mainly use the IP header data to find out where to send the package next. That's mainly all what the IP protocol is used for.
() Back to Top ()
If a package arrives at a host and there is only one program running on this host that is expecting packages from the network, there is no problem, but what if there are two such programs currently running? For whom is the package that just arrived? In a multitasking operating system that's a difficult question!
Every PC has usually only a single IP address, but various applications may have sent out requests and now all wait for replies. Let's compare that to an office building. In an office building are several offices and in every office sits a single worker. If I now send a letter to the street address of this office building, they first must open it to find out which of the people working there is responsible for the letter and often that can't easily be said by the content of it.
It's much better if I not only write the street address of the office building onto the envelop, but also the name of the person in that building who should get the letter. They often have local mailboxes for every worker and will simply place the letter in the right mailbox. Now in case I'm sitting in an office building myself, I would not only write the street address of my office building as sender address, but also my name, so the answer is directly sent to me.
The same concept exists on PCs and it's called ports. A PC may only have one IP address, but it always has 65,536 ports. In case it sends a request to another PC, the application can not only tell the server to reply to IP 1.2.3.4, but also to use port 1024 for it. When the data package now arrives, the port number will assure that it is forwarded to the right application (regardless how many other applications are waiting for replies, as no application can use a port number that is already taken by another one).
If you let applications choose port numbers themselves (from what is currently available), you won't have a problem to redirect replies, but what about requests? If a server is WWW and FTP server at once and you request for data, how does it know what service you indent to use? Sure, by using the right port, the FTP server will be waiting for requests at one port and the WWW server at another port. The problem: How can your application know which port? The answer is simple: It can't!
Your application would be forced to try all possible ports to find the right one. To avoid such a disaster, there are well known ports. E.g. the well-known port for WWW requests is port 80, all standard WWW servers will wait for requests on port 80. The well-known port for FTP is port 21. Those ports are standards, but not mandatory. If you run your own FTP server, you can run it on any port you like, but if it isn't 21 people can only connect to it if they know the port address.
Applications will always assume the standard ports unless you tell them differently. If you enter http://www.tgos.org, your browser in fact understands that as http://www.tgos.org:80. The port number is always added to the address at the end after a colon and it can be added to both DNS and IP addresses (1.2.3.4:21 or www.someserver.com:80). If you want your browser to connect to a FTP server (ftp.somehost.net) at port 500 (e.g. the owner of the FTP server mailed you that it is port 500 and not 21), you would have to type: ftp://ftp.somehost.net:500, because if you leave the port away, your browser will use 21 and print out an error message because it won't find a FTP server at that port.
While request ports have standard values, reply ports usually haven't. No matter at what port you requested what kind of service, the server will send the reply to this request to the port that was specified as reply port in the request header.
You may ask why I didn't mention ports while explaining the IP protocol. The answer is easy, the IP protocol doesn't know any ports. IP is for sending data packages between two PCs, not between single applications/services on those PCs. The next two protocols I'm explaining below make use of ports, that's why I explained their concept, making it easier for you to understand those two protocols.
() Back to Top ()
The TCP protocol (Transmission Control Protocol) sits on top of the IP protocol. The tasks for the TCP protocol are:
Before TCP starts transferring any data to another PC, it initiates a session. This is done to find out if the other PC is online and ready to receive data. So unlike IP that sends data to an address regardless if it exists or if the host is online, TCP will first assure both by a handshake.
Once the connection is initiated, TCP starts sending the data. For every package that arrives successfully at the other host (successfully means that is did arrive and that according to checksum it didn't have any errors), the host will send out an acknowledgment block. If such a block doesn't arrive in time, the sender assumes the block either never arrived at the other host or only with errors and will send it a second time. This is called positive acknowledgment (only successfully received blocks are acknowledged).
The opposite would be negative acknowledgment, that means not successfully arrived blocks will be acknowledged (or in that case you better speak of "requested a second time"). The reason why positive acknowledgment was chosen for the Internet are:
TCP is aware of ports, so every transfer is not only between two IP addresses, but also between two ports (one on the sender and one of the recipient). If the recipient can't process the received data as fast as the sender is sending it, the recipient can temporarily stop the transmission. This is done by a simple trick: Every acknowledgment block has a header field named "Window Size", what is the amount of network cache left over at the recipient's computer. If it reaches zero, it means the other PC can't receive blocks anymore (no RAM is left to save it, the blocks would be lost) and the transfer is stopped until another block arrives with a "Window Size" greater than zero (e.g. if the other PC processed some of the data and that way freed up some RAM).
TCP blocks have numbers, they are needed for acknowledgment (so your PC knows what block just was acknowledged) and they are need for sorting as IP doesn't care for the "order" of blocks and may route packages over different paths, what may cause that they arrive completely out of order. It's the task of TCP to order them and then merge them (in case they had been split).
TCP is a connection orientated protocol (it first makes a connection/session to another PC before it sends any data), while IP is a connection-less protocol (it will send data regardless if the other PC even exists). Since a TCP connection is opened by a handshake (both PCs assure that the other PC is online and ready to send/receive data) that creates a session, this session must also be closed. To close a session, the sender sends out a special TCP block after the transfer and requests to close the session, if the other PC replies to that block, both PC close their connection "peacefully". In case there is no transfer on a TCP connection for some time, your PC assumes that the other PC went off-line, crashed or isn't reachable anymore and thus will close the session without waiting for a reply to its "close session" request (the session is closed forcefully).
TCP itself isn't bound to IP, it could run on top of any protocol that supports routing, but TCP can't transfer data alone, there always must be a low level transfer protocol below it. On the Internet you often see it written as TCP/IP, TCP for preparing data, correcting errors and creating sessions for safe transfer and IP for routing the TCP blocks between the single sub-nets of the Internet.
() Back to Top ()
The UDP protocol (User Datagram Protocol) is often seen as the little brother of the TCP protocol. Just like TCP it knows ports, just like TCP it has checksums and block numbers inside and just like TCP it runs on top of IP (UDP/IP). But there are some differences compared to TCP:
You may now ask, why using UDP anyway and not directly IP? Two good reasons: The first reason are the support of ports, what allows multiple UDP connections at once (IP would only allow one connection to another PC at once) and the other good reason are checksum/block number. Just because UDP doesn't care for those two, the application above can use them to request blocks a second time (this time it's negative acknowledgment) or to order blocks on its own.
So when is UDP used?
Theoretically you can use it for every connection that also runs over TCP, the
connection just isn't reliable anymore, but sometimes that's no problem. The
main advantage of UDP is speed. UDP is a lot faster than TCP because
it needs a lot less processing power on both sides. It's often used for applications
where speed is more important than reliability.
E.g. real time videos. If you send video material in real-time to another PC, what is if one block doesn't arrive? Well, no big deal, actually waiting for it to be resent would take too long (it would arrive seconds after it was needed in the video stream, the video can't pause that long, after all the server permanently sends new data) and if you miss one frame out of 25/30 frames a second, nobody will really notice that. Same for errors, if two or four pixels don't have the correct color because of a few wrong bytes, it isn't really important. It's not differently for sound files, a few missing/incorrect bytes are often not noticeable.
In other words you would use UDP for transferring data whenever you would accept little errors once a while and where you don't plan to save the data to your HD later on, because if you plan to save it, you'd like it to be absolutely flawless, otherwise errors would sum up if you redistribute it a second time via UDP.
Games that you can play over network also often use UDP protocol, as it's more important to keep on sending data all the time (upholding the flow of data), than to correct a few incorrectly/missing blocks (the game would already be in a completely different situation when the blocks arrives after they were resent, there is no usage for that old information anymore). And of course nobody saves those network traffic caused by games, so errors wouldn't sum up in any way.
You can assume that the majority of Internet traffic that is not TCP is UDP traffic. There are some other Internet protocol and thus also Internet traffic that neither is TCP nor UDP, but this is only a small piece of the big traffic cake.
() Back to Top ()
![]()
If your browser doesn't support frames, click here to return to project index.
If you miss the navigation frame to your left, click here to get it back.
![]()
Last edited 31.03.2001 by TGOS