The Internet in a Nutshell

The internet is simply a collection of computers that are connected via wires and routers and that communicate using the Internet Protocol (IP).

Every computer that is connected to the Internet is assigned a unique numeric IP address.  This includes our personal computers.  In fact, if we google my IP address, google will display our computer’s IP address. If we click on the link above what we’ll see is an IP address in dotted decimal form.  71.63.32.48, for example, is a dotted-decimal IP address.  Web servers, computers that serve up the web pages of web sites, also have unique IP addresses.  For example, if we enter 172.217.13.78 in our browser, the google.com web server will return to our browser the google.com home page.

When a program on one computer wants to send a message to a program on another computer (like request a web page from a web server), it encapsulates the message in what is called a packet.  The packet contains both the source and destination IP addresses, the message, and other information.

Since sequences of letters like google.com are easier to remember than IP addresses, individuals, organizations, and companies register domain names to identify their servers on the internet.  A domain name consist of a sequence of domains separated by dots, for example, www.n0code.net.  The last domain (net in our example) is called the top-level domain (TLD).  There are many top-level domains.  The original TLDs are com, org, edu, gov, net, int, gov, and mil.  The company that oversees Internet domain names (ICANN) has recently approved over a thousand new TLDs, like io, ninja, black, and xyz.  A current list of TLDs can be found at https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains.

Domain names are mapped to IP addresses in the Domain Name System (DNS).  The DNS is a collection of web servers on the Internet whose main purpose is to provide a DNS resolver with the IP addresses that is mapped to a given domain name.

When we want to request a web page of a web site, we enter a Universal Resource Locator (URL) in the search field of a browser and press enter.  A URL contains the domain name followed by other optional information that tells the server what resource (e.g. file) we are requesting as well as other data that we want to pass to the server.  Since the packets the browser creates must contain the IP address of the destination server, not the domain name, the browser must resolve the IP address from the domain name using a DNS resolver.

The domain name resolution process begins with the client contacting its DNS resolver and requesting a lookup.  If the DNS resolver doesn’t have the IP address in its cache, it sends the request to a root DNS server.  If the root DNS server doesn’t have the IP address cached it returns the IP address of the top-level DNS server and the DNS resolver sends another lookup request to the top-level DNS.  If the top-level DNS doesn’t have the IP address cached it returns the IP address of the next-level DNS, and so on until the IP address is resolved.  When the DNS resolver resolves the domain name, it sends the IP address to the  client.

Once the client has the IP address of the destination server, it forms and sends the packet(s) to the computer’s gateway router.  Routers route packets through the web of routers and wires that make up the internet.  When a router receives an incoming packet, it inspects the packet to determine its destination IP address and uses its routing table to determine which wire to send the packet out on.  After it determines the best route to the destination, the router updates the packet and sends it on its way.  The packet then hops from router to router until it gets to the destination computer.

When the destination server receives the request from the client it responds by sending packets containing the website data back to the client.

URLs

When a user enters text in the search bar of a browser, the text must consist of a well formed Universal Resource Identifier (URI).  If the URL identifies a web page we refer to the URI as a Universal Resource Locator (URL).

A URI has the following general form:

 scheme://[user[:password]@]host[:port][/path][?query][#fragment]

Brackets [] refer to optional parts to a URI.  This implies that the only required parts are scheme://host

  • scheme refers to the name of the Internet Protocol being used.  When requesting web pages, this will often be http or https.  The scheme is followed by a colon and two slashes.
  • user and password identify credentials on the server.
  • host refers to the domain name of the server on which the resource is located.
  • port refers to the port number on which the web server is listening. Web servers usually listen to port 80.
  • path identifies a specific resource on the server.  The format of path depends on the type of web server that is running on the server.  In our case, we’re using an Apache web server which stores the web documents on the underlying hierarchal file system, so the path to a resource is the path to the resource from the root directory for the domain (public_html).  The path may or may not include a file name.  If a file name is not included in the path then the server searches for a file named index.html or index.php.
  • query specifies a string of data that can be passed to a server side script file.
  • fragment specifies an secondary resource identifier.  If the resource is a web page this often refers to an element in the page which the browser can scroll down to.

Organizing Server Resources

Resource Types

A single web sites contains numerous types of resources that are fetched by web browsers and displayed on the screen.  These include:

  • HyperText Markup Language (HTML) files
  • Cascading Style Sheet (CSS) files
  • Javascript (client side scripts) files
  • Image files (jpg, gif, png, svg)

Since a web site can contain hundreds and even thousands of resources, it is important to organize the files on the underlying file system.


Directory Structure

As mentioned in the previous lecture, when a user enters a URL in the search field of a browser, part of the URL string specifies a path to the particular resource that they would like displayed in the browser.  Since we’re using an Apache web server and the server stores all of the resources in a hierarchal file system, then the path to a resource (if valid) will contains a list of subdirectories from the domain’s root directory (public_html) separated by slashes followed by an optional file name.

For example, on n0code.net I have the following directory structure:

.../public_html/
.../public_html/work/
.../public_html/work/teaching/
.../public_html/work/teaching/courses
.../public_html/work/teaching/courses/csci240/
.../public_html/work/service/
.../public_html/play/

Note that the ../ that proceeds public_html just indicates that although public_html is the root directory for the domain, it is not necessarily the root directory of the file system.

When designing the directory structure for my website I took into account the content that I wanted to display on my webpage.  By organizing the directory structure in this way I allow users to go directly to the content that they are interested in.  For example, if a user wants to see the course web page for CSCI-240, they enter the URL http://n0code.net/work/teaching/courses/csci240/ into their browser.

As mentioned earlier, if the path portion of the URL does not include a file name, the web server looks for a file named index.html or index.php.  Therefore each subdirectory should have its own index.html file.  A subdirectory may contain more than one html file as well.  For example the csci240 directory contains a file named index.html as well as a file named student_sites.html which contains links to all of my students’ websites.  A browser can load each of these file by specifying the name of the file at the end of the path portion of the URL:

Since we can have any number of html, css, image, and javascript files in a single directory, it is a good idea to organize the files in each directory by file type. For example, in my domain’s root directory I have the following subdirectories:

.../public_html/css/
.../public_html/javascript/
.../public_html/images/

I include in these subdirectories resources that are shared across my website. Notice that I do not create a separate subdirectory for html files like I do for the other types of files.


Creating Directories in WebStorm

To create a sub directory in WebStorm, simply right-click the directory for which you want to create a subdirectory (e.g. /public_html) and choose New -> Directory.  Then enter the name of the subdirectory and press OK.

Although the steps above create a new directory on your local file system, they do not alter the file system on your server.  To force your server to automatically create the subdirectory right-click on the new directory and press Upload to your-server where your-server is the name of your server.