Introduction to HTML

File Extensions and Encodings

We said earlier that the content of a web page is usually specified in a .html file.  When a browser is sent a file, it looks at the extension to determine how to render the file.  If it is an html file the browser interprets the contents of the file as HTML code.  For example, click here.  This file is titled ass.kee.html. The browser interprets the contents as HTML, but since it doesn’t contain HTML code (it’s just plain old ASCII text) the browser doesn’t display it properly.

Viewing a Web Page’s Source Code

You can view the source code for a web page using the browser.  This will be a useful technique when we start debugging.  To view the source code for a web page, navigate to a web page with your browser, right click on the page, and choose the view page source option.  What you see is the raw text that was served up by the web server.  Try it on the example link given above.

Character Encoding

The default character encoding for HTML 5 is UTF-8.  UTF-8 is a variable length text encoding that includes ASCII as a subset.  Since ASCII is a subset, we can use any ordinary text editor, like notepad, to create and edit HTML files (and CSS, Javascript for that matter).


Elements

As its name suggests, HTML is a markup language.  A markup language consists of a set of tags that are used to mark-up the content.  Tags provide context for some content.  Content is marked-up by surrounding the content by opening and closing tags.

HTML provides tags that inform the browser how the content should be displayed.  For example, a browser can be informed that a sequence of characters should be interpreted as a paragraph by surrounding the sequence of characters with an opening paragraph tag <p> and closing paragraph tag </p> as shown below.

<p>Four score and seven years ago ...</p>

A pair of open and closing tags and any optional content between them is referred to as an element.  For example, the code given above defines a p element.


Element Attributes

Elements may also set the value of attributes within the opening tag.  An attribute is set by specifying the attribute name followed by = and a string of characters surrounded by double or single quotes.

ID and Class Attributes

Two of the most common element attributes are id and class. Ids must be unique. That is, no two elements within a web page can have the same id value.  More than one element in a web page, however, may have the same class value and an element can have more than one class.  When an element belongs to one or more classes the class names are separated by spaces.

Id and class attributes are used by CSS stylesheets and in Javascript to denote a single element (id) or a group of elements (class) in a web page.  We’ll then be able modify the style or other property of the element(s).

In the example below, the p element has an attribute named id that is set to “introduction”.

<p id="introduction">Four score and seven years ago ...</p>

 


A Basic HTML File

Most html files include at a minimum the following code.

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <title>My test page</title>
    </head>
    <body>
        <!-- this is a comment -->
    </body>
</html>

Here, the html element (often referred to as the root element) is the outermost element and contains child elements, namely the head element and the body element, that are nested within the html element.  Similarly, the head element contains two children: a meta element and a title element.

This nesting features allows a browser, before it renders the page, to create a tree structure called the Document Object Model (DOM) in memory.  The nodes are objects that represent the elements that are to be rendered.  The browser can stylize to the nodes using CSS and can dynamically modify the nodes using Javascript.  More on this later.

Lets dissect the code above.

  • The DOCTYPE tag specifies that the file contains HTML code.
  • The html element defines the root element.  This will be helpful when we want to find a particular element using Javascript code.  It can contain one head element and one body element.
  • The head element includes information that is used by the browser to display the content in the body element.
  • Meta elements include metadata, that is, data that describes data.  In the meta element above we stipulate that the html file uses the UTF-8 character set.  This is the default character set for HTML 5.
  • The title element specifies the string of characters that is displayed in the browser’s tab.
  • The body element contains the content that is displayed to the user.
  • Any text (including elements and executable code) that is placed between <!– and –> are considered as comments and are ignored by the browser when rendering the page.