Introduction to HTML

File Extensions and Encodings

We said earlier that the content of a web page is usually specified in a .html file.  When a browser is sent a file, it looks at the extension to determine how to render the file.  If it is an html file the browser interprets the contents of the file as HTML code.  For example, click here.  This file is titled ass.kee.html. The browser interprets the contents as HTML, but since it doesn’t contain HTML code (it’s just plain old ASCII text) the browser doesn’t display it properly.

Viewing a Web Page’s Source Code

You can view the source code for a web page using the browser.  This will be a useful technique when we start debugging.  To view the source code for a web page, navigate to a web page with your browser, right click on the page, and choose the view page source option.  What you see is the raw text that was served up by the web server.  Try it on the example link given above.

Character Encoding

The default character encoding for HTML 5 is UTF-8.  UTF-8 is a variable length text encoding that includes ASCII as a subset.  Since ASCII is a subset, we can use any ordinary text editor, like notepad, to create and edit HTML files (and CSS, Javascript for that matter).


Elements

As its name suggests, HTML is a markup language.  A markup language consists of a set of tags that are used to mark-up the content.  Tags provide context for some content.  Content is marked-up by surrounding the content by opening and closing tags.

HTML provides tags that inform the browser how the content should be displayed.  For example, a browser can be informed that a sequence of characters should be interpreted as a paragraph by surrounding the sequence of characters with an opening paragraph tag <p> and closing paragraph tag </p> as shown below.

<p>Four score and seven years ago ...</p>

A pair of open and closing tags and any optional content between them is referred to as an element.  For example, the code given above defines a p element.


Element Attributes

Elements may also set the value of attributes within the opening tag.  An attribute is set by specifying the attribute name followed by = and a string of characters surrounded by double or single quotes.

ID and Class Attributes

Two of the most common element attributes are id and class. Ids must be unique. That is, no two elements within a web page can have the same id value.  More than one element in a web page, however, may have the same class value and an element can have more than one class.  When an element belongs to one or more classes the class names are separated by spaces.

Id and class attributes are used by CSS stylesheets and in Javascript to denote a single element (id) or a group of elements (class) in a web page.  We’ll then be able modify the style or other property of the element(s).

In the example below, the p element has an attribute named id that is set to “introduction”.

<p id="introduction">Four score and seven years ago ...</p>

 


A Basic HTML File

Most html files include at a minimum the following code.

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <title>My test page</title>
    </head>
    <body>
        <!-- this is a comment -->
    </body>
</html>

Here, the html element (often referred to as the root element) is the outermost element and contains child elements, namely the head element and the body element, that are nested within the html element.  Similarly, the head element contains two children: a meta element and a title element.

This nesting features allows a browser, before it renders the page, to create a tree structure called the Document Object Model (DOM) in memory.  The nodes are objects that represent the elements that are to be rendered.  The browser can stylize to the nodes using CSS and can dynamically modify the nodes using Javascript.  More on this later.

Lets dissect the code above.

  • The DOCTYPE tag specifies that the file contains HTML code.
  • The html element defines the root element.  This will be helpful when we want to find a particular element using Javascript code.  It can contain one head element and one body element.
  • The head element includes information that is used by the browser to display the content in the body element.
  • Meta elements include metadata, that is, data that describes data.  In the meta element above we stipulate that the html file uses the UTF-8 character set.  This is the default character set for HTML 5.
  • The title element specifies the string of characters that is displayed in the browser’s tab.
  • The body element contains the content that is displayed to the user.
  • Any text (including elements and executable code) that is placed between <!– and –> are considered as comments and are ignored by the browser when rendering the page.

Block and Inline Elements

HTML elements that contain content occupy rectangular areas within the web page.  These elements can be classified into two disjoint sets: block elements and inline elements.

A block element will always begin on a new line, regardless of the element that proceeds it and the element that is after it will also be be placed on a new line.  Block elements can contain both block elements and inline elements and are used to structure the document.

Inline elements are positioned to the right of the element that proceeds it and cannot contain block elements.  Inline elements are usually used to format text.

Example

The code below demonstrates the difference between block and inline elements.

<section>Lets begin...<h4>Introduction</h4><p>Hello <strong>World!</strong></p></section>

This code produces the following content in a web page.

Lets begin…

Introduction

Hello World!

The section element, the h4 (heading) element, and the p (paragraph) element are block elements which are positioned on new lines.  The <strong> element, however, is an inline element within the p element and is placed to the right of the text that proceeds it.


Formatting HTML Code

It is customary to write HTML code with spacing that reflects how the elements will be positioned on the page.  This makes the code more readable and easier to debug.  For example, it is better to write the above HTML code as follows:

<section>
    Lets begin...
    <h4>Introduction</h4>
    <p>Hello <strong>World!</strong></p>
</section>

Notice that when a block element contains other block elements, like the section element, we indent the inner elements with tabs.  The above code better represents visually what will appear on the web page and is easier to read.

Structuring Text

Although web pages today can display various forms of visual and auditory content, textual content is the most prevalent.  As such there are a number of elements that are used to structure text on the page.  Below are a few of the most commonly used ones.


Headings

Heading elements (h1, h2, h3, h4, h5, h6) are rendered by the browser with various size font, with h1 having the largest size and the rest progressively smaller.  It can contain any phrasing content (inline elements).

<h1>Heading</h1>
<h2>Heading</h2>
<h3>Heading</h3>
<h4>Heading</h4>
<h5>Heading</h5>
<h6>Heading</h6>

The above code produces the following:

Heading

Heading

Heading

Heading

Heading
Heading

Pargraphs

The paragraph element (p) is used to separate a paragraph of text from adjacent blocks with vertical space. It can contain any phrasing content (inline elements).

<p>paragraph 1</p>
some other text
<p>paragraph 2</p>

The above code produces the following HTML:

paragraph 1

some other text

paragraph 2


Line Breaks

The break element (br) produces a line break.  It can contain elements itself, and thus must not have an end tag.

one<br>two<br>three

The above code produces the following HTML:

one
two
three


Ordered Lists

The ordered list element (ol) creates an enumerated list of items.  Each enumerated item is given by an li element.  The li elements can contain any flow element including nested ol and ul elements.

<ol>
    <li>One</li>
    <li>Two</li>
    <li>Three</li>
</ol>

The above code produces the following HTML:

  1. One
  2. Two
  3. Three

Unordered List

The unordered list element (ul) creates a bulleted list of items.  Each enumerated item is given by an li element.  The li elements can contain any flow element including nested ol and ul elements.

<ul>
    <li>One</li>
    <li>Two</li>
    <li>Three</li>
</ul>

The above code produces the following HTML:

  • One
  • Two
  • Three

Description Lists

The description list element (dl) produces a list of items that each have a description term and a description definition indented on the following line.

<dl>
    <dt>One</dt>
    <dd>This is one</dd>
    <dt>Two</dt>
    <dd>This is two</dd>
    <dt>Three</dt>
    <dd>This is three</dd>
</dl>

The above code produces the following HTML:

One
This is one
Two
This is two
Three
This is three

Block Quotations

The block quotation element (blockquote) produces a block that is separated from its adjacent elements with vertical space and is usually indented. A visible representation of the source can be provided with the cite element.

<blockquote cite="http://n0code.net">
    <p>A little nonsense now and then, is cherished by the wisest men.</p>
    <p>-<cite>Anonymous</cite></p>
</blockquote>

The above code produces the following HTML:

A little nonsense now and then, is cherished by the wisest men.

Anonymous


Characters Entities

The greater than (>) and less than (<) characters, when written in a HTML document, are interpreted as part of element tags and do not render as text.  To display these characters as text we need to use their character entities.  Character entities begin with the ampersand (&) and end with a semi-colon (;).  Between the ampersand and semi-colon are one of the following:

  • a keyword
  • # and an octal code
  • #x and a hexadecimal code

A list of character entities can be found here.  For example, < can be written as text in a HTML document using any one of the following:

&lt;
&LT;
&#x0003C;
&#60;

The above code produces the following HTML:

<
<
<
<

White Space

Browsers interpret multiple adjacent white space characters (space, tab, newline) as a single space character.  Consider the following HTML code.  There are multiple spaces on the first line, a tab at the beginning of the second line, and newline characters after the first and second lines.

one   two   three
        four
five

The above code produces the following HTML:

one two three four five

There are a number of whitespace character entities in the HTML specification.  The &nbsp; character entity produces a non-breaking white space.  We can use multiple &nbsp; characters to render multiple adjacent spaces.  Unfortunately the character entities &Tab; and &NewLine; only work within a pre (preformatted) element which render all text as is and usually with additional formatting.  For example, all of the gray boxes in this tutorial are created with pre elements.  To reproduce a tab element, the best we can do is use the &ensp; element which produces 2 adjacent spaces, and the &emsp; element which produces 3 adjacent spaces.  To produce a new line we must use the br element.

1&nbsp;!<br>

2&ensp;!<br>

3&emsp;!<br>

The above code produces the following HTML:

1 !

2  !

3   !

Ampersand

Since the ampersand is reserved to denote the beginning of a special character entity we must use &amp; in order to display an ampersand.

one &amp; two

The above code produces the following HTML:

one & two