• character set is a binding where every character is assigned a number (code units/points)
  • encoding scheme defines how a number (code units/points) is encoded into a sequence of bytes or binary
  • characterencoding is the combination of character set and encoding scheme (i.e. the mapping of characters to a sequence of bytes or binary, and vice versa)

Character Sets and/or Encoding Schemes

Common Standards

Character Set

Encoding Scheme

Description

ASCII

 ✔

mapping of characters to numbers

ISO 8859-1 - Latin1

 ✔

 ✔

mapping of characters to a single byte (a superset of ASCII)

Unicode

 ✔

mapping of characters to numbers (a superset of ASCII)

Unicode Transformation Format (UTF-8)

 ✔

mapping of numbers to one or more bytes

Other Encoding Schemes

Other Encoding Schemes

Description

HTML Encoding

HTML Encoding is mainly used to represent various characters so that they can be safely used within an HTML document

URL Encoding

When dealing with URLs, they can only contain printable ASCII characters (these are characters with ASCII codes between decimal 32 and 126, i.e. hex 0x20 – 0x7E). However, some characters within this range may have special meanings within the URL or within the HTTP protocol. URL encoding comes into play when we have either some characters with special meaning in the URL or want to have characters outside the printable range. To URL encode a character we simply prefix its hex value with a %

Resources