2 concerns:

receiver be able to extract the same message from the signal sent by sender
increase the efficiency of encoding - data compression

2 sides need to agree to a presentation/message format

Data Manipulation Functions

presentation formatting
data compression

7.1 Presentation Formatting

presentation formatting - the transforming between: application data and data suitable for transmission over a network
encoding/marshalling - application data to network data
decoding/unmarshalling - network data to application data

Issues in Argument Marshalling:

What Data Types Does the System Support

data types levels (lowest to highest):

base types - typically includes: integers, floating-point numbers, characters, ordinal types, booleans, etc
flat types - structures and arrays
complex types - types that are built using pointers/memory addresses

Data Conversion Between Different Machine Architectures

conversion strategies:

canonical intermediate form
- both sender and receiver settles on external representation for each type
- in networking, this is known as N-by-1 solution: each N machine architectures must be able to convert between its own representation to the single external representation
receiver-makes-right
- sender transmit data in its own internal format; the receiver is then responsible for translating the data from sender’s format into its own format
- in networking, this is known as N-by-N solution: each N machine architectures must be able to convert between its own representation to all N architectures

Identifying Kinds of Data Within the Message

2 common approaches:

tagged data - is any additional information included in a message - beyond the concrete representation of the base types - that helps the receiver decode the message
untagged data - both sender and receiver agrees on a predetermined type for each data

Stubs

is a piece of code that implements argument marshalling
on the client side, the stub marshals the data into a linear message
on the server side, the stub unmarshalls the linear message into the data
stubs can either be: interpreted or compiled

Network Data Representation Examples (XDR, ASN.1, NDR)

External Data Representation (XDR)

is the network format used with SunRPC
supports the entire C-type system with exception of function pointers
defines a canonical intermediate form
does not use tags (except to indicate array lengths)
uses compiled stubs

Abstract Syntax Notation One (ASN.1)

is an ISO standard
supports C-type system without function pointers
defines a canonical intermediate form
uses type tags
stubs can be either: compiled or interpreted
used in the Internet standard Simple Network Management Protocol (SNMP)
represents each data item with a triple of the form: ⟨tag, length, value⟩

Network Data Representation (NDR)

data-encoding standard used in the Distributed Computing Environment (DCE)
uses receiver-makes-right (unlike XDR and ASN.1) by inserting a tag at front of each message; individual data items are untagged
uses compiler to generate stubs
supports C-type system

Markup Languages

Extensible Markup Language (XML) - takes tagged approach to the extreme
XML Schema Document (XSD) - a schema language defined for XML
Efficient XML Interchange (EXI) - binary XML format for greater compactness and faster parsing

XML Namespaces - solves the problem of name clashes

7.2 Multimedia Data

since multimedia data are consumed by humans we want to keep the information that is most important to a human, while getting rid of anything that doesn’t improve the human’s perception of the visual or auditory experience

compression types:

lossless compression - guarantees data received is exactly the same as data sent
lossy compression - does not guarantee data received is exactly the same as data sent

Lossless Compression Techniques

huffman codes

encoding data of: higher occurrences with fewer bits and lower occurences with more bits

run length encoding (RLE)

the idea is to replace consecutive occurrences of a given symbol with only one copy of the symbol, plus a count of how many times that symbol occur
e.g. the string AAABBCDDDD would be encoded as 3A2B1C4D
works well on images due to large homogeneous regions

differential pulse code modulation (DPCM)

the idea here is to first output a reference symbol and then, for each symbol in the data, to output the difference between that symbol and the reference symbol
e.g. using symbol A as the reference symbol, the string AAABBCDDDD would be encoded as A0001123333 because A is the same as the reference symbol, B has a difference of 1 from the reference symbol, and so on
when the differences are small they can be encoded with fewer bits than the symbol itself. In this example, the range of differences, 0–3, can be represented with 2 bits each, rather than the 7 or 8 bits required by the full character. As soon as the difference becomes too large, a new reference symbol is selected
works better than RLE for most digital imagery, because adjacent pixels are usually similar

delta encoding

simply encodes symbol as the difference from the previous one
e.g. AAABBCDDDD would be represented as A001011000
also possible to perform RLE after delta encoding, since we might find long strings of 0s if there are many similar symbols next to each other

dictionary-based methods

Lempel-Ziv (LZ) compression algorithm is the best known dictionary-based compression method
the idea is to build a dictionary/table of variable-length strings that would be expected to be found in data, then replace each of these strings with the corresponding index

7.2.2 Image Representation and Compression

images are composed of pixels
pixels are either: grayscale or color
there are various color spaces:
- RGB - red, green, blue values
- YUV - Y luminance - the brightness, U and V containing chrominance - color information

Graphical Interchange Format (GIF)

starts with an RGB color space: 8-bits for each of the 3 color dimensions for a total of 24-bits
then reduces the 24-bit color images to 8-bit color images: by picking the 256 colors that most closely approximate the colors used in the picture
all 256 colors are stored in a table and each indexed with an 8-bit number
then the value for each pixel is replaced by the appropriate index
then runs an LZ variant over the result, treating common sequences of pixels as the strings that make up the dictionary

Joint Photographic Experts Group (JPEG)

starts off by transforming colors from RGB space into YUV space using linear equations:
- Y = 0.299R + 0.587G + 0.114B
- U = (B-Y) * 0.565
- V = (R-Y) * 0.713
we compress U and V components more aggressively, because human eyes are less sensitive
subsample - take a number of adjacent pixels, calculate the average U or V value and transmit that. the luminance Y component is not subsampled
after subsampling the image, we feed it through 3 phases one 8x8 block at a time:
- discrete cosine transform (DCT) - transforms image as spatial domain into an equivalent signal in the spatial frequency domain. this is a lossless operation
- quantization - quantizes the resulting signal, which loses the least significant information contained in that signal
- encoding - encodes the final result
decompression follows these same 3 phases in reverse order

7.2.3 Video Compression

Moving Picture Experts Group (MPEG)

each frame can be compressed using the same DCT-based technique used in JPEG
then takes a sequence of video frames and compresses them into 3 types of frames (each frame is compressed into one of these frame types):
- I frames (intrapicture) - can be thought of self-contained reference frames
- P frames (predicted picture) - not self-contained, specifies the differences from the previous I frame
- B frames (bidirectional predicted picture) - not self-contained, gives an interpolation between the previous and subsequent I or P frames

7.2.4 Transmitting MPEG over a Network

TODO page 614

7.2.4 Audio Compression

MP3

MP3 Compression Rates
Coding	Bit Rates	Compression Factor
Layer I	384 kbps	4
Layer II	192 kbps	8
Layer III	128 kbps	12

／var／log marcus chiu

Explorer

CN - Chapter 7 - End-to-End Data

Data Manipulation Functions

7.1 Presentation Formatting

Issues in Argument Marshalling:

What Data Types Does the System Support

Data Conversion Between Different Machine Architectures

Identifying Kinds of Data Within the Message

Stubs

Network Data Representation Examples (XDR, ASN.1, NDR)

External Data Representation (XDR)

Abstract Syntax Notation One (ASN.1)

Network Data Representation (NDR)

Markup Languages

7.2 Multimedia Data

Lossless Compression Techniques

huffman codes

run length encoding (RLE)

differential pulse code modulation (DPCM)

delta encoding

dictionary-based methods

7.2.2 Image Representation and Compression

Graphical Interchange Format (GIF)

Joint Photographic Experts Group (JPEG)

7.2.3 Video Compression

Moving Picture Experts Group (MPEG)

7.2.4 Transmitting MPEG over a Network

7.2.4 Audio Compression

MP3

／var／logmarcus chiu

Explorer

CN - Chapter 7 - End-to-End Data

Data Manipulation Functions

7.1 Presentation Formatting

Issues in Argument Marshalling:

What Data Types Does the System Support

Data Conversion Between Different Machine Architectures

Identifying Kinds of Data Within the Message

Stubs

Network Data Representation Examples (XDR, ASN.1, NDR)

External Data Representation (XDR)

Abstract Syntax Notation One (ASN.1)

Network Data Representation (NDR)

Markup Languages

7.2 Multimedia Data

Lossless Compression Techniques

huffman codes

run length encoding (RLE)

differential pulse code modulation (DPCM)

delta encoding

dictionary-based methods

7.2.2 Image Representation and Compression

Graphical Interchange Format (GIF)

Joint Photographic Experts Group (JPEG)

7.2.3 Video Compression

Moving Picture Experts Group (MPEG)

7.2.4 Transmitting MPEG over a Network

7.2.4 Audio Compression

MP3

／var／log marcus chiu