Sunday 11 October 2009

DITA 02 - Text & HTML


This post covers digital data representation and the organisation of data using mark up language. We performed a number of simple experiments using MSWord, Notepad and Internet Explorer to see changes in the format of data representation in each file type. A JPEG file, as shown, was embedded in a WORD file and then in an HTML file to show the difference between "file centred" (embedded in WORD) and "document centred" (reference link in HTML) approaches to organising additional file data.

Digital computers use a binary representation implemented by hardware which stores and processes on/off states. A unit of memory storage is the byte which is 8 bits - a Mbyte is Million 8 bit bytes of data. Data formats were explored using 7 bit ASCII code to denote text characters as used in early teleprinters. Digital data is usually systematically organised into files of related data, and files into directories or catalogues (depending upon computer system).

Metadata is used to organise data by inserting additional file information carrying either semantic or presentation related information about the file. Metadata is coded into files as a Mark Up. Semantic mark up carries information about the context and meaning of the information held in the data file and so is superior to a presentation only mark up approach.

Hierachical file structures can be created, based on a "root" directory. Data organisation can also be "file centred", where additional file data is imported to be embedded in the same file, or "document centred", where additional files are externally held and referenced via a link from a main file.

No comments: