2 - Data Organization

2.1 Scope of the Standard

The Digital Log Interchange Standard defines data organization on two levels, on the Logical Format level and on the Physical Format level. The Logical Format concerns the syntactic and semantic organization of a sequence of data characters. The Physical Format concerns the location and physical organization of data on various media.

The Logical Format is embedded in a Physical Format, but the two formats are very loosely bound to support reasonable access efficiency without unduly constraining the elements of the Logical Format. Consequently, before an application on a given system can process DLIS data recorded on a given medium, it must

2.2 Logical Format

The DLIS Logical Format consists of the following elements:

The relationship and ordering of these elements is illustrated in Figure 2-1. At the byte level, DLIS data is an ordered stream of 8-bit bytes, in which byte k precedes byte k+1. Within each byte, bit 1 is the high-order bit.


Figure 2-1. Logical Format

2.2.1 Representation Codes
Each distinct piece of information in the Logical Format has a well-defined representation that extends across one or more bytes. All permissible representations are listed in Appendix B both by symbolic name and by the one-byte Representation Code used to designate the representation explicitly in the Logical Format.

Symbolic names of the Representation Codes are provided for use in this specification. It is likely that these names will correspond to identifiers in program code, and they are restricted to six characters for this reason. A two-byte signed integer quantity, then, is said to have Representation Code SNORM, whereas a single precision floating point number has Representation Code FSINGL, a variable-length string of text has Representation Code ASCII, and so on.

2.2.2 Logical Record (LR)
Logical Records form the basic coherent bodies of information in the DLIS Logical Format. They encapsulate semantically related information within a Logical File. Each Logical Record consists of one or more consecutive Logical Record Segments, which provide the interface between the Logical Format and the Physical Format. Logical Record Segmentation is dependent on the type of Physical Format.

For example, it is the responsibility of the Logical Record Segments, not the Logical Records, to align with Physical Format boundaries. Segmentation also permits processing Logical Records of indefinite length.

A Logical Record Segment is composed of four mutually disjoint parts:

The term "Logical Record" distinguishes the DLIS element from a physical disk or tape record.
2.2.2.1 Logical Record Segment Header (LRSH)
Each Logical Record Segment begins with a Logical Record Segment Header. The LRSH format is defined in Figure 2-2.

Where information in the LRSH applies to a Logical Record, rather than to a Logical Record Segment, that information must be used consistently in all Segments of a Logical Record. Redundant recording of such information permits a uniform structure for the LRSH and provides knowledge about the Logical Record (e.g., what kind of information has been lost) that may not be otherwise available if, for example, the first Logical Record Segment is damaged.

Entry Representation Code Comments
Logical Record Segment Length UNORM 1
Logical Record Segment Attributes (not defined) 2
Logical Record Type USHORT 3
Figure 2-2. Logical Record Segment Header

Comments:

Bit Description Comments
1 Logical Record Structure
0 = Indirectly Formatted Logical Record
1 = Explicitly Formatted Logical Record
1
2 Predecessor
0 = This is the first segment of the Logical Record
1 = This is not the first segment of the Logical Record
2
3 Successor
0 = This is the last Segment of the Logical Record.
1 = This is not the last Segment of the Logical Record
3
4 Encryption
0 = No encryption.
1 = Logical Record is encrypted
4
5 Encryption Packet
0 = No Logical Record Segment Encryption Packet
1 = Logical Record Segment Encryption Packet is present
5
6 Checksum
0 = No checksum
1 = A checksum is present in the LRST
6
7 Trailing Length
0 = No Trailing Length
1 = A copy of the LRS lengt is present in the LRST
7
8 Padding
0 = No record padding
1 = Pad bytes are present in LRST
8
Figure 2-3. Logical Record Segment Attributes

Comments:

2.2.2.2 Logical Record Segment Encryption Packet (LRSEP)
The Logical Record Segment Encryption Packet, if present, immediately follows the Logical Record Segment Header. The format of the Encryption Packet is described in Figure 2-4.

Bytes Description Comments
1-2 Size of Encryption Packet in bytes (UNORM) 1
3-4 Producer's Company Code (UNORM) 2
5-end Encryption information 3
Figure 2-4. Definition of Logical Record Segment Encryption Packet

Comments:

2.2.2.3 Logical Record Segment Body (LRB)
The Logical Record Segment Body is an ordered set of 8-bit bytes that immediately follow the Logical Record Segment Encryption Packet, when the Encryption Packet is present, or otherwise immediately follow the Logical Record Segment Header.
2.2.2.4 Logical Record Segment Trailer (LRST)
The Logical Record Segment Trailer, if present, immediately follows the Logical Record Segment Body and consists of any combination of: Padding, a 2-byte checksum (see Appendix E), and/or a Trailing Length. Padding precedes a checksum or a Trailing Length. A checksum precedes a Trailing Length.
2.2.2.5 Logical Record Body (LRB)
The Logical Record Body consists of the ordered union of the Logical Record Segment Bodies of all Logical Record Segments that make up the Logical Record. Figure 2-5 illustrates a sample Logical Record decomposed into three Logical Record Segments.

LRSL
10100110
  
LRT body CHECKSUM TRAILING LRSL
LRSL
11100110
  
LRT body CHECKSUM TRAILING LRSL
LRSL
11000111
  
LRT body PADDING CHECKSUM TRAILING LRSL
Figure 2-5. Illustration of a Three-Segment Logical Record

2.2.3 Logical File (LF)
A Logical File consists of a sequence of one or more Logical Records, beginning with a File Header Logical Record (FHLR, see §5.1 and Appendix A), and containing no other FHLRs. A Logical File is terminated when another FHLR is encountered or when no more Logical Records are available for the Logical File.

The term "Logical File" distinguishes the DLIS element from a physical disk or tape file.

2.2.3.1 File Header Logical Record (FHLR)
Any FHLR must consist of exactly one Logical Record Segment. It is useful to be able to handle an FHLR as a file label, and this is one of the requirements necessary to make that possible.

2.3 Physical Format

Physical Format is the way in which recorded data is located and organized on a physical medium, such as a magnetic tape or disk. The specific binding of Logical Format to Physical Format depends on the medium and the access mechanism.

This section defines bindings for Record-Structured Physical Formats, including the industry-standard 9-track magnetic tapes as a special case. Bindings for other Physical Formats are not defined here.

2.3.1 Terminology
The term Storage Unit is defined loosely as something that contains recorded data and that is manageable as a unit at the human level. When applied to magnetic tape, Storage Unit refers to a single physical reel of tape. When applied to disks, Storage Unit refers to a single file. The term is used only when no distinction between different media is intended; the common terms "tape" and "file" are used when the context is targeted strictly at magnetic tapes or at disk files, respectively.

A sequence of Logical Files can reside on a single Storage Unit or a single Logical File can span multiple Storage Units. When a Logical File begins on one Storage Unit and ends on another, the Storage Units that it intersects constitute part of a Storage Set. Further definition of a Storage Set is provided in §2.3.4, "Storage Set Requirements."

All access mechanisms apply a structure to data recorded on a Storage Unit. The following structure is covered in this specification:

A Physical Format can be partitioned into three mutually disjoint parts: the Logical Format, the Invisible Envelope, and the Visible Envelope, respectively. These parts are illustrated in Figure 2-6. The Logical Format is data that is of interest to applications. The Invisible Envelope is data that is managed by the access mechanism and is not part of normal data read and write transactions. Invisible Envelope data is typically part of the control interface between the access mechanism and applications or is available through special queries. For example, most disk operating systems maintain file header control information that is separate from the file data. Record Structure files can also contain record lengths that are passed as control between the operating system and the application but are not passed as data. The Visible Envelope is information that is passed as data and is important in defining a particular Physical Format, but data that is not part of the Logical Format.

Except for industry-standard magnetic tapes, a specification of the Invisible Envelope is beyond the scope of this document.


Figure 2-6. Partitions of a Physical Format

2.3.2 Storage Unit Label (SUL)
The first 80 bytes of the Visible Envelope consist of ASCII characters and constitute a Storage Unit Label. Figure 2-7 defines the format of the SUL.

Field Size in Bytes Comments
Storage Unit Sequence Number 4 1
DLIS Version 5 2
Storage Unit Structure 6 3
Maximum Record Length 5 4
Storage Set Identifier 60 5
Figure 2-7. Format of Storage Unit Label

Comments:

2.3.3 Storage Unit Requirements
A Storage Unit must have exactly one Storage Unit Label that must appear before any Logical Format data. The first record in the Visible Envelope of a Record Storage Unit; must consist of an SUL.

A Storage Unit must contain an integer number of Logical Record Segments. It need not contain an integer number of Logical Records.

2.3.4 Storage Set Requirements
A Storage Set was introduced in §2.3.1 as a group of Storage Units across at least two of which resides a single Logical File. With the introduction of an SUL, it is now possible to complete the definition of Storage Set. A Storage Set is a group of one or more Storage Units that satisfies the following conditions: The Storage Set is provided to cover situations in which a Logical File overflows a Storage Unit and must be continued on another. This typically need only occur with magnetic tapes, although it is permitted to occur with any type of Storage Unit. Notice, however, that the actual requirements stated above do not demand Logical File continuation across members of a Storage Set, neither do they demand that the Storage Set Identifier be distinct for all Storage Sets. The implementation of Storage Sets and Storage Set Identifiers is left to users who shall decide how they can best suit the users' needs.
2.3.5 Storage Unit Terminators
A Storage Unit may simply run out of data, which is one way for it to terminate.

On industry standard 9-track magnetic tapes a Storage Unit is terminated by two consecutive Tape Marks (see §2.3.7.1). These marks belong to the Invisible Envelope of the tape.
2.3.6 Record Structure Requirements
A Visible Record on a Record Storage Unit consists of all data bytes passed to an application as a result of a normal record read operation.

On a Record Storage Unit, each Visible Record other than those in the Invisible Envelope or those that contain a Storage Unit Label must contain a positive integer number of Logical Record Segments, and all Segments must belong to the same Logical File. That is, a Logical Record Segment cannot span Visible Records and Visible Records cannot intersect more than one Logical File.

This requirement permits a DLIS reader always to locate the beginning of the next Logical Record on the Storage Unit. If Trailing Lengths are recorded, it permits backward recovery of Logical Record Segments when the first Logical Record Segment Length in the Visible Record is damaged.

2.3.6.1 Visible Record Length
According to sections 2.3.1 and 2.3.6, a Visible Record consists of Visible Envelope data plus one or more Logical Record Segments. Other than the Storage Unit Label, a Visible Record on a Record Storage Unit must contain the following parts in the order described: The Visible Record Length specifies the sum of the lengths in bytes of these three parts.
2.3.6.2 Format Version
Following the Visible Record Length in each Visible Record is a two-byte field, called the Format Version. This belongs to the Visible Envelope. The first byte is FF (hex), which distinguishes the Visible Record from records of other, older formats. The second byte is an integer (USHORT) specifying the major version number of the format, which for this specification is the value 1.
2.3.6.3 Visible Envelope
For a Storage Set consisting of Record Storage Units, the Visible Envelope consists of
2.3.6.4 Minimum Visible Record Length
No explicit minimum record length is required. Note that Logical Record Segments must be at least 16 bytes long. When the Visible Record Length and Format Version are included, the combination yields an implicit minimum length of 20 bytes, which is sufficient for known devices to handle.
2.3.6.5 Maximum Visible Record Length
The maximum Visible Record Length permitted on a Record Storage Unit is 16,384 bytes. 2.3.7 Industry-Standard 9-Track Magnetic Tape No constraint is imposed on the type of magnetic media that a company uses to record DLIS information for its private use. However, any standard DLIS tape access utility may be required to read or write DLIS information recorded only on industry-standard 9-track tapes that are written at a density of 800, 1600, or 6250 bits per inch.

To ensure uniformity of access to tape, which is a removable medium, the Invisible Envelope of a Physical Format that is recorded on industry-standard 9-track magnetic tape.; must consist of physical tape marks. The complete Physical Format encountered on such a tape, then, consists of tape marks, Storage Unit Label, Visible Record Lengths, Format Versions, and Logical Record Segments. This is illustrated in Figure 2-8. The use of tape marks is described in "Physical Tape Marks," which is the following section.


Figure 2-8. Illustration of Magnetic Tape Physical Format (1st Reel)

2.3.7.1 Physical Tape Marks
Physical tape marks constitute the Invisible Envelope on industry-standard 9-track magnetic tapes. Such tapes contain two indelible marks, called BOT and ETW in Figure 2-8. BOT is near the physical beginning of the tape and indicates the start of the region in which recorded information is permitted. ETW is required to be a minimum distance from the physical end of the tape and serves as a warning; with many systems ETW can be sensed only when writing.

Industry-standard 9-track magnetic tapes also employ a form of tape mark which is not indelible and that can appear multiple times. Marks of this type are called TM in Figure 2-8. A TM is a distinct form of recorded information and takes the place of a physical record. When an industry-standard 9-track magnetic tape serves as a Storage Unit, the TM shall be used as follows:

2.3.8 Considerations on Moving DLIS Between Media
Programs that move DLIS information from one Physical Format to another need to carry enough knowledge of the standard to ensure that the result is a valid DLIS Physical Format. The knowledge required depends on the sophistication of the program and can include one or more of the following: