Iomega 1-Step Backup File FormatFor an overview, see the introductory comments in the 1-Step Backup summary page. This page describes the physical file format I have worked out by inspection for Version 5.3 backups.
I got interested in this as I was bored and enjoy an occasional puzzle.
The file header appears to be 0x200 bytes long. Typically the raw file data
section immediately follows this header. Backups could span multiple
target disks using a set of removable Iomega media for a single backup.
Only one *.1-Step file was allowed on each of the media disks. Often
this file filled the disk, but there was an option to leave existing files
and just use the remaining space on the media. The backup files had names
which embedded the job number, disk number, and date in a string like:
The numeric data format in these files is Intel specific
little endian.
The sample code I have written is targeted at 32 bit systems which
is what I believe this software ran on. The only 64 bit value I've
seen maybe the date/time stamp used in the header. Oddly I have not
been able to decipher the format used for this, but since this time stamp
is also displayed as an ASCII string in the catalog region there is no
compelling reason to understand it. It can be used as an identifier in
backups that span multiple disk media as it remains constant for all
file headers. 00000: CD AB CD AB 00 02 00 00 02 00 01 00 B3 C3 D4 25 |...............% 00010: DA D5 E4 40 00 00 00 00 07 00 02 00 04 93 EA 00 |...@............ 00020: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................ 00030: 01 00 00 00 74 65 73 74 38 20 32 20 64 69 73 6B |....test8 2 disk 00040: 20 75 6E 63 6F 6D 70 72 65 73 73 65 64 00 00 00 | uncompressed... I only know what a some of the fields above represent, but it seems to be enough to parse the file. offset bytes use 0x0 4 appears to be a signature always bytes shown above 0xc 8 appears to be a date time stamp, format 8 unknown byte binary 0x18 2 job number 0x1a 2 disk number 0x1c 4 offset to catalog section in file 0x34 0-0x1cc optional descriptive text string entered by user File data is concatenated into files on sequential disks in the backup until all the data has been written, then the catalog is written at the end of the last file in the backup. The data region for each backup file normally begins at 0x200, although I believe it is possible for there to be no data in a file in which case the catalog region would start at 0x200 on a multi disk backup. I believe that there were nominally two user modes, either compression was activated or no compression was used. When compression was active, gzip compression appears to be used on a file by file basis. However the program seems to have some native intelligence and small files (typically less than 34 bytes in length) were not compressed. If compression does no reduce the file size, some of its data maybe uncompressed. As described below the catalog includes information about whether compression is active and if it is, the compression details for each file. To date most of my reverse engineering has been aimed at the catalog. It seems fairly strange compared to other backup formats I have looked at. Much of the key data in the catalog is in text format rather than binary. I do not understand a significant amount of the catalog contents, but do understand enough to parse the data required to extract files from it. In my parse of this information I just skip the data I do not understand. I have a few notes about what some of it might do, but its speculation. There are other regions I am not even willing to speculate about! One possible explanation for this text representation of structures and data is to make it transparent to different cpu chips and operating systems. However other parts are clearly big endian binary intel format structures. I'm like to know if any Mac users have seen a 1-Step backup program or relation from the OS6 days. Structure Definition: The key sub structure I more or less understand is what I call the structure definition section and the data that follows it. There appear to be 7 of these definition sections followed by the data for each of these definitions. All but a small binary header at the beginning of each structure definition is in text format. The binary header is at least 0x20 bytes long, and maybe 0x30. For now I assume 0x30, but the first 0x10 bytes may be a terminator for the proceeding section. It doesn't matter a lot the only WORD I identify starts at offset 0x14 in the 0x30 byte binary header block. It is the number of data entries in the data section. I find I can ignore this header and still parse the files, so currently that is what I am doing. However if I can work it out it might help in the traverse of these files. This header is followed by an arbitrary number of Field definitions, each 0x20 bytes long. These are ascii records. They contain a Field name, Field type, Field length and start offset. See structure below. To parse the region I read and save the data until I reach a block which begins with the byte 0xD which terminates the list. Let nfields be the number of fields found for this data definition. The start offset for the first field plus the sum of the field lengths is the data size of each of the entries that follows. The start offset is embedded in the field header described below, but has been 1 in all the sample files I've looked at so I assume this in my program. The data types used appear to be 'N' for numeric and 'C' for character data. The data is presented as ascii chars, but this data type indicates how the string should be used. It seems odd to use a free format definition of fields, but it appears to be what they have done. In my parsing program I use fixed structures, but validate that they conform. The easiest way to display this is to annotate a listing from my test program. My program does not display the start offset as its always been 1 for the first entry and then just incremented by the length of the preceding characters for the field in the record. My Name # of fields description 1 drives 13 generic backup data, drive names and letters 2 directories 4 maps directory names to numeric id 3 files 9 maps file names to drives and directories by # 4 comp 11 compression info, even if compressed backup some not 5 job 11 target machine job # and if a compressed volume apparently one one such record in the catalog 6 paths 3 maps file name to path to file Records appear to be in order that files above are listed, normally two records per file, 1st has a path string, 2nd associated file name. In some of my samples the last path has no file names after it implying the remaining files are all in the last path/directory. 7 session 6 one record for each media file in data set Following each binary structure definition header is an ascii region consisting of 0x20 byte blocks which start with the name of each field. Other than the name most bytes in this block are zero. Guessing at the 3 other non-zero bytes: 0 11 ascii chars creating a field name, some maybe NUL 0xb data type 0x4E => Numeric, 0x43=>Character, 1 byte long 0xc binary offset from start of data for this entry to start of field value in data block, monotonically increasing from 1 for each field 0x10 binary length of field, ie sum of lengths is byte length for each entry. This list of fields is terminated with an 0xd byte where the next field name would start. The ascii data region follows. It is in turn terminated with an 0x1a byte. Below is sample output from my test program showing the 7 key structures. The offsets change from file to file as the amount of data varies, but the same 7 data groups occurs in all. Its pretty clear what most are, but there seems to be some redundancy. For instance why does one need the Dir listing in addition to the Path listing? They both appear to contain the same information in different formats. Better safe than sorry. Note there are moderately large holes in the listing below and much of the data in these 'holes' seems to be zeros. I have not made a very significant attempt at understanding these binary regions as I seem to have more information than I need already to recover file data. Note each structure begins with a SERIAL field, this appears to be a 0 based record number in the array of records. As mentioned above record 0 contains the number of additional records in the array. The following records contain data related to the files in the backup. In the discussion below I ignore record 0 and just talk about the data records which follow it. ------------------------------------------------------- Backup name: test1 Job 3 Backup Disk 1 Catalog at offset 0x44944 attempt to step through catalog and locate structure definition regions ;one data record for each drive traversed in the backup start of Structure Definition 1: Disk at 0x45a14 1 SERIAL type N len 12 ; record # 2 NUMDIRS type N len 12 ; ? typically '0' 3 NUMFILES type N len 12 ; ? typically '0' 4 VLSRDW type N len 12 ; ? 5 USED_HI type N len 12 ; 2 fields for # of bytes used? 6 USED_LO type N len 12 7 FREE_HI type N len 12 ; 2 for fields for # bytes available? 8 FREE_LO type N len 12 9 LABEL type C len 12 ; Drive Volume name 10 VLSRDT type C len 14 ; time stamp as a string (possibly previous backup date) 11 DATETIME type C len 14 ; time stamp as a string (looks like current date) 12 DRV_LTR type C len 2 ; drive letter followed by a colon 13 HASDSKST type N len 12 ; ? typically '1' note: so far I have ignored fields 5 and 7 and used a long int for the _LO value in my program. For the sample files I have looked at the _HI value is always 0 ;one data record for each directory traversed in the backup start of Structure Definition 2: Dir at 0x47b04 1 SERIAL type N len 12 ; record # 2 DISKSER type N len 12 ; record # in Disks array, ie source disk for data 3 DIRSER type N len 12 ; record # of parent directory 4 NAME type C len 240 ; directory name ;one data record for each file in backup start of Structure Definition 3: File at 0x497fa 1 SERIAL type N len 12 ; record # 2 DIRSER type N len 12 ; ndx of dir in Structure Definition 2: Disk 3 STATUS type N len 12 4 DISKSER type N len 12 ; ndx of disk in Structure Definition 1: Disk 5 ATTRIB type N len 12 ; file attribute, typically 32 6 SIZE_HI type N len 12 ; 2 field # of bytes in the file 7 SIZE_LO type N len 12 8 DATETIME type C len 14 ; file timestamp as an asci string 9 NAME type C len 240 ; file name at least one File data record per file in backup, and records may be continued per below maximum ORGSIZE and COMPSIZE appears to be 65535 start of Structure Definition 4: Comp at 0x4be7a 1 SERIAL type N len 12 ; record # 2 ORGSER type N len 12 ; record # in files array 3 SEQUENCE type N len 12 ; sequence # in this record set >= 1 4 ORGSIZE type N len 12 ; original file size (if 65535 its part of a set) 5 COMPSIZE type N len 12 ; compressed file size 6 ARCDSKSE type N len 12 ; ? 7 CHK_SUM type N len 12 ; ? probably a check sum for compression used 8 COMP_LVL type N len 12 ; ? type of compression ? if compressed has been 4 9 OFFS_HI type N len 12 ; 2 field cumulative offset into backup data to start of this file 10 OFFS_LO type N len 12 11 IS_LAST type N len 12 ; boolean 0 unless its the last record of this group note: only the first record exists if the file is not compressed, if compressed there is a minimum of 1 of these records for each file record, and there maybe multiple records for a file. Records for a give file continue until IS_last = 1 start of Structure Definition 5: Job at 0x4ddb3 1 SERIAL type N len 12 ; record # 2 JOBNUM type N len 12 ; Job # from machine backup was run on 3 NUMDISKS type N len 12 ; number of media disks used in this backup 4 TGDRV type N len 12 ; ? possibly drive # of target Iomega drive 5 TGDRVT type N len 12 ; size of target media, Zip drive use 100 6 HASPSW type N len 12 ; ? 7 ISCUST type N len 12 ; ? appears to be boolean, if 1 a customized backup selection 8 ISCOMP type N len 12 ; 0 if not compressed, 1 if compression is used 9 CUSTDATE type C len 14 ; ascii time stamp for backup 10 PASSWORD type C len 32 ; apparently a password could be used, has been all spaces in my samples 11 DESCR type C len 256 ; descriptive string input by user at time of backup note: I believe there is only one Job data record related to the current backup it is always record #1 following the initial configuration record # 0 This appears to be a list of the paths selected for backup, it does not include the subdirectories which may have been included below this path. To cover all possible directories accessed use the Dir Structure definition 2 start of Structure Definition 6: Path at 0x4ee53 1 SERIAL type N len 12 ; record # 2 DATATYPE type N len 12 ; 1 if path name, 2 if file name 3 DATATEXT type C len 240 ; asciii path or file name note: typically two Path data records for each file in Backup, one for the path with datatype 1 and one for a file name with datatype 2. In some of my samples the list ends with the last path name and no following file name, in this case all additional files in the backup set go in this path. start of Structure Definition 7: Session at 0x51480 1 SERIAL type N len 12 ; record # 2 SESSFROM type N len 12 ; starting buffer read count in this media file 3 SESSTO type N len 12 ; last buffer read count in this media file 4 JOBNUM type N len 12 ; Job # from machine backup was run on 5 DISKNUM type N len 12 ; # of backup file with catalog => # of disks in set 6 SESSDATE type C len 14 ; ascii date string Note: there appears to be one of Session record for each media disk (file) in the backup set. It helps map the files to a specific media disk number. I believe the program uses a fixed size buffer, and advances the read count by 1 each time it refreshes this buffer as it reads the cumulative data from the media disks. It apparently fills the buffer each time until all the data is read, but writes it out on a file by file basis using the file specific length required in the write (based on file length if not compressed, or otherwise the compressed length). This means that a files data may well span the more than one disk, ie there is no guarantee the data area on any of the disk except the first begins at the start of a file. It may well contain data a continuation of a files data from the previous disk. My session data display via the -vs7 command line shows the session data and total number of files in the backup as a comparison. I currently only have two examples with two media files. This would be more useful if I knew the buffer size used, but I am still trying to work this out! Currently I am calculating the data length on a media disk from the file size less the file header length, 0x200 bytes, for all but the last disk which contains the catalog. On the last disk I use the catalog offset less the header length as the data length. |
File Format > Iomega 1-Step Backup >