1-Step Format


Iomega 1-Step Backup File Format


For an overview, see the introductory comments in the 1-Step Backup summary page. This page describes the physical file format I have worked out by inspection for Version 5.3 backups.

I got interested in this as I was bored and enjoy an occasional puzzle.
I've done some previous reverse engineering work on the MsBackup format used in Win9x. I've reverse engineered this Iomega format sufficiently to parse sample backup files I created with the 1 Step Backup version 5.3 installed on a couple of my Win9x systems equipped with Zip drives. The following description is sufficient, but preliminary and has many holes in it as described below. However the information provided allows one to parse the files and recover the original data from them. A program written in C was created to test and verify this information, for a description of the program see rd113 Info. Program source code can be downloaded from rd113-src.tar.gz.

The file header appears to be 0x200 bytes long. Typically the raw file data section immediately follows this header. Backups could span multiple target disks using a set of removable Iomega media for a single backup. Only one *.1-Step file was allowed on each of the media disks. Often this file filled the disk, but there was an option to leave existing files and just use the remaining space on the media. The backup files had names which embedded the job number, disk number, and date in a string like:
"Backup Job 7, Disk 1, 16-10-27 19.36.39.1-Step"
ie "Backup Job #, Disk #, date time .1-Step"

After creation these files can obviously be renamed, but the name format above is what the program initially creates. The Job # is maintained in a database by the 1 Step Backup program on the target machine. Each new backup increments the job # for the machine. The Disk # is the number of the disk in the backup set. A one disk set will have Disk # = 1, multiple disk sets will have Disk # 1 to # = max number of disks used. The job and disk numbers are also stored in each files header as show below.

The numeric data format in these files is Intel specific little endian. The sample code I have written is targeted at 32 bit systems which is what I believe this software ran on. The only 64 bit value I've seen maybe the date/time stamp used in the header. Oddly I have not been able to decipher the format used for this, but since this time stamp is also displayed as an ASCII string in the catalog region there is no compelling reason to understand it. It can be used as an identifier in backups that span multiple disk media as it remains constant for all file headers.

A dump of a typical headers is shown below:
This is the file header at file offset 0

 00000: CD AB CD AB 00 02 00 00 02 00 01 00 B3 C3 D4 25 |...............%
 00010: DA D5 E4 40 00 00 00 00 07 00 02 00 04 93 EA 00 |...@............
 00020: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................
 00030: 01 00 00 00 74 65 73 74 38 20 32 20 64 69 73 6B |....test8 2 disk
 00040: 20 75 6E 63 6F 6D 70 72 65 73 73 65 64 00 00 00 | uncompressed...

I only know what a some of the fields above represent, but it seems to be
enough to parse the file.
offset    bytes          use
0x0         4        appears to be a signature always bytes shown above
0xc         8        appears to be a date time stamp, format 8 unknown byte binary
0x18        2        job number
0x1a        2        disk number
0x1c        4        offset to catalog section in file
0x34     0-0x1cc     optional descriptive text string entered by user

File data is concatenated into files on sequential disks in the backup
until all the data has been written, then the catalog is written at the
end of the last file in the backup.  The data region for each backup file
normally begins at 0x200, although I believe it is possible for there to
be no data in a file in which case the catalog region would start at 0x200
on a multi disk backup. I believe that there were nominally two user
modes, either compression was activated or no compression was used.
When compression was active, gzip compression appears to be used on a file
by file basis.  However the program seems to have some native intelligence
and small files (typically less than 34 bytes in length) were not compressed.
If compression does no reduce the file size, some of its data maybe uncompressed.
As described below the catalog includes information about whether compression
is active and if it is, the compression details for each file.

To date most of my reverse engineering has been aimed at the catalog.  It
seems fairly strange compared to other backup formats I have looked at. Much
of the key data in the catalog is in text format rather than binary. I do not
understand a significant amount of the catalog contents, but do understand
enough to parse the data required to extract files from it. In my
parse of this information I just skip the data I do not understand. I
have a few notes about what some of it might do, but its speculation.
There are other regions I am not even willing to speculate about!

One possible explanation for this text representation of structures and
data is to make it transparent to different cpu chips and operating systems.
However other parts are clearly big endian binary intel format structures.
I'm like to know if any Mac users have seen a 1-Step backup program or relation
from the OS6 days.

Structure Definition:
The key sub structure I more or less understand is what I call the
structure definition section and the data that follows it. There appear
to be 7 of these definition sections followed by the data for each of these
definitions. All but a small binary header at the beginning of each
structure definition is in text format. The binary header is at least
0x20 bytes long, and maybe 0x30. For now I assume 0x30, but the first
0x10 bytes may be a terminator for the proceeding section.
It doesn't matter a lot the only WORD I identify starts at offset
0x14 in the 0x30 byte binary header block. It is the number of data
entries in the data section. I find I can ignore this header and still
parse the files, so currently that is what I am doing. However if I can
work it out it might help in the traverse of these files.

This header is followed by an arbitrary number of Field definitions, each
0x20 bytes long.  These are ascii records.  They contain a Field name,
Field type, Field length and start offset. See structure below.
To parse the region I read and save the data until I reach a block which
begins with the byte 0xD which terminates the list.

Let nfields be the number of fields found for this data definition.
The start offset for the first field plus the sum of the field lengths
is the data size of each of the entries that follows. The start offset
is embedded in the field header described below, but has been 1 in all the
sample files I've looked at so I assume this in my program. The data types
used appear to be 'N' for numeric and 'C' for character data. The data
is presented as ascii chars, but this data type indicates how the string
should be used.

It seems odd to use a free format definition of fields, but it appears to be
what they have done. In my parsing program I use fixed structures, but
validate that they conform. The easiest way to display this is to annotate
a listing from my test program. My program does not display the start offset
as its always been 1 for the first entry and then just incremented by the
length of the preceding characters for the field in the record.


   My Name       # of fields        description
1  drives        13          generic backup data, drive names and letters
2  directories    4          maps directory names to numeric id
3  files          9          maps file names to drives and directories by #
4  comp          11          compression info, even if compressed backup some not
5  job           11          target machine job # and if a compressed volume
                             apparently one one such record in the catalog

6  paths          3          maps file name to path to file
                             Records appear to be in order that files above are 
                             listed, normally two records per file, 1st has a path string, 
                             2nd associated file name. In some of my samples the last
                             path has no file names after it implying the remaining 
                             files are all in the last path/directory.

7  session        6          one record for each media file in data set

Following each binary structure definition header is an ascii region
consisting of 0x20 byte blocks which start with the name of each field.
Other than the name most bytes in this block are zero. Guessing at the
3 other non-zero bytes:
  0      11 ascii chars creating a field name, some maybe NUL
  0xb    data type 0x4E => Numeric, 0x43=>Character, 1 byte long
  0xc    binary offset from start of data for this entry to start of field  
         value in data block, monotonically increasing from 1 for each field
  0x10   binary length of field, ie sum of lengths is byte length for each entry.
  
This list of fields is terminated with an 0xd byte where the next
field name would start. The ascii data region follows. It is in turn
terminated with an 0x1a byte.  

Below is sample output from my test program showing the 7 key structures.
The offsets change from file to file as the amount of data varies, but the
same 7 data groups occurs in all.  Its pretty clear what most are, but there
seems to be some redundancy.  For instance why does one need the Dir listing
in addition to the Path listing?  They both appear to contain the same
information in different formats.  Better safe than sorry.
Note there are moderately large holes in the listing below and much of
the data in these 'holes' seems to be zeros.  I have not made a very significant
attempt at understanding these binary regions as I seem to have more information
than I need already to recover file data.

Note each structure begins with a SERIAL field, this appears to be a 0
based record number in the array of records.  As mentioned above record
0 contains the number of additional records in the array. The following
records contain data related to the files in the backup.  In the discussion
below I ignore record 0 and just talk about the data records which follow it.
-------------------------------------------------------
Backup name: test1
Job 3  Backup Disk 1
Catalog at  offset 0x44944

attempt to step through catalog and locate structure definition regions
;one data record for each drive traversed in the backup
start of Structure Definition  1: Disk     at 0x45a14 
1  SERIAL     type N len 12       ; record #   
2  NUMDIRS    type N len 12       ; ? typically '0'
3  NUMFILES   type N len 12       ; ? typically '0'
4  VLSRDW     type N len 12       ; ? 
5  USED_HI    type N len 12       ; 2 fields for # of bytes used?
6  USED_LO    type N len 12
7  FREE_HI    type N len 12       ; 2 for fields for # bytes available?
8  FREE_LO    type N len 12
9  LABEL      type C len 12        ; Drive Volume name
10 VLSRDT     type C len 14        ; time stamp as a string (possibly previous backup date)
11 DATETIME   type C len 14        ; time stamp as a string (looks like current date)
12 DRV_LTR    type C len 2         ; drive letter followed by a colon
13 HASDSKST   type N len 12        ; ?  typically '1'
note: so far I have ignored fields 5 and 7 and used a long int for the _LO value
in my program.  For the sample files I have looked at the _HI value is always 0

;one data record for each directory traversed in the backup
start of Structure Definition  2: Dir      at 0x47b04  
1  SERIAL     type N len 12        ; record #
2  DISKSER    type N len 12        ; record # in Disks array, ie source disk for data
3  DIRSER     type N len 12        ; record # of parent directory
4  NAME       type C len 240       ; directory name

;one data record for each file in backup
start of Structure Definition  3: File     at 0x497fa 
1  SERIAL     type N len 12        ; record #
2  DIRSER     type N len 12        ; ndx of dir in Structure Definition  2: Disk
3  STATUS     type N len 12
4  DISKSER    type N len 12        ; ndx of disk in Structure Definition  1: Disk
5  ATTRIB     type N len 12        ; file attribute, typically 32
6  SIZE_HI    type N len 12        ; 2 field # of bytes in the file
7  SIZE_LO    type N len 12
8  DATETIME   type C len 14        ; file timestamp as an asci string
9  NAME       type C len 240       ; file name

  at least one File data record per file in backup, and records may be continued per 
  below maximum ORGSIZE and COMPSIZE appears to be 65535

start of Structure Definition  4: Comp     at 0x4be7a
1  SERIAL     type N len 12       ; record #
2  ORGSER     type N len 12       ; record # in files array
3  SEQUENCE   type N len 12       ; sequence # in this record set >= 1
4  ORGSIZE    type N len 12       ; original file size (if 65535 its part of a set)
5  COMPSIZE   type N len 12       ; compressed file size       
6  ARCDSKSE   type N len 12       ; ? 
7  CHK_SUM    type N len 12       ; ? probably a check sum for compression used
8  COMP_LVL   type N len 12       ; ? type of compression ? if compressed has been 4
9  OFFS_HI    type N len 12       ; 2 field cumulative offset into backup data to start of this file
10 OFFS_LO    type N len 12
11 IS_LAST    type N len 12       ; boolean 0 unless its the last record of this group
note: only the first record exists if the file is not compressed, if compressed
there is a minimum of 1 of these records for each file record, and there maybe multiple
records for a file. Records for a give file continue until IS_last = 1

start of Structure Definition  5: Job      at 0x4ddb3
1  SERIAL     type N len 12       ; record #
2  JOBNUM     type N len 12       ; Job # from machine backup was run on
3  NUMDISKS   type N len 12       ; number of media disks used in this backup
4  TGDRV      type N len 12       ; ? possibly drive # of target Iomega drive
5  TGDRVT     type N len 12       ; size of target media, Zip drive use 100
6  HASPSW     type N len 12       ; ?
7  ISCUST     type N len 12       ; ? appears to be boolean, if 1 a customized backup selection
8  ISCOMP     type N len 12       ; 0 if not compressed, 1 if compression is used
9  CUSTDATE   type C len 14       ; ascii time stamp for backup
10 PASSWORD   type C len 32       ; apparently a password could be used, has been all spaces in my samples
11 DESCR      type C len 256      ; descriptive string input by user at time of backup
note: I believe there is only one Job data record related to the current backup
      it is always record #1 following the initial configuration record # 0

This appears to be a list of the paths selected for backup, it does not include the subdirectories
which may have been included below this path.  To cover all possible directories accessed use
the Dir Structure definition 2
start of Structure Definition  6: Path     at 0x4ee53 
1  SERIAL     type N len 12       ; record #
2  DATATYPE   type N len 12       ; 1 if path name, 2 if file name
3  DATATEXT   type C len 240      ; asciii path or file name
note: typically two Path data records for each file in Backup, one for the path
      with datatype 1 and one for a file name with datatype 2. In some of my
      samples the list ends with the last path name and no following file
      name, in this case all additional files in the backup set go in this path.

start of Structure Definition  7: Session  at 0x51480 
1  SERIAL     type N len 12       ; record #
2  SESSFROM   type N len 12       ; starting buffer read count in this media file
3  SESSTO     type N len 12       ; last buffer read count in this media file
4  JOBNUM     type N len 12       ; Job # from machine backup was run on
5  DISKNUM    type N len 12       ; # of backup file with catalog => # of disks in set
6  SESSDATE   type C len 14       ; ascii date string

Note: there appears to be one of Session record for each media disk (file) in the
backup set. It helps map the files to a specific media disk number.  I believe the program
uses a fixed size buffer, and advances the read count by 1 each time it refreshes this
buffer as it reads the cumulative data from the media disks.  It apparently fills the
buffer each time until all the data is read, but writes it out on a file by file basis using
the file specific length required in the write (based on file length if not compressed, or
otherwise the compressed length).  This means that a files data may well span the more than
one disk, ie there is no guarantee the data area on any of the disk except the first begins
at the start of a file.  It may well contain data a continuation of a files data from the
previous disk.  My session data display via the -vs7 command line shows the session data and 
total number of files in the backup as a comparison.  I currently only have two examples with 
two media files.  This would be more useful if I knew the buffer size used, but I am still
trying to work this out!

Currently I am calculating the data length on a media disk from the file size less the file
header length, 0x200 bytes, for all but the last disk which contains the catalog. On the last
disk I use the catalog offset less the header length as the data length.


Comments