Automated differential backup using 7zip for linux/windows

There are a lot of ways of doing differential backups, tons of software, freeware of shareware. I want to share one I didn’t know about which I liked very much. It is unfair so little information can be found the Net about 7-zip archival tool, which can handle differential backups with ease.  Article is for CLI geeks, who understands where to push these commands )

What are Differential and Incremental backups?

[adsense_id=”1″]

Incremental backup

is a backup method in which multiple backups are kept (not just the last one). These backups will be incremental if each original piece of backed up information is stored only once, and then successive backups contain only the information that changed since a previous backup.

Differential backup

is a cumulative backup of all changes made since the last full or normal backup, i.e., the differences since the last full backup

I am talking about Differential backup, which contains one FULL archive and several DIFFERENTIAL archives on different date each.

Using 7zip for automated backup

Is really great tool for archiving files. Linux and win32 platform support, crossplaform archiving, multi threading support.

7zip installation

  1. Windows, download
  2. Ubuntu: aptitude install p7zip

7zip commands to create a backup of files

First step is to create full backup which is fairly easy:

7za a c:\archive.7z  c:\folder_to_archive

Next is to create differential backup with name diff1.7z

7za u c:\archive.7z  c:\folder_to_archive  -ms=off -mx=9 -t7z -u- -up0q3r2x2y2z0w2!c:\diff1.7z
  • Where command switches stand for:
  • -mx=9 – best compression
  • -t7z – 7z archive type

[adsense_id=”1″]

Wtf “-up0q3r2x2y2z0w2!c:\diff1.7z” is ?

Actions mask to determinite 7z behavior

p - File exists in archive, but is not matched with wildcard.
q - File exists in archive, but doesn't exist on disk.
r - File doesn't exist in archive, but exists on disk.
x - File in archive is newer than the file on disk.
y - File in archive is older than the file on disk.
z - File in archive is same as the file on disk
w - Can not be detected what file is newer (times are the same, sizes are different)

Number means action:

0	Ignore file (don't create item in new archive for this file)

1	Copy file (copy from old archive to new)

2	Compress (compress file from disk to new archive)

3	Create Anti-item (item that will delete file or directory during extracting). This feature is supported only in 7z format

More detailes on this switch here:

http://www.bugaco.com/7zip/MANUAL/switches/update.htm

How to extract files from 7zip differential backup

First step is to extract full backup  archive:

7za.exe x c:\archive.7z -oc:\recovery_path\

Next,  to extract needed differential  backup on top to the same folder

7za.exe x c:\archive.7z -aoa -y -oc:\recovery_path\

-aoa Overwrite All existing files without prompt.

-y (assume Yes on all queries) switch

After extraction destination folder will contain exact structure and files on date of backup!

What is 7zip anti-item

When creating differential archive 7zip matches files that have been deleted and creates anti-file entry which tells 7zip extractor actually do delete file when overriding master archive. Thats why resulting recovery folder will look the same as on archiving stage.

[adsense_id=”1″]

7zip backup limitations

DO NOT USE the 7-zip format on Linux/Unix  for system backup purposes, because of 7zip does not store the owner/group of the file.

On Linux/Unix, in order to backup directories you should use tar

to backup a directory

tar cf – directory | 7za a -si directory.tar.7z

to restore your backup :

7za x -so directory.tar.7z | tar

 

Did you find this post useful? Support the the author ($10)
My Google Profile+

15 comments

  1. Awesome.

    Minor error in section “Extracting differential archive on top to the same folder”

    Thanks :)

  2. Thanks, but there is a problem for linux backup.
    man 7za:
    Backup and limitations
    DO NOT USE the 7-zip format for backup purpose on Linux/Unix because :
    – 7-zip does not store the owner/group of the file.

    On Linux/Unix, in order to backup directories you must use tar :
    – to backup a directory : tar cf – directory | 7za a -si direc‐
    tory.tar.7z
    – to restore your backup : 7za x -so directory.tar.7z | tar xf –

  3. Yeah, 7zip is not suitable for full OS backup or owner/mode sensitive data. I think of it as a backup for personal data or web sites or big chunks of data where owner or permissions does not really matters.

    thanks for an update!

  4. I found this article when searching for a way to do a differential backup, and in the process of implementing it, realized a flaw in the method:

    If you attempt to do sequential differential backups, e.g. a nightly backup of the contents of a particular directory, then the first backup will have the entire contents of the source directory in it, but the second backup will have only the differential files with respect to the first backup.

    Now what happens when you do the third backup? If you’re diffing the current directory contents against the most recent previous backup – i.e. the second one in this example – then you’re taking the diff of the diff, which now backs up the entire directory once again, except for those files that changed after the first backup but not again after the second.

    So you wind up with alternating large and small incremental backups. But this still yields a correct copy of the directory when all of the backup files are restored in sequence.

    You could attempt to always diff against the original base backup, which will always produce relatively small files – at least until the directory contents have diverged significantly from the original contents – but then you wind up with an inauthentic directory when restoring each archive in sequence: anything that was added after the original base backup and then subsequently removed would never get an anti-item created, so what you’re restoring is actually the most recent version of anything that was ever in the directory.

    So, the first method still cuts space usage roughly in half, since you’re only re-archiving the entire directory every other backup, which makes it superior to simply archiving the entire directory every time, and not bothering with diffs, but I’d really like a way of having something similar to the second method, but which parses through the entire sequence of backups to generate a valid file list.

    I suppose the only effective way to do this would be to separate the differential function from the 7-zip archiving, and keep a text file with the directory tree and file hashes associated with each backup, which can be processed to generate an archival file list every time the tool is run.

  5. what you call “diff of the diff” is actually incremental backup, when next piece of backup depends on previous one like in a chain.

    7ZIP does not support incremental.

    Is supports only differential. That means each next backup depends only on first one. To restore the data you need main(first) archive and any of next one.

  6. That’s basically what I was getting at – the “diff of the diff” is re-archiving everything that’s never changed at all in the source directory. I think having a script which maintained a sequential log of every backup, and parsing it each time a new backup is run, could accomplish incremental backup.

    But after I posted that, I realized that you’d of course only extract the base archive and then only the most recent differential backup, and not restore them in sequence as I described above.

    Thanks for a useful article.

  7. thanks fou this post, but this line i wrong i suppose:

    7za.exe x c:\archive.7z -oc:\recovery_path\
    Next, to extract needed differential backup on top to the same folder
    —–> wrong line 7za.exe x c:\archive.7z -aoa -y -oc:\recovery_path
    —–> correct line 7za.exe x c:\diff1.7z -aoa -y -oc:\recovery_path
    is right ?

  8. can someone help please i dont understand how this script works i made different test but still confuse
    first step i made a full backup with the following files
    test1.txt
    test2.txt
    test3.txt

    Then before i performed a differential backup i add test4.txt
    then i performed a differential backup in diff1.7z
    so i saw the test4.txt in my differential backup

    Then i add another files test5.txt and i performed another differential backup
    in diff2.7z then i saw
    test4.txt
    test5.txt
    apparently it showing all changes from the initiall full backup

    when i m getting confused is i delete all files from initial backup and i create 2 folders
    folder1
    folder2
    when i am doing the differential backup
    in diff3.7z then i see
    all text files i have deleted and the 2 folders newly created.

    why the txt files deleted are showing up in the differential?

  9. Hi how to add a log content
    for example for the differential backup
    i would like to see in my log on the new/update files.
    thanks to help

  10. hi can you help please
    i did the full backup i have big 7z backup file like 40 Gb
    i would like to make a differential ,

    the differential command works fine when testing with a small 7z file
    but when i try to make a differential with the 40 Gb file i have a error message saying
    backup.7z is not supported archive

    thanks to help

Leave a Reply

Your email address will not be published.