Samuel Tardieu @ rfc1149.net

How recoverjpeg saved my day

,

People sometimes do stupid things. I do at least. After I experienced a fatal disk crash a few days ago (the disk could not even be seen in the BIOS), I congratulated myself for having done a full online backup of thousands of pictures I had been taking for years with my digital cameras on my second hard disk a few days before the event.

I bought two new serial ATA disks[1] and reinstalled the system (first FreeBSD, then Debian GNU/Linux, as support for serial ATA is much better with the latter), setup software RAID-1 redundancy to avoid losing my system disk the next time a hard drive fails, and my computer went up and running again. When I was done, I decided to test other operating systems on my reshaped computer and installed Microsoft Windows XP Pro on my older hard disk[2] on a newly created 10GB partition, with the intention of playing with it for a few hours and deleting it afterwards, as I have no use for it.

Then I realized that… I had not transferred my digital pictures to my new disks; the only online copy was located on the disk I just reconfigured. Sure, I could remember burning two DVD as a backup three or four months before, but I was unable to locate them in my appartment. The pictures were buried somewhere under or around the new XP installation.

I happened to have written a small Python program a few weeks before to recover JPEG pictures from a friend compact flash memory card which would not list any of the images he had taken during his african trip. On most filesystems, chances that pictures are stored in consecutive disk sectors are good, as this is the simplest thing to do. Of course, some pictures will get stored in the holes made by removing pictures interactively, and some may have been overwritten by newly shot ones.

While the program did a good job on a 128MB file (a copy of the failing memory card), using it on a 80GB drive was going to be very painful. Especially since I expected to having to refine the algorithm in order to recover as many pictures as possible. The pictures had been taken with several brands of cameras and I had to be as close as possible to the JFIF file format while maintaining a high speed.

I decided to take a few hours to rewrite my program in C and to reduce the number of system calls to a minimum (the Python program was using tons of read()). I also wrote a small shell script to be run on top of recovered pictures which would sort them in directories named after the date the pictures had been taken, using the exif tags.

Amazingly enough, it did a very good job. The outcome of running the program on my 80GB drive with 10GB being used for the XP installation was:

  • 9538 pictures sorted by date (a few of them were corrupted in a way that no software can detect as they are valid JFIF files) and taken on 337 different days
  • 1310 pictures without date (some of them were correct pictures whose exif data had been corrupted)
  • 8301 pictures too small to be real digital pictures (no error there, most of them were thumbnails of real pictures previously made by software such as gqview)
  • 71 invalid JFIF files
  • 4 pictures recorded at a date of 0000-00-00 (probably a bug in a friend’s Olympus camera used to take the pictures)

That makes it a total of 19222 pictures, using 11GB worth of disk space. I could find pictures for every single major event I was able to remember. Needless to say I was and still am today very happy. I sent the program to a few friends for testing[3] and released it under the name recoverjpeg under the GNU General Public License.

I hope it will work out for you as well as it did for me. If it does, do not hesitate to send me a few pictures that have been recovered using it (800 x 600 format) so that I can put them on recoverjpeg WWW page.

[1] Ok, I admit, when I was in the shop, I also bought a new motherboard, a new CPU and more RAM.

[2] At this point, I was happy that Windows XP did not recognize the serial ATA drives as I was sure it could not trash them.

[3] This way, we found out that mmap()-ing block devices was not supported under FreeBSD, while it worked fine under Linux or Solaris. The program was adapted to use huge read() chunks to increase portability.

blog comments powered by Disqus