September 29, 2005

JFS + Unicode/Extended-ASCII

So, my Linux box recently suffered some physical failure. An unfortunate side-effect of my current operating environment is that my JFS filesystem was no longer letting me access any files that contained unicode or extended-ASCII characters. The files were there, but could not be stat'ed. Luckily, I got an excellent reply from Dave Kleikamp at IBM on what went wrong and how to fix it. I include his reply here for posterity.

Wes:

I've been using JFS for my home directories, and recently had a hardware failure. Everything is back up now, but JFS seems to have trashed any file that contained Unicode or extended ASCII characters (e.g. ö or 刀) in the filename. I can see these files exist via an ls, but any attempt to stat or delete them fails.

Unfortunately, not only am I left with a bunch of lost data, but I cannot clean-up the dead files that are lying around anymore. fsck.jfs doesn't help.

Dave:

Are you running on a different kernel, or did the mount options in /etc/fstab change?

The quick answer is to mount with the option iocharset=utf8. This should make any file accessible.

The long answer is that the default character mapping behavior has changed between the 2.4 and 2.6 kernels, and that the default 2.4 behavior was dependent upon the setting of CONFIG_NLS_DEFAULT when the kernel is built.

For historic reasons, jfs stores the pathnames in 16-bit unicode. Since there is no reliable way for the kernel to know what character set the pathname truly are in, jfs now (in 2.6) stores every character as a 16-bit value with the high-order byte zeroed. (This is equivalent to iso8859-1.) This works well when the files have been consistently created this way. If a pathname exists that has a non-zero high-order byte, the default character conversion doesn't deal with it right. (You should see some syslog messages suggesting mounting with iocharset=utf8.)

So if files had been created on a 2.4 kernel, where CONFIG_NLS_DEFAULT was something other than iso8859-1, or if files were created when the partition was mounted with the iocharset flag, you may encounter the problems you describe. The problem can also be seen between 2.4 kernels when the iocharset differs.

Posted by josuah at September 29, 2005 7:08 AM UTC+00:00

Trackback Pings

TrackBack URL for this entry:
http://www.wesman.net/cgi-bin/mt/mt-tb.cgi/354

Comments

Post a comment

July 2013
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Search