View Ticket
Not logged in
Ticket UUID: 93166ec5cc5b429090137e6d65a8fc9f3cf31118
Title: disk I/O error on commit (AIX)
Status: Closed Type: Code_Defect
Severity: Severe Priority:
Subsystem: Resolution: Not_A_Bug
Last Modified: 2010-02-25 12:39:38
Version Found In: 0f27a59808
Description & Comments:
I have been getting a 'disk I/O error' when performing commits on an AIX system. The error prevents the autosync-push from taking place. A manual push works just fine.

After a fair bit of tracking down, I found it is caused by the fsync(fd) call in unixDelete().

DBA0035:/home/dba0035/code/fossil-scm.org >fsl commit -m "where is>
New_Version: ba79ed2621c11aad28ed30d6d3de9f922a5c445c
dt: unixDelete(1) rc:0 zPath:/home/dba0035/code/fossil-scm.org/_FOSSIL_-mj544AAF
2B fd:7
dt: unixDelete(2) rc:1290 errno:9
fossil: disk I/O error
COMMIT

If you have recently updated your fossil executable, you might
need to run "fossil all rebuild" to bring the repository
schemas up to date.
DBA0035:/home/dba0035/code/fossil-scm.org >

Tracked using the following patch

--- src/sqlite3.c
+++ src/sqlite3.c
@@ -25516,20 +25516,18 @@
   unlink(zPath);
 #ifndef SQLITE_DISABLE_DIRSYNC
   if( dirSync ){
     int fd;
     rc = openDirectory(zPath, &fd);
+printf("dt: unixDelete(1) rc:%d zPath:%s fd:%d\n", rc, zPath, fd);
     if( rc==SQLITE_OK ){
 #if OS_VXWORKS
       if( fsync(fd)==-1 )
 #else
       if( fsync(fd) )
 #endif
       {
         rc = SQLITE_IOERR_DIR_FSYNC;
+printf("dt: unixDelete(2) rc:%d errno:%d\n", rc, errno);
       }
       if( close(fd)&&!rc ){
         rc = SQLITE_IOERR_DIR_CLOSE;
       }
     }

The errno of 9 is

DBA0035:/usr/include >grep EBADF *
errno.h:#define EBADF   9       /* Bad file descriptor                  */

To get around this I have applied the following patch to bypass the dirSync processing... Not sure of the implications but nothing has broken yet!

--- sqlite3.c
+++ sqlite3.c
@@ -48918,13 +48918,12 @@
     }

     /* Delete the master journal file. This commits the transaction. After
     ** doing this the directory is synced again before any individual
     ** transaction files are deleted.
+    rc = sqlite3OsDelete(pVfs, zMaster, 1);
+    */
+    rc = sqlite3OsDelete(pVfs, zMaster, 0);
-    */
-    rc = sqlite3OsDelete(pVfs, zMaster, 1);
     sqlite3DbFree(db, zMaster);
     zMaster = 0;
     if( rc ){
       return rc;
     }

drh added on 2010-02-24 21:51:43:
Seems like an easier fix is to simply recompile with the -DSQLITE_DISABLE_DIRSYNC compile-time option.


anonymous added on 2010-02-25 08:40:04:
Sometimes you cant see the answer for looking... :)

Recompiled as suggested and autosync now working beautifully!!! Suddenly fossil comes alive.

Autosync:  http://nnn.nnn.nnn.nnn:8080/
                Bytes      Cards  Artifacts     Deltas
Send:             130          1          0          0
Received:        1196         26          0          0
Total network traffic: 315 bytes sent, 831 bytes received
New_Version: 4d9585118174ba091078318d13942715b662c86b
Autosync:  http://nnn.nnn.nnn.nnn:8080/
                Bytes      Cards  Artifacts     Deltas
Send:            2763         31          1          2
Received:        1334         29          0          0
Total network traffic: 1713 bytes sent, 903 bytes received