Epub 1050; 1086; 1085; 1074; 1077; 1088; 1090; 1077; 1088; For Mac
Modified pages/atomiccommit.infrom [e2dbe7a0d2]to [906d4edd42].123456789101112131415Atomic Commit In SQLitehd_keywords atomic commit *Atomic CommitAtomic Commit In SQLite1.0 IntroductionAn important feature of transactional databases like SQLiteis "atomic commit". Atomic commit means that either all database changes within a single transaction occur or none of them occur. With atomic commit, itis as if many different writes to different sections of the databasefile occur instantaneously and simultaneously.156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192the purpose of detecting corruption or I/O errors.SQLite assumes that the data it reads is exactly the same data that it previously wrote.By default, SQLite assumes that an operating system call to writea range of bytes will not damage or alter any bytes outside of that rangeeven if a power loss or OS crash occurs during that write. Wecall this the "[PSOW powersafe overwrite]" property. Prior to [version 3.7.9] ([dateof:3.7.9]),SQLite did not assume powersafe overwrite. But with the standardsector size increasing from 512 to 4096 bytes on most disk drives, ithas become necessary to assume powersafe overwrite in order to maintainhistorical performance levels and so powersafe overwrite is assumed bydefault in recent versions of SQLite. The assumption of powersafe overwrite property can be disabled at compile-time or a run-time ifdesired. See the [PSOW powersafe overwrite documentation] for furtherdetails. Single File CommitWe begin with an overview of the steps SQLite takes in order toperform an atomic commit of a transaction against a single databasefile. The details of file formats used to guard against damage frompower failures and techniques for performing an atomic commit acrossmultiple databases are discussed in later sections.hd_fragment initstate Initial StateThe state of the computer when a database connection isfirst opened is shown conceptually by the diagram at theright.The area of the diagram on the extreme right (labeled "Disk") represents202203204205206207208209210211212213214215216the process that is using SQLite. The database connection hasjust been opened and no information has been read yet, so theuser space is empty.hd_fragment rdlck3.2 Acquiring A Read LockBefore SQLite can write to a database, it must first readthe database to see what is there already. Even if it is justappending new data, SQLite still has to read in the databaseschema from the sqlite_master table so that it can know200201202203204205206207208209210211212213214the process that is using SQLite. The database connection hasjust been opened and no information has been read yet, so theuser space is empty.hd_fragment rdlck Acquiring A Read LockBefore SQLite can write to a database, it must first readthe database to see what is there already. Even if it is justappending new data, SQLite still has to read in the databaseschema from the sqlite_master table so that it can know237238239240241242243244245246247248249250251operating system crashes or if there is a power loss. Itis usually also the case that the lock will vanish if theprocess that created the lock exits.3.3 Reading Information Out Of The DatabaseAfter the shared lock is acquired, we can begin readinginformation from the database file. In this scenario, weare assuming a cold cache, so information must first beread from mass storage into the operating system cache then235236237238239240241242243244245246247248249operating system crashes or if there is a power loss. Itis usually also the case that the lock will vanish if theprocess that created the lock exits. Reading Information Out Of The DatabaseAfter the shared lock is acquired, we can begin readinginformation from the database file. In this scenario, weare assuming a cold cache, so information must first beread from mass storage into the operating system cache then259260261262263264265266267268269270271272273pages out of eight being read. In a typical application, adatabase will have thousands of pages and a query will normallyonly touch a small percentage of those pages.hd_fragment rsvdlock3.4 Obtaining A Reserved LockBefore making changes to the database, SQLite firstobtains a "reserved" lock on the database file. A reservedlock is similar to a shared lock in that both a reserved lockand shared lock allow other processes to read from the database257258259260261262263264265266267268269270271pages out of eight being read. In a typical application, adatabase will have thousands of pages and a query will normallyonly touch a small percentage of those pages.hd_fragment rsvdlock Obtaining A Reserved LockBefore making changes to the database, SQLite firstobtains a "reserved" lock on the database file. A reservedlock is similar to a shared lock in that both a reserved lockand shared lock allow other processes to read from the database283284285286287288289290291292293294295296297And because the modifications have not yet started, otherprocesses can continue to read from the database. However,no other process should also begin trying to write to thedatabase.3.5 Creating A Rollback Journal FilePrior to making any changes to the database file, SQLite firstcreates a separate rollback journal file and writes into the rollback journal the originalcontent of the database pages that are to be altered.The idea behind the rollback journal is that it contains281282283284285286287288289290291292293294295And because the modifications have not yet started, otherprocesses can continue to read from the database. However,no other process should also begin trying to write to thedatabase. Creating A Rollback Journal FilePrior to making any changes to the database file, SQLite firstcreates a separate rollback journal file and writes into the rollback journal the originalcontent of the database pages that are to be altered.The idea behind the rollback journal is that it contains315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344 is possible when doing real disk I/O. We illustrate this idea in the diagram to the right by showing that the new rollback journal appears in the operating system disk cache only and not on the disk itself.3.6 Changing Database Pages In User SpaceAfter the original page content has been saved in the rollbackjournal, the pages can be modified in user memory. Each databaseconnection has its own private copy of user space, so the changesthat are made in user space are only visible to the database connectionthat is making the changes. Other database connections still seethe information in operating system disk cache buffers which havenot yet been changed. And so even though one process is busymodifying the database, other processes can continue to read theirown copies of the original database content.3.7 Flushing The Rollback Journal File To Mass StorageThe next step is to flush the content of the rollback journalfile to nonvolatile storage.As we will see later, this is a critical step in insuring that the database can survivean unexpected power loss.313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342 is possible when doing real disk I/O. We illustrate this idea in the diagram to the right by showing that the new rollback journal appears in the operating system disk cache only and not on the disk itself. Changing Database Pages In User SpaceAfter the original page content has been saved in the rollbackjournal, the pages can be modified in user memory. Each databaseconnection has its own private copy of user space, so the changesthat are made in user space are only visible to the database connectionthat is making the changes. Other database connections still seethe information in operating system disk cache buffers which havenot yet been changed. And so even though one process is busymodifying the database, other processes can continue to read theirown copies of the original database content. Flushing The Rollback Journal File To Mass StorageThe next step is to flush the content of the rollback journalfile to nonvolatile storage.As we will see later, this is a critical step in insuring that the database can survivean unexpected power loss.352353354355356357358359360361362363364365366rollback journal is modified to show the number of pages in the rollback journal. Then the header is flushed to disk. The detailson why we do this header modification and extra flush are providedin a later section of this paper.3.8 Obtaining An Exclusive LockPrior to making changes to the database file itself, we mustobtain an exclusive lock on the database file. Obtaining anexclusive lock is really a two-step process. First SQLite obtainsa "pending" lock. Then it escalates the pending lock to anexclusive lock.350351352353354355356357358359360361362363364rollback journal is modified to show the number of pages in the rollback journal. Then the header is flushed to disk. The detailson why we do this header modification and extra flush are providedin a later section of this paper. Obtaining An Exclusive LockPrior to making changes to the database file itself, we mustobtain an exclusive lock on the database file. Obtaining anexclusive lock is really a two-step process. First SQLite obtainsa "pending" lock. Then it escalates the pending lock to anexclusive lock.382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421that cycle by allowing existing shared locks to proceed butblocking new shared locks from being established. Eventuallyall shared locks will clear and the pending lock will then beable to escalate into an exclusive lock.3.9 Writing Changes To The Database FileOnce an exclusive lock is held, we know that no otherprocesses are reading from the database file and it issafe to write changes into the database file. Usuallythose changes only go as far as the operating systems diskcache and do not make it all the way to mass storage.3.10 Flushing Changes To Mass StorageAnother flush must occur to make sure that all thedatabase changes are written into nonvolatile storage.This is a critical step to ensure that the database willsurvive a power loss without damage. However, becauseof the inherent slowness of writing to disk or flash memory, this step together with the rollback journal file flush in section3.7 above takes up most of the time required to complete atransaction commit in SQLite.3.11 Deleting The Rollback JournalAfter the database changes are all safely on the massstorage device, the rollback journal file is deleted.This is the instant where the transaction commits.If a power failure or system crash occurs prior to thispoint, then recovery processes to be described later make380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419that cycle by allowing existing shared locks to proceed butblocking new shared locks from being established. Eventuallyall shared locks will clear and the pending lock will then beable to escalate into an exclusive lock. Writing Changes To The Database FileOnce an exclusive lock is held, we know that no otherprocesses are reading from the database file and it issafe to write changes into the database file. Usuallythose changes only go as far as the operating systems diskcache and do not make it all the way to mass storage.0 Flushing Changes To Mass StorageAnother flush must occur to make sure that all thedatabase changes are written into nonvolatile storage.This is a critical step to ensure that the database willsurvive a power loss without damage. However, becauseof the inherent slowness of writing to disk or flash memory, this step together with the rollback journal file flush in section3.7 above takes up most of the time required to complete atransaction commit in SQLite.1 Deleting The Rollback JournalAfter the database changes are all safely on the massstorage device, the rollback journal file is deleted.This is the instant where the transaction commits.If a power failure or system crash occurs prior to thispoint, then recovery processes to be described later make457458459460461462463464465466467468469470471part of the header is malformed the journal will not roll back.Hence, one can say that the commit occurs as soon as the headeris sufficiently changed to make it invalid. Typically this happensas soon as the first byte of the header is zeroed.3.12 Releasing The LockThe last step in the commit process is to release theexclusive lock so that other processes can once againstart accessing the database file.In the diagram at the right, we show that the information455456457458459460461462463464465466467468469part of the header is malformed the journal will not roll back.Hence, one can say that the commit occurs as soon as the headeris sufficiently changed to make it invalid. Typically this happensas soon as the first byte of the header is zeroed.2 Releasing The LockThe last step in the commit process is to release theexclusive lock so that other processes can once againstart accessing the database file.In the diagram at the right, we show that the information485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531database by checking that counter. If the database was modified,then the user space cache must be cleared and reread. But it iscommonly the case that no changes have been made and the userspace cache can be reused for a significant performance savings.hd_fragment rollback4.0 RollbackAn atomic commit is supposed to happen instantaneously. But the processingdescribed above clearly takes a finite amount of time.Suppose the power to the computer were cutpart way through the commit operation described above. In orderto maintain the illusion that the changes were instantaneous, wehave to "rollback" any partial changes and restore the database tothe state it was in prior to the beginning of the transaction.hd_fragment crisis4.1 When Something Goes Wrong...Suppose the power loss occurredduring step 3.10 above,while the database changes were being written to disk.After power is restored, the situation might be somethinglike what is shown to the right. We were trying to changethree pages of the database file but only one page wassuccessfully written. Another page was partially writtenand a third page was not written at all.The rollback journal is complete and intact on disk whenthe power is restored. This is a key point. The reason forthe flush operation in step 3.7is to make absolutely sure thatall of the rollback journal is safely on nonvolatile storageprior to making any changes to the database file itself.4.2 Hot Rollback JournalsThe first time that any SQLite process attempts to accessthe database file, it obtains a shared lock as described insection 3.2 above.But then it notices that there is a rollback journal file present. SQLite then checks to see if483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529database by checking that counter. If the database was modified,then the user space cache must be cleared and reread. But it iscommonly the case that no changes have been made and the userspace cache can be reused for a significant performance savings.hd_fragment rollback RollbackAn atomic commit is supposed to happen instantaneously. But the processingdescribed above clearly takes a finite amount of time.Suppose the power to the computer were cutpart way through the commit operation described above. In orderto maintain the illusion that the changes were instantaneous, wehave to "rollback" any partial changes and restore the database tothe state it was in prior to the beginning of the transaction.hd_fragment crisis When Something Goes Wrong...Suppose the power loss occurredduring step 3.10 above,while the database changes were being written to disk.After power is restored, the situation might be somethinglike what is shown to the right. We were trying to changethree pages of the database file but only one page wassuccessfully written. Another page was partially writtenand a third page was not written at all.The rollback journal is complete and intact on disk whenthe power is restored. This is a key point. The reason forthe flush operation in step 3.7is to make absolutel