Talk:Three-phase commit protocol

The protocol presented on the page at present conforms to the Skeen article which actually differs slightly from the description given at [1]. Specifically, the state transition on the cohort from prepared to committed only happens when receiving a commit message from the coordinator in the original article. Was there a change to the protocol in the meantime?

Agreed, and it's definitely not a slight discrepancy; the description about the cohort states matches neither the diagram shown or the state diagram in the source material. I'd rather someone more knowledgeable about the subject matter commits a change though. --130.15.80.105 (talk) 16:07, 31 March 2009 (UTC)[reply]

i reformatted the protocol description at the bottom of the page to look similiar to two-phase_commit. hope nobody minds. gba 18:56, 4 March 2006 (UTC)[reply]

Atomicity reliability

Since this is the first time that I post a message to a wikipedia discussion, I won't edit the article myself. I suggest this to the original author. Please correct me if I'm wrong!

You could make the two-phase commit protocol non-blocking in the same way as with the three-phase: by introducing timeouts. The problem with both the blocking and the non-blocking variant is the same: you can never be sure of the atomicity.

Consider the [2]. If Cohort(i) sends an ACK message that gets lost because the link to Cohort breaks, the Cohort will timeout and commit the local transaction, while the Coordinator, not having received the ACK, will timeout and abort. Even if the link gets restored, you can't abort (rollback) later on the commited part on the Cohort.

The basic problem with most kinds of commit protocols is called the Two_Generals'_Problem. If you add more and more layers of acknowledgements (acknowledgements to acknowledgements), the system gets more reliable but never perfect. On the down side, the execution slows more and more.

Regards, Igorecan 13:14, 11 April 2006 (UTC)[reply]

Regarding your comments, Igorecan, I have to disagree. In the situation you mention, in 2PC commit with timeouts, this is how I believe it would go (according to my reading of Lampson93):

Cohort(i) sends an ACK message that gets lost. By the time any cohort can send an ACK, it has already been decided by the coordinator whether the transation is commiting or aborting. So this Cohort is ACKing a commit message in your scenario.
The Coordinator, knowing that this is a commited transaction, will timeout on Cohort(i)'s response, and will again send a COMMIT message to Cohort(i).
Cohort(i) upon receiving a COMMIT message for a transaction that has already been commited, will know its ACK message was lost, and will resend the ACK.
This process might repeat many times until both the COMMIT message and the ACK message were transmited.

Thanks, Nels Beckman 14:28, 7 September 2006 (UTC)[reply]

Hello again after a long time! First of all, I was speaking about the 3PC, as described by the state automata on the [3]. Notice the P1 (coordinator), and the Pi (cohort) states. For one the timeout leads to abort and for the other it leads to commit. Isn't that wrong? I didn't study your link, but this confuses me: "By the time any cohort can send an ACK, it has already been decided by the coordinator whether the transaction is committing or aborting" - if the coordinator has decided, then what is the need for further ACKs or NACKs? I believe the timeouts necessarily introduce a degree of uncertainty whether both will actually abort or commit.

Regards, Igorecan (talk) 00:20, 15 November 2008 (UTC)[reply]

Figure

The figure is nice, but is inconsistent with the text, in that it shows "participants" rather than "cohorts". —Preceding unsigned comment added by Yagibear (talk • contribs) 22:51, 31 October 2007 (UTC)[reply]

Ah, good point. I'll update it sometime within the next few days. If I forget, feel free to send me an email to remind me. --Tjohns ✎ 18:46, 11 March 2008 (UTC)[reply]

Fixed. Feel free to let me know if any other changes need to be made. Tjohns ✎ 06:49, 21 March 2008 (UTC)[reply]

- It appears that the figure is still inconsistent with the text. In particular, the figure seems to indicate that, for a cohort that has ACK'd a pre-commit but not received a do-commit, a timeout will cause a commit to take place. However the text says "In the prepared state, if the cohort receives an abort message from the coordinator, fails, or times out waiting for a commit, it aborts." 98.212.216.20 (talk) 18:38, 22 April 2008 (UTC)[reply]

modes of failure

I wanted to use this article as a brief introduction to the kinds of problems that must be considered in distributed consensus, but was disappointed by the brevity of the explanation of how this is an improvement over the two-phase commit. I think the discussion is fine as a definition for those already familiar with the domain, but needs a little more justification for pedagogical use. I will take a shot at this, and would welcome improvement from anyone.

MarkKampe (talk) 18:55, 13 March 2010 (UTC)[reply]