Talk:Universally unique identifier
|This is the talk page for discussing improvements to the Universally unique identifier article.
This is not a forum for general discussion of the article's subject.
|This article is of interest to the following WikiProjects:|
|The contents of the Globally unique identifier page were merged into Universally unique identifier on 16 January 2017. For the contribution history and old versions of the redirected page, please see ; for the discussion at that location, see its talk page.|
|This talk page is automatically archived by Lowercase sigmabot III. Any threads with no replies in 30 days may be automatically moved. Sections without timestamps are not archived.|
Number of atoms in the universe
I think that the comparison between the number of possible UUIDs and the (estimated) number of atoms in the universe is at best useless, at most misleading. They differ by 42 orders of magnitude! — Preceding unsigned comment added by 184.108.40.206 (talk) 02:33, 14 April 2015 (UTC)
- Agreed. The "as many ____ as atoms in the universe" analogy is overused, anyway. – voidxor 22:12, 16 January 2017 (UTC)
I am mostly liking the article now that it is merged with GUID. However, the current version of the article lacks two things which the previous version had:
- an explanation of why RFC 4122 recommends version 5 (SHA1) over version 3 (MD5) and counsels against either in security applications.
- discussion of the probability of collisions and duplication.
I'll come back to the first of these another time, but I'd like to bring up the second point now. The analysis of collision probability with UUIDs is generally done using the birthday problem. The previous version(s) of this article, going back years, mentioned this and provided examples. I understand why this was removed: there were no references supporting it.
Ironically, there are numerous references out there on the Internet which talk about the birthday problem and UUID's, and which do provide calculations of examples. They are almost all blogs and forums like Stack Overflow, which is a problem. More aggravating is that almost all of them got their information from one of the previous incarnations of the UUID article on Wikipedia! I can find legitimate references, such as academic journal articles, regarding the relationship between UUIDs and the birthday problem; but unfortunately they do not provide concrete examples.
voidxor, question for you: Assuming I had a valid reference which stated that the birthday problem was the correct way of analyzing UUID collisions, would it be allowed under Wikipedia policies for me to then plug the numbers into the birthday problem formula, so as to provide concrete examples, such as I did in the text which you didn't carry forward during the merge? Or would this be considered "original research"? It is just arithmetic, after all.
In short, I have a good-ish reference for the relationship between UUIDs and the birthday problem; but to make it concrete, I would have to do the calculations myself. Original research, or no? 220.127.116.11 (talk) 17:05, 18 January 2017 (UTC)
- Last time I looked at this (which was some years ago) there was no chance of an overlap (ignoring clock rollover, which is longer than the Mayan calendar). What there was instead was a rate limit in how fast they could be allocated.
- As my problem involved generating them (for database keys) in a way that the burst rate of their consumption could exceed the allocation rate, our fix for this was to install multiple NICs in the server, thus multiple MAC addresses and so a faster potential allocation rate. I dimly recall that MS SQL Server did this for us automatically, so was a much easier fix than any soft of buffering.
- For a good discussion of the virtues of UUIDs as database keys, particularly for databases with distributed record creation, then look at Kimball's Data Warehouse Toolkit ISBN 0471200247 Andy Dingley (talk) 17:23, 18 January 2017 (UTC)
For version 1 UUIDs using MAC addresses, there is no chance of duplication if they are "correctly" generated. Part of "correctly" means not exceeding a maximum average rate of generation of 16384 per 100 nanoseconds per MAC address or node id. With proper programming, this can be an average rate, sometimes exceeded, because there are techniques for "pocketing" unused ones for later when the rate temporarily goes above 16384/100 nanoseconds. However, the key words here are "correctly generated". There are a lot of ways to generate them incorrectly, such as having a network card which duplicates the MAC address on another network card (known to have happened). You can have bugs. And even if you generate your UUIDs perfectly, for "universal" uniqueness you are also depending on all the other guys to generate their UUIDs correctly too, and of course your perfection does not prevent the other guys from having problems. And one thing we have learned from the Internet is that if somebody can figure out how to exploit your trusting everybody-should-just-play-nice-and-follow-the-rules scheme to his advantage or even just for his amusement, it will be exploited. So even the version 1 UUIDs come down to probabilities. Version 2 is similar, but the maximum average rate of generation per node-domain-id is lower. Versions 1 and 2, using randomly-generated node ids, versions 3 and 5 (hash-based) and version 4 (random) do have a chance of collision, even when generated perfectly. It is "almost" a zero chance, and how close to zero it actually is can be determined using the birthday problem formula. 18.104.22.168 (talk) 17:45, 18 January 2017 (UTC)
- 1) One of you guys said you actually generate these extremely fast in your IT operation. Do you have a dedicated UUID server, or do you just call a function in your database software to spit out a new UUID?
- Note that if you do it inside the database program, it becomes impossible to overrun the UUID generator, which seems to be a big concern here.
- 2) Some standard said to use a hash of the generated UUID. That's a really great idea, but how could they possibly fail to notice that hashing each one would slow your generator by--what? 1,000 clocks per UUID?
- 3) Using the Mac address in the UUID is brilliant. It guarantees no collisions between computers. But within a pc, you just set almost half the bits in your preferably-random number to a constant.
- if you have to use a 16,384 counter to insure uniqueness, then you're close to being fu cked by future technology. Instead of incrementing a 16384 counter, increment the Mac @ field with each UUID. That field is invulnerable to the time stamp stall that happens on really fast processors.
- If you're worried about the incremented mac @ colliding with another mac @ in the same batch of NIC cards, just hash it before you start. --Verdana♥Bold 19:35, 13 March 2017 (UTC)
Hi, Verdana. Probably not kosher to be having this discussion on a Wikipedia Talk page, unless it can be related back to the article, but I'll add that if you are in a situation where version 1 or 2 UUIDs have to be generated so fast that you can't rely on the "uniquifying" clock sequence to keep them unique, and need to start incrementing the MAC address too, you are probably better off just using a version 4 random UUID, or one of the flavors (version 3 or 5) of hash-based UUID's. The MAC address is a 48-bit number, and even if you start at a random point, if you are generating UUIDs so fast that you use a large number of sequentially assigned MAC addresses, you are creating a larger and larger target for someone else using random node assignment to accidentally hit. Might as well throw in the towel, and make the whole UUID random. That way you get 122 bits of randomness (128 minus the 6 bits which tell you what kind of UUID it is). Personally, I don't see much point in version 1 or 2 with anything other than an actual MAC address from a manufacturer who can be relied upon to generate unique MAC addresses, and you are in a domain where you can be reasonably assured that other UUID generators aren't going to screw around (or you don't think you'll ever care.) If you don't have that, then one of the other UUID types is probably better for your use case. Person54 (talk) 22:06, 13 March 2017 (UTC)
The article says "anyone can set a UUID". But it also says it requires 128 bits. It doesn't elude to that software is required: WHICH IT IS.
The article doesn't say UUID can't be stored on any disks: byut only within the header for filesystems (which must be supported) on a partition (which must also be supported) of a disk.
Personally, I just tried to set UUID on a flash stick and I get blank <none>, but other software (an emulator) insists I must "use UUID" to conform with it's "booting methods"
"anyone can set a UUID" is far from true
MORESO, saying anything 128-bit (that isn't human readable and has no real standard except changing standard) is equivalent to "being a UUID" is absolutely assanine. By those standards any collected 128-bit from anyhwere are categorized under a new name. But the proper name for that is guess what? 128 bits. — Preceding unsigned comment added by 2600:8806:400:B090:4DD4:234D:C19F:DC85 (talk) 18:58, 20 June 2018 (UTC)