Talk:FASTQ format

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computational Biology (Rated B-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Computational Biology, a collaborative effort to improve the coverage of Computational Biology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 

Technically fastq format is multi-lined, but the use of it in short-read sequencing obviously disguises this issue.

Hence sequences may be line-wrapped, and quality values too. Given that @ is a legal quality value and it may occur just after a newline in a line-wrapped quality string, care must be taken when parsing it. The ideal solution here is simply to count the number of bases in the sequence lines and then parse with the expectation of the same number of bases in the quality lines. (If after this there isn't a new sequence header immediately starting after the quality then the format is in error.)

Unfortunately many people have implemented broken parsers and so you'll sometimes see ghastly messes where the first quality value on each line has been changed to zero (ascii '!'). This is just a bug!

193.62.203.214 (talk) 15:36, 16 April 2009 (UTC) jkb

The Celera Assembler implements yet another quality format based on this theme...[edit]

The input for the Celera Assembler is a 'frg' file [1]

Apparently they take the (presumably Phred style) quality score and add 48 before converting to ascii for storage in the frg file. i.e. "chr(ord(0)+$qual)".

--Dan|(talk) 15:27, 30 July 2009 (UTC)

The AMOS .afg format uses the same encoding[edit]

IonTorrent quality range[edit]

I've seen some IonTorrent quality values and they seem have different range from sanger or illumina. However I don't have access to such machine or output so can't be sure. Can anyone with the machine confirm and put the range up? — Preceding unsigned comment added by Hena wp (talkcontribs) 18:25, 30 April 2013 (UTC)

Would adding color to the FASTQ versions test make it clearer?[edit]

  SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
  ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................
  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
  .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
  LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  |                         |    |        |                              |                     |
 33                        59   64       73                            104                   126
  0........................26...31.......40                                
                           -5....0........9.............................40 
                                 0........9.............................40 
                                    3.....9.............................40 
  0........................26...31........41                               

 S - Sanger        Phred+33,  raw reads typically (0, 40)
 X - Solexa        Solexa+64, raw reads typically (-5, 40)
 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
    with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) 
    (Note: See discussion above).
 L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)

Colors picked at random, and I don't absolutely guarantee that the alignment is correct. And there appears to be a problem with the J alignment in the original figure.

Tnabtaf (talk) 02:17, 22 October 2012 (UTC)

Got no comments; posting to page.

Tnabtaf (talk) 05:59, 22 January 2013 (UTC)

The Sanger FASTQ format has no limit on the range - it goes all the way up to ~ (93). After all there is no limit on either the Phred or Solexa quality scale. The same is probably true of the Solexa/Illumina<1.8 versions too, albeit that the sequencing machines never gave a value above X because it could never been *that* confident. It is unlikely that X is 40 for all of these tools. Moreover, it's incorrect to say that the FORMAT doesn't support values larger than 40, just because the tools that produced them do not. — Preceding unsigned comment added by 2A02:8071:B1C0:C01:84E:7023:C079:6527 (talk) 17:45, 11 December 2016 (UTC)

Sequence letter definitions?[edit]

I'm writing a fastq parser for Illumina exome data, and I found this article very useful! Thanks for writing it. The only data I see missing from this article that would aid me in completing the parser is sequence letter definitions. I see ACTG throughout the Illumina data, which makes sense, but I don't know what 'N' stands for. I'll figure it out, but it would be cool if sequence letters were documented here.WaywardGeek (talk) 12:00, 5 August 2013 (UTC)

External links modified[edit]

Hello fellow Wikipedians,

I have just added archive links to one external link on FASTQ format. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

You may set the |checked=, on this template, to true or failed to let other editors know you reviewed the change. If you find any errors, please use the tools below to fix them or call an editor by setting |needhelp= to your help request.

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

If you are unable to use these tools, you may set |needhelp=<your help request> on this template to request help from an experienced user. Please include details about your problem, to help other editors.

Cheers.—cyberbot IITalk to my owner:Online 19:21, 27 January 2016 (UTC)