Jump to content

Target (project): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Line 79: Line 79:


[[File:DuboisManuscript-PeterMaas-Naturalis.jpg|thumb|right| The Monk system for handwritten text recognition is hosted on the Target facilities. Image credit: Photographed by [[Peter Maas]], manuscript written by Marie Eugène François Thomas Dubois]]
[[File:DuboisManuscript-PeterMaas-Naturalis.jpg|thumb|right| The Monk system for handwritten text recognition is hosted on the Target facilities. Image credit: Photographed by [[Peter Maas]], manuscript written by Marie Eugène François Thomas Dubois]]
Monk is a system, developed by prof. Schomaker and his group at the Artificial Intelligence Institute (ALICE) at the [[University of Groningen]]. It utilizes sophisticated algorithms for handwritten text recognition in a variety of existing archives.<ref name=zant/><ref>{{cite journal|last=van der Zant|first=T|coauthors=Schomaker, L.R.B., Valentijn, E., B.A. Yanikoglu and K. Berkner|title=Large-scale parallel document-image processing|journal=Proceedings of Document Recognition and Retrieval XV, IS&T/SPIE International Symposium on Electronic Imaging|date=January 28, 2008|page=68150N–68150N|doi=10.1117/12.765482|url=http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=812190|series=Document Recognition and Retrieval XV|editor1-last=Yanikoglu|editor1-first=Berrin A|editor2-last=Berkner|editor2-first=Kathrin|volume=6815}}</ref><ref>{{cite journal|last=Schomaker|first=L.R.B.|coauthors=B.A. Yanikoglu and K. Berkner|title=Word mining in a sparsely-labeled handwritten collection|journal=Proceedings of Document Recognition and Retrieval XV, IS&T/SPIE International Symposium on Electronic Imaging|date=January 28, 2008|pages=6815–6823}}{{cite journal|last=van der Zant|first=T|coauthors=Schomaker, L.R.B., Haak, K.|title=Handwritten-word spotting using biologically inspired features|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=30(11)|pages=pp. 1945-1957.}}</ref> Currently a number of books from the Dutch National Archives as well as more than 70 international historical collections, ranging from Western, medieval to handwritten Chinese manuscripts have been ingested into Monk. The systems applies continuous ('24/7') machine learning over internet, yielding fundamental results{{cite journal|last=van Oosten|first=J.-P.|coauthors=Schomaker, L.R.B.| Separability versus Prototypicality in Handwritten Word-Image Retrieval.|Pattern Recognition|volume=47(3)|pp. 1031-1038|doi=10.1016/j.patcog.2013.09.006}}. MONK is also a scientific user of the Target infrastructure.
Monk is a system, developed by prof. Schomaker and his group at the Artificial Intelligence Institute (ALICE) at the [[University of Groningen]]. It utilizes sophisticated algorithms for handwritten text recognition in a variety of existing archives.<ref name=zant/><ref>{{cite journal|last=van der Zant|first=T|coauthors=Schomaker, L.R.B., Valentijn, E., B.A. Yanikoglu and K. Berkner|title=Large-scale parallel document-image processing|journal=Proceedings of Document Recognition and Retrieval XV, IS&T/SPIE International Symposium on Electronic Imaging|date=January 28, 2008|page=68150N–68150N|doi=10.1117/12.765482|url=http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=812190|series=Document Recognition and Retrieval XV|editor1-last=Yanikoglu|editor1-first=Berrin A|editor2-last=Berkner|editor2-first=Kathrin|volume=6815}}</ref><ref>{{cite journal|last=Schomaker|first=L.R.B.|coauthors=B.A. Yanikoglu and K. Berkner|title=Word mining in a sparsely-labeled handwritten collection|journal=Proceedings of Document Recognition and Retrieval XV, IS&T/SPIE International Symposium on Electronic Imaging|date=January 28, 2008|pages=6815–6823}}{{cite journal|last=van der Zant|first=T|coauthors=Schomaker, L.R.B., Haak, K.|title=Handwritten-word spotting using biologically inspired features|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=30(11)|pages=pp. 1945-1957.}}</ref> Currently a number of books from the Dutch National Archives as well as more than 70 international historical collections, ranging from Western, medieval to handwritten Chinese manuscripts have been ingested into Monk. The systems applies continuous ('24/7') machine learning over internet, yielding fundamental results<ref name=oosten/><ref>{{cite journal|last=van Oosten|first=J.-P.|coauthors=Schomaker, L.R.B.| Separability versus Prototypicality in Handwritten Word-Image Retrieval.|Pattern Recognition|volume=47(3)|pp. 1031-1038|doi=10.1016/j.patcog.2013.09.006}}</ref>. MONK is also a scientific user of the Target infrastructure.


=== LifeLines ===
=== LifeLines ===

Revision as of 16:52, 11 November 2014

Target
Mission statementDevelopment of Big Data information systems
LocationUniversity of Groningen, Netherlands
EstablishedJanuary 2009
FundingFunded by the European Fund for Regional Development & partners
Websitewww.rug.nl/target

Target [1] is an public-private project initiated in 2009 in the area of large-scale data management and information systems. It is run by a consortium of ten academic and IT industry partners coordinated by the University of Groningen, the Netherlands. Target conducts research and development with a focus on the design of intelligent information systems that can efficiently process data and extract information from extremely large and structurally diverse datasets. The Target data center in Groningen does research on:

  • data management of (inter)national science projects in the area of astronomy, life sciences, artificial intelligence, medical diagnosis and more
  • initiation of market-driven R&D activities in collaboration with IT businesses that can lead to competitive Big Data solutions and applications.

The computer center is hosted by the Donald Smits Center for Information Technology at the University of Groningen, and consist of more than 10 Petabytes of GPFS-based storage, high-performance supercomputing cluster [2] and a GRID cluster, which is a part of the Big Grid and the European Grid Infrastructure.

History

The project was initiated by the insight that the expertise of astronomers in massive data processing would be applicable to other areas of science. The core philosophy of Target builds on the scalable distributed environment called Astro-WISE,[3] which is based on an approach to data management that considerably differs from the classical techniques of predefined processing pipelines.[4] This new approach is based on the principles of linking data processing, analysis and storage into an intelligent information system that enables a continuous and controlled integration of improved data management techniques.[5] The Target project launched in 2009 after receiving 32 million euros[6] of funding for a period of five years from the European Fund for Regional Development, the Dutch Ministry of Economic Affairs (Pieken in de Delta), and the provinces of Groningen and Drenthe. The project runs under the auspices of the Northern Netherlands Provinces Alliance (SNN) and the Groningen municipality.

Technological findings

At the start of the project the goal was to develop a single large file system at the multi-petabyte scale. During the first years it became apparent that the requirements for the different e-Science disciplines are different. In some areas, a massive data streaming effort takes place, as in Lofar. In astronomy, the number of data objects may run in the billions, with a limited number of data columns. In genomics, the number of rows is small, but the number of columns can be huge, in the hundreds of thousands. Other areas, such as visual text retrieval in the Monk search engine for historical manuscripts are at an intermediate position with hundreds of millions of rows and thousands of dimensions. Furthermore, genomics applications often require stringent access control, whereas other disciplines have no privacy issues. At the level of computation, the requirements are also diverse, ranging from Grid computing, general Linux compute clusters and dedicated compute servers with very large memory. During the project, it was found out that it is important to avoid common bottlenecks, at the same time making sure that the advantages of a scaleable environment are exploited. Therefore, a functional partitioning of the storage pool was realized for the e-Science partners. The project has shown that a Posix file-system environment can be served with, e.g., one billion of inodes for a single mount point. The diversity of I/O patterns also inspired the development of dynamic multimodal storage schemes ranging from memory-based ram disks, SSD disks, fast SCSI disks, slow commodity-level hard disks and tape-robot storage.

Target Partners

The Target data center is hosted by the Donald Smits Center for Information Technology located at the University of Groningen, The Netherlands

Target is a consortium of ten partners coming from academic/research institutions, emerging local businesses and well established global IT industries.

  • OmegaCEN Research Group – A part of the Kapteyn Astronomical Institute at the University of Groningen, OmegaCEN group is headed by Prof. Edwin A. Valentijn. The group conducts research in the field of wide-field astronomical imaging and astronomical information technology. Prof. Edwin A. Valentijn is also the founder and coordinator of Target.
  • Donald Smits Center for Information Technology (CIT) – CIT is one of the Dutch academic ICT centers for High Performance Computing. CIT hosts and manages the integrated ICT infrastructure of Target.
  • University Medical Center Groningen (UMCG) –The UMCG is one of the largest hospitals in the Netherlands. IT researchers from the UMCG work with Target on the development of a flexible data management system for Lifelines – a large initiative in the north of the Netherlands aiming at collecting data for 165000 people over the course of 30 years in order to identify and analyze aging patterns.[7]
  • Artificial Intelligence Institute (ALICE) - ALICE is located at the University of Groningen and it is involved in research activities in the area of artificial intelligence and cognitive engineering. Target collaborates with the institute in the research of (i) handwritten text recognition, (ii) scalability of large data files, and (iii) analyzing unstructured information. The Monk system used for handwritten text recognition is hosted by the Target data center.[8]
  • ASTRON - Target and ASTRON work together to develop the LOFAR long-term archive.[9]
  • IBM - The multinational company provides most of the hardware infrastructure for the Target center and its experts are involved in the development of robust and reliable architectural design that can meet the often disparate requirements of all Target users.
  • Oracle - Oracle participates in joint research with Target on very large distributed databases and scalable access to tables with billions of entries.
  • Nspyre - Nspyre is one of the IT business partners in the Target project that specializes in innovative software solutions for high-tech automation of large-scale industrial services.
  • Elkoog/Heeii - Elkoog/Heeii is another IT business partner in the Target project. It is a fast-growing company, based in Groningen that offers innovative internet search services and recommendation engines for web browsers.
  • Target Holding - Target Holding stimulates and guides knowledge transfer from the Target expertise center to interested commercial parties.

Areas of R&D

Target conducts multidisciplinary R&D in the following areas

  • scalable distributed data storage
  • information system workflows
  • massive data production and quality control
  • scalable distributed database systems
  • data visualization.

Projects

Target participates in a number of data-intensive scientific projects in astronomy, handwritten text recognition algorithms, medical research on healthy aging, development of diagnostic tools for Parkinson’s disease and more.

LOFAR Long-term Archive

Target has developed and maintains the LOFAR Long-term Archive. The telescope will generate Petabytes of data that will be stored at the Target data center in Groningen and several other data centers in Europe. Image Credit: ASTRON

Much of the data from the LOFAR telescope is stored, accessed from and archived on the LOFAR Long-Term Archive, designed by ASTRON and Target.[9] The data will be hosted at the Target data center and several other European centers.

Monk

The Monk system for handwritten text recognition is hosted on the Target facilities. Image credit: Photographed by Peter Maas, manuscript written by Marie Eugène François Thomas Dubois

Monk is a system, developed by prof. Schomaker and his group at the Artificial Intelligence Institute (ALICE) at the University of Groningen. It utilizes sophisticated algorithms for handwritten text recognition in a variety of existing archives.[8][10][11] Currently a number of books from the Dutch National Archives as well as more than 70 international historical collections, ranging from Western, medieval to handwritten Chinese manuscripts have been ingested into Monk. The systems applies continuous ('24/7') machine learning over internet, yielding fundamental results[12][13]. MONK is also a scientific user of the Target infrastructure.

LifeLines

LifeLines is a long-term medical research project run by the University Medical Center Groningen (UMCG). An array of genotype and phenotype data will be gathered from 165000 people once every five years for a total period of thirty years. The accumulated data will be used by researchers and medical specialists to gain insights into the processes related to aging and understand why age-related health degradation varies so widely.[7] Target provides LifeLines with the infrastructure for data storage, access and processing. Data from Lifelines, as well as the SURFsara and Target infrastructure were used in the Genome of the Netherlands project, run by a consortium of the UMCG, LUMC, Erasmus MC, UMCU, Free University of Amsterdam. Results from the project using whole-genome sequencing to deduce population structure and demographic history of the Dutch population were published in June in the Nature Genetics journal.[14], [15]

GLIMPS

Run by Dr. K Leenders, a professor of neurology at the UMCG, GLIMPS is a research project set to find faster and more reliable diagnostic tools for Parkinson’s disease.[16] GLIMPS explores the possibilities of using complex image-based algorithms and PET scans for early detection of Parkinson’s.[17] To test the effectiveness of such algorithms, GLIMPS is building a large database of PET scans delivered by numerous hospitals in the Netherlands. This database will be used to improve and refine the software algorithms and compare their output with currently existing clinical diagnosis. Target is responsible for building and maintaining the GLIMPS database as well as ensuring the smooth running of the image-based algorithms on its computing facilities.

Others

Additionally, Target is involved in the data management for other astronomical projects such as KiDs/VIKING astronomical survey, ESO’s MUSE[18] instrument (mounted on the Very Large Telescope) and MICADO (to be mounted on the E-ELT). In addition the datacentric approach to data management prompted by Target has been adopted by the ESA’s Euclid mission[19]. Target Holding also manages a number of commercial projects that utilize the expertise or ICT infrastructure of Target. These projects are usually a result of collaborations with emerging and/or well established public and private businesses in the North of the Netherlands.

References

  1. ^ "Nederlands project kan 1,5 petabytes verwerken en opslaan" [Netherlands Project can 1,5 petabytes process and save]. nu.nl (in Dutch). 2 September 2010.
  2. ^ Witold, Kepinski (19 November 2010). "Gronings ICT-project klaar voor petabytes data" [Groningen ICT project ready for petabytes of data]. Computable (in Dutch).
  3. ^ Begeman, Kor (January 2013). "The Astro-WISE data centric information system". Experimental Astronomy. 35 (1–2): 1. doi:10.1007/s10686-012-9311-4. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  4. ^ Verdoes Kleijn, Gijs (January 2013). "The data zoo in Astro-WISE". Experimental Astronomy. 35 (1–2): 187. doi:10.1007/s10686-012-9314-1. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  5. ^ Mwebaze, Johnson (2012). Extreme Data Lineage in Ad-hoc Astronomical Data Processing. University of Groningen: PhD Dissertation. ISBN 9789036757591.
  6. ^ Edelman, Peter (20 July 2009). "Miljoenensubsidie voor Noord Nederland Dataminingprogram" [Millioans of subsidy for North Netherlands Datamining Programme]. Bits and Chips (in Dutch). p. 45.
  7. ^ a b Stolk, Ronald P; Rosmalen JG; Postma DS; de Boer RA; Navis G; Slaets JP; Ormel J; Wolffenbuttel BH (January 2008). "Universal risk factors for multifactorial diseases: LifeLines: a three-generation population-based study". European Journal of Epidemiology. 23 (1): 67–74. doi:10.1007/s10654-007-9204-4. PMID 18075776.
  8. ^ a b van der Zant, T (2009). "Where are the Search Engines for Handwritten Documents?". Interdisciplinary Science Reviews. 34 (2–3): 224–235. doi:10.1179/174327909X441126. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  9. ^ a b Belikov, A (2011). "Target for LOFAR Long Term Archive: Architecture and Implementation". Proc. of ADASS XXI, ASP Conf. Series. arXiv:1111.6443. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  10. ^ van der Zant, T (January 28, 2008). Yanikoglu, Berrin A; Berkner, Kathrin (eds.). "Large-scale parallel document-image processing". Proceedings of Document Recognition and Retrieval XV, IS&T/SPIE International Symposium on Electronic Imaging. Document Recognition and Retrieval XV. 6815: 68150N–68150N. doi:10.1117/12.765482. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  11. ^ Schomaker, L.R.B. (January 28, 2008). "Word mining in a sparsely-labeled handwritten collection". Proceedings of Document Recognition and Retrieval XV, IS&T/SPIE International Symposium on Electronic Imaging: 6815–6823. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)van der Zant, T. "Handwritten-word spotting using biologically inspired features". IEEE Transactions on Pattern Analysis and Machine Intelligence. 30(11): pp. 1945-1957. {{cite journal}}: |pages= has extra text (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
  12. ^ Cite error: The named reference oosten was invoked but never defined (see the help page).
  13. ^ van Oosten, J.-P. 47(3). doi:10.1016/j.patcog.2013.09.006. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help); Text "Pattern Recognition" ignored (help); Text "Separability versus Prototypicality in Handwritten Word-Image Retrieval." ignored (help); Text "pp. 1031-1038" ignored (help)
  14. ^ Francioli, Laurent; Menelaou, Androniki; et al. (29 June 2014). "Whole-genome sequence variation, population structure and demographic history of the Dutch population". Nature Genetics. 46: 818–825. doi:10.1038/ng.3021. {{cite journal}}: Explicit use of et al. in: |last3= (help)
  15. ^ van Wijngaarden, Arend (June 30, 2014). "Genoom Nederlandse volk ontrafeld". Dagblad van het Noorden. {{cite news}}: |access-date= requires |url= (help); External link in |ref= (help)
  16. ^ Teune, Laura Klaaske (2013). Glucose metabolic patterns in neurodegenerative brain diseases. PhD Dissertation.
  17. ^ Teune, Laura (2013). FDG- PET Imaging in Neurodegenerative Brain Diseases, chapter 22 of the book "Functional Brain Mapping and the Endeavor to Understand the Working Brain". InTech. ISBN 978-953-51-1160-3.
  18. ^ Weilbacher, Peter (September 2012). "Design and capabilities of the MUSE data reduction software and pipeline". Proc. SPIE 8451. 4581. doi:10.1117/12.925114. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  19. ^ Pasian, Fabio (September 2012). "Science ground segment for the ESA Euclid Mission". Proc. SPIE 8451. 4581. doi:10.1117/12.926026. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)