File:Processor families in TOP500 supercomputers.svg

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Full resolution(SVG file, nominally 1,800 × 1,203 pixels, file size: 108 KB)

Contents

[edit] Summary

Description
English: Area chart showing the representation of different families of microprocessors in the TOP500 supercomputer ranking list, from 1992-2009. (Based on the public data from http://www.top500.org/stats)
Français : Un diagramme qui représente la distribution des différentes familles de microprocesseurs dans les TOP500 supercalculateurs au cours des années 1992-2009. (Basée sur les données publiques disponibles à http://www.top500.org/stats)
Date 17:47, 26 August 2009 (UTC)
Source Own work, intending to create a vector version of File:Top500.procfamily.png
Author Moxfyre
Other versions

File:Top500.procfamily.png

[edit] Source code

This data has been generated with a pair of Python scripts. These require Numeric Python, Matplotlib, and the BeautifulSoup HTML parsing libraries.

[edit] top500stats.py

The first script, top500stats.py, is used to download data from http://www.top500.org/stats : it should be run like:

python top500stats.py procfam procfam.npz
# which downloads data from http://www.top500.org/*/stats and saves it in procfam.npz
#!/usr/bin/python
 
import BeautifulSoup
import urllib, re, sys
import numpy as np
from itertools import count
 
def extend_axis(arr, axis=-1, by=1, value=0):
    sh = list(arr.shape)
    sh[axis] = by
    extension = np.empty(sh, dtype=arr.dtype)
    if value is not None: extension.fill(value)
    return np.append(arr, extension, axis)
 
category, output = sys.argv[1:]
 
# stat_ijk = value of column k on date dates[i] for choice choices[k]
# thus: stats.shape = ( len(dates), len(choices), len(columns) )
stats = np.zeros((0,0,4), dtype=int) # n_dates X n_choices X 4-columns
dates = []
choices = []
columns = []
 
for ii in count(1):
    url = 'http://www.top500.org/stats/list/%d/%s' % (ii, category)
    print "Loading %s..." % url
    html = urllib.urlopen(url).read()
    print "Scraping data..."
    soup = BeautifulSoup.BeautifulSoup(html)
 
    # get month and year from title
    title = soup.findChild('h1', {'class':'title'}).string
    m = re.search('(.+) share for (\d+)/(\d+)', title)
    if m:
        catname, month, year = m.groups()
        date = float(year) + float(month)/12.0
        print "Got '%s' data for date %g" % (catname, date)
    else:
        break
 
    # get data table and make sure it's what we expect
    table = soup.findChild('table')
    columns = [ x.string for x in table.findChild('thead').findChildren(('th','td')) ]
    assert columns[0]==catname
 
    # get the data in numeric form
    array = np.array(
        [[x.string for x in row.findChildren('td')]
         for row in table.findChildren('tr')] )
    these_choices = array[:, 0]
    these_stats = array[:, (1,3,4,5)].astype(int)
 
    ############################################################
 
    # figure out how the choices correspond to those already seen
    choice_map = np.zeros_like(these_choices).astype(int)
    for ii, choice in enumerate(these_choices):
        try:
            choice_map[ii] = choices.index(choice)
        except ValueError:
            choices.append(choice)
            choice_map[ii] = len(choices) - 1
 
    # extend axis one of stats to accomodate new choices
    stats = extend_axis(stats, 1, len(choices)-stats.shape[1])
 
    # map our data onto the current ordering of choices
    stats = extend_axis(stats, 0, 1)
    stats[-1, (choice_map,)] = these_stats
 
    # add the current date to the list
    dates.append(date)
 
np.savez(output, dates=dates, choices=choices, stats=stats)

[edit] top500plot.py

The second script, top500plot.py, is used to plot data from the format saved by the first script. It should be run like this:

python top500plot.py procfam.npz count "Number of systems" "Processor families in TOP500 supercomputers"
# which takes data from procfam.npz,
# plots time-series according to the number of systems using each architecture,
# labels the y-axis with "Number of systems",
# and titles the chart "Number of systems" "Processor families in TOP500 supercomputers"
#!/usr/bin/python
 
import sys
from itertools import cycle
import numpy as np
import pylab as pl
pl.rcParams['font.size']*=1.1
pl.rcParams['legend.fontsize']*=1.1
pl.rcParams['xtick.labelsize']*=1.25
pl.rcParams['ytick.labelsize']*=1.25
 
 
colors = cycle( list('bgrcmyw')
                + ['orange', 'pink', 'grey', 'brown', 'lightblue', 'lightgreen', 'turquoise', 'navy', 'purple', 'gold',
                   'aqua', 'silver', 'tan', 'tomato', 'steelblue', 'lightcoral', 'chocolate', 'fuchsia', 'indianred'])
 
input, var, ylabel, title = sys.argv[1:]
 
# get the data
bag = np.load(input)
dates = bag['dates']
choices = bag['choices']
stats = bag['stats'] # n_dates X n_choices X 4-columns
 
if var=='count':
    timeseries = stats[:, :, 0]
    ylabel = "Number of systems"
elif var=='rmaxshare':
    timeseries = stats[:, :, 1] / stats[:, :, 1].sum(axis=1)[:,:,None]
    ylabel = "Share of Rmax performance"
elif var=='rpeakshare':
    timeseries = stats[:, :, 2] / stats[:, :, 2].sum(axis=1)[:,:,None]
    ylabel = "Share of Rpeak performance"
elif var=='procshare':
    timeseries = stats[:, :, 3] / stats[:, :, 3].sum(axis=1)[:,:,None]
    ylabel = "Share of processors"
elif var=='procsum':
    timeseries = stats[:, :, 3]
    ylabel = "Number of processors"
else:
    raise RuntimeError
order = np.argsort(timeseries[-1])
 
# plot it
pl.figure()
base = 0
patches, labels = [], []
for ii in order:
    data = timeseries[:, ii]
    choice = choices[ii]
 
    facecolor = colors.next()
    pl.fill_between(dates, base, base+data, edgecolor='k', facecolor=facecolor, label=choice)
    patches.append( pl.Rectangle((0,0), 1, 1, edgecolor='k', facecolor=facecolor) )
    labels.append( choice )
 
    base += data
 
# show legend and labels
pl.legend(patches, labels, loc='upper right')
pl.xlabel("Year")
pl.ylabel(ylabel)
pl.title(title)
 
pl.show()

[edit] Licensing

I, the copyright holder of this work, hereby publish it under the following licenses:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
  • share alike – If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

GNU head Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.

You may select the license of your choice.

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current17:47, 26 August 2009Thumbnail for version as of 17:47, 26 August 20091,800 × 1,203 (108 KB)Moxfyre{{Information |Description={{en|1=w:Area chart showing the representation of different families of microprocessors in the w:TOP500 w:supercomputer ranking list, from 1992-2009. (Based on the public data from http://www.top500.org/stats)
The following pages on the English Wikipedia link to this file (pages on other projects are not listed):

Global file usage

The following other wikis use this file:

Metadata

Personal tools
Namespaces

Variants
Views
Actions
Navigation
Interaction
Toolbox