Jump to content

SAS (software): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Pegua (talk | contribs)
m Disambiguated wikilink RTF
Line 218: Line 218:
Version 6 represented a major milestone for SAS. While it was superficially similar to the user, the major change was "under the hood", where the software was rewritten. From its [[FORTRAN]] origins, followed by PL/I and mainframe [[assembly language]]; in version 6 the SAS System was rewritten in [[C]], to provide enhanced [[portability]] between operating systems, as well as access to an increasing pool of C programmers from the shrinking pool of PL/I programmers.
Version 6 represented a major milestone for SAS. While it was superficially similar to the user, the major change was "under the hood", where the software was rewritten. From its [[FORTRAN]] origins, followed by PL/I and mainframe [[assembly language]]; in version 6 the SAS System was rewritten in [[C]], to provide enhanced [[portability]] between operating systems, as well as access to an increasing pool of C programmers from the shrinking pool of PL/I programmers.


In [[1984]] SAS 6.03 and 6.04 were released for PC-DOS. A project management component was added (SAS/OR?)
In [[1984]] SAS 6.03 and 6.04 were released for PC-DOS. A project management component was added (SAS/OR?).


In [[1985]] SAS/AF software, econometrics and time series analysis (SAS/DMI) component, and interactive matrix programming (SAS/IML) software was introduced. The capability for MS-DOS SAS to link to mainframe SAS was introduced.
In [[1985]] SAS/AF software, econometrics and time series analysis (SAS/DMI) component, and interactive matrix programming (SAS/IML) software was introduced. The capability for MS-DOS SAS to link to mainframe SAS was introduced.

Revision as of 13:45, 3 July 2006

The SAS System, originally Statistical Analysis System, is an integrated system of software products provided by SAS Institute that enables the programmer to perform:

In addition, the SAS System integrates with many SAS business solutions that enable large scale software solutions for areas such as human resource management, financial management, business intelligence, customer relationship management and more.

Description of SAS

SAS 8 on an IBM Mainframe under 3270 emulation
SAS 8 on an IBM Mainframe under 3270 emulation screen shot
SAS 8 on an IBM Mainframe under 3270 emulation screen shot
Developer: SAS Institute

SAS is driven by SAS programs that define a sequence of operations to be performed on data stored as tables. Although non-programmer graphical user interfaces to SAS exist (such as the SAS Enterprise Guide), most of the time these GUIs are just a front-end to automate or facilitate generation of SAS programs. SAS components expose their functionalities via application programming interfaces, in the form of statements and procedures.

A SAS program is composed of three major parts, the DATA step, procedure steps (effectively, everything that is not enclosed in a DATA step), and a macro language. SAS Library Engines and Remote Library Services allow access to data stored in external data structures and on remote computer platforms.

The DATA step section of a SAS program, like other database-oriented fourth-generation programming languages such as SQL or Focus, assumes a default file structure, and automates the process of identifying files to the operating system, opening the input file, reading the next record, opening the output file, writing the next record, and closing the files. This allows the user/programmer to concentrate on the details of working with the data within each record, in effect working almost entirely within an implicit program loop that runs for each record.

All other tasks are accomplished by procedures that operate on the dataset (SAS' terminology for "table") as a whole. Typical tasks include printing or performing statistical analysis, and may just require the user/programmer to identify the dataset. Procedures are not restricted to only one behavior and thus allow extensive customization, controlled by mini-languages defined within the procedures. SAS also has an extensive SQL procedure, allowing SQL programmers to use the system with little additional knowledge.

There are macro programming extensions, that allow for rationalization of repetitive sections of the program. Proper imperative and procedural programming constructs can be simulated by use of the "open code" macros or the SAS/IML component.

Macro code in a SAS program, if any, undergoes preprocessing. At runtime, DATA steps are compiled and procedures are interpreted and run in the sequence they appear in the SAS program. A SAS program requires the SAS System to run.

SAS 9 on Microsoft Windows
SAS 9 on Microsoft Windows screen shot
SAS 9 on Microsoft Windows screen shot
Developer: SAS Institute

Compared to general-purpose programming languages, this structure allows the user/programmer to be less familiar with the technical details of the data and how it is stored, and relatively more familiar with the information contained in the data. This blurs the line between user and programmer, appealing to individuals who fall more into the 'business' or 'research' area and less in the 'information technology' area, since SAS does not enforce (although SAS recommends) a structured, centralized approach to data and infrastructure management.

The SAS System runs on IBM mainframes, Unix machines, OpenVMS Alpha, and Microsoft Windows; and code is almost transparently moved between these environments. Older versions have supported PC-DOS, the Apple Macintosh, VMS, VM/CMS and OS/2.

Components

The SAS system consists of a number of components, which organisations separately license and install as required.

SAS Add-In for Microsoft Office
A component of the SAS Enterprise Business Intelligence Server, is designed to provide access to data and analytics for non-technical workers (such as business analysts, power users, domain experts and decision makers) via menus and toolbars integrated into Office applications.
Base SAS
The core of the SAS System is the so-called Base SAS Software, which is used to manage data. SAS procedures software analyzes and reports the data. The SQL procedure allows SQL programming in lieu of data step and procedure programming. Library Engines allow transparent access to common data structures such as Oracle, as well as pass-through of SQL to be executed by such data structures. The Macro facility is a tool for extending and customizing SAS software programs and reducing overall program verbosity. The DATA step debugger is a programming tool that helps find logic problems in DATA step programs. The Output Delivery System (ODS) is a system that delivers output in a variety of formats, such as SAS data sets, listing files, RTF, PDF, or HTML. The SAS windowing environment is an interactive, graphical user interface used to run and test SAS programs.
SAS Enterprise Business Intelligence Server
Includes both a suite of business intelligence (BI) tools and a platform to provide uniform access to data. The goal of this product is to compete with Business Objects and Cognos' offerings.
Enterprise Computing Offer (ECO)
Not to be confused with Enterprise Guide or Enterprise Miner, ECO is a product bundle.
Enterprise Guide
SAS Enterprise Guide is a Microsoft Windows client application that provides a guided mechanism to use SAS and publish dynamic results throughout an organization in an uniform way. It is marketed as the default interface to SAS for business analysts, statisticians, and programmers.
Enterprise Miner
A data mining tool.
ETL
Provides Extract, transform, load services.
SAS/ACCESS
Provides the ability for SAS to transparently share data with non-native datasources.
SAS/ACCESS for PC Files
Allows SAS to transparently share data with personal computer applications including MS Access and Microsoft Office Excel.
SAS/AF
Applications facility, a set of application development tools to create customized applications.
SAS/ASSIST
Early point-and-click interface to the SAS system, has since been superseded by SAS Enterprise Guide.
SAS/C
SAS/CONNECT
Provides ability for SAS sessions on different platforms to communicate with each other.
SAS/DMI
A programming interface between interactive SAS and ISPF/PDF applications. Obsolete since version 5.
SAS/EIS
A menu-driven system for developing, running, and maintaining an enterprise information systems.
SAS/ETS
Provides Econometrics and Time Series Analysis
SAS/FSP
SAS/GIS
An interactive desktop Geographic Information System for mapping applications.
SAS/GRAPH
Although base SAS includes primitive graphing capabilities, SAS/GRAPH is needed for charting on graphical media.
SAS/IML
Matrix-handling SAS script extensions.
SAS/INSIGHT
Dynamic tool for data mining. Allows examination of univariate distributions, visualization of multivariate data, and model fitting using regression, analysis of variance, and the generalized linear model.
SAS/IntrNet
Extends SAS’ data retrieval and analysis functionality to the Web with a suite of CGI and Java tools
SAS/LAB
Superseded by SAS Enterprise Guide.
SAS/OR
Operations Research
SAS/PH-Clinical
Defunct product
SAS/QC
Quality Control provides quality improvement tools.
SAS/SHARE
Is a data server that allows multiple users to gain simultaneous access to SAS files
SAS/STAT
Statistical Analysis with a number of procedures, providing statistical information such as analysis of variance, regression, multivariate analysis, and categorical data analysis.
SAS/TOOLKIT
SAS/Warehouse Administrator
Superseded in SAS 9 by SAS ETL Server.
SAS Web Report Studio
Part of the SAS Enterprise Business Intelligence Server, provides access to query and reporting capabilities on the Web. Aimed at non-technical users.

Terminology

Where many other languages refer to tables, rows, and columns/fields, SAS uses the forms data sets, observations, and variables respectively. This usage derives from its statistical heritage, and is shared by SPSS, another statistical package of similar vintage.

There are only two kinds of variables in SAS, being numeric and character (string). By default all numeric variables are stored as real. It is possible to reduce precision however. Date and date-time variables are numeric variables that inherit the C tradition and are stored as either the number of days (for date variables) or milliseconds (for date-time variables) from a certain date and time (01-January-1960 00:00:00 hr.).

Features

  • Read and write many different file formats.
  • Process data in many different formats.
  • Many built-in statistical and random number functions.
  • Interaction with database products through SQL (and ability to use SQL internally to manipulate SAS data sets).
  • Direct output of reports to CSV, HTML, PCL, PDF, PostScript, RTF, XML, and more using ODS.
  • Interaction with the operating system (for example, Piping on Unix and Windows and DDE on Windows).
  • Fast development time, particularly from the many built-in procedures.
  • Hundreds of built-in functions for manipulating character and numeric variables.
  • An Integrated development environment.
  • Dynamic data driven code generation using the SAS Macro language.
  • Can process files containing millions of rows and thousands of columns of data.
  • University research centers often offer SAS code for advanced statistical techniques, especially in fields such as Political Science, Economics and Business Administration.

Example SAS code

SAS uses data steps and procedures to analyze and manipulate data. A data step iterates through each observation in a dataset (sort of like every row in a SQL table).

This data step creates a new data set BBB that includes those observations from dataset AAA that had charges greater than 100.

data BBB;
    set AAA;
    if charge > 100;
run;

Procedures that can summarize data are available in SAS. The proc freq procedure shows a frequency distribution of a given variable in a dataset.

proc freq data=BBB;
    table charge;
run;

SAS features a macro language, which can be used to generate SAS code. For instance, the above example could be re-used in many pieces of code by rewriting it as a macro:

%macro freqtable(table, variable);
  proc freq data = &table;
    table &variable;
  run;
%mend freqtable;

%freqtable(BBB, charge);

Version history

? – 79.3 - 82.4

SAS was first released in 1972 for the IBM mainframe only. 1980 saw the addition of SAS/GRAPH, a graphing component; and SAS/ETS for econometric and time series analysis. In 1981 SAS/FSP followed, providing full-screen interactive data entry, editing, browsing, retrieval, and letter writing.

In 1983 full-screen spreadsheet capabilities were introduced (PROC FSCALC).

Version 5 series

Version 5 was the release used through the late 1980s and lingered on in many institutions well into the 1990s. It was the last version written mainly in PL/I.

Version 6 series

Version 6 represented a major milestone for SAS. While it was superficially similar to the user, the major change was "under the hood", where the software was rewritten. From its FORTRAN origins, followed by PL/I and mainframe assembly language; in version 6 the SAS System was rewritten in C, to provide enhanced portability between operating systems, as well as access to an increasing pool of C programmers from the shrinking pool of PL/I programmers.

In 1984 SAS 6.03 and 6.04 were released for PC-DOS. A project management component was added (SAS/OR?).

In 1985 SAS/AF software, econometrics and time series analysis (SAS/DMI) component, and interactive matrix programming (SAS/IML) software was introduced. The capability for MS-DOS SAS to link to mainframe SAS was introduced.

In 1986 Statistical quality improvement component is added (SAS/QC software); SAS/IML and SAS/STAT software is released for personal computers.

1987 saw concurrent update access provided for SAS data sets with SAS/SHARE software. Database interfaces are introduced for DB2 and SQL-DS.

In 1988 MultiVendor Architecture (MVA) concept is introduced; SAS/ACCESS software is released. Support for UNIX-based hardware announced. SAS/ASSIST software for building user-friendly front-end menus is introduced. New SAS/CPE software establishes SAS as innovator in computer performance evaluation.

6.06 for MVS, CMS, and OpenVMS is announced in 1990.

Data visualization capabilities added in 1991 with SAS/INSIGHT software.

In 1992 SAS/CALC, SAS/TOOLKIT, SAS/PH-Clinical, and SAS/LAB software is released.

In 1993 software for building customized executive information systems (EIS) is introduced. Release 6.08 for MVS, CMS, VMS, VSE, OS/2, and Windows is announced.

1994 saw the addition of ODBC support, plus SAS/SPECTRAVIEW and SAS/SHARE*NET components.

6.09 saw the addition of a data step debugger.

6.09E for MVS.

6.10 in 1995 was a Microsoft Windows release and the first release for the Apple Macintosh. Version 6 was the first, and last series to run on the Macintosh.

Also in 1995, 6.11 (codenamed Orlando) was released for Windows 95, Windows NT, and UNIX.

6.12 were Unix and Microsoft Windows releases (and more?)

(Some of the following milestones in this sub-section may belong under version 7 or 8.)

In 1996 SAS announces Web enablement of SAS software. Scalable performance data server is introduced.

In 1997 SAS/Warehouse Administrator and SAS/IntrNet software goes into production.

1998 sees SAS introduce a customer relationship management (CRM) solution, and an ERP access interface — SAS/ACCESS interface for SAP R/3. SAS is also the first to release OLE-DB for OLAP and releases HOLAP solution. Balanced scorecard, SAS/Enterprise Reporter, and HR Vision are released. First release of SAS Enterprise Miner.

1999 sees the releases of HR Vision software, the first end-to-end decision-support system for human resources reporting and analysis; and Risk Dimensions software, an end-to-end risk-management solution.

In 2000 SAS shipped Enterprise Guide and ported its software to Linux.

Version 7 series

The Output Delivery System debuted in version 7; as did long variable names (from 8 to 32 characters); storage of long character strings in variables (from 200 to 32,767); and a much improved built-in text editor, the Enhanced Editor.

Version 7 saw the synchronisation of features between the various platforms for a particular version number (which previously hadn't been the case).

Version 7 was a pre-cursor to version 8. It was believed SAS Institute released a snapshot from their development on version 8 to meet a deadline promise. SAS Institute recommended sites waited until version 8 before deploying the new software.

Version 8 series

Released about 1999; 8.0, 8.1, 8.2 were Unix, Microsoft Windows, and z/OS releases.

Version 9 series

In version 9, SAS Institute added the SAS Management Console, parallel processing, JavaObj, ODS OO (experimental as opposed to alpha), and National Language Support.

Again the SAS Institute recommended sites delay deployment until 9.1.

SAS Version 9 is running on Windows (32 & 64 bit), Unix (64 bit), Linux, and z/OS.

SAS 9.1 was released in 2003.

SAS 9.1.3 Service Pack 4 is available (April 2006).

References