Jump to content

Stata

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 131.172.99.15 (talk) at 01:03, 6 October 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Stata
Original author(s)Bill Gould
Developer(s)StataCorp
Initial release1985
Stable release
11.1 / June 8, 2010 (2010-06-08)
Written inC
Operating systemWindows, Mac OS X, Unix, Linux
Typestatistical analysis
Licenseproprietary
Websitewww.stata.com

Stata is a general-purpose statistical software package created in 1985 by StataCorp (4905 Lakeway Drive, College Station, Texas 77845, USA). It is used by many businesses and academic institutions around the world. Most of its users work in research, especially in the fields of economics, sociology, political science, biomedicine and epidemiology.

It is a teaching software of many renowned academic institutions, including the London School of Hygiene & Tropical Medicine and the Faculty of Medicine, National University of Singapore.

Stata's full range of capabilities includes:

  • Data management
  • Statistical analysis
  • Graphics
  • Simulations
  • Custom programming

The name "Stata" is a portmanteau of the words "statistics" and "data"; it is not an acronym and therefore should not appear with all letters capitalized (i.e. as STATA).[1] The correct English pronunciation of "Stata" "must remain a mystery;" any of "Stay-ta", "Sta-ta" or "Stah-ta" are considered acceptable.[2]

There are four major builds of each version of Stata [3]:

  • Stata/MP for multiprocessor computers (including dual-core and multicore processors)
  • Stata/SE for large databases
  • Stata/IC which is the standard version
  • Small Stata which is a smaller, student version of educational purchase only

User interface

Stata has always emphasized a command-line interface, which facilitates replicable analyses. Since version 8.0, however, Stata has included a graphical user interface which uses menus and dialog boxes to give access to nearly all built-in commands. This generates code which is always displayed, easing the transition to the command line interface and more flexible scripting language. The dataset can be viewed or edited in spreadsheet format, from version 11 or later versions other commands can be executed while the data browser or editor is opened. This capability is not available in earlier versions.

Data structure and storage

Stata can only open a single dataset at any one time. Stata holds the entire dataset in (random-access or virtual) memory, which limits its use with extremely large datasets. This is mitigated to some extent by efficient internal storage, as there are integer storage types which occupy only one or two bytes rather than four, and single-precision (4 bytes) rather than double-precision (8 bytes) is the default for floating-point numbers.

The dataset is always rectangular in format, that is, all variables hold the same number of observations (in more mathematical terms, all vectors have the same length, although some entries may be missing values).

Data Format Compatibility

Stata can import data in a variety of formats. This includes ASCII data formats (such as CSV or databank formats) and spreadsheet formats (including various Excel formats).

Stata's proprietary file formats are platform independent, so users of different operating systems can easily exchange datasets and programs. Stata's data format has changed over time, although not every Stata release includes a new dataset format. Every version of Stata can read all older dataset formats, and can write both the current and most recent previous dataset format, using the saveold command[4]. Thus, the current Stata release can always open datasets that were created with older versions, but older versions cannot read newer format datasets.

Stata can read and write SAS XPORT format datasets natively, using the fdause and fdasave commands.

Some other econometric applications, including gretl, can directly import Stata file formats.

Extensibility

Stata is unusual among commercial statistics packages in allowing user-written commands, distributed as so called ado-files, to be straightforwardly downloaded from the internet which are then indistinguishable to the user from the built-in commands. In this respect, Stata combines the extensibility more often associated with open-source packages with features usually associated with commercial packages such as software verification, technical support and professional documentation. Some user-written commands have later been adopted by StataCorp to become part of a subsequent official release after appropriate checking, certification and documentation.

User community

Stata has an active email list (Statalist, over 1000 messages per month), to which StataCorp employees regularly contribute. Statalist is maintained by Marcello Pagano, Harvard School of Public Health not by StataCorp itself. Articles about the use of Stata and new user-written commands are published in the quarterly peer-reviewed Stata Journal. User group meetings are held annually in the USA, the UK, Germany and Italy, and less frequently in several other countries.

Established under the Societies Act on 10 May 2008, the Singapore Stata Users Group is the world's first government-approved users group (Registration No: 2048/2008; UEN: T08SS0091A). This was officially announced in the Stata News [5]. As a non-profit organisation SGSUG does not organise regular meetings but provide programming and statistical advice to users in Singapore through informal means. Aimed to promote proper application of statistics in research, SGSUG's slogan is "Shaping Data Meaningfully". The active members of SGSUG are mostly enagaged in medical research.

Example Stata code

To perform logistic regression of y on x:

logistic y x

To display a scatter plot of y against x restricted to values of x below 10:

scatter y x if x < 10

Timeline of releases

In recent years, StataCorp have released a new major release of Stata (incrementing the integer part of the version number) roughly every two years. Users must pay a fee if they wish to upgrade to the latest major release. Minor releases (incrementing the decimal part of the version number) are sometimes made available in between major releases. These are available as free downloadable updates to those who have a licence for the previous major release. Dates of all releases are available on the Stata website[6]. Stata's version control system is designed to give a very high degree of backward compatibility, ensuring that code written for previous releases continues to work. Stata 11 shipped on July 27th 2009.

See also

References

  1. ^ What is the correct way to write ‘Stata’?
  2. ^ What is the correct way to pronounce ‘Stata’?
  3. ^ "Which Stata is right for me?". Stata. Retrieved 2010-04-04.
  4. ^ Stata's 'help' entry for the save command
  5. ^ Stata News Vol. 23 No. 2; Apr-May-Jun 2008
  6. ^ History of Stata