Jump to content

Unified Code Count

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Ucc wiki tool (talk | contribs) at 01:03, 18 November 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Template:New unreviewed article

Software engineering is concerned with theories, methods and tools for professional software production. The main goals of software engineering are to produce high quality software products that are on time and within budget, satisfy the needs of the stakeholders (customers, owners, users, developers, etc), and enable the highest Return of Investment (ROI). Software development cost estimation is difficult as many factors have to be considered. Software construction is human-intensive. Also, software size and complexity are increasing. For example, Windows NT 3.1, released in 1993, had 4-5 million SLOC. Eight years later, Windows Vista was released with 50 million SLOC.

One of the major problems in software estimation is sizing which is also one of the most important attributes of a software product. It is not only the key indicator of software cost and time but also is a base unit to derive other metrics for project status and software quality measurement. The size metric is used as an essential input for most of cost estimation models such as COCOMO, SLIM, SEER-SEM, and Price-S. Although source lines of code or SLOC is a widely accepted sizing metric, in general there is a lack of standards that enforce consistency of what and how to count SLOC.

USC Center for Systems and Software Engineering (CSSE) has developed and released a code counting toolset called the Unified CodeCount, which ensures consistency across independent organizations in the rules used to count software lines of code. To support sizing software code for historical data collection and reporting purposes. This toolset is a collection of tools designed to automate the collection of source code sizing information. It implements the popular code counting standards published by the Software Engineering Institute (SEI) and adapted by COCOMO. Logical and physical SLOC are among the metrics generated by the toolset. SLOC refers to Source Lines of Code is a unit used to measure the size of software program. SLOC counts the program source code based on a certain set of rules. SLOC is a key input for estimating project effort and is also used to calculate productivity and other measurements. There are two types of SLOC: physical and logical sloc. Physical SLOC (PSLOC)– One physical SLOC corresponds to one line starting with the first character and ending by a carriage return or an end-of-file marker of the same line. Blank and comment lines are not counted. Logical SLOC (LSLOC)– Lines of code intended to measure “statements”, which normally terminate by a semicolon (C/C++, Java, C#) or a carriage return (VB, Assembly), etc. Logical SLOC are not sensitive to format and style conventions, but they are language dependent.

The main purpose for developing Code Count was to ensure consistency across independent organizations in the rules used to count software lines of code. Later on Unified CodeCount (UCC) was introduced which is a enhanced version of the CodeCount toolset. The UCC is a code counting and differencing tool that unifies the SLOC counting capabilities of the previous CodeCount tools and source differencing capabilities of the Difftool.

The Unified CodeCount(UCC) allows the user to count, compare, and collect logical differentials between two versions of the source code of a software product. The differencing capabilities allow users to count the number of added/new, deleted, modified, and unmodified logical SLOC of the current version in comparison with the previous version. With the counting capabilities, users can generate the physical and logical SLOC counts, and other sizing information such as complexity, comment and keyword counts of the target program.

System Requirements

A. Hardware

  • RAM: minimum 512 MB. Recommended: 1024 MB
  • HDD: minimum 100 MB disk space available. Recommended: 200MB.

B. Software Operating Systems

  • Linux 2.6.9
  • Unix
  • Mac OS X
  • Windows 9x/Me/XP/Vista
  • Solaris

C. Compilers Supported

  • MS Visual Studio 6, 2003, 2005, 2008, 2010
  • G++
  • Eclipse C/C++

History

Many different code counting tools existed in the early 2000s. However, due to the lack of standard counting rules and software accessibility issues, the National Reconnaissance Organization Cost Analysis and Improvement Group (NCAIG) identified the need for a new code counting tool to analyze software program costs. In order to avoid any industry bias, the CodeCount tool was developed at the esteemed USC Center of Systems and Software Engineering (USC CSSE) under the direction of Dr. Barry Boehm, Merilee Wheaton, and A. Windsor Brown, with IV&V provided by The Aerospace Corporation. Many organizations including the Northrop Grumman and Boeing Corporations donated several code counting tools to the USC CSSE. The goal was to develop a public domain code counting tool that handles multiple languages and produces consistent results for large and small software systems.

Project plans are developed every semester, and graduate students from USC doing directed research are assigned projects to update the code count tool. Vu Nguyen, a PhD student at USC, led several semesters of student projects. All changes are verified and validated by the Aerospace Corporation IV & V team which works closely with the USC Instructor on the projects. The beta versions are tested by industry Affiliates, and then released to the public as open source code.

In 2006, work was done to develop a differencing tool which would compare two software system baselines to determine the differences between two versions of software. The CodeCount tool set, which is a precursor of UCC, was released in the year 2007. It was a collection of standalone programs written in a single language to measure source code written in languages like COBOL, Assembly, PL/1, Pascal, and Jovial.

Nguyen produced the Unified CodeCount (UCC) system design as a framework and the existing code counters and differencing tool were merged into it. Additional features like unified counting and differencing capabilities, detecting duplicate files, support for text and CSV output files, etc. were also added. A presentation on “Unified Code Count with Differencing Functionality” was presented in the 24th International Forum on COCOMO in October 2009.

UCC tool has been released to the public with a license enabling users to use and modify the code; if the modifications are to be distributed, the user must send a copy of the modifications to USC CSSE. The US Government has made UCC its standard for code counting, and it has been specified in many government software contracts.


Importance

Importance of the Unified Code Count

The Unified CodeCount (UCC) is used to analyze existing projects for physical and logical SLOC counts which directly relate to work accomplished. The data collected can then be used by software cost estimation models to accurately estimate time and cost taken for similar projects to get to a successful conclusion. There are many code count tools available in the market, however most have various draw backs such as:

  • Some are proprietary, others are public domain
  • Inconsistent or unpublished counting rules
  • May not be maintained
  • Each tool has different rules for counting giving inconsistent results

The University of Southern California Center for Systems and Software Engineering was approached by the NRO Cost Analysis and Improvement Group (NCAIG) to create a code counting solution developed by non-biased, industry-respected institution and which provides the following features:

  • Count software lines of code
  • Consistently
  • With documented standards
  • Ability to easily add new languages
  • Support and maintenance
  • Compare different baselines of software
  • Determine addition, modification, deletion
  • Identify duplicate files
  • Determine complexity
  • Platform independent
  • Command line interface
  • Modes: Code counting only or counting plus differencing
  • Counts multiple files and languages in a single pass
  • Output reports
  • Robust processing
  • Options to improve performance
  • Error log

The UCC is the result of that effort, and is available as open source to the general public.

Uses and Functionality of CodeCount:

About CodeCount:

The Unified CodeCount Toolset with Differencing Functionality (UCC) is a collection of tools designed to automate the collection of source code sizing and change information.

The UCC runs on multiple programming languages and focuses on two possible Source Lines of Code (SLOC) definitions, physical and/or logical. The Differencing functionality can be used to compare two baselines of software systems and determine change metrics: SLOC addition, deletion, modification, and non-modification counts.

The UCC toolset is copyright USC Center for Software Engineering but is made available with a Limited Public License which allows anyone to make modifications on the code. However, if they distribute that modified code to others, the person or agency has to return a copy to USC so the toolset can be improved for the benefit of all.

Uses of CodeCount:

  • Counting Capabilities- UCC allows users to measure the size information of a baseline of a source program by analyzing and producing the count for:

a) Logical SLOC b) physical SLOC c) comment d) executable, data declaration, e) compiler directive SLOC f) keywords

  • Differencing Capabilities- UCC allows users to compare and measure the differences between two baselines of source programs. These differences are measured in terms of the number of logical SLOC added/new, deleted, modified, and unmodified. These differencing results can be saved to either plain text .txt or .csv files. The default is .csv, but .txt can be specified by using the –ascii switch.
  • Counting and Differencing Directories- UCC allows users to count or compare source files by specifying the directories where the files are located.
  • Support for various Programming Languages - The counting and differencing capabilities accept the source code written in C/C++, C#, Java, SQL, Ada, Perl, ASP, ASP.NET, JSP, CSS, HTML, XML, JavaScript, VB, PhP, VbScript, Bash, C Shell Script, ColdFusion, Fortran, XMidas, NeXtMidas, PhP, and Python.
  • Command Arguments- The tool accepts user’s settings via command arguments. UCC is a command-line application and it is compiled under the application console mode. On PC based machines, the user can use Visual Studio 2003, 2005, 2008, or 2010 to compile the source code.
  • Duplication- For each baseline, two files are considered duplicates if they have same content or the difference is smaller than the threshold given through the command line switch -tdup. Two files may be identified as duplicates although they have different filenames. Comments and blank lines are not considered during duplication processing.
  • Matching- When differencing, files from Baseline A are matched to files in Baseline B. Two files are matched if they have the same filename regardless of which directories they belong to. Remaining files are matched using a best-fit algorithm.
  • Complexity Count- UCC produces complexity counts for all source code files. The complexity counts may include the number of math, trig, logarithm functions, calculations, conditionals, logicals, preprocessors, assignments, pointers, and cyclomatic complexity. When counting, the complexity results are saved to the file “outfile_cplx.csv”, and when differencing the results are saved to the files “Baseline-A-outfile_cplx.csv” and “Baseline-B- outfile_cplx.csv”.


  • File Extensions. The tool determines which code counter to use for each file from the file extension. This release supports the following languages and file extensions:

Functionality of CodeCount:

  • Execution speed:

CodeCount is written in C/C++, and utilizes relatively simple algorithms to recognize comments and physical/logical lines. Testing has shown the UCC to process acceptably fast except in extreme situations. A number of switches are available to inhibit certain types of processing if needed.

  • Reliability and Correctness

CodeCount has been tested extensively in the laboratory, and is being used globally. There is a defect-reporting capability, and any defects reported are corrected promptly. It is not uncommon for users to add functionality or correct defects and notify the UCC managers along with providing the code for the changes.

  • Documentation

The UCC open source distribution contains Release Notes, User’s Manual, and Code Counting Standards for the language counters. The source code contains file headers and in-line comments. The UCC Software Development Plan, Software Requirements Specification, and Software Test Plan are available upon request.

  • Ease of general maintenance

The UCC is a monolithic, object-oriented toolset which facilitates ease of maintenance.

  • Ease of extension

The "CSCI" CodeCount flavor lends itself to ease of extension. Users are able to easily add another language counter on their own.

  • Compatibility

CodeCount is the clear winner if compatibility with the COCOMO estimation mechanism is required or desired. CodeCount also wins if compatibility with companies already using CodeCount is desired.

  • Portability

CodeCount has been tested on a wide variety of operating systems and hardware platforms and found to be portable to any environment that has an ANSI standard C++ compiler.

  • Availability of source code

Source code for CodeCount is available as a downloadable zip file.

  • Licensing

Source code for CodeCount is provided under the terms of the USC-CSE Limited Public License.


Standards for the Language

The main objective for the Unified CodeCount (UCC) is to provide counting methods that define a consistent and repeatable SLOC measurement. There are more than 20 SLOC counting applications, each of which produces the different physical and logical SLOC count, with some 75 commercially available software cost estimating tools existing in today’s market. The differences in cost results from the various tools show the deficiencies of the current techniques in estimating the size of the code, particularly true for the projects of the large magnitude, where cost estimation depends on automatic procedures to generate reasonably accurate predictions. This led to the need of a universal SLOC counting standard which would produce consistent results.

SLOC serves as a main factor for cost estimation techniques. Although it is not the sole contributor to software cost estimation, it does provide the foundation for a number of metrics that are derived throughout the software development life cycle. The SLOC counting procedure can be automated, requiring less time and effort to produce metrics. A well defined set of rules identify what to include and exclude in SLOC counting measures. The two most accepted measures for SLOC are the number of physical and logical lines of code.

The program elements included in the physical and logical SLOC measure count can be seen below as:

Measurement Unit
Source statement type
Executable
Nonexecutable
Declarations
Compiler directives
Comments
Blank lines
Separate totals for each language

In the CodeCount, logical SLOC measures the total number of source statements in a block of code. The three types of statements are: executable, declaration and compiler directives. Executable statements are eventually translated into machine code to cause runtime actions, while declaration and compiler directive statements affect compiler’s actions.

The USC CodeCount tool treats the source statements as independent units at source code level, where a programmer constructs a statement and its sub-statements completely. The UCC assumes that the source code will compile; otherwise the results are unreliable. The next big challenge was to decide the ends of each statement for counting logical SLOC. The semicolon option may sound appealing, but not all the popular languages uses the semicolon (like SQL, JavaScript, UNIX scripting languages, etc.). The Software Engineering Institute (SEI) at Carnegie Mellon University and COCOMO II SLOC defined a way to count ‘how many of what program elements’. The table 1 and 2 illustrates the summary of SLOC counting rules for logical line of codes for C/C++, Java, and C# programming languages. The UCC Code Counting Rules for each language are distributed with the open source release.

Measurement Unit Order of Precedence Physical SLOC
Executable lines
Statements 1 One per line




References