Criticism of C++

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

C++ is a general-purpose programming language with imperative, object-oriented, and generic programming features. Many criticisms have been leveled at C++'s design by well-known software developers including Linus Torvalds,[1] Richard Stallman,[2] Joshua Bloch, Rob Pike,[3] Ken Thompson,[4][5][6] and Donald Knuth.[7][8]

C++ is a multi-paradigm programming language[9] with extensive, but not complete, backward compatibility with C.[10] This article focuses not on C features like pointer arithmetic, operator precedence or preprocessor macros, but on pure C++ features that are often criticized. Bjarne Stroustrup, the inventor of the C++ language, stated:

There are only two kinds of languages: the ones people complain about and the ones nobody uses.[11]

Slow compile times[edit]

The natural interface between source files in C/C++ are header files. Each time a header file is modified, all source files that include the header file should recompile their code. Header files are slow because they are textual and context-dependent as a consequence of the preprocessor.[12] C only has limited amounts of information in header files, the most important being struct declarations and function prototypes. C++ stores its classes in header files and they not only expose their public variables and public functions (like C with its structs and function prototypes) but also their private functions. This forces unnecessary recompiles of all source files that include the header file, each time when changing these private functions. This problem is magnified where the classes are written as templates, forcing all of their code into the slow header files, which is the case with the whole C++ standard library. Large C++ projects can therefore be relatively slow to compile.[13] The problem is largely solved by precompiled headers in modern compilers or using the module system that was added in C++20; future C++ standards are planning to expose the functionality of the standard library using modules.[14]

Global format state of <iostream>[edit]

C++ <iostream>, unlike C <stdio.h>, relies on a global format state. This fits very poorly together with exceptions, when a function must interrupt the control flow, after an error but before resetting the global format state. One fix for this is to use Resource Acquisition Is Initialization (RAII), which is implemented in the Boost[15] libraries but is not a part of the C++ Standard Library.

<iostream> uses static constructors which causes useless overhead if included, even if the library isn’t used.[16] Another source of bad performance is the misuse of std::endl instead of \n when doing output, as it also calls .flush(). C++ <iostream> is by default synchronized with <stdio.h> which can cause performance problems in command-line io intensive applications. Shutting it off can improve performance but forces giving up some ordering guarantees.

Here follows an example where an exception interrupts the function before std::cout can be restored from hexadecimal to decimal. The error number in the catch statement will be written out in hexadecimal which probably isn't what one wants:

#include <iostream>
#include <vector>

int main() {
  try {
    std::cout << std::hex
              << 0xFFFFFFFF << '\n';
    // std::bad_alloc will be thrown here:
    std::vector<int> vector(0xFFFFFFFFFFFFFFFFull);
    std::cout << std::dec; // Never reached
                           // (using scopes guards would have fixed that issue 
                           //  and made the code more expressive)
  } 
  catch (const std::exception& e) {
    std::cout << "Error number: " << 10 << '\n';  // Not in decimal
  }
}

It is even acknowledged by some members of the C++ standards body[17] that <iostream> is an aging interface that eventually needs to be replaced. This design forces the library implementers to adopt solutions that impact performance greatly.[citation needed]

C++20 added std::format that eliminated the global formatting state and addressed other issues in iostreams.[18] For example, the catch clause can now be written as

std::cout << std::format("Error number: {}\n", 10);

which is not affected by the stream state. Although it might introduce security issues and overhead due to the actual formatting being done at runtime.

Iterators[edit]

The philosophy of the Standard Template Library (STL) embedded in the C++ Standard Library is to use generic algorithms in the form of templates using iterators. Early compilers optimized small objects such as iterators poorly, which Alexander Stepanov characterized as the "abstraction penalty", although modern compilers optimize away such small abstractions well.[19] The interface using pairs of iterators to denote ranges of elements has also been criticized.[20][21] The C++20 standard library's introduction of ranges should solve this problem.[22]

One big problem is that iterators often deal with heap allocated data in the C++ containers and become invalid if the data is independently moved by the containers. Functions that change the size of the container often invalidate all iterators pointing to it, creating dangerous cases of undefined behavior.[23][24] Here is an example where the iterators in the for loop get invalidated because of the std::string container changing its size on the heap:

#include <iostream>
#include <string>

int main() {
  std::string text = "One\nTwo\nThree\nFour\n";
  // Let's add an '!' where we find newlines
  for (auto it = text.begin(); it != text.end(); ++it) {
    if (*it == '\n') {
      // it =
      text.insert(it, '!') + 1;
      // Without updating the iterator this program has
      // undefined behavior and will likely crash
    }
  }
  std::cout << text;
}

Uniform initialization syntax[edit]

The C++11 uniform initialization syntax and std::initializer_list share the same syntax which are triggered differently depending on the internal workings of the classes. If there is a std::initializer_list constructor then this is called. Otherwise the normal constructors are called with the uniform initialization syntax. This can be confusing for beginners and experts alike[25][16]

#include <iostream>
#include <vector>

int main() {
  int integer1{10};                 // int
  int integer2(10);                 // int
  std::vector<int> vector1{10, 0};  // std::initializer_list
  std::vector<int> vector2(10, 0);  // std::size_t, int

  std::cout << "Will print 10\n" << integer1 << '\n';
  std::cout << "Will print 10\n" << integer2 << '\n';

  std::cout << "Will print 10,0,\n";

  for (const auto& item : vector1) {
    std::cout << item << ',';
  }

  std::cout << "\nWill print 0,0,0,0,0,0,0,0,0,0,\n";

  for (const auto& item : vector2) {
    std::cout << item << ',';
  }
}

Exceptions[edit]

There have been concerns that the zero-overhead principle[26] isn't compatible with exceptions.[16] Most modern implementations have a zero performance overhead when exceptions are enabled but not used, but do have an overhead during exception handling and in binary size due to the need to unroll tables. Many compilers support disabling exceptions from the language to save the binary overhead. Exceptions have also been criticized for being unsafe for state-handling. This safety issue has led to the invention of the RAII idiom,[27] which has proven useful beyond making C++ exceptions safe.

Encoding of string literals in source-code[edit]

C++ string literals, like those of C, do not consider the character encoding of the text within them: they are merely a sequence of bytes, and the C++ string class follows the same principle. Although source code can (since C++11) request an encoding for a literal, the compiler does not attempt to validate that the chosen encoding of the source literal is "correct" for the bytes being put into it, and the runtime does not enforce character encoding. Programmers who are used to other languages such as Java, Python or C# which try to enforce character encodings often consider this to be a defect of the language.

The example program below illustrates the phenomenon.

#include <iostream>
#include <string>
// note that this code is no longer valid in C++20
int main() {
  // all strings are declared with the UTF-8 prefix

  // file encoding determines the encoding of å and Ö
  std::string auto_enc = u8"Vår gård på Öland!";
  // this text is well-formed in both ISO-8859-1 and UTF-8
  std::string ascii = u8"Var gard pa Oland!";
  // explicitly use the ISO-8859-1 byte-values for å and Ö
  // this is invalid UTF-8
  std::string iso8859_1 = u8"V\xE5r g\xE5rd p\xE5 \xD6land!";
  // explicitly use the UTF-8 byte sequences for å and Ö
  // this will display incorrectly in ISO-8859-1
  std::string utf8 = u8"V\xC3\xA5r g\xC3\xA5rd p\xC3\xA5 \xC3\x96land!";

  std::cout << "byte-count of automatically-chosen, [" << auto_enc
            << "] = " << auto_enc.length() << '\n';
  std::cout << "byte-count of ASCII-only [" << ascii << "] = " << ascii.length()
            << '\n';
  std::cout << "byte-count of explicit ISO-8859-1 bytes [" << iso8859_1
            << "] = " << iso8859_1.length() << '\n';
  std::cout << "byte-count of explicit UTF-8 bytes [" << utf8
            << "] = " << utf8.length() << '\n';
}

Despite the presence of the C++11 'u8' prefix, meaning "Unicode UTF-8 string literal", the output of this program actually depends on the source file's text encoding (or the compiler's settings - most compilers can be told to convert source files to a specific encoding before compiling them). When the source file is encoded using UTF-8, and the output is run on a terminal that's configured to treat its input as UTF-8, the following output is obtained:

byte-count of automatically-chosen, [Vår gård på Öland!] = 22
byte-count of ASCII-only [Var gard pa Oland!] = 18
byte-count of explicit ISO-8859-1 bytes [Vr grd p land!] = 18
byte-count of explicit UTF-8 bytes [Vår gård på Öland!] = 22

The output terminal has stripped the invalid UTF-8 bytes from display in the ISO-8859 example string. Passing the program's output through a Hex dump utility will reveal that they are still present in the program output, and it is the terminal application that removed them.

However, when the same source file is instead saved in ISO-8859-1 and re-compiled, the output of the program on the same terminal becomes:

byte-count of automatically-chosen, [Vr grd p land!] = 18
byte-count of ASCII-only [Var gard pa Oland!] = 18
byte-count of explicit ISO-8859-1 bytes [Vr grd p land!] = 18
byte-count of explicit UTF-8 bytes [Vår gård på Öland!] = 22

One proposed solution is to make the source encoding reliable across all compilers.

Code bloat[edit]

Some older implementations of C++ have been accused of generating code bloat.[28]:177

See also[edit]

References[edit]

  1. ^ "Re: [RFC] Convert builin-mailinfo.c to use The Better String Library" (Mailing list). 6 September 2007. Retrieved 31 March 2015.
  2. ^ "Re: Efforts to attract more users?" (Mailing list). 12 July 2010. Retrieved 31 March 2015.
  3. ^ Pike, Rob (2012). "Less is exponentially more".
  4. ^ Andrew Binstock (18 May 2011). "Dr. Dobb's: Interview with Ken Thompson". Retrieved 7 February 2014.
  5. ^ Peter Seibel (16 September 2009). Coders at Work: Reflections on the Craft of Programming. Apress. pp. 475–476. ISBN 978-1-4302-1948-4.
  6. ^ "C++ in Coders at Work". 16 October 2009.
  7. ^ Woehr, Jack (April 1996). "An Interview with Donald Knuth: DDJ chats with one of the world's leading computer scientists" (PDF). Dr. Dobb's Journal of Software Tools. 21 (4): 33–37. ISSN 1044-789X. C++ has a lot of good features, but it has a lot of dirty corners. If you don’t mind those, and you stick to stuff that can be counted well-portable, it's just fine. There are many constructions that are ambiguous, there's no way to parse them and decide what they mean, that you can't trust the compiler to do. For example, you use the 'less-than' and 'greater-than' signs not only to mean less-than and greater-than but also in templates. There are lots of little things like this, and many things in the implementation, that you can't be sure the compiler will do anything reasonable with.
  8. ^ "Donald Knuth — Computer Literacy Bookshop Interview". Computer Literacy Bookshops. 7 December 1993. The problem that I have with them today is that... C++ is too complicated. At the moment, it's impossible for me to write portable code that I believe would work on lots of different systems, unless I avoid all exotic features. Whenever the C++ language designers had two competing ideas as to how they should solve some problem, they said "OK, we'll do them both". So the language is too baroque for my taste. But each user of C++ has a favorite subset, and that's fine.
  9. ^ Stroustrup, Bjarne. "Bjarne Stroustrup's FAQ: What is "multiparadigm programming"?". www.stroustrup.com. Retrieved 21 May 2021.
  10. ^ Stroustrup, Bjarne. "Bjarne Stroustrup's FAQ: Are there any features you'd like to remove from C++". www.stroustrup.com. Retrieved 21 May 2021.
  11. ^ Stroustrup, Bjarne. "Bjarne Stroustrup Quotes". www.stroustrup.com. Retrieved 21 May 2021.
  12. ^ Walter Bright. "C++ compilation speed".
  13. ^ Rob Pike. "Less is exponentially more". Back around September 2007, I was doing some minor but central work on an enormous Google C++ program, one you've all interacted with, and my compilations were taking about 45 minutes on our huge distributed compile cluster.
  14. ^ Ville Voutilainen. "To boldly suggest an overall plan for C++23".
  15. ^ "I/O Stream-State Saver Library - 1.60.0". www.boost.org.
  16. ^ a b c "LLVM Coding Standards — LLVM 12 documentation". llvm.org.
  17. ^ "N4412: Shortcomings of iostreams". open-std.org. Retrieved 3 May 2016.
  18. ^ "P0645: Text Formatting". open-std.org. Retrieved 20 May 2021.
  19. ^ Alexander Stepanov. "Stepanov Benchmark". The final number printed by the benchmark is a geometric mean of the performance degradation factors of individual tests. It claims to represent the factor by which you will be punished by your compiler if you attempt to use C++ data abstraction features. I call this number "Abstraction Penalty." As with any benchmark it is hard to prove such a claim; some people told me that it does not represent typical C++ usage. It is, however, a noteworthy fact that majority of the people who so object are responsible for C++ compilers with disproportionately large Abstraction Penalty.
  20. ^ Andrei Alexandrescu. "Iterators Must Go" (PDF).
  21. ^ Andrei Alexandrescu. "Generic Programming Must Go" (PDF).
  22. ^ "Ranges library (C++20) - cppreference.com". en.cppreference.com.
  23. ^ Scott Meyers. Effective STL. Given all that allocation, deallocation, copying, and destruction. It should not stun you to learn that these steps can be expensive. Naturally, you don't want to perform them any more frequently than you have to. If that doesn't strike you as natural, perhaps it will when you consider that each time these steps occur, all iterators, pointers, and references into the vector or string are invalidated. That means that the simple act of inserting an element into a vector or string may also require updating other data structures that use iterators, pointers, or references into the vector or string being expanded.
  24. ^ Angelika Langer. "Invalidation of STL Iterators" (PDF).
  25. ^ Scott Meyers. "Thoughts on the Vagaries of C++ Initialization".
  26. ^ Bjarne Stroustrup. "Foundations of C++" (PDF).
  27. ^ Stroustrup 1994, 16.5 Resource Management, pp. 388–89.
  28. ^ Joyner, Ian (1999). Objects Unencapsulated: Java, Eiffel, and C++?? (Object and Component Technology). Prentice Hall PTR; 1st edition. ISBN 978-0130142696.

Works cited[edit]

Further reading[edit]

  • Ian Joyner (1999). Objects Unencapsulated: Java, Eiffel, and C++?? (Object and Component Technology). Prentice Hall PTR; 1st edition. ISBN 978-0130142696.
  • Peter Seibel (2009). Coders at Work: Reflections on the Craft of Programming. Apress. ISBN 978-1430219484.

External links[edit]