The mutability of std::basic_string<> along with its interface do not allow implementers to make it a lightweight value object. Efforts to provide cheap copy operation through string representation sharing and copy-on-write technique while maintaining thread safety by present have not succeded [1]. Modern implementation of std::basic_string<> (like Dinkum’s and STLPort ones) tend to give up representation sharing for thread safety. Other implementations (like that one which comes with g++) do share string representation while not being really thread safe.
There is a surprisingly large number of string use cases where string mutability is not actually needed. Also, it seems that applications can be designed without using mutable strings at all. Most prominent examples are Python and Java languages, where built-in string is immutable.
The goal of boost::const_string<> is to address the problem, giving up mutability to make the string a real value object, avoiding dynamic allocations while constructing and copying. It also uses expression templates for concatenation, effectively eliminating overhead resulting from creation intermediate temporary objects. Its immutability allows for thread safe reference counted representation sharing.
boost::const_string<> provides a subset of std::basic_string<> interface. All constructors and non mutating operations of std::basic_string<> are included and also some of mutating ones, which do make sense for an immutable string. operator[] is const and returns by value. The following member functions are omitted:
resize() capacity() reserve() insert() erase() replace() get_allocator()
Actually, insert(), erase() and replace() functions could be included, but they would always result in a reallocation. So, if one needs that functions, she should resort to using std::basic_string<>, which in general may avoid reallocations when using these functions.
Extension constructors are provided to allow creation of the string from string literals and std::basic_string<> objects using reference semantics. It allows to avoid allocating any memory and copying the source while constructing the string. This makes the string a perfect choice for using it instead plain char const* everywhere, including the cases where char const* is used as a common denominator in function interfaces to avoid otherwise probably costly argument conversion to function’s string type. Here is an example:
void f(char const*); void g(std::basic_string<char> const&); void h(boost::const_string<char>const&);
Function f() can be called effectively with a c-style string and an object of any string type that provides a conversion to c-style string, as long as the conversion is a zero cost operation. Function g() can be called effectively with std::string object only. Calling g() with a string of any other type will end up most likely allocating and surely copying the argument. Function h() when using reference semantics has the same performance guaranties as f() has and yet provides an additional value of convenience that plain char const* does not provide. Reference semantics kicks in when the first argument of the string’s constructor is wrapped in boost::reference_wrapper<> obtained via call to boost::cref(). Here is an example of how h can be called using reference semantics:
char const* s = “abc”; h(boost::cref(s)); std::string str(“abc"); h(boost::cref(str));
String concatenation is implemented using expression templates technique. This makes concatenation a very effective operation, resulting in only one memory allocation to store the result. That makes the commonly used trick for concatenating std::basic_string<>'s, which is concatenate using operator += or append() instead of more natural +, not necessary and, in fact, less efficient for the string, since its operator+=() always results in one memory allocation.
The string provides strong exception safety guaranty. Copy constructor and assignment that takes other const_string<> do not allocate and are nothrow operations. This makes generic std::swap() a nothrow operation.
All operations are thread safe.
All constructors, except those with reference semantics (those ones that take boost::reference_wrapper<>), allocate memory and copy the source string. Constructors with reference semantics do not allocate and do not copy the source string.
The string uses small string optimization, that is, the string has an internal buffer to store strings shorter than a certain length. The size of the buffer can be specified by the user. The size of the string object can be as little as sizeof(CharT*) + sizeof(size_t), which on modern x86 platforms is 8 bytes. The sizeof(CharT*) part of the string representation is also used as a part of its internal buffer, thus the buffer has the size of at least sizeof(CharT*). By default, the string object size is 16 bytes. This provides for 12-byte internal buffer.
For comparison, objects of Dinkumware's implementation of std::basic_string<> bundled with MS VS .NET 2003, have the size of 28 bytes providing 16-byte internal buffer. STLPort's implementation has size of 8 bytes plus 16-character internal buffer, which for std::basic_string<char> amounts to 24 bytes. GNU ISO C++ implementation, which comes with g++ 3.2, does not use small string optimization, but do use string representation sharing through reference counting, and has the objects with size of 4 bytes.
Please note, that these size considerations are only exact for modern x86 platforms and when a stateless allocator is used (i.e. without data members).
The code uses standard C++ and should work with a standard conforming compiler. By the date it has been tested with:
#include "boost/const_string/const_string_fwd.hpp" // forward declarations #include "boost/const_string/const_string.hpp" // const_string class definition and implementation #include "boost/const_string/concatenation.hpp" // operator+ definition and implementation #include "boost/const_string/format.hpp" // format functions definition and implementation #include "boost/const_string/io.hpp" // stream << and >> operators, getline functions definition and implementation
template< class CharT , class TraitsT = std::char_traits<CharT> , class StorageT = const_string_storage<TraitsT> > class const_string;
The class implements the subset of std::basic_string<> interface. The first and the second template parameters have the same meaning as for std::basic_string<>. The third parameters is not the allocator type, rather string's storage implementation. Please note, that the storage should not be considered a policy. It is made as a template parameter solely to allow one to customize the low-level storage template parameters. const_string<> class is a stateless interface wrapper over the storage.
All std::basic_string<> functions has the same semantics and signatures as std::basic_string<> does, for exception specification prease refer to const_string<> sources.
template< class TraitsT , class AllocatorT = std::allocator<typename TraitsT::char_type> , size_t buffer_size = (16 - sizeof(size_t)) / sizeof(typename TraitsT::char_type) , size_t buffer_alignment = 0 > class const_string_storage;
The class implements string memory management and core string operation, in terms of which interface const_string<> is implemented. The first two parameters are obvious. buffer_size parameter specifies the string internal buffer in bytes. It has such a default value, so that string objects occupy 16 bytes (since all the state in the storage class, the string itself does not have any data members). buffer_alignment specifies the internal buffer alignment in bytes (the parameter may be removed in future).
To be continued…
http://sourceforge.net/projects/conststring/
[1] "Vindicated? Sutter and COW Strings" thread at comp.lang.c++.moderated.
© 2004 Maxim Yegorushkin
Use, modification, and distribution are subject to the Boost Software License, Version 1.0.