Extending collate by derivation

The behavior of collate can still be customized if you are on a platform that does not support a file system, or if you do not wish to use data files for other reasons. Naturally, you can derive from collate and override each of the virtual methods in a portable manner as specified by the C++ standard. Additionally you can take advantage of the EWL C++ specific protected interface of collate_byname if you wish (to make your job easier if portability is not a concern).

The class collate_byname has one protected data member:

  __collation_rule<charT> rule_;
  
Listing: The class std::__collation_rule interface:
template <class charT>
class __collation_rule

{

   struct value

   {

      charT primary;

      charT secondary;

      charT tertiary;

      ;

public:

   struct entry

      : value

   {

      unsigned char length;

   };

   __collation_rule();

   explicit __collation_rule(const basic_string<charT>& rule);

   void set_rule(const basic_string<charT>& rule);

   entry operator()(const charT* low, 

            const charT* high, int& state) const;

   bool is_french() const;

   bool empty() const;

};

Most of this interface is to support collate_byname. If you simply derive from collate_byname, set the rule with a string, and let collate_byname do all the work, then there is really very little you have to know about __collation_rule.

A __collation_rule can be empty (contain no rule). In that case collate_byname will use collate's sorting rule. This is also the case if collate_byname is constructed with "C". And once constructed, __collation_rule's rule can be set or changed with set_rule. That is all you need to know to take advantage of all this horsepower!

Listing: Example of a __collation_rule:
#include <iostream>
#include <locale>

#include <string>

struct my_collate

   : public std::collate_byname<char>

{

   my_collate();

   ;

 

my_collate::my_collate()

   : std::collate_byname<char>("C")

{

   rule_.set_rule("< a = A < b = B < c = C

                  "< d = D < e = E < f = F"

                  "< g = G < h = H < i = I"

                  "< j = J < k = K < l = L"

                  "< m = M < n = N < o = O"

                  "< p = P < q = Q < r = R"

                  "< s = S < t = T < u = U"

                  "< v = V < w = W < x = X"

                  "< y = Y < z = Z");

}

int main()

{

   std::locale loc(std::locale(), new my_collate);

   std::string s1("Arnold");

   std::string s2("arnold");

   if (loc(s1, s2))

      std::cout << s1 << " <  " << s2 << '\n';

   else if (loc(s2, s1))

      std::cout << s1 << " >  " << s2 << '\n';

   else

      std::cout << s1 << " == " << s2 << '\n';

}

The custom facet my_collate derives from std::collate_byname<char> and sets the rule in its constructor. That's all it has to do. For this example, a case-insensitive rule has been constructed. The output of this program is:

  Arnold == arnold
  

Alternatively, you could use my_collate directly (this is exactly what EWL C++'s locale does):

Listing: Example of custom facet my_collate:
int main()
{

   my_collate col;

   std::string s1("Arnold");

   std::string s2("arnold");

   switch (col.compare(s1.data(), s1.data()+s1.size(),

                  s2.data(), s2.data()+s2.size())

          )

   {

   case -1:

      std::cout << s1 << " <  " << s2 << '\n';

      break;

   case  0:

      std::cout << s1 << " == " << s2 << '\n';

      break;

   case  1:

      std::cout << s1 << " >  " << s2 << '\n';

      break;

   }

}

The output of this program is also:

  Arnold == arnold