.NET Core.NET Framework Acronym Glossary.NET Framework ADO.NET.NET Framework CLR.NET Framework Code Contracts.NET Framework Collections.NET Framework Custom Types.NET Framework DateTime parsing.NET Framework Dependency Injection.NET Framework Dictionaries.NET Framework Encryption / Cryptography.NET Framework Exceptions.NET Framework Expression Trees.NET Framework File Input/Output.NET Framework ForEach.NET Framework Garbage Collection.NET Framework Globalization in ASP.NET MVC using Smart internationalization for ASP.NET.NET Framework HTTP clients.NET Framework HTTP servers.NET Framework Introduction.NET Framework JIT compiler.NET Framework JSON Serialization.NET Framework LINQ.NET Framework Managed Extensibility.NET Framework Memory management.NET Framework Networking.NET Framework NuGet packaging system.NET Framework Platform Invoke.NET Framework Process and Thread affinity setting.NET Framework Reading and writing Zip files.NET Framework ReadOnlyCollections.NET Framework Reflection.NET Framework Regular Expressions (System.Text.RegularExpressions).NET Framework Serial Ports.NET Framework Settings.NET Framework SpeechRecognitionEngine class to recognize speech.NET Framework Stack and Heap.NET Framework Strings.NET Framework Synchronization Contexts.NET Framework System.Diagnostics.NET Framework System.IO.NET Framework System.IO.File class.NET Framework System.Net.Mail.NET Framework System.Reflection.Emit namespace.NET Framework System.Runtime.Caching.MemoryCache (ObjectCache).NET Framework Task Parallel Library (TPL).NET Framework Task Parallel Library (TPL) API Overviews.NET Framework Threading.NET Framework TPL Dataflow.NET Framework Unit testing.NET Framework Upload file and POST data to webserver.NET Framework Using ProgressT and IProgressT.NET Framework VB Forms.NET Framework Work with SHA1 in C Sharp.NET Framework Write to and read from StdErr stream.NET Framework XmlSerializerJSON in .NET with Newtonsoft.JsonParallel processing using .NET framework

.NET Framework Strings

From WikiOD

Remarks[edit | edit source]

In .NET strings System.String are sequence of characters System.Char, each character is an UTF-16 encoded code-unit. This distinction is important because spoken language definition of character and .NET (and many other languages) definition of character are different.

One character, which should be correctly called grapheme, it's displayed as a glyph and it is defined by one or more Unicode code-points. Each code-point is then encoded in a sequence of code-units. Now it should be clear why a single System.Char does not always represent a grapheme, let's see in real world how they're different:

  • One grapheme, because of combining characters, may result in two or more code-points: à is composed by two code-points: U+0061 LATIN SMALL LETTER A and U+0300 COMBINING GRAVE ACCENT. This is the most common mistake because "à".Length == 2 while you may expect 1.
  • There are duplicated characters, for example à may be a single code-point U+00E0 LATIN SMALL LETTER A WITH GRAVE or two code-points as explained above. Obviously they must compare the same: "\u00e0" == "\u0061\u0300" (even if "\u00e0".Length != "\u0061\u0300".Length). This is possible because of string normalization performed by String.Normalize() method.
  • An Unicode sequence may contain a composed or decomposed sequence, for example character 한 U+D55C HAN CHARACTER may be a single code-point (encoded as a single code-unit in UTF-16) or a decomposed sequence of its syllables ᄒ, ᅡ and ᆫ. They must be compared equal.
  • One code-point may be encoded to more than one code-units: character 𠂊 U+2008A HAN CHARACTER is encoded as two System.Char ("\ud840\udc8a") even if it is just one code-point: UTF-16 encoding is not fixed size! This is a source of countless bugs (also serious security bugs), if for example your application applies a maximum length and blindly truncates string at that then you may create an invalid string.
  • Some languages have digraph and trigraphs, for example in Czech ch is a standalone letter (after h and before i then when ordering a list of strings you will have fyzika before chemie.

There are much more issues about text handling, see for example How can I perform a Unicode aware character by character comparison? for a broader introduction and more links to related arguments.

In general when dealing with international text you may use this simple function to enumerate text elements in a string (avoiding to break Unicode surrogates and encoding):

public static class StringExtensions
    public static IEnumerable<string> EnumerateCharacters(this string s)
        if (s == null)
            return Enumerable.Empty<string>();

        var enumerator = StringInfo.GetTextElementEnumerator(s.Normalize());
        while (enumerator.MoveNext())
            yield return (string)enumerator.Value;

Count characters[edit | edit source]

If you need to count characters then, for the reasons explained in Remarks section, you can't simply use Length property because it's the length of the array of System.Char which are not characters but code-units (not Unicode code-points nor graphemes). Correct code is then:

int length = text.EnumerateCharacters().Count();

A small optimization may rewrite EnumerateCharacters() extension method specifically for this purpose:

public static class StringExtensions
    public static int CountCharacters(this string text)
        if (String.IsNullOrEmpty(text))
            return 0;

        int count = 0;
        var enumerator = StringInfo.GetTextElementEnumerator(text);
        while (enumerator.MoveNext())

        return count;

Count distinct characters[edit | edit source]

If you need to count distinct characters then, for the reasons explained in Remarks section, you can't simply use Length property because it's the length of the array of System.Char which are not characters but code-units (not Unicode code-points nor graphemes). If, for example, you simply write text.Distinct().Count() you will get incorrect results, correct code:

int distinctCharactersCount = text.EnumerateCharacters().Count();

One step further is to count occurrences of each character, if performance aren't an issue you may simply do it like this (in this example regardless of case):

var frequencies = text.EnumerateCharacters()
    .GroupBy(x => x, StringComparer.CurrentCultureIgnoreCase)
    .Select(x => new { Character = x.Key, Count = x.Count() };

Convert string to/from another encoding[edit | edit source]

.NET strings contain System.Char (UTF-16 code-units). If you want to save (or manage) text with another encoding you have to work with an array of System.Byte.

Conversions are performed by classes derived from System.Text.Encoder and System.Text.Decoder which, together, can convert to/from another encoding (from a byte X encoded array byte[] to an UTF-16 encoded System.String and vice-versa).

Because the encoder/decoder usually works very close to each other they're grouped together in a class derived from System.Text.Encoding, derived classes offer conversions to/from popular encodings (UTF-8, UTF-16 and so on).

Examples:[edit | edit source]

Convert a string to UTF-8[edit | edit source]

byte[] data = Encoding.UTF8.GetBytes("This is my text");

Convert UTF-8 data to a string[edit | edit source]

var text = Encoding.UTF8.GetString(data);

Change encoding of an existing text file[edit | edit source]

This code will read content of an UTF-8 encoded text file and save it back encoded as UTF-16. Note that this code is not optimal if file is big because it will read all its content into memory:

var content = File.ReadAllText(path, Encoding.UTF8);
File.WriteAllText(content, Encoding.UTF16);

Сomparing strings[edit | edit source]

Despite String is a reference type == operator compares string values rather than references.

As you may know string is just an array of characters. But if you think that strings equality check and comparison is made character by character, you are mistaken. This operation is culture specific (see Remarks below): some character sequences can be treated as equal depending on the culture.

Think twice before short circuiting equality check by comparing Length properties of two strings!

Use overloads of String.Equals method which accept additional StringComparison enumeration value, if you need to change default behavior.

Count occurrences of a character[edit | edit source]

Because of the reasons explained in Remarks section you can't simply do this (unless you want to count occurrences of a specific code-unit):

int count = text.Count(x => x == ch);

You need a more complex function:

public static int CountOccurrencesOf(this string text, string character)
    return text.EnumerateCharacters()
        .Count(x => String.Equals(x, character, StringComparer.CurrentCulture));

Note that string comparison (in contrast to character comparison which is culture invariant) must always be performed according to rules to a specific culture.

Split string into fixed length blocks[edit | edit source]

We cannot break a string into arbitrary points (because a System.Char may not be valid alone because it's a combining character or part of a surrogate) then code must take that into account (note that with length I mean the number of graphemes not the number of code-units):

public static IEnumerable<string> Split(this string value, int desiredLength)
    var characters = StringInfo.GetTextElementEnumerator(value);
    while (characters.MoveNext())
        yield return String.Concat(Take(characters, desiredLength));

private static IEnumerable<string> Take(TextElementEnumerator enumerator, int count)
    for (int i = 0; i < count; ++i)
        yield return (string)enumerator.Current;

        if (!enumerator.MoveNext())
            yield break;

Object.ToString() virtual method[edit | edit source]

Everything in .NET is an object, hence every type has ToString() method defined in Object class which can be overridden. Default implementation of this method just returns the name of the type:

public class Foo

var foo = new Foo();
Console.WriteLine(foo); // outputs Foo

ToString() is implicitly called when concatinating value with a string:

public class Foo
    public override string ToString()
        return "I am Foo";

var foo = new Foo();
Console.WriteLine("I am bar and "+foo);// outputs I am bar and I am Foo

The result of this method is also extensively used by debugging tools. If, for some reason, you do not want to override this method, but want to customize how debugger shows the value of your type, use DebuggerDisplay Attribute (MSDN):

// [DebuggerDisplay("Person = FN {FirstName}, LN {LastName}")]
[DebuggerDisplay("Person = FN {"+nameof(Person.FirstName)+"}, LN {"+nameof(Person.LastName)+"}")]
public class Person
    public string FirstName { get; set; }
    public string LastName { get; set;}
    // ...

Immutability of strings[edit | edit source]

Strings are immutable. You just cannot change existing string. Any operation on the string crates a new instance of the string having new value. It means that if you need to replace a single character in a very long string, memory will be allocated for a new value.

string veryLongString = ...
// memory is allocated
string newString = veryLongString.Remove(0,1); // removes first character of the string.

If you need to perform many operations with string value, use StringBuilder class which is designed for efficient strings manipulation:

var sb = new StringBuilder(someInitialString);
foreach(var str in manyManyStrings)
var finalString = sb.ToString();