Identify and eliminate bottlenecks in your application for optimized performance.
Garbage collection (GC) is the process of automatic memory management. It manages the allocation and release of memory for applications by attempting to reclaim allocated but no longer referenced memory. This memory is called garbage.
You can find GC in languages such as .NET, Java, and Python. It tries to solve the following problems:
GC can solve all these problems by intelligently allocating and deallocating memory.
When you instantiate an object by using the new keyword, GC finds a place in the memory to store it and returns a reference to that memory location. GC also deallocates the memory automatically. This process results in an algorithm exhibiting different behaviors on different runs, even for the same input. As a result, you don’t have control over the time the GC cleans up. All you know is that it occurs whenever deemed necessary. GC typically runs as much as possible when the system has low memory.
You can avoid most problems with GC by manually running it. For many scenarios, this is enough. But sometimes, you need to be more precise about when GC occurs because your code may allocate a lot of memory without freeing it. This increases the GC’s workload, stealing CPU cycles from your main threads. Eventually, the application is busier with GC than with the actual code. The diversion of CPU cycles from an application to GC is called GC pressure.
This section outlines some best practices for reducing GC pressure.
The IDisposable pattern can reduce GC pressure by handling a possible problem. Imagine that a class opens a file handle. If the file isn’t closed, the class remains allocated in memory and keeps using it even if an application no longer references it. This goes for all unmanaged operations, such as database connections and network handles. In these cases, you can use a destructor on the class. A destructor is a method that is automatically invoked when an object is destroyed.
When the object is removed from memory, it goes to the finalizer queue. A finalizer is a special method that performs finalization, generally some form of cleanup. At the next GC, you call the destructors, and you can use them to close files. However, this isn’t guaranteed (nondeterministic). The program may terminate without calling the destructors.
To understand how this pattern handles this problem, create a primitive logging class to see the issues with nondeterministic GC and how to solve them.
The Logger class creates a new file in its constructor. A constructor is a special type of subroutine called to create an object. It holds a file handle (through the _logStream variable). The file handle isn’t released automatically when the class goes out of scope:
internal class Logger
{
const string _logfilePath = @"c:\temp\GC.log";
private readonly StreamWriter _logStream;
public Logger()
{
_logStream = File.CreateText(_logfilePath);
}
public void LogInfo(string s)
{
_logStream.WriteLine("info\t" + s);
}
public void LogWarning(string s)
{
_logStream.WriteLine("warning\t" + s);
}
public void LogError(string s)
{
_logStream.WriteLine("error\t" + s);
}
}
internal class Program
{
static void Main(string[] args)
{
Logger _logger = new Logger();
_logger.LogInfo("Hello world");
Console.WriteLine("Hello, World!");
}
}
In the constructor, create a new file called GC.log. In the Program.Main function, instantiate this class and write to the log file. When you run the program and check the log file, you see that the file has been created but is empty. This is because the file never closes, so it’s never flushed.
Enhance the Logger class as follows:
internal class Logger
{
const string _logfilePath = @"c:\temp";
private readonly StreamWriter _logStream;
public void Close()
{
_logStream.Dispose();
}
~Logger() {
_logStream.Dispose();
}
}
When you call the Dispose method explicitly, it disposes of the log file. For a file handle, that means flushing and closing the file and releasing all the resources. Once you call the _logger.Close function in the Main function, the destructor command does its job.
Enhance your code by adding a try finally block:
static void Main(string[] args)
{
Logger _logger = null;
try
{
_logger = new Logger();
_logger.LogInfo("Hello world");
}
finally
{
_logger?.Close();
}
}
Here are the stats after using the BenchmarkDotNet tool to analyze the code’s runtime performance:
Fig.1: Performance when implementing the IDisposable patternThe try finally block ensures that the code always calls the Dispose method. Put this in a using block:
using (Logger _logger = new Logger())
{
_logger.LogInfo("Hello world");
}
This does the same as the previous block of code, but on one condition: The Logger class must implement the IDisposable interface. This interface contains one method, void Dispose, so the changes to your class are minimal:
internal class Logger : IDisposable
{
const string _logfilePath = @"c:\temp";
private StreamWriter _logStream = null;
public Logger()
{
string filename = "GC_" + Path.GetRandomFileName();
_logStream = File.CreateText(Path.Combine(_logfilePath, filename));
Console.WriteLine($"{filename} created");
}
public void Dispose()
{
if (_logStream != null)
{
_logStream.Dispose();
_logStream = null; // avoid problems when Dispose is called > 1
System.GC.SuppressFinalize(this);
}
}
~Logger()
{
_logStream.Dispose();
}
}
Note that you must implement the IDisposable pattern in the Logger class because it contains the Disposable field _logStream.
There isn’t much difference in performance between the IDisposable pattern and the using block or the try finally block for logging:
Fig. 2: Performance when implementing the using or try finally blocksIt’s almost impossible to have memory leaks with value-typed variables. Value types comprise most primitive data types (int, float, et al.), enum data types, and structs.
Value types directly contain their data. For example:
int i = 5;
This creates a zone of four bytes named i, and its initial value equals 5. It doesn’t create dynamic memory — there’s no new keyword, but it creates the zone on the stack, meaning this variable is automatically destroyed at the end of its scope.
Reference types only create a reference when the keyword new instantiates a class. When the reference goes out of scope, it doesn’t automatically free the allocated memory.
For memory purposes, a struct is more efficient than a class because it’s immediately destroyed when it goes out of class. Unfortunately, structs don’t support inheritance.
Sometimes, it may be beneficial to create a class so you can initialize its members only when instantiating the class. After the initialization, you can’t modify the members but only consult (read) them, which can reduce the complexity of a program.
Here’s a simple example:
internal class Complex
{
public double Real { get; init; } = 0.0;
public double Imaginary { get; init; } = 0.0;
public Complex(double real = 0.0, double imaginary = 0.0)
{
Real = real;
Imaginary = imaginary;
}
public Complex Add(Complex x) => new Complex(this.Real + x.Real, this.Imaginary + x.Imaginary);
public override string ToString()
{
return $"{Real:F} + {Imaginary:F}i";
}
}
The two properties Real and Imaginary are read-only. There's also an Add function that returns a new Complex class. So, the original class is never modified, but it generates a new class every time:
Complex c = new Complex(1.1, 2.2);
Complex a = new Complex(0.1, 0.1);
for (int i = 0; i < 1000; i++)
{
c = c.Add(a);
}
Console.WriteLine(c);
Here are the results from the BenchmarkDotNet tool on the above code:
Fig. 3: Performance stats when calling the Add method 1,000 times, which instantiates Complex class each time. GC cleans unused classes from the heapNow when you perform some calculations on c, a new Complex object is instantiated in each iteration of the loop. In this case, 1,000 objects are created on the heap. These are deallocated later by the GC.
You can replace the Add function from the Complex class to return an existing object instead of instantiating a new object every time:
public Complex Add(Complex x)
{
this.Real += x.Real;
this.Imaginary += x.Imaginary;
return this;
}
Here are the BenchmarkDotNet stats:
Fig.4: In this case, GC didn’t play any role and the memory allocated is 168BThe string
.NET class is immutable. So, every time you write t +=, a new string is instantiated. This is not visible in the code, but it may provoke many GCs. StringBuilder is a mutable class. It maintains an internal buffer for its contents and only allocates memory when the buffer isn’t large enough. This results in fewer memory allocations and fewer GC cycles.
The following example uses HTML to produce a multiplication table:
public string GenerateTable()
{
string t = "<table>";
for (int i = 0; i < _rows; i++)
{
t += "<tr>";
for (int j = 0; j < _cols; j++)
{
t += "<td>" + (i * j) + "</td>";
}
t += "</tr>";
}
t += "</table>";
return t;
}
The string
.NET class is immutable. So, every time you write t +=, a new string is instantiated. This isn’t visible in the code, but it may provoke many GCs.
Use StringBuilder to rewrite the code as follows:
public string GenerateTableWithStringBuilder()
{
StringBuilder sb = new StringBuilder("<table>");
for (int i = 0; i < _rows; i++)
{
sb.Append("<tr>");
for (int j = 0; j < _cols; j++)
{
sb.AppendFormat("<td>{0}</td>", i * j);
} sb.Append("</tr>");
}
sb.Append("</table>");
return sb.ToString();
}
You can see from these stats that using StringBuilder optimizes speed and memory use:
Fig.5: Performance when using StringBuilderWhen allocating data structures such as arrays, it’s best to use fixed sizes instead of dynamic sizes. This minimizes the amount of GC that the system must perform because dynamic sizes can often lead to resizing, which can be costly in terms of both time and memory.
Fixed-size arrays are also easier to understand and debug, and they’re more likely to be optimized by the compiler. In general, it’s best to avoid resizing data structures whenever possible.
Here’s some sample code:
<data_type>[] array_identifier = new <data_type>[size_of_array];
GC in .NET does a lot to avoid memory problems. Unfortunately, it can’t handle unmanaged resources, such as file handles, database connections, and unmanaged memory. In these cases, you need to implement the IDisposable pattern.
This article showed you how and when to use this pattern properly. It also explained the difference between value and reference types and pointed out that a struct may be better than a class if you don’t need inheritance. Finally, it examined immutable classes, with string as the major example.
.NET implements strings as constant classes for efficiency and to protect their internal buffers. If you require a lot of mutations on a string, it’s better to use StringBuilder for the intermediate work. This is faster and impacts the GC less.
Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.
Apply Now