Sometimes you would like to see how well your system is performing, but this information is hard to extract from all the data available to you. The processor usage, queue lengths and message rates, database transaction times and numbers - they do not tell the full story. You, as the developer of the system, know best what parts of the system and what processes should be monitored - and how. Youw would like to measure the number of business operations performed and time taken. But how to collect this information from multiple machines and present it in a graphical form? There are many options, however here I will show how to do this using the Seq log collector.

Why Seq?

There are many other options, each having its strong and weak sides:

  • Windows performance counters - available on each Windows box, but it's hard to report from them across machines
  • ELK stack - powerful, yet challenging when it comes to setup
  • Application Insights - also powerful, but latency and the UX is something that puts many people off
  • miriad of other systems like New Relic

And then there is Seq which I find really sleek, fast and easy to setup (unfortunately it's Windows - only). It works really well together with Serilog structured logging framework for .NET.

So let's assume you're already using Seq for log collection and browsing and now you're wondering about a way to use it to visualize performance of your system. Seq does not have a powerful analytics or visualization engine, but let's see what we can do.

Business transaction monitoring

First we need some piece of software to monitor to use as an example. I created a simple producer - consumer simulation, where we'll be monitoring the queue length and number of items processed per second.

The code is available here.

In a real system you'd be monitoring duration of a business transaction triggered by the message, for instance the time it takes to compose and write an order into the database. Observing this value over longer periods of time is especially interesting, because it could tell you the process is slowing down over the time and you need take some action (add a missing index?) before it's too late. Or maybe this new feature you added recently has negative impact on the order-placing functionality which is not something you expected. With proper monitoring, you know those things up front and you can prevent disasters from happening.

The simulation

Let's produce some data.

static async Task ProduceData(BlockingCollection<int> queue, CancellationToken token) 
{
    var i = 0;
    try
    {
        while(!token.IsCancellationRequested)
        {
            var producingTime = GetCurrentDelay();
            await Task.Delay(producingTime, token);

            Log.Debug("Event {EventName}, item was {Item}, duration was {DurationInMs} ms", 
                "ItemProduced", i, producingTime.TotalMilliseconds);

            queue.Add(i++);
        }
    }
    catch(TaskCanceledException) {}

    queue.CompleteAdding(); 
}

Note the EventName and DurationInMs properties that are logged here. Thanks to structured logging we're doing here, we'll be able to filter and aggregate those properties in our monitoring solution. No need to tediously parse text logs to extract this information.

GetCurrentDelay adds some variance to the data production rate, while having a particular pattern.

Now we need a data consumer.

static async Task ConsumeData(BlockingCollection<int> queue, CancellationToken token) 
{
    try
    {
        var random = new Random();

        foreach(var item in queue.GetConsumingEnumerable(token)) 
        {
            var consumingTime = random.Next(MinProcessingTimeInMs, MaxProcessingTimeInMs);

            await Task.Delay(consumingTime, token);

            Log.Debug("Event {EventName}, item was {Item}, duration was {DurationInMs} ms", 
                "ItemConsumed", item, consumingTime);
        }
    }
    catch (TaskCanceledException) { } 
    catch (OperationCanceledException) { } 
}    

We would also like to monitor the current length of the queue. Here, we'll be doing this outside of the producer and consumer. We'll observe the value about twice a second.

static async Task ObserveQueueLength(BlockingCollection<int> queue, CancellationToken token)
{
    try {
        while(!token.IsCancellationRequested)
        {
            await Task.Delay(500, token);
            Log.Information("Metric {MetricName} = {MetricValue}", "QueueLength", queue.Count);
        }
    }
    catch (TaskCanceledException) { }
}

Now, we need to put all of this together - and configure logging to Seq.

Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Debug()
    .WriteTo.Seq("http://localhost:5341")
    .WriteTo.ColoredConsole()
    .CreateLogger();

var cts = new CancellationTokenSource();
var queue = new BlockingCollection<int>();

var tasks = new[] 
{
    Task.Run(() => ConsumeData(queue, cts.Token)),
    Task.Run(() => ProduceData(queue, cts.Token)),
    Task.Run(() => ObserveQueueLength(queue, cts.Token))
};

Making sense out of the logs

So after running the simulation for a little while we can browse the logs in Seq web interface, and also do some simple filtering on the properties of the log entries.

Log entries

With simple SQL-like queries and visualisations (introduced in Seq 3.3) we can also see things like average number of items consumed per second:

select count(*) 
from stream 
where EventName = 'ItemConsumed'
group by time(1s)
order by time desc 
limit 300

Messages per second

The limit 300 clause combined with ordering by time gives us 300 most recent results, which in this case results from last 300 seconds.

Let's also take a look at mean queue length per second:

select mean(MetricValue) 
from stream 
where MetricName = 'QueueLength'
group by time(1s)
order by time desc 
limit 300

Queue length

Uh oh, seems like our queue is growing indefinitely. Let's fix this by adding another consumer.

var tasks = new[] 
{
    Task.Run(() => ConsumeData(queue, cts.Token)),
    Task.Run(() => ConsumeData(queue, cts.Token)), // add one more
    Task.Run(() => ProduceData(queue, cts.Token)),
    Task.Run(() => ObserveQueueLength(queue, cts.Token))
};

And now it looks much better :)

Messages per second 2

Queue length 2

Summary

If you need advanced performance analytics in your system, Seq is not for you. If however you are using it to collect logs in a distributed system anyway, why not try and extract some valueable information out of them?

About the author:

Marcin Budny

Team lead of R&D at BT Skyrise. Works with Intelligent Transport Systems (ITS), where he faces problems in terms of equipment and software. He is passionate about the architecture of applications, cloudcomputing and IT systems quality. He constantly searches for novelties on software development. Marcin specializes in .NET technology, however he looks for inspiration in other languages and platforms.

Next Post Previous Post

blog comments powered by Disqus