Bosun Complete Examples

From WikiOD

Linux Bonding Health[edit | edit source]

Template Definition[edit | edit source]

template linux.bonding {
    subject = {{.Last.Status}}: {{.Eval .Alert.Vars.by_host}} bad bond(s) on {{.Group.host}}
    body = `{{template "header" .}}
    <h2>Bond Status</h2>
    <table>
    <tr><th>Bond</th><th>Slave</th><th>Status</th></tr>
    {{range $r := .EvalAll .Alert.Vars.slave_status}}
        {{if eq $.Group.host .Group.host}}
            <tr>
                <td>{{$r.Group.bond}}</td>
                <td>{{$r.Group.slave}}</td>
                <td {{if lt $r.Value 1.0}} style="color: red;" {{end}}>{{$r.Value}}</td>
            </tr>
        {{end}}
    {{end}}
    </table>
    `
}

Alert Definition[edit | edit source]

alert linux.bonding {
    template = linux.bonding
    macro = host_based
    $notes = This alert triggers when a bond only has a single interface, or the status of a slave in the bond is not up
    $slave_status = max(q("sum:linux.net.bond.slave.is_up{bond=*,host=*,slave=*}", "5m", ""))
    $slave_status_by_bond = sum(t($slave_status, "host,bond"))
    $slave_count = max(q("sum:linux.net.bond.slave.count{bond=*,host=*}", "5m", ""))
    $no_good = $slave_status_by_bond < $slave_count || $slave_count < 2
    $by_host = max(t($no_good, "host"))
    warn = $by_host
}

Notification Priview[edit | edit source]

Complete_Examples

SSL Certs Expiring[edit | edit source]

This data is collected by the http_unit and scollector. It warns when an alert is going to expire within a certain amount of days, and then goes critical if the cert has passed the expiration date. This follows the recommended default of warn and crit usage in Bosun (warn: something is going to fail, crit: something has failed).

Template Def[edit | edit source]

template ssl.cert.expiring {
    subject = {{.Last.Status}}: SSL Cert Expiring in {{.Eval .Alert.Vars.daysLeft | printf "%.2f"}} Days for {{.Group.url_host}}
    body = `
    {{ template "header" . }}
    <table>
       <tr>
            <td>Url</td>
            <td>{{.Group.url_host}}</td>
       </tr>
       <tr>
            <td>IP Address Used for Test</td>
            <td>{{.Group.ip}}</td>
       </tr>
       <tr>
            <td>Days Remaining</td>
            <td>{{.Eval .Alert.Vars.daysLeft | printf "%.2f"}}</td>
       </tr>
       <tr>
            <td>Expiration Date</td>
            <td>{{.Last.Time.Add (parseDuration (.Eval .Alert.Vars.hoursLeft | printf "%vh")) }}</td>
       </tr>
    </table>
    `
}

Alert Definition[edit | edit source]

alert ssl.cert.expiring {
    template = ssl.cert.expiring
    ignoreUnknown = true
    $notes = This alert exists to notify of us any SSL certs that will be expiring for hosts monitored by our http unit test cases defined in the scollector configuration file.
    $expireEpoch = last(q("min:hu.cert.expires{host=ny-bosun01,url_host=*,ip=*}", "1h", ""))
    $hoursLeft = ($expireEpoch - epoch()) / d("1h")
    $daysLeft = $hoursLeft / 24
    warn = $daysLeft <= 50
    crit = $daysLeft <= 0
    warnNotification = default
    critNotification = default
}

Alert Explanation[edit | edit source]

  • q(..) (func doc) querties OpenTSDB, one of Bosun's supported backends. In returns a type called a seriesSet (which is set of time series, each identified by tag).
  • last() (func doc) takes the last value of each series in the seriesSet and returns a numberSet.
  • The metric, hu.cert.expires. is returning the Unix time stamp of when the cert will expire
  • epoch() (func doc) returns the current unix timestamp. So subtracting current unix timestamp from the expiration epoch gives is the remaining time.
  • d() (func doc) returns the number of seconds represented by the duration string, the duration string uses the same units as OpenTSDB.

Notification Preview[edit | edit source]

Complete_Examples

Example Section of scollector.toml referencing the config for httpunit test cases:[edit | edit source]

[[HTTPUnit]]
  TOML = "/opt/httpunit/data/httpunit.toml"

Header Template[edit | edit source]

In Bosun templates can reference other templates. For emails notifications, you might have a header template to show things you want in all alerts.

Header Template[edit | edit source]

template header {
    body = `
    <style>
    td, th {
        padding-right: 10px;
    }
    </style>
    <p style="font-weight: bold; text-decoration: underline;">
        <a style="padding-right: 10px;" href="{{.Ack}}">Acknowledge</a>
        <a style="padding-right: 10px;" href="{{.Rule}}">View Alert in Bosun's Rule Editor</a>
        {{if .Group.host}}
            <a style="padding-right: 10px;" href="https://status.stackexchange.com/dashboard/node?node={{.Group.host}}">View {{.Group.host}} in Opserver</a>
            <a href="http://kibana.ds.stackexchange.com/app/kibana?#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-15m,mode:quick,to:now))&_a=(columns:!(_source),index:%5Blogstash-%5DYYYY.MM.DD,interval:auto,query:(query_string:(analyze_wildcard:!t,query:'logsource:{{.Group.host}}')),sort:!('@timestamp',desc))">View {{.Group.host}} in Kibana</a>
        {{end}}
    </p>
    <table>
        <tr>
            <td><strong>Key: </strong></td>
            <td>{{printf "%s%s" .Alert.Name  .Group }}</td>
        </tr>
        <tr>
            <td><strong>Incident: </strong></td>
            <td><a href="{{.Incident}}">#{{.Last.IncidentId}}</a></td>
        </tr>
    </table>
    <br/>
    {{if .Alert.Vars.notes}}
        <p><strong>Notes:</strong> {{html .Alert.Vars.notes}}</p>
    {{end}}
    {{if .Alert.Vars.additionalNotes}}
        <p>
        {{if not .Alert.Vars.notes}}
            <strong>Notes:</strong>
        {{end}}
        {{ html .Alert.Vars.additionalNotes }}</p>
    {{end}}
    `
}

Explanations:

  • <style>...: Although style blocks are not supported in email, bosun processes style blocks and then inlines them into the html. So this is shared css for any templates that include this template.
  • The .Ack link takes you to a Bosun view where you can acknowledge the alert. The .Rule link takes you to Bosun's rule editor setting the template, rule, and time of the alert so you can modify the alert, or run it at different times.
  • Template:If .Group.host...: .Group is the tagset of the alert. So when the warn or crit expression has tags like host=*, we know the alert is in reference to a specific host in our environment. So we then show some links to host specific things.
  • The Alert name and key are included to ensure that at least the most basic information is in any alert
  • .Alert.Vars.notes this is included so if in any alert someone defines the $notes variables it will be show in the alert. The encourages people to write notes explaining the purpose of the alert and how to interpret it.
  • .Alert.Vars.additionalNotes is there in case we want to define a macro with notes, and then have instances of that macro with more notes added to the macro notes.

Credit:Stack_Overflow_Documentation