Theory and Design of the Automatic Statistical Backoff Algorythm for Agents Sending Events

The idea is to have the Partout Master be able to control and statistically throttle the volume of events being sent by large numbers of agents.

Event Content

	Field	Description
¹	agent_uuid	Agent's unique id
	hostname	Agent's hostname
	arch	Agent's architecture
	platform	Agent's platform
	os_release	Agent's Operating System release
	os_family	Agent's Operating System family
	os_dist_name	Agent's Operating System Distribution name
	os_dist_version_id	Agent's Operating System Distribution version identifier
²	module	Name of the agent module sending this event
³	object	Name or title of the object subject to this event
⁴	msg	Description of the Event or Action

¹ Primary agent identifier

⁴

Event Content Throttle Levels

Level	Aggregates
0	~~Agent alive msgs for at Agent UUID level~~
1	Counts by agent_uuid
2	Counts by agent_uuid & module
3	Counts by agent_uuid & module & object
4	Counts by agent_uuid & module & object & msg

Default starts at level 4.

Event Throttling by Time Period

Master tells the agent's (via event send responses (POSTs to /events)) to use an event collection period based on:

E^m = Events per minute arriving at the master

P^s = Collection Period in seconds

D = Event rate divisor

P^min = Minimum floor collection period in seconds, e.g. 10 secs

P^max = Maximum collection period in seconds, whereupon the agents must implement event level aggregation

P^s = int(E^m / D)

if P^s < P^min then P^s = P^min

if P^s > P^max then P^s = P^max

Event Throttling by Aggregate Level (≥ P^max)

if P^s ≥ P^max then ...

Thresholds of E^m events per minute:

T^obj = 500

T^mod = 2000

T^uuid = 5000

T^max = 10000

Calculate Aggregate Level:

if E^m < T^obj

A^level = 4 // all detail

if E^m ≥ T^obj & < T^mod

A^level = 3 // aggregate at Object

if E^m ≥ T^mod & < T^uuid

A^level = 2 // aggregate at Module

if E^m ≥ T^uuid & < T^max

A^level = 1 // aggregate at Agent UUID

~~if E^m ≥ T^max~~

~~A^level = 0 // aggregate on Agent alive msgs (same content as level 1)~~

Examples of Aggregated Events

{
  agent: {  // level 0/1
    uuid: ...,
    count: n,
    modules: {  // level 2
      module_name: {
        count: n,
        objects: {  // level 3
          object_name_base64: {
            count: n,
            messages : {  // level 4
              msg_base64: {
                level: error|info,
                count: n
              }
            }, ...
        }, ...
    }, ...
  },
  period_secs: n
}

e.g.:

{
  agent: {
    uuid: '89609ae7-c955-4109-a2b8-4d0e7edcf460',
    count: 1,
    modules: {
      'file': {
        count: 1,
        objects: {
          '8765765jhgfjfhgf876587': {  // for special chars handling
            count: 1,
            messages: {
              '786576576576dsfsdfsfdsdf976876': {
                level: 'info',
                count: 1
              }
            }
          }
        }
      }
    }
  },
  period_secs: 10
}

RESTful API

POST /events JSON as above

(Note: /event api will be deprecated.)

Events Response for Throttling

The /events api will always respond with the current aggregate throttling settings for the agents. e.g.:

res
.status(500)
.send({
  aggregate_period_secs: 60,
  aggregate_period_splay: 0.05,
  aggregate_level: 4,
  notify_alive_period_secs: 60 * 60 * 24
});

Where:

aggregate_period_secs = P^s
aggregate_level = 0 - 4
notify_alive_period_secs = final fallback level, notify agent is alive (includes above aggregates counts at uuid detail level 0).

Aggregate Event Storage

Data will be stored in the ArangoDB database.

Detail Levels

Period	Detail Level
Current day	Raw aggregate data from agents
Past 7 days	Hourly aggregates
Past 31 days	Daily aggregates
Past 365 days (or more)	Weekly aggregates

To Concider

Combination of per Master traffic thresholds AND per agent traffic thresholds???

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Theory and Design of the Automatic Statistical Backoff Algorythm for Agents Sending Events

Event Content

Event Content Throttle Levels

Event Throttling by Time Period

Event Throttling by Aggregate Level (≥ P^max)

Thresholds of E^m events per minute:

Calculate Aggregate Level:

Examples of Aggregated Events

RESTful API

Events Response for Throttling

Aggregate Event Storage

Detail Levels

To Concider

FilesExpand file tree

Event_Sending_Statistical_Backoff.md

Latest commit

History

Event_Sending_Statistical_Backoff.md

File metadata and controls

Theory and Design of the Automatic Statistical Backoff Algorythm for Agents Sending Events

Event Content

Event Content Throttle Levels

Event Throttling by Time Period

Event Throttling by Aggregate Level (≥ Pmax)

Thresholds of Em events per minute:

Calculate Aggregate Level:

Examples of Aggregated Events

RESTful API

Events Response for Throttling

Aggregate Event Storage

Detail Levels

To Concider

Event Throttling by Aggregate Level (≥ P^max)

Thresholds of E^m events per minute: