Data analysis

Estimated time: 15–25 minutes

This tutorial shows you how to detect anomalies in stock data.

You will obtain the last three months of closing prices for Tesla stock and simulate the experience of streaming real-time data and acting upon it. The service will calculate the stock price's trailing five-day average and deviation. Then, the service will log a message indicating whether to buy, sell, or hold—in other words, if the price is above, below, or within the normal range, or "control band."

In addition to anomaly detection, this tutorial provides an introduction to the concept of manipulating the timing of signals in streaming data.


Create service

This service will perform statistical analysis on streaming data to determine outliers from the norm.

  1. Click create new service.
  2. In the service name box, enter detect-outliers.

Request data

First, use an IdentityIntervalSimulator block to drive your service. Then, configure an HTTPRequests block so that whenever it receives a signal, it will request three months of historic Tesla stock data from the IEX API.

  1. Drag an IdentityIntervalSimulator block onto your canvas.
  2. Configure the block as follows:
    1. Name: Simulate 1 Signal Per Day
    2. Days: 1
    3. Seconds: 0
  3. Click accept.

Once each day, a signal will be generated to drive your service.

  1. Drag an HTTPRequests block onto your canvas.
  2. Configure the block as follows:
    1. Name: Get Tesla History 3m
    2. HTTP Method:GET
    3. URL Target: https://api.iextrading.com/1.0/stock/tsla/chart/3m
      Here is a breakdown of the above URL:
      • https://api.iextrading.com/1.0 is the base URL
      • /stock indicates the IEX API stock collection
      • /tsla is the stock symbol for Tesla
      • /chart/3m denotes time series data /chart for the last three months /3m
  3. Click accept.
  4. From the configured tab of the block library, drag the Logger block onto your canvas.

 

[info] Why only once a day?

Since the API data you are polling only changes once a day, producing a signal more frequently is not necessary. Don't worry, you won't have to wait days to see the results of your service!


Connect blocks and run service

  1. Connect the blocks according to the image at right.
  2. Click save.
  3. Click start.
  4. Click open logger panel to view the last three months of Tesla stock data.

The past three months of Tesla stock data is now available in your service. The data comes in as different signals, but it comes in all at once, in one big chunk of signals.


Add timing blocks

As mentioned before, the data from the API comes in as individual signals, but all at once. This is not how streaming data usually works, so now you are going to use the Queue block to break up this chunk of data and simulate streaming data over time, as if you were sent the closing stock price at the end of the day, each day. In this case, you are going to simulate receiving one day of stock data each second.

First, configure an AttributeSelector block to trim your signal to include only the date and close attributes.

  1. Drag an AttributeSelector block onto your canvas.
  2. Configure the block as follows:
    1. Name: Select Date and Close
    2. Click the + Incoming signal attributes twice and enter the following values:
      • Incoming signal attributes box 1: date
      • Incoming signal attributes box 2: close
      • Selector Mode: WHITELIST
  3. Click accept.
  4. Drag a Queue block onto your canvas.
  5. Configure the block as follows:
    1. Name: Stream One Signal Per Second
    2. Load from Persistence?: False (deselect the radio button)
    3. Notification Interval: 1 second
  6. Click accept.


[info] Understanding the Queue block

The basic configuration of the Queue block involves a Capacity, a Chunk Size, and a Notification Interval. Think of the queue as a tube that can hold a single column of balls. Each ball is a signal. The capacity of the queue is the length of the tube—how many signals, or balls, the queue can hold. The chunk size is the number of signals to release at once from the end of the tube, and the notification interval is the frequency with which each chunk of signals is emitted.


[info] Capacity and chunk size

The default Capacity of 100 is enough to hold three months of stock data (approximately 67 signals). The default Chunk Size of 1 will emit one signal at a time.


[info] Why is timing important?

The Queue block enables you to control the timing of your signals. When you configure the Stream One Signal Per Second block to emit one signal every second, you break up the chunk of data received from the HTTP request into signals that will be emitted at known intervals.

Timing can be used in creative ways in stream processing. This service introduces timing as a way to spread the signals out so you know when they will be emitted. Timing has other uses in stream processing such as sorting and grouping signals.


Calculate Rolling Average

In systems containing natural variation, it can be difficult to determine what is "normal." Using a moving range can help. The long-term mean and standard deviation are not as relevant to the current trend as are the near-term mean and standard deviation. Every band of data has its own "normal." The ControlBands block will create statistics such as mean and standard deviation from a moving range.

  1. Drag a ControlBands block onto your canvas.
  2. Configure the block as follows:
    1. Name: Calculate Rolling Average Every 5 Seconds
    2. Load from Persistence?: False (deselect the radio button)
    3. Days: 0
    4. Seconds: 5
    5. Value: {{ $close }}
  3. Click accept.
  4. Connect the blocks as configured in the diagram.

In this block, because we controlled the timing of the signals, five seconds is the same as five signals (or five days) worth of streaming data. The ControlBands block calculates the mean and deviation of the signal's close value over a rolling window, in this case, the last five signals.

If you start your service, you will see "band_data" statistics appended to the signals in the logger panel. If each signal was parsed into separate lines, it would resemble this format:

{
  "band_data": {
    "deviation": 0.5141843971631065,
    "deviations": 13.438758620689962,
    "mean": 340.41,
    "value": 347.32
  },
  "close": 347.32,
  "date": "2017-06-05"
}

Define outliers and create message

You can use this band data to define what is "normal" for your system. You can create a message that tells you if the close value in each signal meets the criteria for normal or if it is an outlier. A ConditionalModifier block is a good choice for this task since it allows you to add attributes, in this case a message, that is customized to conditions that you define.

Following the old adage of "buy low, sell high," one message will be to buy when the price of the stock is trending below the norm and another message will be to sell when the price is trending above the norm. A final message, to "hold," will result when the stock value is within a normal range or "control band."

  1. Drag a ConditionalModifier block onto your canvas.
  2. Configure the block as follows:
    1. Name: Define Outliers
    2. Click + Field once
    3. Field: message
    4. Click + Lookup three times
    5. Enter the expression that needs to be true for the price to be considered above the normal range:
      • Formula Box 1: {{ $close > $band_data['mean'] + $band_data['deviation'] }} The values 'mean' and 'deviation' nested in $band_data can be accessed with bracket notation
      • Value Box 1: Price is above normal, SELL!
    6. Enter the expression that needs to be true for the price to be considered below the normal range:
      • Formula Box 2: {{ $close < $band_data['mean'] - $band_data['deviation'] }}
      • Value Box 2: Price is below normal, BUY!
    7. If neither of the previous formulas evaluate to true, this message will be returned:
      • Formula Box 3: Do not change {{ True }}
      • Value Box 3: Price within normal bounds, HOLD.
  3. Click accept.

 

[info] Normal range defined

In this case, the "normal range" will be defined as the moving average (or "mean") plus one standard deviation above and one standard deviation below the mean.

Format results

The final step is to reduce the outgoing signal to the date and the message attributes with a second AttributeSelector block.

  1. Drag an AttributeSelector block onto your canvas.
  2. Configure the block as follows:
    1. Name: Include Message and Date
    2. Click the + Incoming signal attributes twice and enter the following values:
      • Incoming signal attributes box 1: message
      • Incoming signal attributes box 2: date
    3. Selector Mode: WHITELIST
  3. Click accept.

Connect blocks and run service

  1. Connect the blocks as configured in the diagram.
  2. To see your changes take effect, save and restart the service.
  3. Open the logger panel to see the messages that will tell you day by day, up to the current day, whether to buy or sell your Tesla stock, and why.

Check your results against three months of Tesla stock data. The buy dates from the logs should be near a local minimum or valley and the sell dates should be near a local maximum or peak.

 

[info] Keep in mind

Statistics at the beginning of the service are more variable because they calculate from a very short history. Results should settle down after the number of signals in the moving window have passed.


Summary

In this tutorial you performed statistical analysis on streaming data to determine outliers. Identifying outliers is important for generating alerts and determining significant events.

For example, if you set up a service to detect when a piece of machinery started performing outside of its normal pattern, you could be alerted to perform preventative maintenance. Or you could send yourself a notification when tweets about a company either go viral or fall off the radar.


Extra credit

Here are some ideas to improve or expand the services you just created.

  • How might you improve the algorithm to detect anomalies? What additional parameters might you add for more accuracy?
  • How might you keep track of trends, or the number of consecutive days a stock price is above or below the normal range?
  • Drive this service with the twitter-winner service and modify it to get messages based on the stock of the twitter winner.
  • Could you use a similar, but inverse technique to find the signal in the noise?
  • How could timing be used to discern a signal in the noise?
  • How might you use the Queue and AggregateStreams blocks to calculate your own rolling average?
  • How could you have nio text your mobile phone number when the message content changes? Hint: Look into the Twilio block.

Getting help

We're always happy to help with any questions you might have about the nio Platform. View the troubleshooting guide, search the documentation, or post your questions in the forum. You can also contact live support by clicking the chat icon in the lower-right corner of the nio System Designer.


 

proceed to tutorial 5: database insertion »
 

results matching ""

    No results matching ""