Nifi - Fun with Flow Management


2020/11/12

Eric


What is Nifi?

What is Nifi you may ask? Nifi is a tool which is used to create and automate dataflows. This can be used for complex data routing, modification, and interchange between systems. Nifi features a slick UI in which data flows can be visually constructed in a point and click fashion.

Main Components

Below are some of the basic components Nifi uses:

Components Descripton
Processors Processors do the work with our data which is stored in flowfiles. Whether that is transforming, routing, or sending/receiving to an external system.
Flow File A flow file is an object moving through the Nifi system. A flow file consists of content and attributes.
Connection Connections tie processors together, and are used for queuing data.

Setting Up Nifi

Starting Nifi is very quick and easy! The binaries can be downloaded directly from the Apache Nifi Project. After extracting the downloaded file, follow the instructions below to start Nifi.

Windows

  1. Navigate to the bin folder.
  2. Double click run-nifi.bat to start the service.
  3. When done, close the running window to stop Nifi.

Linux

  1. Open a terminal.
  2. Navigate to the bin folder.
  3. Run command ./nifi.sh start
  4. When done, run command ./nifi.sh stop to stop Nifi.

Creating our First Flow

After Nifi fully starts you will be able to navigate to http://localhost:8080/nifi/ and you will be greated with the Canvas. The canvas is where we will add all of our processors and connect them.

nifi canvas

The blank Nifi canvas

Now its time to create our first flow!

processor icon

Click and drag the processor icon onto the canvas grid to view the available out of the box data manipulations that can be performed.

First lets take the InvokeHTTP processor and drop it into our canvas. After it is placed right click and select Configure to setup the processor.

Processor context menu

Nifi processor context menu

Select the properties tab and type http://www.google.com as the Remote URL. configure

Next in the scheduling tab change the run schedule to 60 sec. Finally in the settings tab check the boxes next to: Failure, No Retry, Original, and Retry. This will auto terminate the queues, meaning we are not intending to connect them to another processor. Queues that are auto-terminated will discard any data that is sent into that queue.

Next create a RouteOnContent processor by dragging the processor icon into the canvas.

Nifi processors waiting without conections

Nifi processors before connecting

Before configuring this processor we will select the InvokeHTTP processor and drag a connection onto the RouteOnContent processor.

When you are given an option select Response as the relationship type. This will place all of the HTTP responses into the queue. You will notice connection has an arrow indicating the direction of the data flow.

configure

Next right click the RouteOnContent Processor and select Configure. Under the properties tab, change the Match Requirement to be content must contain match, next click the + button to add our search criteria and relationship. In this example we are looking for data that contains the word search

configure

Finally we will drop our matches flowfiles into a funnel which we will use a as a queue to hold our data. Funnels also can be used to allow for multiple inputs and outputs of a queue. All data that flows in upstream is duplicated to all queues downstream. Do this by selecing the funnel Icon and dragging it onto the canvas. Create a relationship four our search criteria to one funnel, and anything that does not match to a different funnel. Your canvas should look similar to below:

Our completed Nifi Flow!

Our completed Nifi Flow!

Right Click on each processor and select Start to begin data processing. This flow will simply query Google and for each response that contains the word search put it in our search queue!

The last thing we will look at is how to view the data in our queues. This can be done by right clicking on a queue with data and selecting List queue. This shows us all the flowfiles that are stored in the queue. Each flowfile comes from one GET request by the InvokeHTTP processor. We can view each individual flowfile by selecting the icon configure and then either using the download, or view buttons.

configure

In this example we quickly set up a flow to GET HTTP data from Google and route that data. With Nifi’s point and click interface it is very easy to quickly visually design data flows, and understand how they work. It also makes it very easy to embedded retry and failure handling logic.