Nifi - Fun with Flow Management
2020/11/12
Eric

What is Nifi?
What is Nifi you may ask? Nifi is a tool which is used to create and automate dataflows. This can be used for complex data routing, modification, and interchange between systems. Nifi features a slick UI in which data flows can be visually constructed in a point and click fashion.
Main Components
Below are some of the basic components Nifi uses:
Components | Descripton |
---|---|
Processors | Processors do the work with our data which is stored in flowfiles. Whether that is transforming, routing, or sending/receiving to an external system. |
Flow File | A flow file is an object moving through the Nifi system. A flow file consists of content and attributes. |
Connection | Connections tie processors together, and are used for queuing data. |
Setting Up Nifi
Starting Nifi is very quick and easy! The binaries can be downloaded directly from the Apache Nifi Project. After extracting the downloaded file, follow the instructions below to start Nifi.
Windows
- Navigate to the
bin
folder. - Double click
run-nifi.bat
to start the service. - When done, close the running window to stop Nifi.
Linux
- Open a terminal.
- Navigate to the
bin
folder. - Run command
./nifi.sh start
- When done, run command
./nifi.sh stop
to stop Nifi.
Creating our First Flow
After Nifi fully starts you will be able to navigate to http://localhost:8080/nifi/ and you will be greated with the Canvas. The canvas is where we will add all of our processors and connect them.
The blank Nifi canvas
Now its time to create our first flow!
Click and drag the processor icon onto the canvas grid to view the available out of the box data manipulations that can be performed.
First lets take the InvokeHTTP
processor and drop it into our canvas. After it is placed right click and select Configure
to setup the processor.

Nifi processor context menu
Select the properties
tab and type http://www.google.com
as the Remote URL
.
Next in the scheduling
tab change the run schedule
to 60 sec
. Finally in the settings
tab check the boxes next to: Failure
, No Retry
, Original
, and Retry
. This will auto terminate the queues, meaning we are not intending to connect them to another processor. Queues that are auto-terminated will discard any data that is sent into that queue.
Next create a RouteOnContent
processor by dragging the processor icon into the canvas.

Nifi processors before connecting
Before configuring this processor we will select the InvokeHTTP
processor and drag a connection onto the RouteOnContent
processor.
When you are given an option select Response
as the relationship type. This will place all of the HTTP responses into the queue. You will notice connection has an arrow indicating the direction of the data flow.
Next right click the RouteOnContent
Processor and select Configure
. Under the properties
tab, change the Match Requirement
to be content must contain match
, next click the +
button to add our search criteria and relationship. In this example we are looking for data that contains the word search
Finally we will drop our matches flowfiles into a funnel which we will use a as a queue to hold our data. Funnels also can be used to allow for multiple inputs and outputs of a queue. All data that flows in upstream is duplicated to all queues downstream. Do this by selecing the funnel Icon and dragging it onto the canvas. Create a relationship four our search criteria to one funnel, and anything that does not match to a different funnel. Your canvas should look similar to below:

Our completed Nifi Flow!
Right Click on each processor and select Start
to begin data processing. This flow will simply query Google and for each response that contains the word search put it in our search queue!
The last thing we will look at is how to view the data in our queues. This can be done by right clicking on a queue with data and selecting List queue
. This shows us all the flowfiles that are stored in the queue. Each flowfile comes from one GET request by the InvokeHTTP
processor. We can view each individual flowfile by selecting the icon and then either using the download, or view buttons.
In this example we quickly set up a flow to GET HTTP data from Google and route that data. With Nifi’s point and click interface it is very easy to quickly visually design data flows, and understand how they work. It also makes it very easy to embedded retry and failure handling logic.