Nifi - Fun with Flow Management
What is Nifi?
What is Nifi you may ask? Nifi is a tool which is used to create and automate dataflows. This can be used for complex data routing, modification, and interchange between systems. Nifi features a slick UI in which data flows can be visually constructed in a point and click fashion.
Below are some of the basic components Nifi uses:
|Processors||Processors do the work with our data which is stored in flowfiles. Whether that is transforming, routing, or sending/receiving to an external system.|
|Flow File||A flow file is an object moving through the Nifi system. A flow file consists of content and attributes.|
|Connection||Connections tie processors together, and are used for queuing data.|
Setting Up Nifi
Starting Nifi is very quick and easy! The binaries can be downloaded directly from the Apache Nifi Project. After extracting the downloaded file, follow the instructions below to start Nifi.
- Navigate to the
- Double click
run-nifi.batto start the service.
- When done, close the running window to stop Nifi.
- Open a terminal.
- Navigate to the
- Run command
- When done, run command
./nifi.sh stopto stop Nifi.
Creating our First Flow
After Nifi fully starts you will be able to navigate to http://localhost:8080/nifi/ and you will be greated with the Canvas. The canvas is where we will add all of our processors and connect them.
Now its time to create our first flow!
Click and drag the processor icon onto the canvas grid to view the available out of the box data manipulations that can be performed.
First lets take the
InvokeHTTP processor and drop it into our canvas. After it is placed right click and select
Configure to setup the processor.
properties tab and type
http://www.google.com as the
Next in the
scheduling tab change the
run schedule to
60 sec. Finally in the
settings tab check the boxes next to:
Retry. This will auto terminate the queues, meaning we are not intending to connect them to another processor. Queues that are auto-terminated will discard any data that is sent into that queue.
Next create a
RouteOnContent processor by dragging the processor icon into the canvas.
Before configuring this processor we will select the
InvokeHTTP processor and drag a connection onto the
When you are given an option select
Response as the relationship type. This will place all of the HTTP responses into the queue. You will notice connection has an arrow indicating the direction of the data flow.
Next right click the
RouteOnContent Processor and select
Configure. Under the
properties tab, change the
Match Requirement to be
content must contain match, next click the
+ button to add our search criteria and relationship. In this example we are looking for data that contains the word
Finally we will drop our matches flowfiles into a funnel which we will use a as a queue to hold our data. Funnels also can be used to allow for multiple inputs and outputs of a queue. All data that flows in upstream is duplicated to all queues downstream. Do this by selecing the funnel Icon and dragging it onto the canvas. Create a relationship four our search criteria to one funnel, and anything that does not match to a different funnel. Your canvas should look similar to below:
Right Click on each processor and select
Start to begin data processing. This flow will simply query Google and for each response that contains the word search put it in our search queue!
The last thing we will look at is how to view the data in our queues. This can be done by right clicking on a queue with data and selecting
List queue. This shows us all the flowfiles that are stored in the queue. Each flowfile comes from one GET request by the
InvokeHTTP processor. We can view each individual flowfile by selecting the icon and then either using the download, or view buttons.
In this example we quickly set up a flow to GET HTTP data from Google and route that data. With Nifi’s point and click interface it is very easy to quickly visually design data flows, and understand how they work. It also makes it very easy to embedded retry and failure handling logic.