Today’s post is an introduction to Twitter Streaming API and Redis in Java, using third-party libraries to use these components easily.

Project

I’ll use IntellijIdea as IDE through this tutorial ; it’s also possible to use Eclipse and NetBeans. Start by creating a new Maven project, uncheck From archetype then click Next. Provide the informations relative to your projets then click Create.

Screen Shot 2014-11-19 at 15.53.06

The important file you’ll need is pom.xml, which acts as the dependency manager reference for Maven. Indeed, that’s were you’ll tell your program to integrate new dependencies. We’ll use several third-party libraries, for both Twitter client and Redis management. Start by adding the following dependencies:

You may also specify a specific build configuration:

Last thing is to setup a run configuration using Run > Edit Configurations. Add a new one of type Application ; enter your MainClass and check Single instance only to avoid stopping and starting server manually each time your re-deploy.

Screen Shot 2014-11-19 at 16.05.21

You’re all set up!

Twitter Streaming API

Specifications can be found here but to put it in a nutshell, in contrary to a standard REST API which serves a response per request, a streaming API needs a connection to be opened, and the API is then sending a stream continuously. You may have to reload the connection sometimes if no data is transmitted during a pre-established time interval, but the general behaviour is the websocket-like behaviour.

To get rid of low-level implementation details, we’ll use a third-party library named Hosebird Client. You already imported this dependency in your pom.xml file, so we’ll ask Maven to download it automatically. In the right side panel, open the Maven box and under yourProject > Lifecycle, sequentially hit CleanPackage and Install. You can now use the lib in your project without any effort.

Prerequisites: you have create a new application on Twitter dev portal from here and generate a new access token. Save your app consumer key and secret, and your access token and access token secret. You’ll need them very soon.

Here is the code snippet that listen for a new tweets (or retweets) containing the hashtag #nscurious:

Run this code, try posting a new tweet containing the hashtag #nscurious and have a look at your console log. A massive string will be printed ; in fact, this is the complet Json object of your tweet, with its unique Id, its content, the hashtag and mentions contained and so on. You can have a deeper look at a basic tweet structure on the Twitter dev portal, we’ll simply use its unique Id in this tutorial as it’s for demonstration purpose only.

Save Them All (into Redis)

Now you keep track of tweets live, you may want to store them into a database to provide data visualisation later for exemple. This topic does not deal with data analysis and data mining, so we’ll only store the full tweet structure so that it’s possible to find it back later. Start by downloading Redis from the official website and install it on your local machine (or on a remote server). Run Redis, the default port is 6379. Your Redis is now live, and you can use Redis Desktop Manager as a GUI to manager your Redis databases.

Redis is using a master/slave replication model to offer high performances. To summarise, write operations are done on the master instance (single instance) and data are automatically replicated on the slaves (from 0 to unlimited) – which can be hosted either on the same server or on remote server. Read operations are only performed on the slaves. Many configurations are possible in Redis, you can checkout the documentation on the official website to get more informations. For this tutorial, we’ll only use a master instance with no replication.

To access your Redis db from your Java application, we’ll use Jedis, a wrapper to access your Redis instances easily, and Gson, a library written by Google to manipulate Json objects. Jedis provides both a lightweight possible implementation – the one we’ll use – or more complex ways to handle Redis clusters. Now switch to your Java application.

We’ll first create a Jedis pool, in charge of managing Redis instances, to get an instance of our Redis server.

Good practices encourage you to set a password to access the master instance. If you did it in a config file, set the password as the last argument when creating your JedisPool instance.

Now modify your previous implementation to, instead of logging our tweet, converting it to a JsonObject instance using Gson and inserting it into your Redis instance:

Run your program and post a tweet containing the hashtag #NSCurious, then open your GUI Redis Manager and see the result. You may see several instances like this:

Screen Shot 2014-11-19 at 17.02.24

It’s done. This code must not be sent to production cause there’s no error handling or whatever that should be done before deploying a product on a production environment. But it’s a good start to introduce Redis using a concrete example. For the anecdote, I ran such a program on a trending hashtag (#MTVStars) and I got 504 tweets into my db in less than 2 seconds. So use a rare tweet while developing, and test your server capacity after.

If you have any questions on this tutorial or on advanced Redis usage, please leave a comment on this post.

Partager