Collect Business Intelligence from Twitter Feed | Boomerang Notification Services for SQL Server

Collect Business Intelligence from Twitter

Introduction

In this application scenario we are illustrating how to collect Business Intelligence from Twitter Feed and giving examples how to process and evaluate received data using simple SQL commands.  We collect busniness intelligence from Twitter feed by using the Twitter In service available the Boomerang Notification Framework. The Twitter In service supports two different Twitter API methods; track and follow.

When tracking you filter for a keyword or keywords in the Twitter stream. In follow you specify one or more Twitter accounts to follow i.e. to receive status updates. The two methods may be used at the same time. Tweets resulting from tracking and following will be stored in the IN_TWIT and IN_TWIT_HEADER tables.

Collect Business Intelligence from a Twitter Feed | Boomerang a Notification Framework

Track/Filter

Phrases of keywords are specified by a comma-separated list. Comma separated keywords and comma separated phrases are treated as logical ORs, quoted phrases are treated as logical ANDs. Words within phrases are delimited by spaces. A tweet matches if any phrase matches. A phrase matches if all of the words are present in the tweet. (e.g. ‘the twitter’ is the AND twitter, and ‘the, twitter’ is the OR twitter.). Terms are exact-matched ignoring punctuation. Each comma-separated term may be up to 60 characters long. The current limit imposed by Twitter is 400 keywords.

To track/filter on specific keywords open the Boomerang administration console and select the Twitter node. Under Status filter enter one or more keywords. In the example below all Twitts with the word Mexico, Sweden or USA will be captured.

To capture additional information from Twitter such as user location or user name check the checkbox Save incoming status headers.

Exercise caution in selecting filter keywords. Filtering on commonly used keywords may require high bandwidth and increased database storage space.

Twitter Feed | Admin Console | Boomerang a Notification Framework

Changes made to the Status filter may take up to 30 seconds before starting to work. To read more about the track/filter method please refer to the Twitter documentation found here.

Follow

The follow method returns the public statuses for the given users. Users are specified by a comma separated list of the Twitter user Id’s (integer). To find out the user id assigned to a particular user name click here.

In the below example status updates for two users will be received by Boomerang and stored in the IN_TWIT and IN_TWIT_HEADER tables.

Collect Business Intelligence from a Twitter Feed | Follow | Boomerang a Notification Framework

Examples

Track/filter for the word ’USA’ or ‘Canada’ and follow user id: 186574980 and 186566175.
Collect Business Intelligence from a Twitter Feed | Track and Follow | Boomerang a Notification Framework
Track/filter for the word ’USA’ or ‘Canada’
Collect Business Intelligence from a Twitter Feed | Track and Follow | Boomerang a Notification Framework
Track/filter for the word ‘USA’ and ‘Canada’ and follow user id: 186574980 and 186566175.
Collect Business Intelligence from a Twitter Feed | Track and Follow |

Post Processing

As with all Boomerang services, post processing is easy with the use of the available event handlers. In the case of the Twitter in services the stored procedure in question is sp_On_Twit_In. This store procedure will fire after each Twitter status update have been received. For example to discard Twitter messages which does not include a hyperlink, the event handler store procedure could be altered like this:

ALTER procedure [dbo].[sp_On_Twit_In]
(
  @rKey uniqueidentifier
)
as begin
        --- Collect Business Intelligence from Twitter Feed -----------------
	--- Delete incoming twitter status updtes that does not include a URL
	select null from IN_TWIT where Text like '%http://%' and rKey=@rKey
	if @@ROWCOUNT = 0
		begin 
		delete from IN_TWIT where rKey=@rKey
		end
 
end
GO

The above code sample will remove any Twitter status update from IN_TWITT and IN_TWITT_HEADER tables as they are being captured.

Evaluating Twitter Updates

Now that only Twitter status updates exist that contain a URL let us further processes these records by parsing the actual twitter messages. To get a top 10 list of the most twittered URL’s over the last 24 hours would look like this:

--- Collect Business Intelligence from Twitter Feed -----------------
--- Parse and aggregate top 10 URL last 24 hours
select top 10 TwitterURL , COUNT(*) NumberOfTwitts
into #_tmp_top_10_tweets
from 
	(
	select rtrim(case when  EndPos = 0 then SUBSTRING(TwitterURL, 1, 140) 
	else substring(TwitterURL,1, EndPos) end) as TwitterURL  from 
		(
		select PATINDEX('% %', substring(Text, StartPos, 140)) EndPos, StartPos, 
		substring(Text, StartPos, 140) as TwitterURL  from 
			(
			select PATINDEX('%http://%', Text) as StartPos, Text 
			from dbo.IN_TWIT it
                        where Created_On > GETDATE()-1  
			) s1
		) s2
	) s3
where LEN(TwitterURL) > 15
group by TwitterURL
order by 2 desc

Result:

TwitterURL                                         NumberOfTwitts
-------------------------------------------------- --------------
http://t.co/6U6B4UtN                               22
http://t.co/cGykr8xa                               13
http://t.co/Hucbg1LC                               13
http://t.co/aSWBTOQV                               10
http://t.co/AorOUOca                               9
http://t.co/Crly4SRv                               9
http://t.co/c2R2F11t                               7
http://t.co/Gxvqi0tA                               6
http://t.co/1aU7otWQ                               5
http://t.co/9TpSDwxt                               5
 
(10 row(s) affected)

Twitter Profile Data

All Twitter profile data available through the API is stored in IN_TWITT_HEADER. The information is organized by category (tag) and value.

select Name from IN_TWIT_HEADER group by Name

Result:

Name
---------------------------------------------------------------
created_at
geo
geo:coordinates
place
user:created_at
user:description
user:lang
user:location
user:name
user:profile_image_url
user:screen_name
user:time_zone
user:url
user:verified
 
(14 row(s) affected)

In order to only include Twitter status messages where the Twitter users language setting is set to English add the following:

--- Collect Business Intelligence from Twitter Feed -----------------
--- Parse and aggregate top 10 URL last 24 hours
select top 10 TwitterURL , COUNT(*) NumberOfTwitts
into #_tmp_top_10_tweets
from 
	(
	select rtrim(case when  EndPos = 0 then SUBSTRING(TwitterURL, 1, 140) 
	else substring(TwitterURL,1, EndPos) end) as TwitterURL  from 
		(
		select PATINDEX('% %', substring(Text, StartPos, 140)) EndPos, StartPos, 
		substring(Text, StartPos, 140) as TwitterURL  from 
			(
			select PATINDEX('%http://%', Text) as StartPos, Text 
			from dbo.IN_TWIT it inner join dbo.IN_TWIT_HEADER ith 
			on it.rKey=ith.rKey and ith.Name='user:lang' 
			where Created_On > GETDATE()-1 and ith.Value='en'
			) s1
		) s2
	) s3
where LEN(TwitterURL) > 15
group by TwitterURL
order by 2 desc

Manipulate Track/Follow values

To view or change the Twitter Track/Follow settings dynamically rather from the Boomerang administration console the SYSTEM_VARS table may be manipulated.

select CONVERT(Nvarchar (50),Value) from SYSTEM_VARS where [Key]='twit_status_filter'

Result:

--------------------------------------------------
track=sql
 
(1 row(s) affected)

Conclusion

Boomerang takes away the chore of writing complex code to interface with notification infrastructure like twitter, e-mail, printers, file and fax servers. By means of the Boomerang notification framework IT Professionals can create robust and dynamic notification solution. As demonstrated above we collect business intelligence from Twitter Feed and automatically parsed and filtered incoming tweets with a minimum amount of T-SQL code.

References

Download Sample

Twitter In Sample