Parsing Tweets with the TwitterString Class

While building a little Twitter aggregator for an upcoming conference, I found myself in need of a set of methods to create hyperlinks from three distinct elements that can be included within a tweet; links, usernames, and hashtags.

I was able to find regular expressions to do all the heavy lifting from various sources on the Web and have created a class which pretty much does all the processing with one method call. Here’s an example of the thing working, followed by example code, and the class itself.

Example SWF

Flash Player 10.2 or greater is required!


Example MXML

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
<?xml version="1.0" encoding="utf-8"?>
<s:Application xmlns:fx="http://ns.adobe.com/mxml/2009" 
			   xmlns:s="library://ns.adobe.com/flex/spark" 
			   xmlns:mx="library://ns.adobe.com/flex/mx" 
			   creationComplete="init()" width="300" height="100" 
			   backgroundColor="#1E1E1E" preloaderBaseColor="#989898">
 
	<fx:Script>
		<![CDATA[
			import flashx.textLayout.elements.Configuration;
			import flashx.textLayout.elements.TextFlow;
			import flashx.textLayout.conversion.TextConverter;
			import flashx.textLayout.formats.TextLayoutFormat;
			import com.fracturedvisionmedia.utils.TwitterString;
 
			private var myTxt:String = "Everyone should follow @josephlabrecque (http://bit.ly/7NkqrB) - really cool stuff and super-informative insights! #Awesome #Super #LOL";
 
			private function init():void {
				// Configure styling the TextFlow links to match richTxt
				var cfg:Configuration = TextFlow.defaultConfiguration;
				var normalTLF:TextLayoutFormat = new TextLayoutFormat(cfg.defaultLinkNormalFormat);
				normalTLF.color = 0xDCD9D9;
				cfg.defaultLinkNormalFormat = normalTLF;
				TextFlow.defaultConfiguration = cfg;
 
				// Import tweet as HTML
				richTxt.textFlow = TextConverter.importToFlow(TwitterString.instance.parseTweet(myTxt), TextConverter.TEXT_FIELD_HTML_FORMAT);
			}
		]]>
	</fx:Script>
 
	<s:RichEditableText id="richTxt" selectable="false" editable="false" right="10" top="10" bottom="10" color="#DCD9D9" left="10"/>
 
</s:Application>

TwitterString Class

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
package com.fracturedvisionmedia.utils {
 
	/**
	 * The TwitterString class assists with the parsing of a tweet to add hyperlinks 
	 * around Links, HashTags, and UserNames in a tweet.
	 * @author Joseph Labrecque
	 * v. 0.1.2
	 */ 
 
	public final class TwitterString {
		private static var _instance:TwitterString = new TwitterString();
 
		public function TwitterString(){
			if (_instance != null){
				throw new Error("TwitterString can only be accessed through TwitterString.instance");
			}
		}
 
		public static function get instance():TwitterString {
			return _instance;
		}
 
		public function parseTweet(t:String):String {
			var step1:String = parseHyperlinks(t);
			var step2:String = parseUsernames(step1)
			var step3:String = parseHashtags(step2)
			return step3;
		}
 
		private function parseUsernames(t:String):String {
			var result:String = t.replace(/(^|\s)#(\w+)/g, "$1#<a href='http://search.twitter.com/search?q=$2' target='_blank'>$2</a>");
			return result;
		}
 
		private function parseHashtags(t:String):String {
			var result:String = t.replace(/(^|\s)@(\w+)/g, "$1@<a href='http://www.twitter.com/$2' target='_blank'>$2</a>");
			return result;
		}
 
		private function parseHyperlinks(t:String):String {
			var urlPattern:RegExp = new RegExp("(((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)", "g")
			var result:String = t.replace(urlPattern, "<a href='$1' target='_blank'>$1</a>");
			return result;
		}
 
	}
}

Download TwitterString

8 thoughts on “Parsing Tweets with the TwitterString Class

  1. Pingback: Parsing Hashtags in Flash « FlyPaper

  2. Hi.

    Thank you for sharing this!
    I needed something like that, and I made a port to haXe.
    I noticed a couple of “mistakes” (don’t qualify as bugs, as it just works as it is).

    1) The method parseHashtags does in fact parse usernames
    2) The method parseUsernames, on the other hand, does parse hashtags.

    Also, I checked Twitter’s own parser, and the parsed hashtags maintain the # character in the search link.

    • Yeah, thanks. I expect this will be some “interesting” bits in there. It was written in a few hours last year out of immediate necessity :)

      Awesome that you ported to haXe! Is it public anywhere?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">