My mentor, Prof. Kevin Scannell, made a pretty awesome website called Indigenous Tweets. It finds and ranks tweeters who tweet in one of the over 30 indexed languages. He’ll also be blogging in order to talk with some of the main tweeters in each of these languages and help grow the online presence of these language communities.
My mentor and esteemed SLU Professor Kevin Scannell is at it again: he’s providing a way for members of language communities to harness the power of the Internet in order to connect with one another, this time by finding the top users of over 30 languages on Twitter and ranking both them and the languages on Indigenous Tweets.
Twitter describes itself as “the best way to discover what’s new in your world,”¹ but there is a fundamental issue with this: “world” is presently limited by the inclusion of only a handful of languages. Although people can tweet in any language on Twitter, finding users who speak the same language is a difficult, or even a seemingly impossible, task. This is especially true for the minority languages on which Prof. Scannell focuses. Further, while Twitter attempts to classify the language of every tweet, it does a poor job.
This isn’t to blame Twitter–in fact, classifying many of these languages can be difficult due to a lack of data, but Prof. Scannell has been working on similar problems for many years and as such has amassed large corpora with which to classify and analyze languages. With his data, Twitter’s API, and the magic of Perl at hand, Prof. Scannell was able to write a bot that crawls Twitter as far as the API allows, seeded by a search of common but distinctive words in each language. For every user that is encountered in a search, the bot then considers not only that tweeter’s timeline for ranking, but also his or her following and follower graphs in an attempt to find other language users.
Of course, as Twitter continues to grow, Indigenous Tweets aims to do the same. Twitter’s API was very helpful in gathering the data, Prof. Scannell has told me, but he knows that some tweeters were likely missed in the process. To counter this, every page is affixed with a form where the usernames of those thought to have been missed can be suggested, letting the community be directly involved in the website. As he continues to crawl Twitter, those suggested will be added to the queue for consideration.
He’s also created a blog (that he’ll definitely keep updated²). Through the blog, he plans to further engage the community, primarily by interviewing top tweeters in each language. He hopes that this in conjunction with the ranking system on Indigenous Tweets will put the need for increased Internet communication at the forefront of language communities’ minds.
It’s a really great service, so I should just stop talking about it so that you can go check it out!
¹ From Twitter’s about page. ² I visit his office frequently, so I’ll be sure to pester him if I don’t see a new post every once in awhile.