Only 1 day ahead of schedule, when I had hoped to be three days ahead by now, or more. However, not to fret: this remote task processing thing has completely changed the scale at which the app can handle new accounts with huge sets of followers.
Moving on now to the second of my big tasks, and one that’s far more tangible for the service’s users. An overhaul to the Report. Features I’m weighing right now (please weigh-in!)
- Profiling on followers: histograms on how many other people they’re following, how many followers they have, how recently they’ve been active on Twitter (measured by time since last tweet).
- Ability to explore a list of my account’s illegitimate followers (including the reason they’re classified as illegit.).
- “Most relevant” followers’ followees. Currently I do most popular. For Starbucks, for example, about 11,200 of their 290K followers follow Barack Obama in addition to Starbucks. He’s the most popular “peer” to Starbucks, in Starbucks’ set of followers. But everyone follows Obama (2M followers and counting), so that doesn’t really help Starbucks distinguish their followers from any other Twitter users. So I’ll create a ratio of also-follows to the total number followers for the popular user. This gets us closer to the heart of the matter: who else really matters to Starbucks’ followers?
- Ability to explore the complete list of follow and un-follow events, by day, w/ paging (I currently limit the list to 100 follows and 300 unfollows).
- Comparisons of each individual account’s data with the aggregated data from all GraphEdge accounts. Example: The histogram of how many friends your followers have sits next to the same analysis for ALL followers in all GraphEdge accounts. This tells you how you’re doing compared to some other benchmark.
- Other stuff, some minor, some less minor. Bug fixes, etc.
I don’t think I have time to do all that, plus break the report presentation into multiple pages, which is what it needs. But these are the things I’m looking at. As a roadmap, it’s not bad.
I think I’ll focus on the stuff that’ll make the biggest immediate impact on prospective new accounts. That’s going to be charts. Why don’t I use this as my punch-list:
- Follower Profiles with historgrams for friends, followers, and activity.
- Most-relevant peers. This may be ambitious because I can’t generate this report without having data for all followers’ friends, at crawl-time. Right now I get only followers themselves, not their networks as well, and only pick up data on most-popular follows after I know who they are. In this case I don’t know who they are until after I have their data (because their data helps me determine who will be on the list). My “rollers” (as I’ve starting calling my remote processing agents) may be able to help me here, but there may be side-effects to increasing my twitter data table by 10-20X.
- Plus organization of report page presentation changes needed to accommodate these new features.
[2009:08:30 08:09:02] – - – - – Completed Tasks – - – - – <br/>
[2009:08:30 08:09:02] — 1 tasks of pri 201 – findOldAndNewAccounts [ID 1] <br/>
[2009:08:30 08:09:02] — 1 tasks of pri 190 – updateVoxBotFollowers [ID 2] <br/>
[2009:08:30 08:09:02] — 1 tasks of pri 187 – clearOrphanedTaskReservations [ID 31] <br/>
[2009:08:30 08:09:02] — 79 tasks of pri 182 – getAcctTwitterUserDataForPendingReport [ID 27] <br/>
[2009:08:30 08:09:02] — 117 tasks of pri 180 – getTwitterUserDataForPendingReport [ID 3] <br/>
[2009:08:30 08:09:02] — 79 tasks of pri 170 – findUserAddDropsForTwitterID [ID 4] <br/>
[2009:08:30 08:09:02] — 79 tasks of pri 160 – getUpdatedFollowersForTwitterID [ID 5] <br/>
[2009:08:30 08:09:02] — 1 tasks of pri 145 – findUserAddDropsForAllAccounts [ID 13] <br/>
[2009:08:30 08:09:02] — 12 tasks of pri 70 – processTaskResults [ID 32] <br/>
[2009:08:30 08:09:02] — 1 tasks of pri 64 – verifyTwitterDataInternally [ID 33] <br/>
[2009:08:30 08:09:02] — 1 tasks of pri 62 – queueTwitterUsersForDataVerification [ID 24] <br/>
[2009:08:30 08:09:02] — 36 tasks of pri 59 – collapseNetworkConnectionsToSummary [ID 25] <br/>
[2009:08:30 08:09:02] — 37 tasks of pri 58 – crawlSecondLevelConnections [ID 18] <br/>
[2009:08:30 08:09:02] — 1 tasks of pri 57 – runSecondLevelAnalysisForAcct [ID 19] <br/>
[2009:08:30 08:09:02] — 1 tasks of pri 55 – crawlAcctNetworkUserConnections [ID 17] <br/>
[2009:08:30 08:09:02] — 1 tasks of pri 50 – crawlAnotherAcctNetworkConnections [ID 16] <br/>
[2009:08:30 08:09:02] — 1882 tasks of pri 35 – verifyTwitterData [ID 23] <br/>
[2009:08:30 08:09:02] — 2330 total completed, of 17 types. <br/>