LANGREITER.COM plain, simple
      START     INDEX
  Freitag, 4. August 2006

[create N-Gram]s to the people: "We processed 1,011,582,453,213 words of running text and are publishing the counts for all 1,146,580,664 five-word sequences that appear at least 40 times. There are 13,653,070 unique words, after discarding words that appear less than 200 times."

"[...] [T]here's no data like more data [...]"

And in that spirit*, Greg Linden points to even more research data being made available from AOL Research (who'd have known ...).

* Misguided as, in an infinity of non-tongue-in-cheek cases, it is; about the best outcome of that particular data release would be that it raises awareness of what an incredibly detailed picture of their users search companies have ... Subversion Best Practices (Thanks, Gavin!)"The logbook is a vital instrument of the experimental physicist, and its importance cannot be underestimated."Wikimania, the blog.Signal to noise 2006.Finite Simple Group (of Order Two). Hilariös.

no comments

Please log in (you may want to register first) to post comments!


  SEARCH

GET YOUR MOVE ON

  ALMOST ALL ABOUT YOU
So log in, fella — or finally get your langreiter.com account. You always wanted one.
Nearby in the temporal dimension:
Nobody.
... and 127 of the anonymous kind.
Click on Join us in the chatterbox dimension! for a moderate dose of lcom-talk. This will probably not work in Lynx and other browser exotica.


THIS DAY IN HISTORY
2011 / 2006 / 2005 / 2004 / 2002

BACKLINKS
none

RECENT EDITS (MORE)
  films-seen
  Blood Stone
  y!kes
  wet towel
  B Studio
  Pilcrow News
  Nastassja Kinski
  2011-10-06-steve
  2011-10-06
  comment-2011-08-04-1

POWERED BY
Special Entanglement Forces provided by Vanilla

&c.
GeoURL RSS 0.92

FRIENDLY SHOPS
Uncut Games bei Gameware

OFFEN!
Offenlegung gem. §25 MedienG:
Christian Langreiter, Langkampfen
See also: Privacy policy.



 
EDIT