Inspiron 910 hopefully will come out on Friday. Inspiron means cheap in Dell marketing? I guess it's better than Dell E. There's something wrong with looking forward to a Dell. I mean there is no Dell Insider or Inspiron Rumors.
SearchMonkey Tutorials. SearchMonkey uses a format similar to RDF called DataRSS. The FAQ describes the parts of the Semantic Web that work with the parts of SearchMonkey - no reasoning just the triples.
One Machine Kevin Kelly predicting the next 5,000 days of the web. The specifications for the current web, the one machine, is interesting. It uses 5% of the electricty on Earth and holds and processes the equivalent of one human brain (1 HB). If technology continues growing at the current rate, 6 billion HB will be achieved somewhere between 2020 and 2040, outstripping the current human population (and hopefully the population then as well). He also mentions that every bit will be a web bit, that all bits go through the web, and you see that happening with traditionally non-web data usage like word processing or mobile phone traffic (actually internet would probably be more accurate). This coincides with the convergence of digital and atomic "worlds", rather than moving from one to the other (like in "The Matrix") he suggests that it'll be integrated and that we are the extension of the web rather than the other way around.
The second half of the talk is about the Semantic Web (found from here, here and here). He describes his own definition of the Semantic Web by showing how it fits into what has gone before. The stages of networking includes: connecting by site from one computer to another, connecting by page and linking between them and the last two stages (I didn't see a clear distinction between the two) is by data, idea or item.
Real Developers don't use Ruby Hadoop: When grownups do open source. It's quite an amusing read - especially the part about the word count example on 9,000 blogs, the digg at Twitter, Starfish being practically useless (using MySQL and no Reduce phase) and the bit about understanding something being harder than writing a Ruby version of it.
Twitter decided they would be cute and trendy. They wrote their code in Ruby: the official state language of the hipster-developer nation. Doug Cutting, on the other hand, decided he would get xxxx done, and wrote Hadoop in Java. Starling was hidden away in some corner and forgotten (it's hosted at RubyForge...). Hadoop lives prominently at the Apache Software Foundation. Starling is a re-hash of an existing Java Enterprise API called JMS that has several open source implementations. Hadoop is an implementation of Google's MapReduce, a system that publicly only existed on paper. Hadoop has the added benefit of actually working.
Ahh the joys of installing Visual Studio - enough time to install IntelliJ, run it up, and catch up on news.
Pigeon Programming Intellij 8 continues with "all you need is a space bar and meta combinations to program" functionality: "Pressing Ctrl-Shift-Space twice allows you to find values of the expected types which are "two steps away" (can be retrieved through a chained method call)." It seems to be supporting what I'd consider a code smell.
Burnator Ack, using ISOs on Windows XP fits in that time period where I think I'll do it often enough to remember what software to use but ends up being long enough that I don't. So the two I've used the most is InfraRecord and IsoRecorder (a secret).
JRDF is very crap (part 2) It didn't take as long as expected which means it's probably wrong. For non-filtered queries it's three times faster and for certain FILTER queries (with equals) it's 47 times faster (from 284 to 6 seconds). At least it's now in the same order of magnitude as most tools and it's a a tiny bit faster than some (although adding more features will probably slow it down again).
What changed: * AttributeValuePair has been removed and replaced with maps (as discussed previously). * Seeing as though maps were used so much hashCode and equals were optimized. As I've found before (I think), isAssignableFrom is slower than try/catch for equals (depending on your usage of course). * Queries go through unsorted and uncopied rather than standard graph finds. I'd forgotten about how much effort had gone into allowing remove and automatic sorting on iterators. * A very simple optimizer (it's really only simplifying the FILTER constraints at the moment) was added. Tree manipulation was painful - I resorted to mutating in place operations. * Better designed. It's a bit hard to qualify this except what was there was truly awful - objects being created in constructors and passing itself in. The nice thing about IoC is it's quite easy to see when you're not using objects at the same architectural level.
JRDF is very crap (part 1) I've been spending some time looking at the querying part of JRDF. And it's quite bad. How bad? Well I've been profiling it and noticing that a lot of time was spent comparing attribute value pairs. An attribute in JRDF consists of a name (variable or position in a triple) and type (position in a triple or literal, URI Reference or blank node). Comparisons are done during most operations (like joins) and they are done on sorted attribute values. This is incredibly dumb. What's much better is to have a map of attributes to values. No sorting required and O(1) lookup - hurrah. The code around the comparisons also got a lot simpler and is obviously better. I think there's at least one other case of this at a different level and potentially room for about an order of magntitude speed up over the current release. Test queries are already 2-3 times faster.
The main reason for this though, is that currently the FILTER in JRDF runs about 10 times more slowly than a query using triple matching (this isn't a complexity measurement - it's based on a rather small set of triples). So a query with "?a <some:value> 'foo'" is much slower than "?a <some:value> ?b . FILTER(str(?b) = 'foo')". The queries aren't the same but their performance shouldn't be that much slower. However, in order to get to a stage of improving FILTER's performance the code has to be refactored - hopefully simpler and faster.
FILTER is nicely functionaly - it seems a shame to implement it in Java - it's eye poppingly bad at the moment. I was thinking functional Java but instead of taking the gateway drug I was think of just going to the hard stuff. FILTER is operated by creating different operations within a relation - which allows you to put ANDed FILTERs vertically across a relation (columns) and ORed FILTERs horizontally (rows). I don't know if anyone else implements it this way - it might be another bad idea over time.
YADS and RDF Molecules BNodes Out! discusses how any usefully scalable system doesn't use blank nodes. What is interesting is the comment on YADS (Yet Another DOI Service). The best reference is Tony's presentation although it is mentioned in Jane's as well. "YADS implements a simple, safe and predictable recursive data model for describing resource collections. The aim is to assist in programming complex resource descriptions across multiple applications and to foster interoperability between them...So, the YADS model makes extensive use of bNodes to manage hierarchies of “fat” resources - i.e. resource islands, a resource decorated with properties. The bNodes are only used as a mechanism for managing containment."
This sounds a lot like RDF molecules and supports visualization (apparently). This seems like a good use of molecules that I hadn't previously thought of (Tony's talk gives an example of the London underground). The main homepage of YADS isn't around anymore - it'll be interesting to see if it's still being used/worked on.
Hadoop and Microsoft Pluggable Hadoop lists some extensions to Hadoop in the pipeline: job scheduling (including one based on Linux's completely fair scheduler), block placement, instrumentation, serialization, component lifecycle, and code cleanup (the analysis used Structure101).
I found the reason why HQL was removed from HBase (to be replaced by a Ruby DSL and to ensure that HBase wasn't confused with an SQL database) and moved to HRdfStore.
Site running on a free Atlassian Confluence Open Source Project / Non-profit License granted to Java developers community of KPI ("JUG KPI"),. Evaluate Confluence today.