Wednesday, August 10, 2011

welcome...to...big...data


The following is excerpted from the hbase-user@apache.org mailing list...an instant classic!
=================================================================
Mongodb does an excellent job at single node scalability - they use mmap and many smart things and really kick ass ... ON A SINGLE NODE.

That single node must have raid (raid it going out of fashion btw), and you wont be able to scale without resorting to:
- replication (complex setup!)
- sharding

mongo claims to help on the last item, but it is still a risk point.

For really large data that must span multiple machines, there is no "clustered sql" type solution that isnt (a) borked in various ways (Oracle RAC I'm looking at you) or (b) stupid expensive (Oracle RAC, STILL looking at you)

Tools like HBase give you scalability at the cost of features (no automated secondary indexing, no query language).

Welcome... to... big... data.

-ryan

On Thu, Aug 11, 2011 at 12:44 AM, Edward Capriolo <> wrote:
> On Wed, Aug 10, 2011 at 4:26 PM, Li Pi <> wrote:
>> You'll have to build your own secondary indexes for now.
>> 
>> On Wed, Aug 10, 2011 at 1:15 PM, Laurent Hatier
>>
>> >wrote:
>> 
>> > Yes, i have heard this index but is it available on hbase 0.90.3 ?
>> >
>> > 2011/8/10 Chris Tarnas <>
>> >
>> > > Hi Laurent,
>> > >
>> > > Without more details on your schema and how you are finding that
>> > > number
>> > in
>> > > your table it is impossible to fully answer the question. I
>> > > suspect
>> what
>> > you
>> > > are seeing is mongo's native support for secondary indexes. If
>> > > you were
>> > to
>> > > add secondary indexes in HBase then retrieving that row should be
>> > > on
>> the
>> > > order of 3-30ms. If that is you main query method then you could
>> > reorganize
>> > > your table to make that long number your row key, then you would
>> > > get
>> even
>> > > faster reads.
>> > >
>> > > -chris
>> > >
>> > >
>> > > On Aug 10, 2011, at 10:02 AM, Laurent Hatier wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > I would like to know why MongoDB is faster than HBase to select
>> items.
>> > > > I explain my case :
>> > > > I've inserted 4'000'000 lines into HBase and MongoDB and i must
>> > calculate
>> > > > the geolocation with the IP. I calculate a Long number with the
>> > > > IP
>> and
>> > i
>> > > go
>> > > > to find it into the 4'000'000 lines.
>> > > > it's take 5 ms to select the right row with Mongo instead of
>> > > > HBase
>> > takes
>> > > 5
>> > > > seconds.
>> > > > I think that the reason is the method : cur.limit(1) with
>> > > > MongoDB but
>> > is
>> > > > there no function like this with HBase ?
>> > > >
>> > > > --
>> > > > Laurent HATIER
>> > > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI
>> > >
>> > >
>> >
>> >
>> > --
>> > Laurent HATIER
>> > Étudiant en 2e année du Cycle Ingénieur à l'EISTI
>> >
>>