Thursday, July 17, 2008

hackystat-sensorbase-postgres ftw!

so, i've been hacking on hackystat the last week or so. for the most part, i've been concentrating on sensorbase, more specifically the hackystat-sensorbase-postgres module. when i first started working on this, we saw numbers like this with shellperf (for a 100 entries):

  • Postgres trial 1: 78.6 Milliseconds/sensordata instance
  • Postgres trial 2: 111.25 Milliseconds/sensordata instance
  • Postgres trial 3: 78.75 Milliseconds/sensordata instance
  • Postgres trial 4: 62.34 Milliseconds/sensordata instance

    after a few days of hacking and an OS change to linux we got numbers like this

  • Postgres trial 5: 16.36 Milliseconds/sensordata instance

    and this number is with 400k sensordata and 1.3 million sensordata_properties entries in our database.

    queries
    one of the great things about using a database is the ability to run queries. here are a couple that i just wrote:

    this one gets all the data sensor data from today.

    select * from sensordata, hackyuser
    where hackyuser.email='emailaddress@hackystat.org'
    and sensordata.owner_id = hackyuser.id
    and resource like '%fooProject%'
    and tstamp > current_date

    this one gets the exact snapshots per day

    select date_trunc('day', runtime), tool, max(runtime) from (
    select distinct runtime, tool from sensordata, hackyuser
    where hackyuser.email='emailaddress@hackystat.org'
    and sensordata.owner_id = hackyuser.id
    and resource like '%fooProject%'
    ) runtime_tool
    group by date_trunc('day', runtime), tool
    order by date_trunc('day', runtime)

    (formatted output)
    "2008-06-27" "2008-06-27 00:39:33.458" "Checkstyle"
    "2008-06-27" "2008-06-27 00:40:02.802" "JavaNCSS"
    "2008-06-27" "2008-06-27 00:39:41.145" "JUnit"
    "2008-06-27" "2008-06-27 00:39:58.63" "PMD"
    "2008-06-27" "2008-06-27 00:40:09.834" "SCLC"
    "2008-06-29" "2008-06-29 14:15:26.607" "Checkstyle"
    "2008-06-29" "2008-06-29 12:27:54.546" "JavaNCSS"
    "2008-06-29" "2008-06-29 12:27:44.89" "JUnit"
    "2008-06-29" "2008-06-29 12:27:53.062" "PMD"
    "2008-06-29" "2008-06-29 12:27:57.156" "SCLC"

    the totally awesome thing is that even though there is hundreds of thousands of entries we can execute these queries in less than a second. its totally fast.

    issues
    there are always issues; here are a couple
  • deletes take forever - postgres isn't really optimized for deletes. so they take much longer than updates or even inserts. i've seen that even deleting 10 records can cause a http timout on another request. so i'm thinking the approach we should take is; disable deletes. deletes aren't really that important and seem like an administrator type function or at the very least asynchronous. anyway, the problem seems to be that delete is used in the test cases. so.. i left it in for now.
  • count (*) takes forever - postgres has some issues with counting a huge table. so, instead of count (*), i'm using
    select relname, n_live_tup, last_analyze 
    from pg_stat_user_tables
    where relname like '%'

    this is really fast, but is an estimate because the stats could be out of date.

    thats it for now. things seem to be all good with the sensorbase. at least for now.
  • 1 comment:

    synthesis said...

    You're not allowed to use FTW. You don't even game!!!