Thursday, July 31, 2008

hackystat-sensorbase-postgres size

we've been using the hackystat-sensorbase-postgres module for a couple of weeks now. and everything looks pretty good. no failed sensors, no need to restart the sensorbase (knock on wood). everything looks great.

so, i wanted to check out the statistics of the postgres tables, so i ran the analyze and vaccum maintenance executable. and viewed the report. here is what i found.

hackystat-sensorbase-postgres stores about 750,000 sensor data entries in 100 MB.
hackystat-sensorbase-uh stores about 350,000 sensor data entries in 1 GB (according to philip in this email thread, i need to validate that number)

woah! well, haha. i guess that should be expected. the awesome thing is that i think our database will perform well no matter how much data is in our database. and we should see fast query times all day, every day. :)

by the way, i'd imagine that we'd hit 2 million sensor data entries a month. thats just a guess, i'll let you know when we reach a month of up time.

Wednesday, July 30, 2008

non-routine savants

here is very useful post for all you students out there:

How do we find these non-routine savants? There are many factors, of course, but we primarily look for ...

... analytical reasoning. Google is a data-driven, analytic company. When an issue arises or a decision needs to be made, we start with data. That means we can talk about what we know, instead of what we think we know.

... communication skills. Marshalling and understanding the available evidence isn't useful unless you can effectively communicate your conclusions.

... a willingness to experiment. Non-routine problems call for non-routine solutions and there is no formula for success. A well-designed experiment calls for a range of treatments, explicit control groups, and careful post-treatment analysis. Sometimes an experiment kills off a pet theory, so you need a willingness to accept the evidence even if you don't like it.

... team players. Virtually every project at Google is run by a small team. People need to work well together and perform up to the team's expectations.

... passion and leadership. This could be professional or in other life experiences: learning languages or saving forests, for example. The main thing, to paraphrase Mr. Drucker, is to be motivated by a sense of importance about what you do.

These characteristics are not just important in our business, but in every business, as well as in government, philanthropy, and academia. The challenge for the up-and-coming generation is how to acquire them. It's easy to educate for the routine, and hard to educate for the novel.


i have a priority to those factors.
1. Passion and Leadership
2. Team Players
3. Communication
4. Everything else.

be passionate and communicate! be a tigger!

Sunday, July 27, 2008

hackystat dev

i've been pretty busy working on hackystat during off time. its pretty fun getting back into hackystat hacking. we've been doing pretty good. actually, its not exactly hackystat hacking. we've been doing a mix of custom and open source hacking. for example, we've been accessing jira directly to get jira issue data. and we've been thinking of creating a mini-feed/stream of consciousness. that doesn't really map too well with hackystat.

anyway, i'm enjoying our latest hackystat development. we have a pretty good plan and we are making progress. one of the things that i enjoy is the collaboration. we do "night time internet". we are all online at night hacking a way. we all know the problems and goals so collaboration over chat is pretty easy. some times we also do lunch time dev, where we spend an hour hacking during "lunch". the other cool thing we do is we have a requirement to do one blog post related to hackystat per week (this is in an internal blog). it keeps us on track. oh, oh, oh.... the other thing i do is keep on my guys to update their jira issues. i don't care if they spent 30 minutes working on something, i want to know about it. this helps keep the project active and helps keep each other aware of whats going on.

we got one more interesting thing in the works... hopefully it will be successful. i'll blog about it later once we get things more developed.

anyway, i've been busy. haha. thats why i haven't posted too much. but, i will start it up again. i promise!

Friday, July 18, 2008

UW CS enrollment up and invest in students

i just read an interesting post; Computer Science Enrollments: The Real News by Ed Lazowska. he talks about a few things but i particularly liked the sections about enrollment and students.

enrollment
One place where we can easily measure changes in student interest is in the enrollment in our first introductory course, which serves the entire university. Between 2003-04 and 2007-08 (a 4-year period), enrollment in this course is up by 27%. Enrollment by women is up by 45%. (Annual enrollment of women into the major is up by 64% over that same interval.)


thats good news. i'm especially impressed by the women enrollment increase.

Here is a spreadsheet with charts showing Bureau of Labor Statistics projections for employment between 2006 and 2016 for all fields in the sciences and engineering (including the social sciences). What it shows is that of all of these fields, between now and 2016:

  • 70% of all newly-created jobs will be in computer science.
  • 62% of all job openings (both newly-created jobs and jobs available due to retirements) will be in computer science.


  • wow thats good news again!

    students
    here is some great comments about attracting students to computer science:

    What do we do at UW to attract students? Many many things. As one example, starting tomorrow at UW we’re running an annual 3-day workshop for high school teachers of math and science, sponsored by Google. The goal is to show these teachers that computer science is important to their fields, and is a great field to send their smartest students into. Information is available at http://cs4hs.cs.washington.edu/. (We do this jointly with Carnegie Mellon and UCLA.)

    We have a set of terrific videos that illustrate several important points:

    1. People enter the field of computer science for all sorts of aspirational reasons.
    2. People do all sorts of things with their computer science degrees in addition to working in the software industry.
    3. Working in the software industry is highly exciting and creative and interactive.

    You can take a look at these videos at http://www.cs.washington.edu/WhyCSE/.

    Most importantly, we really invest in our students. Word gets out. At the University of Washington, we have the strongest undergraduates, because students know they can get a great education here.

    How do we “calibrate” our program — make sure our students are ready for careers? Here is a Word document I prepared recently for another purpose. Every year we are a top-5 supplier to Microsoft, Google, and Amazon.com — our students are fantastic.


    two things jump out at me; "we invest in our students" and "we make sure our students are ready for careers". thats awesome!

    when i was in school i often felt the complete opposite. i was really clueless. i wasn't connected to the department; i certainly never felt an investment from the department into my education and i certainly didn't feel like getting me ready for my career was a department goal. luckily, i found a few professors and students that got me out of the motions, paid attention to me, motivated me, and put me in the right direction. i was really lucky.

    i always bring up students and their educational experience... i think its really important:
  • interview with ka yee
  • engineering banquet
  • interview with randy cox
  • ics alumni association
  • making students awesome
  • Thursday, July 17, 2008

    hackystat-sensorbase-postgres ftw!

    so, i've been hacking on hackystat the last week or so. for the most part, i've been concentrating on sensorbase, more specifically the hackystat-sensorbase-postgres module. when i first started working on this, we saw numbers like this with shellperf (for a 100 entries):

  • Postgres trial 1: 78.6 Milliseconds/sensordata instance
  • Postgres trial 2: 111.25 Milliseconds/sensordata instance
  • Postgres trial 3: 78.75 Milliseconds/sensordata instance
  • Postgres trial 4: 62.34 Milliseconds/sensordata instance

    after a few days of hacking and an OS change to linux we got numbers like this

  • Postgres trial 5: 16.36 Milliseconds/sensordata instance

    and this number is with 400k sensordata and 1.3 million sensordata_properties entries in our database.

    queries
    one of the great things about using a database is the ability to run queries. here are a couple that i just wrote:

    this one gets all the data sensor data from today.

    select * from sensordata, hackyuser
    where hackyuser.email='emailaddress@hackystat.org'
    and sensordata.owner_id = hackyuser.id
    and resource like '%fooProject%'
    and tstamp > current_date

    this one gets the exact snapshots per day

    select date_trunc('day', runtime), tool, max(runtime) from (
    select distinct runtime, tool from sensordata, hackyuser
    where hackyuser.email='emailaddress@hackystat.org'
    and sensordata.owner_id = hackyuser.id
    and resource like '%fooProject%'
    ) runtime_tool
    group by date_trunc('day', runtime), tool
    order by date_trunc('day', runtime)

    (formatted output)
    "2008-06-27" "2008-06-27 00:39:33.458" "Checkstyle"
    "2008-06-27" "2008-06-27 00:40:02.802" "JavaNCSS"
    "2008-06-27" "2008-06-27 00:39:41.145" "JUnit"
    "2008-06-27" "2008-06-27 00:39:58.63" "PMD"
    "2008-06-27" "2008-06-27 00:40:09.834" "SCLC"
    "2008-06-29" "2008-06-29 14:15:26.607" "Checkstyle"
    "2008-06-29" "2008-06-29 12:27:54.546" "JavaNCSS"
    "2008-06-29" "2008-06-29 12:27:44.89" "JUnit"
    "2008-06-29" "2008-06-29 12:27:53.062" "PMD"
    "2008-06-29" "2008-06-29 12:27:57.156" "SCLC"

    the totally awesome thing is that even though there is hundreds of thousands of entries we can execute these queries in less than a second. its totally fast.

    issues
    there are always issues; here are a couple
  • deletes take forever - postgres isn't really optimized for deletes. so they take much longer than updates or even inserts. i've seen that even deleting 10 records can cause a http timout on another request. so i'm thinking the approach we should take is; disable deletes. deletes aren't really that important and seem like an administrator type function or at the very least asynchronous. anyway, the problem seems to be that delete is used in the test cases. so.. i left it in for now.
  • count (*) takes forever - postgres has some issues with counting a huge table. so, instead of count (*), i'm using
    select relname, n_live_tup, last_analyze 
    from pg_stat_user_tables
    where relname like '%'

    this is really fast, but is an estimate because the stats could be out of date.

    thats it for now. things seem to be all good with the sensorbase. at least for now.
  • Wednesday, July 9, 2008

    pixar is getting a lot of great press lately

    and they apparently deserve it. check out this article: Pixar's tightknit culture is its edge.

    According to “Pixar Rules — Secrets of a Blockbuster Company,” the company has created an incredible work environment that keeps employees happy and fulfilled. The result: “A tightknit company of long-term collaborators who stick together, learn from one another, and strive to improve with every production.”


    Thanks to Pixar University, employees learn to see the company’s work (and their colleagues) in a new light. “The skills we develop are skills we need everywhere in the organization,” Nelson said. “Why teach drawing to accountants? Because drawing class doesn’t just teach people to draw. It teaches them to be more observant. There’s no company on earth that wouldn’t benefit from having people become more observant.”


    You can try to outspend the competition. Or you can try to outculture them. Create a place that makes employees feel special. A place that makes them feel like they’re part of a bigger whole. A place where they continually get to learn and evolve. A place where everyone actually likes each other.


    wow.. that is cool. here is another one: The human side of Pixar's robot - (37signals)

    Pixar proves it’s one of those great companies that is run by unabashedly human people, and it’s no wonder why their work is so personal and touching. When you engage yourself with your customers and your audience on a level that reminds them you are the same, the experience is far greater than just using a product or just seeing a movie. Humanity is desperately missing in our age of megacorporations and big box stores.

    People love robots, but they’ll love you if you’re human, too.


    thats great press! but more importantly, its really awesome! going back to the pixar university. i think thats really smart, because it fosters creativity. providing a diverse group a people with the same language and framework equals a creative situation. the pixar university keeps the goal of the company at the forefront of their every day activities; make awesome movies and push the limits of what animation can do.

    haha. i can say some funny things about work now... but, i'll refrain. :)