here is what i said,
I had a lot of offline data I executed an ant sensor the sensor started to send the offline data i decided that i didn't want to wait for the offline data and did a Ctrl+C to kill the ant task. the client seemed to recover fine.i didn't check the task manager or anything. but, i had a remote desktop connection to the sensorbase and noticed that the server continued to received data for quite a while. i eventually stopped the server too and restarted it.
the discussion continued i got a somewhat strange response from philip that caught me a little off guard.
> Is it the asynchronous nature of a REST post?
No, it's the nature of TCP and socket-based communication:
<http://www.ncsa.uiuc.edu/~vwelch/net_perf/tcp_windows.html>
Google on "socket buffer size" for more related links.
Again, this is only my _hypothesis_ as to why you were receiving data after you
killed the sending process.
austen and i started to talk about this at work. austen initially agreed with philip. but, i still scratched my head... hm.. doing a search on the "socket buffer size" showed me that
Typical network latency from Sunnyvale to Reston is about 40ms, and Windows XP has a default TCP buffer size of 17,520 bytes. Therefore, Bob's maximum possible throughput is:
17520 Bytes / .04 seconds = .44 MBytes/sec = 3.5 Mbits/second
The default TCP buffer size for Mac OS X is 64K, so with Mac OS X he would have done a bit better, but still nowhere near the 100Mbps that should be possible.
65936 Bytes / .04 seconds = 1.6 MBytes/sec = 13 Mbits/second
so... the delay i was seeing couldn't possibly be from 64K buffers could it? i'm not sure. so i designed a little experiment. here is what i did.
here is some interesting results.
here is my Checkstyle.log file (edited to save horizontal space):
22:44:57 Hackystat SensorShell Version: 8.1.530
22:44:57 SensorShell started at: Thu Jun 26 22:44:56 HST 2008
22:44:57 SensorProperties
sensorshell.autosend.maxbuffer : 250
sensorshell.autosend.timeinterval : 1.0
sensorshell.logging.level : INFO
sensorshell.multishell.autosend.timeinterval : 0.05
sensorshell.multishell.batchsize : 499
sensorshell.multishell.enabled : false
sensorshell.multishell.maxbuffer : 500
sensorshell.multishell.numshells : 10
sensorshell.offline.cache.enabled : true
sensorshell.offline.recovery.enabled : true
sensorshell.sensorbase.host : http://blah:9876/sensorbase
sensorshell.sensorbase.user : kagawaa@hahaha.hahaha
sensorshell.statechange.interval : 30
sensorshell.timeout : 10
sensorshell.timeout.ping : 2
sensorshell.properties file location: C:\..\sensorshell.properties
22:44:57 Type 'help' for a list of commands.
22:44:59 Host: http://naraku:9876/sensorbase/ is available.
22:44:59 User akagawa@referentia.com is authorized to login at this host.
22:44:59 Maximum Java heap size (bytes): 66650112
22:44:59 AutoSend time interval set to 60 seconds
22:45:00 Pinged http://blah:9876/sensorbase/ in 188 ms. Result is: true
22:45:00 Checking for offline data to recover.
22:45:00 Invoking offline recovery on 48 files.
22:45:59 Timer-based invocation of send().
22:46:59 Timer-based invocation of send().
22:47:59 Timer-based invocation of send().
22:48:59 Timer-based invocation of send().
22:49:55 #> quit
22:49:55 #> send
22:49:55 Pinged http://blah:9876/sensorbase/ in 0 ms. Result is: false
22:49:55 Server not available. Storing commands offline.
22:49:55 Stored 4 sensor data instances in:
C:\...\sensorshell\offline\2008.06.26.22.49.55.426.xml
22:49:55 Quitting SensorShell started at: Thu Jun 26 22:44:56 HST 2008
22:49:55 Total sensor data instances sent: 0
at the end of the sensor execution it says it can't send 4 sensor data instances! what happened to the server!? hm.. what is going on. for some reason that indicates to me that sending the offline data made the sensorbase unavailable. so, i looked in the Checkstyle-offline-recovery.log file
22:45:00 Maximum Java heap size (bytes): 66650112
22:45:00 AutoSend disabled.
22:45:00 Invoking offline recovery on 48 files.
22:45:00 Recovering offline data from: 2008.06.26.21.12.01.081.xml
22:45:00 Found 251 instances.
22:45:00 Invoking send(); buffer size > 250
22:45:00 #> send
22:45:01 Pinged http://blah:9876/sensorbase/ in 204 ms. Result is: true
22:45:01 Attempting to send 251 sensor data instances.
Available memory (bytes): 63071744
22:45:12 Error sending data: org.hackystat....SensorBaseClientException:
1001: Unable to complete the HTTP call due to a communication error
with the remote server. Read timed out
22:45:12 org.hackystat.sensorbase.client.SensorBaseClientException: 1001:
Unable to complete the HTTP call due to a communication error with the
remote server. Read timed out at org.hackystat.sensorbase.client.
SensorBaseClient.putSensorDataBatch(SensorBaseClient.java:827)
at org.hackystat.sensorshell.command.SensorDataCommand.
send(SensorDataCommand.java:82)
22:45:12 Exception during send(): org.hackystat.sensorshell.
SensorShellException: Could not send data: error in SensorBaseClient
22:45:12 About to send data
22:45:12 Successfully sent: 0 instances.
22:45:12 Did not send all instances.
C:\...\sensorshell\offline\2008.06.26.21.12.01.081.xml not deleted.
in the Checkstyle-offline-recovery.log file, i can't find one successful send. but, i know i sent data over. i have no idea but maybe the "Timer-based invocation of send()." was able to send data, but there is no log of that timber-base send. anyway, so there was a problem with the server. but, the logs on the server show nothing. i just get a whole bunch of these
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:56 k@gm.com Checkstyle CodeIssue
22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
so i have no idea why the server wasn't responding. in the end off all of that i only see 1,742 entries in the database (there might be a problem with the sensordatatype for coverage). so that's 10 minutes of work for a very small amount of data entries. i would guess there were tens of thousands of possible entries in the offline data.
here is another thing. note that in my Checkstyle.log it says that my sensor process ended at
last entry in from Checkstyle.log
06/26 22:49:55 Total sensor data instances sent: 0
last entry on sensorbase
06/26 22:55 Put: 2008-06-26T21:11:55 k@gm.com Checkstyle CodeIssue
thats 6 minutes after the sensor ended. so, getting back to those 64K buffer caches... that seems a little strange to me. if thats really what is happening there must be a lot of caches.
anyway, there are two things happening.
1) i can send data to the hackystat sensorbase server way way faster than hackystat can consume
2) because of that backlog on the server its not letting me send more data
that worries me. after all 11.3 MB of data is really tiny; after all its just one full build of our system with sensors turned on. here are some questions:
1) how fast can hackystat consume data?
2) what is happening such that data is being processed on the server 6 minutes after the client is finished sending?
3) whats the current bottleneck?
4) how many users can send data (and how much data) simultaneously without killing the server?
hm.. there are a lot of moving parts to this so this is just the start of looking into this. but, all i know is that 11.3 MBs is really small. in my other project we move that amount of data in an handful of seconds not even close to minutes.
ps. i had to delete my offline data. :(
1 comment:
Some thoughts at:
http://groups.google.com/group/hackystat-dev/msg/1986b8443b6995ed
Cheers,
Philip
Post a Comment