View previous topic :: View next topic |
Author |
Message |
raswaim
Joined: 06 Jan 2005 Posts: 69 Location: Houston, TX
|
Posted: Fri May 14, 2010 7:15 am Post subject: Switch Independent Variable Performance |
|
|
David - I've got a time-series plot with around 4 million points per curve (3 curves) that I was trying to do a switch independent variable on, and it's been running for quite a while (over an hour of CPU time on a relatively modern technical laptop). The data points are not equal intervals, and are probably not all time synchronous samples, but they should be (barring a glitch here and there) sorted in time order (I'll have to kill the process to check for sure). I've always assumed that this function just selected the X value for each point in the new independent axis source curve, "seeked" to the nearest X position in each of the other curves, and interpolated a new data point from the nearest neighor points to pair with the independent source's Y value (BTW - I selected linear in the dialog). I've noticed over the years however that the time to execute this function has always seemed to be very non-linear w/rsp to the number of data points. Is there something else going on here that can influence the execution time? Is the CPU time I'm experiencing seem unexpected to you? If the values aren't sorted in X to start with (or are precision limited/"identical"), could that explain it? Thanks - Richard
Just got to thinking; if the algorithm is as I described above, and you don't "remember" the last offset in each curve you're working on (i.e. "seek" from the first index value each iteration), you'd end up with the very non-order(N) behaviour I'm experiencing.... _________________ - RAS |
|
Back to top |
|
|
raswaim
Joined: 06 Jan 2005 Posts: 69 Location: Houston, TX
|
Posted: Fri May 14, 2010 11:36 am Post subject: |
|
|
David - I did a few checks to see what the actual performance curve was...
FIRST - I note that if the curves all have the same number of points (and your code believes they are consistent, ala the old save->csv criteria?), the Switch Independent Variable is blazingly fast (125000 points in < 2sec, including creation of new document). If you delete a few points from one curve or the other, the performance for large plots suffers badly (105 seconds for approx 125000 points). I suspect from the results (sent separately) that my surmise above was correct and you're doing something like re-scanning a whole curve to find a matching X, making it an 1/2*N^2 problem. You're still going real fast on each cycle (~12nsec test on my machine), but with 4 million points (squared) I think I was going to be waiting about 2 months for results....
I THINK I should be able to workaround this by doing an equal intervals on all the data points first (but for some reason that didn't work on my test case, I ended up with 1 less point in one of the curves?). Any suggestions?
[/img] _________________ - RAS |
|
Back to top |
|
|
DPlotAdmin Site Admin
Joined: 24 Jun 2003 Posts: 2311 Location: Vicksburg, Mississippi
|
Posted: Fri May 14, 2010 12:18 pm Post subject: |
|
|
"Any suggestions?"
Only to keep pestering me about this until I fix it
All of your assumptions are correct. If all curves have the same number of points and the same X values, the operation should be very fast. (I'm pretty sure that most of your 2 seconds was spent determining that that condition exists - the swap itself should be much faster.) Otherwise, DPlot is not taking advantage of knowing that the curves (in your case) have monotonically increasing X values, and is starting the search over at the beginning for every X in every curve. That's of course much slower and painfully so for large data sets. I'll gen up a few test cases and improve this before the next release. Thanks for pointing out the problem.
(And as to your mail: the [img] tag is only useful if the image is available on some web site. I did very briefly experiment with allowing user uploads. But before I even had a chance to publicize that, the site was hacked. So that feature probably isn't going to happen.) _________________ Visualize Your Data
support@dplot.com |
|
Back to top |
|
|
DPlotAdmin Site Admin
Joined: 24 Jun 2003 Posts: 2311 Location: Vicksburg, Mississippi
|
Posted: Fri May 14, 2010 1:37 pm Post subject: |
|
|
There's a dumber mistake than I thought. I think you'll find that if you uncheck "Place results in new document" that you'll get much better results. When that option is checked, the flag indicating whether the new X source has monotonically increasing X values gets wiped out. If it is unchecked, DPlot is a bit smarter about finding X. (Your results will still be too slow, as I found another inefficiency in searching for X under these circumstances. But as a workaround until the next release, unchecking that option should work much better for you.)
A few test runs:
A) 100,000 points in 2 curves with the same X values everywhere: 0 msec
B) Same test with 1 less point in either curve (before): 75 sec.
C) Same as B, but fixed mistake above: 45 msec. The improvement will of course go up (way up) as the number of points is increased. _________________ Visualize Your Data
support@dplot.com |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|