IRC Logs for #cmvt Wednesday, 2013-07-17

DanielBairdit looks like I'm getting every region listed twice, once with a state and once without.  i'll try clearing my data dir and running again in case it's just a problem will appending to the old files00:46
prologicin the summaries?01:00
prologicahh yes01:00
prologicthat could be bad design01:00
prologicit does do that01:00
DanielBairdactually it could be my data handling in js, but it looks like it's in the json01:00
prologicit reads an existing file01:01
prologicand appends to it01:01
prologicif it already exists01:01
prologicit will be01:01
prologicyou were right the first time01:01
DanielBairdi've cleaned and kicked off the new fab data, so i guess it'll fit it01:02
DanielBaird* fix01:02
prologicit will01:02
prologicperhaps I should rethink that a bit01:02
prologicthe idea was to be able to add in new models01:02
prologicand re-run fab data01:02
prologicwithout rebuilding the entire data set01:02
prologici.e: partially updating new data01:02
prologicalso quick question this may sound stupid01:03
prologicbut how the hell do you show hidden files when attaching a file in google chrome on a mac?01:03
prologicexactly hmm01:03
prologictell me about it01:04
prologictrying to attach my public key01:04
prologicand well yeah harder than I thought01:04
DanielBairdit might be plain impossilble.  can you drag a file from a proper Finder window into the dialogue?01:04
prologicI swear I've done it once before01:04
prologicforget how01:04
prologicI'll try that01:04
DanielBairdShift Cmd .01:05
DanielBairdyeah that's it01:06
DanielBairdbut don't tell anyone, it's supposed to be a secret, apparently.01:06
prologicok I'll try that01:07
prologicsweet that worked01:08
prologichow did you find that? :)01:08
prologicand thanks!01:08
prologichey DanielBaird01:12
prologicI'm going to cancel this build01:12
DanielBairdgoogle found it mentioned in a mozilla bug report :) hopefully I'll remember it when I need to attach a hidden file.  I hope it works in every file window01:12
prologicthe full datasets of 2 models01:12
DanielBairdthe big one?01:12
prologicare going to take way too long01:12
DanielBairdand try again smaller?01:12
prologicI think we have 3 choices01:12
prologicbuild a new subset data set (based on your suggestion)01:13
prologicpush a new version as-is01:13
prologicfinish the new gensummaries so it's faster01:13
DanielBairdwhen's ABs demo, late tomorrow?01:14
DanielBairdor early.01:14
prologicso with 2 full datasets01:15
prologicthere are 1218 tif file01:15
prologicI want to do some basic math here on how long processing will take01:15
prologicand document it01:15
prologicearly tomorrow morning01:15
prologic10.30 am I think01:15
prologic18 tif files with our small subset dataset01:20
DanielBairdwe should probably do the simplest most-likely-to-work thing01:20
prologicso if that takes an hour here on my mac01:20
prologicthen in theory it would take 1000hrs for the full datasets of 2 models01:20
prologicgod fuck'n danmit :)01:20
DanielBairdfor a demo, a single model would be okay01:20
DanielBairdthree time points, ideally four01:20
DanielBairdand four or more vars01:21
prologicok how about01:21
prologic1 model01:21
prologic4 time points01:21
prologic4 vars01:21
DanielBairdin fact it could even demo nice with just one region type01:21
prologicyeah ok I'll cut out the other regions01:21
prologicthat'll speed things up too01:21
prologic1 region01:22
prologic1 model, 4 time points, 4 vars01:22
prologicof course that includes States still01:22
prologicIBRA + States01:22
prologicRCP45_csiro_mk3.0 with 4 time points and 4 vars01:22
prologicDanielBaird: ?01:26
DanielBairdthat sounds good01:27
DanielBairdmore vars, if we're adding stuff01:28
prologicI'll build that01:28
prologicshould only take a few mins01:28
prologicat least this will reduce stress levels01:28
prologicand get a new version up and testing01:28
DanielBairdi might need to fiddle the states region type display01:28
prologicwith slightly better data01:28
prologicthen I can concentrate on making gensummaries faster01:28
prologicif and only if I can get that working by the end of the day01:29
prologicperhaps there might be time to do a full dataset test01:29
DanielBairdi'll look at that right away.  it's been hard to work on with other state-related weirdness showing up01:29
DanielBairdyes agree01:29
prologicDanielBaird:  ping?01:43
prologichow about 2015, 2035, 2055 and 207501:43
DanielBairdsorry was making my lunch.  i'd like to include current and 2085.. so how about current / 2035 / 2055 / 208501:45
prologiccurrent is there regardless01:46
prologicso I guess 2035, 2055 and 2085 like you say01:46
prologicand just the first 4 bioclims?01:47
DanielBairdyep that's good.. better to have the big difference than to be strictly equal gaps01:47
DanielBairdlemme look what the first four are01:47
DanielBairdmeh how about 1, 7, 12, and 1401:49
DanielBairdotherwise we'd just have all temperature ones01:49
prologicI'm going to document this dataset01:51
prologicand possibly put it up on a repo01:51
prologicjust for the hell of it01:51
prologicnearly done here01:51
DanielBairdput the generated data into a repo, you mean?01:51
DanielBairdi'm off to buy coffee or something.  bbl01:56
prologicok I'm done01:57
prologicuploading it to my server01:57
prologicready to test shortly01:58
prologicI made a mistake with current02:14
prologicOk sweet02:17
prologicthat's more like it02:17
prologic15m processing time (with single-process gensummaires and out new sample dataset)02:17
DanielBairdis that data you're checking in somewhere?  I'd like to try it03:04
DanielBairdmy last data build had states, but still some nulls in the bioclims, regions names etc03:04
prologicback from lunch03:21
prologicI could certainly update it somewhere03:21
DanielBairdah cool, 188mb that's not bad03:26
prologiclet me know if there are any probe with it03:27
prologicI'm going to do a deploy to test03:27
prologicjust to make sure everything works03:27
prologicDanielBaird:  ping?04:18
prologicnew version is up on (NECTAR)04:18
prologicbut I screwed up04:19
prologicI'm going to have to re-deploy04:19
prologicI forgot to wipe the data dir04:19
DanielBairdah okay04:19
prologicdid you look at the data.tar.bz2 I uploaded?04:19
prologicIs it outputting "ok" data?04:19
prologicalso still 404's happenning04:19
prologicwill you have that fixed by tomorrow?04:19
DanielBairdi unzipped it, haven't got to the testing yet.  i'll need to set the region type list etc04:19
DanielBairdyeah should be working today04:20
DanielBairdi'll get back to it now04:20
prologicif we can do our last deploy by 1600 that'd be good04:20
prologicredoing the deploy to test04:21
prologichopefully it only takes 15m04:21
prologicI had some old data in there04:21
DanielBairdthe vars list is out of date, should i just write a fixed one and stop loading the json?04:27
DanielBairdthat's the one with the proper names for the bioclims, and their max and mins04:28
DanielBairdthe graph doesn't crash out, but it falls back to naming them "bioclim_7" and using the max/min of watever the first dataset displayed was.04:28
prologicahh yea04:43
prologicumm hang on04:43
prologicI modified how that was generated04:43
prologicahh yes04:43
DanielBairdhmm also, bad news.. looks like the state being generated is crazy04:43
prologicadd to the bioclims dict in scripts/gervars04:43
DanielBairde.g. the "south eastern qld" IBRA region is in NSW04:43
prologichmm wow04:43
DanielBairdmany of them are good04:43
DanielBairdbut there are some weird ones04:43
prologicit may be entirely possible that the data is rubbish04:44
DanielBairdlol also the states themselves are a bit ersatz04:44
prologiconly war to test and debug that is to start work on our mapping04:44
prologicand map these regions out04:44
prologicI got the state boundaries from a government website04:44
DanielBairdnsw, qld, vic, SA, ACT and "other territories" are all in NSW04:44
prologicthis one04:44
prologiccould be shitty data though04:44
DanielBairdWA is in SA04:45
DanielBairdNT is in Qld04:45
prologicyou mean04:45
DanielBairdmaybe it's finding the first one that intersects, and then accepting that one04:45
prologicit's mismatching states against states?04:45
prologicoh hmm04:45
prologicwe really shouldn't intersect states against states though?04:45
prologicthat is kinda silly ihmo04:46
DanielBairdno we shouldn't need to, just allocate each one it's own id.  but, it's a useful way to see if the intersection is working04:46
prologicyou'd think it would intersect itself04:46
DanielBairdNSW is id 1, so it's getting allocated against all the states that share borders, I think04:46
DanielBairdyeah every state that's wrong has been allocated a state that it shares a border with that has a lower ID04:47
prologic$ intersectvector data/regions/States/STE11aAust.shp data/regions/States/STE11aAust.shp test.json04:47
prologic{'1': '0', '0': '0', '3': '0', '2': '0', '5': '5', '4': '3', '7': '0', '6': '2', '8': '0'}04:47
prologicyeah you're right04:47
prologicit's a little shitty04:47
prologicI wonder whether I have to do something like04:52
prologicperform a bounding box search04:52
prologicand if the list is > 104:52
prologicperform an intersection04:53
prologica bounding box search is going to be somewhat inaccurate I suspect04:53
DanielBairdwhat you really want is to find the area of the intersection.04:53
DanielBairdand go with the state that has the largest intersection.  is it super slow?  i guess it's not so bad given that we only have to do it for each region once.04:54
DanielBairdi guess the early-exit would be if you find a state that fully encloses the region.  in that case you can stop immediately04:55
DanielBairdi guess it's more urgent to get the vars list finished for these new vars04:56
prologicam I doing the "what state this region is in" wrong too?04:56
DanielBairdi think so04:56
prologicafaik I'm using the region's bounding box to search the states04:56
prologicwhen really it should be the other way around04:57
DanielBairdmost of the ibra regions look fine, it's not always easy to tell.  but "sth eastern qld" is in nsw04:58
DanielBairdit could be a border-touching issue same as it looks for states, seq is touching the nsw border so that gets accepted.04:58
prologicso you've found one that's weird04:58
prologiclet me try something here04:59
DanielBairdi'm checking states for ibra, they're all correct so far, and where a region crosses states, we've allocated it to the state with the lowest id05:04
DanielBairdwith ibra it's rare, i guess, to have a region ending on a state boundary. it'll be common for NRM and LGA though05:05
DanielBairdthat one is given to qld, which is not very accurate05:06
DanielBairdand this one is allocated to SA:
*** robert_pyke has joined #cmvt05:08
prologicDanielBaird:  can we live with the defect for now?05:11
prologicI'm trying to fix this :)05:11
DanielBairdi'll add issues to bb05:12
DanielBairdso we remember to come back to it05:12
prologicyeah add the other bioclim vars to scripts/genvars05:14
prologicyou'll see it05:14
prologic$ intersectvector data/regions/States/STE11aAust.shp data/regions/States/STE11aAust.shp test.json05:20
prologic{'1': '1', '0': '0', '3': '3', '2': '2', '5': '5', '4': '4', '7': '7', '6': '6', '8': '8'}05:20
prologicthat's heaps better05:20
prologicit takes ages05:21
prologicI'm doing something wrong05:21
prologicor the geometries are too complex05:21
prologicyeah so05:26
prologicit takes a while to get accurate results05:26
prologicbut as I suspected, complex geometires05:26
prologicthe state of Tasmania has 1.2M points05:26
DanielBairdoh yeah thats right.. enormous polys05:28
robert_pykeI'm not sure what you're doing exactly, so feel free to ignore this.. but could you perform a simplify function on the state geoms, and then use those for your calculations?05:28
DanielBairdmaybe a preliminary bounding box is worth trying05:28
prologicyes Rob you're right05:29
prologicI can simplify the complex geoms05:29
prologicthe question for me is always05:29
prologic"by how much?" :)05:30
prologica buffer of 0.05 by experimentation seems to be "okay" to me05:30
DanielBairdi asked your momma that last night prologic.. she answered "a LOT"05:30
prologicit doesn't seem to loose too much of it's shape05:31
DanielBairdshe wanted a buffer of a HUNDRED05:31
prologicand significantly smaller than 1.2M points05:31
prologic1.2M -> 20k05:31
prologicI will try an arbitrary simplification of 0.0505:32
prologicit does fix the inaccuracy though05:33
prologicI should have realized that whilst doing a bounding box search is good for speed05:33
prologicif it returns more than one feature05:33
prologicyou really need to then perform an intersection05:33
prologicand not just take the first feature :)05:33
prologicDanielBaird:  did you get those bugs fixed on your end yet?05:34
DanielBairdyep it's working.. i'll commit05:35
prologicI'll redeploy then?05:36
prologiceven with our shitty data matching of states/regions?05:36
DanielBairdhow fiddly is it to fix the states so they're in their own selves?05:38
DanielBairdthat's where the bad states look crazy :005:38
DanielBairdalso, if it's easy to fix the vars list, do that too before deploying05:38
prologicI thought you were fixing the vars list?05:39
prologicgimme another 10-15mins05:39
prologicI might be able to fix the shitty state matching05:39
prologicI can fix the shitty state matching05:39
prologicwith calculating intersection areas05:40
prologicand simplifying all geometries by buffering 0.0505:40
DanielBairdah so that's actually fixing it for realz05:40
DanielBairdi can write a correct json file for the vars..05:41
DanielBairdoh sorry i read back, i'll look at genvars05:41
prologicjust add to the bioclims dict05:47
prologickinda needs to be done this way05:48
prologicas it iterates over all the models05:48
prologicand grabs their min/max values05:48
prologicyou yourself laid out such requirements for this data :)05:49
prologicsomething about making your charts look good05:49
DanielBairdyeah the default range is worked out on the displayed data.. so when the chart animates to another year's data, it can go off the edges05:50
DanielBairdfixed, I'll commit it05:50
prologicalmost finished fixing the accuracy here too05:50
prologicjust doing a test run05:50
DanielBairdi didn't test what I changed.. i remembered to fix the commas etc so SURELY it will work first time05:51
prologicargg fuck05:51
DanielBairdi guess simplifying it made it shapely-invalid?05:52
prologicunfortunately simplification can make a polygon become invalid05:53
prologicas the linear ring can become unclosed05:53
prologicat least in my experience :)05:53
prologicI may be simplifying too much05:53
DanielBairdsomething, something, Robert's momma's inner ring.05:54
DanielBaird..and how it's unclosed05:54
robert_pykeyou broke the rule05:54
robert_pykethe one where if you know the mamma, she becomes off limits...05:55
DanielBairdtrue.. but it's been at least a year since i saw your mum.  i'm re-virgining her05:55
DanielBaird.. in a good way.05:55
DanielBairdi ran genvars and it worked okay.  i didn't get the max/min set properly though05:56
prologicthen you did something wrong :)06:01
prologicit tries to find geotifs of the same name as the keys06:01
prologicor something06:01
DanielBairdyeah it looks like it should work.  maybe i've screwed up the min/max handling somehow06:02
prologicshouldn't have06:03
prologicit just read the getfis06:03
prologicand gets their values out06:03
prologicoh of course06:03
prologicthere is no bioclim 2 anymore06:03
prologicwe only have 1, 7, 12 and 1406:04
DanielBairdah okay i'll cut that06:04
prologicso maybe comment out the other ones we don't have?06:04
DanielBairdmy models dir is empty06:05
DanielBairdthis is just the data dir you uploaded earlier, is it just summaries?06:05
DanielBairdah no the .sources/models has it06:06
prologicI keep running into null geometry issues06:06
prologicso much for generalized simplification06:06
prologicthe actual data/models06:07
prologicsymlinks a bunch of stuff06:07
prologicI hope I didn't give you a broken tarball06:07
DanielBairdmeh that's okay it only has to work perfrectly on the demo server06:09
prologicWed Jul 17 16:19:0606:24
prologic$ ./checkgeometries data/regions/IBRA/IBRA7_regions.shp06:24
prologicGeometry of Feature id 68 is not valid06:24
prologicI wrote a tool to determine of all geometries of our regions are valid06:24
prologicturns out this is not the case06:24
prologicno wonder I was having issues06:24
prologic{u'SHAPE_AREA': 2.56269464148, u'REG_NAME_7': u'South East Corner', u'REG_NAME_6': u'South East Corner', u'OBJECTID': 68, u'AREA': 25320528414.6, u'FEAT_ID': u'GA_100K_Islands', u'REC_ID': 68, u'REG_NO_61': 10.0, u'SHAPE_LEN': 29.6847300573, u'HECTARES': 2532052.841, u'SQ_KM': 25320.528, u'REG_CODE_7': u'SEC', u'REG_CODE_6': u'SEC'}06:27
prologicThis is the properties of said invalid features/geometry06:27
prologicI should be able to fix said geometry06:32
prologicby buffering it by 0.006:33
prologicthat is 0 point 006:33
prologicstupid emotiicons06:33
DanielBairdah is that a multipoly shape or something?06:39
prologicwell yes it is06:42
prologicbut it's invalid06:42
prologicwhoever constructed this data set06:42
prologicought to be shot06:43
prologicI just wrote another tool to fix invalid geometries06:43
prologicand write out a new (fixed) Shapefile06:43
DanielBairdah so just plain old wrong. jerks06:44
prologicbut perhaps I should do this dynamically in code06:44
DanielBairdthis is what we all get for not having a proper open standard, and using ESRI's shapefile formats instead06:44
prologicI really don't want to have to maintain forked versions of datasets we injest06:44
prologicwhy can't people develop "good data"06:45
prologicno nothing to do with ESRI or their Shpefile format06:45
prologicbut everything to do with the geometry itself06:45
prologicit's just simply invalid06:45
prologicmost simple invalid geometries can be fixed however by buffering by 0.0 (zero point zero)06:45
DanielBairdi bet it works fine in ArcGIS or whatever though.06:47
prologicI'm sure it does06:47
prologicquite possibly because they likely have a lot of integrity checks on data you throw at it :)06:47
prologicdid I tell you about the whoozie of a bug I found with ESRI's core pdoructs?06:48
prologicspecifically their C++ core runtime that powers all of their products06:48
prologictheir Polygon object had/has floating point rousing errors06:48
prologicyou give it a Polygon via some serialization say WKT/WKB06:48
prologicand you don't get the same Polygon back06:49
prologicit was awesome06:49
prologicAlso in ArcGIS Server 10.1 they introduced GeoJSOn support into the server and arcpy06:49
prologic-but- get this06:49
prologicit was horribly horribly broken06:49
prologicthe interior vs. exterior rings were all around the wrong way06:49
prologicso I had to fix that too06:49
prologicyou reckon now that I can fix this invalid geometry06:50
prologicthat simplification will work now?06:50
prologicbecause gensummaries without is taking ages06:50
prologicbecause of the more accurate (and more correct) intersection of geometries if a search yields more than one feature06:51
DanielBairdhmm hard to guess06:52
prologicI'll give it another few mins06:52
prologicsee if it finishes soon06:52
prologicI guess I'm catching the later bus today :(06:52
prologicIt did succeed07:10
prologicbut it took 61m07:10
prologicnot sure if you're still around07:10
prologicbut there are irc logs :)07:10
prologicfetched your changes07:11
prologicgoing to test now with simplification07:11
DanielBairdstill here.. hiding the regionid from the graph07:11
DanielBairdjust got it working, i'll commit07:11
prologicat least in the current form everything works07:11
prologicand I think the mismatches are gone now07:11
prologicjust going to try and see if simplification works now that I can fix the invalid geometries07:11
DanielBairdokay i'm off home.  i've pushed the regionId hiding mod, so if you fetch again we won't have to see the regionId graphed out07:13
DanielBairdit still shows in teh table, i'll fix that tomorrow probably07:14
DanielBairdyour country salutes you for working late today, sir07:14
prologiccya mate07:18
