IRC Logs for #cmvt Thursday, 2013-07-18

prologicnectar is a piece of shit00:00
prologicI'm trying to delete the old data dir00:00
prologicand it's well umm00:00
prologictaking forever00:00
prologicno wonder the deploy never finished when I got in this morning00:00
prologicDanielBaird:  ping00:09
prologictest is up00:09
prologiclooks goods00:09
prologiccan you have a quick squiz?00:09
DanielBairdit loads correctly00:13
DanielBairdlooks good00:13
prologicnow I can get back to work on rewriting gensummaries00:30
prologicand when that's done00:30
prologicwe can start work on the pieces to make the mapping side of things work00:30
robert_pykeMorning all00:56
prologiccan I pick you brain for a few mins rob?01:04
robert_pykepick away ;)01:13
prologicahh yes hi01:29
prologicso yeah could not make the single-process gensummaries faster01:30
prologicwith simplification of geometries01:30
prologictill ended up with topological errors01:30
prologicwould I be right in guessing that some geometries if simplified end up being invalid?01:30
robert_pykeI'm going to assume that the GDAL simplify, is the same as the PostGIS equivalent ST_Simplify01:34
robert_pykein which case, the docs inform me:01:34
robert_pykeNote topology may not be preserved and may result in invalid geometries.01:34
robert_pykeThe docs then point to a: ST_SimplifyPreserveTopology, which preserves topology01:35
robert_pyke(which requires GEOS 3.0.0+)01:35
robert_pykeThe description of that function is:01:36
robert_pykeReturns a "simplified" version of the given geometry using the Douglas-Peucker algorithm. Will avoid creating derived geometries (polygons in particular) that are invalid. Will actually do something only with (multi)lines and (multi)polygons but you can safely call it with any kind of geometry01:36
prologicAhh yes of course01:37
prologicI remember reading similar material elsewhere01:37
prologicYou are quite right01:37
prologicHave to look into simplifying whilst preserving topology01:37
prologicwhich I think I can do01:37
prologicAnd yes it does use the same GEOS libraries01:37
robert_pykeExcellent :D01:38
robert_pykeHeading off for lunch, catch ya later.01:39
prologicargg fuck01:42
prologicReturns a simplified representation of the geometric object.01:42
prologicAll points in the simplified object will be within the tolerance distance of the original geometry. By default a slower algorithm is used that preserves topology. If preserve topology is set to False the much quicker Douglas-Peucker algorithm [6] is used.01:42
prologicFrom the shapely documentation:
prologicSo why am I getting topology errors01:43
prologicrobert_pyke: your insight when you get back from lunch?01:43
robert_pykeThe summary of my help is, I don't know :(. If you set preserve topology, and it was valid before you run the simplification, it should be afterwards.02:43
robert_pykeI'm curious to know more about the topology error. If you are getting bad overlaps (or gaps) between your polygons, it might be that you need to simplify them as a geometry collection, multi-polygon or some other group construct. If you are simplifying each polygon individually, then I could imagine you getting this error.02:43
robert_pykeBut as noted above, without knowing more about the specific topology error, I'm afraid I can't help much :(02:43
prologicI think the issue is something else02:57
prologicbut unrelated to simpification02:57
prologicI'm looking it it now02:57
prologicwill let you know02:57
prologicI filed a bug report03:36
prologicaccording to everything I'm reading03:36
prologicI should not get an invalid geometry by using shapely's simplify() function/method on a geometry03:36
prologicwhich uses GEOS underneath03:36
prologicVersion on my system is: 3.3.603:38
prologicSo I should be fine03:38
prologicI'm puzzled03:38
robert_pykeCan I check something...03:41
robert_pykeAre the geoms that are becoming invalid after simplification correlated to the geoms that were originally invalid (prior to buffering)?03:41
prologicthis is purely straight from the IBRA region collection04:05
prologicthe very first feature/geometry04:05
prologicfunnily enough04:05
prologicsimplification with preserve topology just results in an invalid geometry04:05
prologicplain and simple04:05
prologicit's weird04:05
prologicWatch this:04:06
prologicThu Jul 18 13:40:5004:06
prologic$ python -i test.py04:06
prologicgeom.is_valid: True04:06
prologicgeom.is_valid: True04:06
prologicgeom.is_valid: False04:06
prologic>>> geom04:06
prologic<shapely.geometry.multipolygon.MultiPolygon object at 0x10e1eb950>04:06
prologic>>> geom.is_valid04:06
prologic>>> geom.buffer(0.0).is_valid04:06
prologicThis is interesting04:06
prologic-after- the simplification04:06
prologicit becomes invalid04:06
prologicbut then if I buffer it by 0.004:06
prologicit's valid again04:06
prologicWTF :)04:06
prologicThe documentation says one thing but the results are another04:06
DanielBairdwhat does the buffering actually fix? rounding errors, or something?04:07
prologicI wonder if it could possibly be a bug with GESO 3.3.604:07
prologicthe latest version is 3.3.804:07
prologicDanielBaird:  I believe so04:07
prologicbbs - coffee04:11
robert_pykeIn PostGIS, I can run a command: ST_IsValidReason, which reports why the polygon is invalid. Do you have a similar command, and if you do, can you check what is causing the invalidity? I'm guessing self-intersection.. but you could also have a "line" in your multipolygon after the simplification... or something.04:19
robert_pykeAnyway, might be something worth checking.04:19
robert_pykeexample output: (in postgis)04:21
robert_pykegid  |      validity_info04:21
robert_pyke 5330 | Self-intersection [32 5]04:21
robert_pyke 5340 | Self-intersection [42 5]04:21
robert_pyke 5350 | Self-intersection [52 5]04:21
robert_pykeI've just read it comes from GEOS >= 3.3.0, so you should have all the powers :D04:21
prologic>>> from shapely.validation import explain_validity04:29
prologic>>> explain_validity(geom)04:29
prologic'Nested shells[136.5009 -12.0048599999999]'04:29
prologicwtf is a Nested shell(s) ?04:29
robert_pykeA nested shell occurs when a shell is nested within a shell...04:30
robert_pykei.e. I have no idea what the hell a nested shell is.. but sounds like you've got 'em04:30
robert_pyke          Indicates that a polygon component of a MultiPolygon lies inside another polygonal component04:32
robert_pykesounds like..04:33
robert_pykeyou know how you have polygon "holes"04:34
robert_pykewell, sounds like a "shell" is the opposite of a hole.. and that you have one within another.. Which would explain why buffering resolves the issue (it just merges them)04:34
robert_pykebbl, meeting04:34
prologicahh i see04:35
prologicnested shells hmm04:35
prologicawesome just awesome04:35
prologicWell the question that remains then are two-fold:04:36
prologica) Should I just buffer(0.0) for any geometry that becomes invalid after simplification?04:36
prologicb) Is buffering by 0.0 okay for Nested Shells?04:36
DanielBairdRob says yes04:37
DanielBairdSorry i should give him his full title.  Amazing Rob says yes04:38
prologicI think so too from my own reading04:40
prologicspatial la la la la :)04:40
prologicSo my strategy going forward is thusly:04:44
prologica) I'm going to run a check and clean on geometers of Shapefiles that I ingest into our system04:44
prologicb) I'm also going to simplify and then check if the resulting geometry is valid or not; buffering it if it isn't when performing necessary intersections against other geometries when doing data pre and post processing04:45
prologicIf this is not acceptable speak now or forever hold your tongue :)04:45
robert_pykeback from my meeting06:23
DanielBairdalso back.. but i stand by my original holding of the tongue06:26
prologicturns out it works06:29
DanielBairdso it boiled down to having to buffer several times?06:29
prologicoh I have no idea06:32
prologicI just do it :)06:32
prologicbut it's possible for simplification to end up with an invalid geometry06:32
prologiceven through you tell the  function to preserve topology (the default)06:32
DanielBairdyeah that's pretty shitty.06:33
prologicwell perhaps06:33
prologicso it's actually slower to simplify and buffer (as necessary)06:35
prologicokay then06:35
prologicI'll multiprocess it06:35
prologicfuck'n complex geometries06:35
prologiceven if I simplified as a separate step06:36
prologicmaking the simplified versions our new data of vectors06:36
prologicit would still take the same amount of time no doubt06:36
prologicsame operations involved06:36
DanielBairdwhat about doing the simplification with buffering etc as necessary, saving the results, and chucking away the shapefiles06:39
prologicyeah sure06:40
prologicI -could- do that06:40
prologicbut it would take the same amount of time06:40
prologic-however- on the plus side06:41
prologicit would save time on other parts of the data processing06:41
prologicwhere in theory you would do the same thing06:41
prologicI'm okay doing this06:41
prologicquestion is how much do we simplify the geometries06:41
prologicWhat I could do (seeing as you have better sight and you're a UI guy)06:41
prologicis develop a simple tool to simplify geometries in a given shapefile06:42
prologicyou can play around with the tolerance06:42
prologicand tell me what you think a suitable tolerance value would be06:42
prologicthat means we're not dealing with MILLIONS of points06:42
prologicit's probably worthwhile implementing06:43
prologicand also making it multi-process06:43
prologicas it would cut down on processing time massively06:43
prologicrewriting gensummaries is taking longer than I expected06:43
prologicmostly because a lot of other code and tools and optimizations have come out of it that will greatly simplify how genusmmeries works -- hopefully by simply walking over all the output files and collecting them06:43
DanielBairdstuff that only runs on deploy can be slow, if slow means that it takes a day or two.06:45
DanielBairdwas the last estimate a thousand days, or something?06:45
DanielBairdoh maybe it was a thousand hours06:45
DanielBairdeither way..06:45
prologictbh I actually have no real idea of how long processing will take06:45
prologicas I haven't really had time or put the effort into collating the numbers06:46
prologicI just know that generating summary data on 18 tif files and 2 regions taking over an hour is likely unacceptable performance06:46
prologicbaring in mind a) complex geometries and b) single-process gensummaries06:46
DanielBairdthere's a lot of LGA regions.06:47
DanielBairdhave you tried testing bounding boxes first?06:47
prologicyeah yeah I do that06:47
prologicso I get a subset of matching features06:47
prologicor potentially matching06:47
prologicthen do an intersection area calculation06:47
DanielBairdah so this is still that slow when it's only testing a few of the regions for state overlap06:47
prologicremember I fixed this :)06:47
prologicmostly I think because of the complex geometries06:48
prologic(I haven't tested this)06:48
prologicanyway what do you think?06:48
prologicwhat'll we simplify our geometries of our regions to some tolerance?06:48
prologicif yes - I'll write that tool in the morning and you can test it on mon06:49
prologici.e: I'll make the tolerance configurable06:49
prologicthen I'll proceed to finishing the rewrite of gensummaries06:49
DanielBairdwait, didn't you show me some simplifications ages ago?06:49
prologichopefully by Mon things will be a lot faster and smoother06:49
prologicI believe so06:49
prologicand we sort of agreed on a tolerance of 0.0506:49
prologicwhich from my poor sight looks okay06:49
prologicbut nonetheless you'll be able to have a look again at your leasire06:50
prologicand if we're not happy with 0.0506:50
prologicwe'll change it06:50
DanielBairdyep okay06:50
DanielBairdif the good one back then was 0.05, that's probably good for us now.  from memory it was very accurate looking still06:51
prologicalso the other option (which we may seriously consider)06:51
prologicis to simplify the shapefiles06:51
DanielBairdbut significantly smaller file06:51
prologicand store them on my server06:51
prologicat least for the time being06:51
prologicand use those instead of from Gov upstream sources06:51
prologiceither way I don't care06:51
prologicbut it would save re-processing time every time we did a test of new data, etc06:51
DanielBairdyes.. that's what i was talking about before.  simplify, save, and chuck out the originals, never to look at them again06:51
prologicagh yes06:52
prologicand use them as new data sources we ingest06:52
prologicrather then do so in processing06:52
prologic*nods* I agree06:52
prologiceither way we can do it really simplly06:52
DanielBairdthe process can even check in the cache spot for simplifications, and use them if they're there, generate them if they're not.06:52
prologicfortunately I have my own server in the US06:52
prologicso doing things if I want to is not so recitative :)06:53
prologicthat sounds like more code :)06:53
prologicanyway, I'm probably out of here shortly06:53
prologicwill see you Mon hopefully with more good news :)06:53
prologicand exciting new things06:53
prologicwell maybe one or two06:53
DanielBairdgood point, could just write a simplifier and separate peocessor thingy.06:53
prologicheap I'm outta here06:54
DanielBairdnp.  i'm riddles with meetings on monday, 9am til 12 or so.  will chat after that :)06:54
DanielBaird* tiddles06:54
DanielBaird* riddled OMG06:54
*** DanielBaird has quit IRC08:16
*** DanielBaird has joined #cmvt08:47
*** DanielBaird has quit IRC08:55
*** DanielBaird has joined #cmvt09:52
*** DanielBaird has quit IRC09:56
*** DanielBaird has joined #cmvt10:52
*** DanielBaird has quit IRC10:57
*** DanielBaird has joined #cmvt11:53
*** DanielBaird has quit IRC11:57
*** DanielBaird has joined #cmvt12:54
*** DanielBaird has quit IRC12:58
*** DanielBaird has joined #cmvt13:54
*** DanielBaird1 has joined #cmvt13:58
*** DanielBaird has quit IRC13:58
*** DanielBaird1 has quit IRC14:24
*** DanielBaird has joined #cmvt14:55
*** DanielBaird has quit IRC15:04
*** DanielBaird has joined #cmvt15:30
*** DanielBaird has quit IRC15:34
*** DanielBaird has joined #cmvt16:31
*** DanielBaird has quit IRC16:35
*** DanielBaird has joined #cmvt17:31
*** DanielBaird has quit IRC17:36
*** DanielBaird has joined #cmvt18:32
*** DanielBaird has quit IRC18:36
*** DanielBaird has joined #cmvt20:23
*** DanielBaird has quit IRC20:27
*** DanielBaird has joined #cmvt21:31
*** DanielBaird has quit IRC21:36
*** DanielBaird has joined #cmvt22:32
*** DanielBaird has quit IRC22:36
*** DanielBaird has joined #cmvt23:53
prologicMorning All23:56
*** DanielBaird has quit IRC23:57

Generated by 2.11.0 by Marius Gedminas - find it at!