IRC Logs for #circuits Saturday, 2014-05-17

*** theo_dore has quit IRC01:19
*** ninkotech has quit IRC01:31
*** ninkotech has joined #circuits01:31
kdbWelcome back ninkotech :)01:31
*** ninkotech_ has joined #circuits01:42
kdbWelcome back ninkotech_ :)01:42
*** ninkotech has quit IRC01:44
*** ninkotech_ has quit IRC01:54
*** ninkotech__ has joined #circuits02:00
kdbWelcome back ninkotech__ :)02:00
*** ninkotech has joined #circuits02:06
*** ninkotech__ has quit IRC02:07
*** ninkotech has quit IRC02:18
*** ninkotech_ has joined #circuits02:18
*** ninkotech_ has quit IRC02:27
*** ninkotech__ has joined #circuits02:27
*** ninkotech__ has quit IRC02:31
*** ninkotech__ has joined #circuits02:33
*** ninkotech__ has quit IRC02:39
*** ninkotech__ has joined #circuits02:51
*** ninkotech__ has quit IRC02:54
*** ninkotech__ has joined #circuits02:56
Romsterprologic, you busy this weekend?03:00
*** ninkotech__ has quit IRC03:01
*** eKKiM has joined #circuits03:30
kdbYo ekkim03:30
prologicRomster, not particularly03:31
prologicwhy?03:31
prologicI am _trying_ to attempt to finish my chicken coup door opener/closer this weekend though *hopefully*03:32
Romsternothing major but i would seriosuly like a proper tree depth level implemented in spyda03:33
Romsterit's too painful to use that much whitelist stuff for every singly url i need to deal with. also is there some easy python command i can use to escape special chars in regex like / . automatically to \. \/03:34
Romsterso i can say restrict to said url and depth limits.03:34
prologicimport re03:34
prologicre.escape(...)03:34
Romsteri'm gonan try and automate it from the source/URL off Pkgfiles.03:35
prologicok03:35
Romstersouceforge google code ones are fairly consistent.03:35
Romsterit's the rest that are on there own servers that wont be quite so bad if i restrict it to there domain and depth.03:36
*** eKKiM has quit IRC03:37
Romsteralso unrelated i found a project called llvmpy and looked interesting you might like it for something.03:37
Romsterit would be like adding a list of urls for each level in a dictionary or something then using that to count the depth?03:43
Romsterhow far are you with the chicken coupe anyways? and how many chooks/chickens do you have?03:44
*** ninkotech has joined #circuits04:04
kdbWelcome back ninkotech :)04:04
*** ninkotech has quit IRC04:07
prologichave all the parts06:30
prologicjust need to put it all together :)06:30
prologic3 chickens06:30
prologicre depth06:30
prologicwhat is wrong with the way depth is calculated right now?06:31
prologicI know it's not ideal but06:31
prologicalso restricting to a domain already works06:31
prologicby simply bkaclisting everything and whitelisting a regex that only matches that domain06:31
*** qwebirc88536 has joined #circuits06:44
kdbHello qwebirc8853606:44
AnonCowardanyone alive?06:44
Romsterit is a while lop between urls.07:03
Romsteri need one that does a tree like scruture that you talked about awhile ago.07:04
Romsterso that each directory on the url is treated as the depth from the starting url.07:05
RomsterAnonCoward, yeah at times.07:05
AnonCoward:-) let's say I create a pooling loop07:11
AnonCowardwhere a response calls ready again07:12
AnonCowardwhat would be the best way of handling keyboard interrupts?07:13
AnonCowardsignal. :-)07:15
AnonCowardfound it. awesome framework07:15
Romsterah prologic could answer that.07:32
prologichi AnonCoward07:39
prologicsorry was afk :)07:39
prologicditn't quite undersatnd what you were asking07:39
prologicbut glad you found a solution :)07:39
prologicwhat are you doing btw?07:39
prologicRomster, unfortunately the kind of crawling you'e referring to is not really anything to do with depth per say. URL(s) actually have nothing to do with directories per say or path levels. It's just become common convention -- I agree that the depth count is not ideal but not sure how to solve it (or improve it) yet07:40
Romsterwell you had the solution awhile ago collect the urls for each index page.07:47
Romsterparse though them find the ones that match the whitelist see what the tree depth is. find ones that start at the given url. then follow that path per say index that page repeat. only collecting urls/following the given tree depth07:49
Romstersay i go --depth=2 from http://foo.com/bar/07:49
Romsteri want to cover http://foo.com/bar/*/ only.07:49
Romsteri want to cover http://foo.com/bar/.*/ only.07:50
Romsterbasicly depth of url07:51
Romsterthe fact it can be a directory or dynamically generated is a moot point07:51
Romsterbasicly instead of this07:52
Romstercrawl -v --blacklist=".*" --whitelist="^http\:\/\/downloads\.sourceforge\.net\/projects\/boost\/boost\/[0-9.]+\/boost_.*\.tar\..*$" --whitelist="^http\:\/\/sourceforge\.net\/projects\/boost\/files\/boost\/[0-9.]+/$" --pattern ".*\.tar\.gz/download$" --pattern ".*\.tar\.bz2/download$" --pattern ".*\.tar\.xz/download$" http://sourceforge.net/projects/boost/files/boost/07:52
Romsteri can do07:52
Romstercrawl -v --depth=2 --whitelist="^http\:\/\/downloads\.sourceforge\.net\/" http://sourceforge.net/projects/boost/files/boost/07:53
Romsterand it'll give me just the urls from http://sourceforge.net/projects/boost/files/boost/.*/.*$07:54
Romsteror make whitelist more powerful.... so i can do http://sourceforge.net/projects/boost/files/boost/.*/.*$07:54
Romsterbecause of right now that's fine for the final directory but i also have to list "http://sourceforge.net/projects/boost/files/boost/.*"07:55
Romsterso it'll even follow that path to the next level.07:55
Romsterelse i need to specify multiple whitelist entries for each. and it's beginning to be a pain07:56
Romsterhttp://pastebin.com/xVJ8pKLL07:58
Romsterunless i am doing that wrong.07:58
*** Osso has quit IRC09:09
*** qwebirc18739 has joined #circuits10:31
kdbHeya qwebirc1873910:31
prologicRomster,  you can do .*/.*11:55
prologicyou just have to make * non-greedy11:55
prologice.g: .*?11:55
prologicsee: pydoc re11:55
Romsterk12:12
*** Osso has joined #circuits12:19
kdbHeya osso12:19
*** irclogger_ has joined #circuits14:42
kdbHello irclogger_14:42
*** prologic has joined #circuits14:42
kdbHello prologic14:42
*** qwebirc18739 has quit IRC16:02
*** Osso has quit IRC23:09

Generated by irclog2html.py 2.11.0 by Marius Gedminas - find it at mg.pov.lt!