IRC Logs for #circuits Friday, 2014-04-18

Romsterprologic, damn i really do need that bug for using files for white lists, i'm gonna be making a massive list of white lists for soruceforge to crawl03:03
Romsterand hit up every project page that is listed in Pkgfile's03:03
Romsterthis is gonna be more effort than i first thought.03:04
prologicIsn't there an easier way to do this? :)03:07
prologicI will fix that bug soon03:07
Romsteri'm trying to find it.03:09
Romsteri need to resttrict my searches more, i don't want to spider the entire sourceforge url.03:10
Romsterwhitelisting only lets it follow those paths correct?03:10
Romsterand restricting mime types is fine not sure what mime types php and other such site site spit out for dynamic content is it still texx/html ?03:14
prologicit's blacklist then whitelist03:14
prologicso if something is blacklsited03:14
prologicit'll check if it's whitelisted03:15
prologicwhitelisting (with no blacklist) has no effect03:15
prologicmy plan was to do a head on the resource03:15
Romsterright it's like deny or allow in iptables for blacklist and whitelist03:15
prologiccheck it's content-type03:15
prologicif it's not something that may contain uris03:15
prologictext/html, text/xml03:15
prologicthen we skip it03:15
Romsterall i am interesed in is finding the download file listings off each project site.03:16
Romsterof interested directories/project names.03:16
Romsterand not crawling needlessly03:16
Romsterand spit out all the files found.03:16
Romsterthen i can use my tools to parse it.03:17
prologicWhat do you think about my offering hsoted jenkins03:17
prologicat $1/month03:17
Romsteri might have to use the crawl library directly to get exactly what i want but the crawl command will do for now.03:17
prologicthey are identical in terms of behavior03:18
Romsterjenkins is a build bot right for projects?03:18
prologica continuous integration server03:23
prologic@spell continuous03:23
kdbcontinuous is spelled correctly.03:23
Romsteri could use that for crux builds but then i got a local mini distcc cluster already03:23
prologicyou can still use a dictcc cluster with jenkins03:23
prologicI'm going to do this myself soon03:23
prologictime permitting03:23
prologici.e: I'm planning on having a:03:24
Romsteri might try out icecream i think it has advantages over distcc03:24
prologicwhere I continuously publish crux packages to03:24
prologicso if you'd like to help with said efforts03:24
Romsternot much point in me doing the same as you are doing. even though that is one of my many goals.03:24
prologicthen just wait for me to complete it :)03:24
prologicI basically want full crux binary package support03:25
prologicwith continuous builds03:25
Romsterwhat i'm doing is continuously checking file versions.03:25
prologicso that my docker containers can easily just pkgadd anything they might want/need03:25
Romsterso ports can be updated more regularly with very little effort.03:25
Romsterall that needs to be done then is fix build failures and test new versions03:26
Romsteras well as me generating meta4 files for each new release.03:26
Romsterso that's my first goal sort version numbers and you've seen me do this already. second goal fetch all project related files on a regular bases03:28
Romsterand your spider is ideal for this.03:29
Romsteri'd like to set it up as a cron job when it's ready03:29
Romsteror even better have circuits fire off events to spider said urls.03:29
Romsteri don't want to get too far ahead and i know mirrorbrains does some of what i want now.03:30
Romsterthere is no decent version sorting library out there, yet many ask for how to do it. so i'm making one.03:31
Romsterfor my needs and others.03:31
Romsterhopefully its not too difficult for me to do this.03:32
Romsteri'm no programmer.03:32
Romsteryou have a few things to consider when offering binary ports for crux.03:36
Romsterconsidering we don't have strict dependencies for versions.03:36
RomsterABI changes and such03:36
Romstermy idea was something like how git tells the server what files i have and what it has and it makes a delta of that and sends it to me.03:37
Romstermy idea is to have a crux script send the installed package list and versions setup an environment for that, add in programs they requested send built packages to them and make a hash of each built package and a table so it'll eventually know what combinations of versions of packages allow a already built package to run.03:38
Romsterprologic, liek ic an do this in a few seconds04:38
Romstercrawl -v -b ".*" -w "^http\:\/\/clementine-player\.googlecode\.com\/files\/.*$" -w "^http\:\/\/alleg\.sourceforge\.net.*$" -p "^.*\.tar\.*$" > clementine-urls.log 2> log04:39
Romsterctrl+c it after a few seconds and04:39
Romsteregrep "\.tar\." log |cut -d' ' -f404:39
Romsterwhat i'm after.04:39
Romsteri was hoping crawl depth would be like 2 or 3 levels deep to find the files i'm interested in04:40
Romsterit definitely needs a mime restriction it was downloading exe files -_-04:41
prologicso add one :)05:08
prologicyou're welcome to contribute05:08
prologicso here's what I was thinking05:08
prologicbefore the get() call05:08
prologicdo a head() on the url05:08
prologicresponse = head(...)05:09
prologicif response.headers["Content-Type"] in allowed_content_types::05:09
prologic    ...05:09
prologicsomething like this05:09
Romsterhmm i'll have to read though the source and figure it out.05:16
Romsteralso a depth of 1 gives me no output but in the verbose mode i do see what i want05:19
Romsterfrom starting at
Romsterfiles/ is one level deep no?05:19
prologicwhat's your -p/--pattern?07:09
Romster-p "^.*\.tar\.*$"07:41
prologicRomster, hmmm08:37
prologicso what you're saying is it found the urls you're after in the verbose log08:37
prologicbut the pattern didn't match it?08:37
prologicand didn't spit it out as output?08:37
Romsterprologic, it spits it out when i don't restrict the mad depth and i let it download every darn file it finds thats not a tar it'll download .md5sums .sha256sum and now i see it'll download all the .exe files and then after many minutes later then spit out the tarball names i am after.08:48
Romsterif i set it to depth of 1 then i don't get the results no matter how long i wait for it to finish.08:49
Romsterwhile it still downloads exe files and then spits out an exception for that file.08:49
