[12:44:34] (PS2) QChris: Swap check_datasource parameters [analytics/geowiki] - https://gerrit.wikimedia.org/r/85612 [12:45:21] (PS2) QChris: When checking datasources, make clean which formats are supported [analytics/geowiki] - https://gerrit.wikimedia.org/r/85613 [13:16:04] (PS1) QChris: Revert "Add test data for unit-tests" [analytics/kraken] - https://gerrit.wikimedia.org/r/87713 [15:34:56] (PS1) QChris: Remove kraken-etl [analytics/kraken] - https://gerrit.wikimedia.org/r/87717 [15:35:06] (PS1) QChris: Remove kraken-funnel [analytics/kraken] - https://gerrit.wikimedia.org/r/87718 [15:35:15] (PS1) QChris: Remove kraken-eventlogging [analytics/kraken] - https://gerrit.wikimedia.org/r/87719 [15:36:32] qchris..... [15:36:38] Hi drdee :-) [15:36:39] it's weekend [15:36:52] Same for you ;-) [15:37:04] I promised to do some work for otto-mata [15:37:15] why are you deleting those files :( [15:37:16] The third party documentation is wrong [15:37:24] They are unused. [15:37:37] And they are still in the repo, so we can bring them back [15:37:45] but that's not a reason to delete them (: [15:37:50] But otherwise we have to maintain them the whole time. [15:38:03] Cruft is a problem. [15:38:12] they won't change so there is nothing to maintain :) [15:38:12] The kraken repo is suffering a lot from it. [15:38:22] Every grep that I run has hits in them. [15:38:28] They are in the way :-/ [15:38:32] i mean the funnel is actually usefu [15:38:39] event logging probably not [15:38:48] etl will ssee a major overhaul [15:38:51] okay laterz [15:39:03] for every change I make and made, I'll have to check if it breaks anything in there. That's a problem. [15:39:16] Especially if we do not use them. [15:39:25] Okay. Bye. [16:02:23] qchris, maybe etl should be the target that we use for camus stuff? [16:02:45] I started with kraken-camus [16:02:56] If you prefer kraken-etl, that's fine as well. [16:03:18] * qchris changes that [16:03:42] not sure, you might knwo better than me [16:03:49] but there might be other etl stuff that would go in there? hmm [16:03:53] like, hm [16:04:04] Honestly ... I do not care too much. Yes, its definitely etl... [16:04:11] where shoudl we put other etl things, like for example after the camus import is done? [16:04:25] Well ... kraken-etl was taken ... [16:04:44] dan was suggesting that we might want to keep a separate dataset of just seqs and hostnames indexed/bucketed on hostname so that we can do import verification [16:04:55] that could be more camus stuff, or it might be hive stuff [16:04:56] dunno [16:05:08] you can rip out etl, i think etl stuff was storm before, right? [16:05:19] https://gerrit.wikimedia.org/r/87717 [16:05:26] Is awaiting your review ;-) [16:05:38] aye cool :) [16:05:53] But that can wait until the next week i guess. [16:05:55] sigh it does feel bad deleting stuff to me, even though I know it exists still in git [16:06:02] sometimes that stuff is hard to find again, you know? [16:06:06] or people forget that it exists [16:06:17] But it's a maintenance problem. [16:06:22] yeah [16:06:25] The kraken repository is mostly cruft ... [16:06:28] maybe we can make a branch before it is deleted? [16:06:31] yeah i agree [16:06:49] archive/etl branch? [16:06:50] i dunno [16:06:56] Findingi it should be easy ... 'git log | grep etl' [16:07:03] yeah but people won't see it [16:07:15] Ok, I'll add that. [16:07:19] like, say 3 years from none of us work for wmf anymore, no one will see that there was storm work done [16:07:20] i dunno [16:07:24] you are probably right [16:10:26] Now there is a branch "archive/etl-storm" pointing to 05685d1ed816f63b7b82a4ae5d5f7842f6ab0540 [16:11:06] thank you :) [16:11:48] oo, qchris, in the test data one [16:11:55] where are those IPs from? [16:12:01] you should probably anonymize those [16:12:16] hmm, they are all internal maybe? [16:12:16] hm [16:12:16] Mhmm ... drdee committed that. [16:12:22] I am reverting it [16:12:22] oh no, xff [16:12:36] His commit breaks tests. [16:12:45] oh you are deleting [16:12:45] ok [16:12:45] ok [16:12:46] see sorry [16:12:50] (CR) Ottomata: [C: 2 V: 2] Revert "Add test data for unit-tests" [analytics/kraken] - https://gerrit.wikimedia.org/r/87713 (owner: QChris) [16:13:00] (CR) Ottomata: [C: 2 V: 2] Remove kraken-etl [analytics/kraken] - https://gerrit.wikimedia.org/r/87717 (owner: QChris) [16:13:08] Yippie \o/ [16:13:11] Thanks. [16:13:22] (PS2) Ottomata: Remove kraken-eventlogging [analytics/kraken] - https://gerrit.wikimedia.org/r/87719 (owner: QChris) [16:13:26] (CR) Ottomata: [C: 2 V: 2] Remove kraken-eventlogging [analytics/kraken] - https://gerrit.wikimedia.org/r/87719 (owner: QChris) [16:13:45] aww poor funnel [16:13:56] maybe a branch for that too? [16:14:06] Ok. I'll create one. [16:14:13] thanks [16:14:57] Created archive/funnel [16:15:11] dankkkkeeee [16:15:25] uh oh [16:15:26] Krrb [16:15:29] Krrb [16:15:31] ack [16:15:40] the chang ecould not be rebased due to a path conflict during merge. [16:15:43] for funnel delete [16:17:12] I'll rebase by hand. [16:18:50] (PS2) QChris: Remove kraken-funnel [analytics/kraken] - https://gerrit.wikimedia.org/r/87718 [16:21:37] (CR) Ottomata: [C: 2 V: 2] Remove kraken-funnel [analytics/kraken] - https://gerrit.wikimedia.org/r/87718 (owner: QChris) [16:21:42] great, danke [16:21:47] Danke! [16:22:07] I should sumbit more commits on the weekend. ... It seems they merge easier [16:34:34] qchris: i've found the opposite to be true, especially for operations/puppet.git [16:34:42] Hehe [16:34:50] qchris: even if they're for labs projects. sigh [16:35:19] If I had +2 I'd merge for you ;-) [16:35:33] awww :) [22:18:09] ottoman, qchris: those ip's are random ip addresses [22:18:30] drdee:Mhmm? [22:19:07] yes they are, i would not commit real ip addresses and url's to a git repo :) [22:19:47] You did not anonymize the X-Forwarded-For ... [22:20:18] And the urls are real ... zcat /a/squid/archive/mobile/mobile*2013*03*28* | head -n 25 [22:21:31] But I thought the problem is resolved as the commit was reverted? [22:22:40] in a way yes [22:22:50] but anybody clones the repo will have failed unit-tests [22:22:58] because the test data is missing [22:23:08] see the README file ... [22:23:17] so the test data, in anonymized form, should be part of repo i guess [22:23:23] And the testdata you provided does not allow the unit tests to pass either :-) [22:23:48] right, so the unit-tests need to be updated [22:23:54] I agree that the testdata in anonymized form, and in a form that lets the tests pass should be included.