[00:08:01] nite@! [00:11:21] good night milimetric :) [00:19:45] madhuvishy: sorry, missed your question! You did it right in the patch, the module would have to be configurable if this went in there. But the role is just how our EL is configured, so your patch is right. [00:20:11] milimetric: no problem :) okay then, i'll ask andrew to look at it tomorrow too [00:21:27] I'm working on the tests for the endpoints, and I'm running into the case where if I insert every time into the test tables, the count keeps going up. Is clearing the test db everytime before the test runs a good idea? [00:35:07] getting errors on Hue - first this: "Your query has the following error(s): java.lang.OutOfMemoryError: Java heap space"` [00:35:24] now a wikimedia error [00:36:58] Analytics-Backlog: Add referrer to pageviews_hourly - https://phabricator.wikimedia.org/T108886#1533643 (kevinator) [00:40:06] HaeB: could you share your query? [00:40:14] Hue can be a bit flaky [00:40:47] if you are looking through, say a month of data - that error can happen [00:41:17] now getting "Server Error (500)" while trying to access the saved query ;) [00:41:20] Do you know how to run the query directly using hive cli on stat1002? [00:41:29] HaeB: oh no [00:41:32] yes, i can try that [00:42:00] yeah, if you do that, do export HADOOP_HEAPSIZE=1024 [00:42:07] before you run your query [00:42:20] should not OOM [00:42:27] HaeB: ^ [00:43:42] cool thanks! so you think the errors might be related? (it came up again and i could seemingly start another query) [04:39:24] Analytics-Backlog, Research-and-Data: Add referrer to pageviews_hourly - https://phabricator.wikimedia.org/T108886#1533886 (DarTar) [09:34:56] Analytics-Backlog: Investigate sample cube pageview_count vs unsampled log pageview count - https://phabricator.wikimedia.org/T108925#1534298 (JAllemandou) NEW [09:50:11] (PS1) Joal: Bump refinery-core and refinery-hive to 0.0.15 [analytics/refinery] - https://gerrit.wikimedia.org/r/231241 [09:50:53] (CR) Joal: [C: 2 V: 2] "Self merging after ottomata review." [analytics/refinery] - https://gerrit.wikimedia.org/r/231010 (owner: Joal) [09:52:07] (CR) Joal: [C: 2 V: 2] "Self merging as planned with ottomata." [analytics/refinery] - https://gerrit.wikimedia.org/r/231241 (owner: Joal) [09:54:03] Hey team: Refinery deployment started [09:54:23] Includes bug on pageviews and new maps webrequest source. [10:49:24] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [10:51:24] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [11:21:27] Deployment went well [11:22:02] Some backfilling to be done for maps data and legacy_tsv_5xx, will do it later today (cluster has some job to catch up now) [11:35:03] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [11:37:04] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [14:16:03] hello i am at the bank! [14:16:10] milimetric: shall I merge that statsd_host change? [14:16:30] ottomata: it looked good to me, [14:16:37] madhu was gonna talk to you about it [14:16:42] oh? [14:16:55] hullo [14:16:56] oh, nothing to block merging, just she wanted your opinion [14:17:08] ok [14:17:10] so i should merge ? :) [14:17:20] you are the puppet master, sir [14:17:33] that is entirely up to you. From all other perspectives I give the ok [14:18:08] speaking of puppet, I wanted to bounce the restbase stuff off you, but maybe we can chat when you're back, it's not urgent [14:18:40] if you want you can check the current role and modules: https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/restbase.pp https://github.com/wikimedia/operations-puppet/blob/production/modules/restbase/manifests/init.pp [14:18:49] yeshhh? [14:57:33] Analytics-Kanban: Update refinery dataset-check dump script - https://phabricator.wikimedia.org/T108950#1535196 (JAllemandou) NEW a:JAllemandou [14:58:34] (PS1) Joal: Update dataset check dump script [analytics/refinery] - https://gerrit.wikimedia.org/r/231285 [15:10:18] milimetric, yt? [15:10:41] hey mforns [15:10:51] hey, good morning :] [15:11:11] I was wondering if you know something about the flow dashboard semantics? [15:11:14] specifically [15:11:58] what does active-topics mean, and can the older topics get bumped? or can the older values of the metric change? [15:12:27] hm... no i'm not familiar, looking [15:12:56] milimetric, it's ok, is Roan the man? [15:13:23] matt flaschen or roan should be able to answer, yes [15:13:30] ok cool :] [15:13:32] thx [15:21:07] ottomata: lol you merged my patch with the [WIP] tag [15:21:23] * madhuvishy has bad luck with [WIP] tags this week [15:27:09] madhuvishy, :] I also had problems with them and I changed to -1 myself with a comment "Still WIP" [15:27:23] mforns: ha ha [15:27:42] i've had two patches merged so far with [wip] on them [15:27:47] hehehe [15:28:18] and using self -1 also avoids needing to push an empty patch if you decide that the patch is already ok [15:28:38] i think you guys are right :) self-1 seems better than WIP [15:28:47] i forgot madhu's patch had that [15:29:08] mforns: Thanks for the tip, it's a good one :) [15:29:08] I'm putting this in retro notes to talk about it :) [15:29:28] hehe, in fact milimetric taught me this [15:30:03] Ah, copyright translation then: Thanks milimetric ! [15:30:08] :] [15:31:02] joal: standup? :) [15:31:19] Thx madhu [15:31:33] I joined an alternate one :) [15:48:23] Analytics-Engineering, operations, Privacy: Honor DNT header for access logs & varnish logs - https://phabricator.wikimedia.org/T98831#1535432 (Krenair) [15:49:17] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [15:51:17] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [16:03:55] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: EventLogging Icinga Alerts should look at a longer period of time to prevent false positives {stag} [5 pts] - https://phabricator.wikimedia.org/T108339#1535461 (Milimetric) There's something else weird going on here. It seems to me some of th... [16:09:07] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [16:13:01] ok, excuse my acronyms, but WTF icinga [16:13:07] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [16:13:14] there's literally NO difference in those rates [16:13:20] oh [16:13:26] maybe you have to yell at it? [16:14:58] Ironholds: is this worthy of the funny page? ^ [16:15:37] milimetric, probably! [16:32:57] Good morning Grace :) [16:33:09] wrong window ... [16:51:28] Analytics, Analytics-Kanban: Foundation-only Geowiki stopped updating - https://phabricator.wikimedia.org/T106229#1535823 (kevinator) [17:08:37] Analytics-Backlog: Analyze webrequest data issue on August 3/4 [?pts] {hawk} - https://phabricator.wikimedia.org/T107893#1535926 (JAllemandou) [17:08:39] Analytics, Analytics-Backlog: Foundation-only Geowiki stopped updating - https://phabricator.wikimedia.org/T106229#1535927 (kevinator) [17:10:08] Analytics-General-or-Unknown, Wikidata: Statistics for Wikidata API usage - https://phabricator.wikimedia.org/T64873#1535954 (thiemowmde) [17:11:11] Analytics-General-or-Unknown, Wikidata: [Story] Statistics for Wikidata API usage - https://phabricator.wikimedia.org/T64873#1535957 (thiemowmde) [17:11:53] Analytics-General-or-Unknown, Wikidata: Stats for Wikidata exports - https://phabricator.wikimedia.org/T64874#1535966 (thiemowmde) [17:12:13] Analytics-General-or-Unknown, Wikidata: [Story] Statistics for Wikidata exports - https://phabricator.wikimedia.org/T64874#1535967 (thiemowmde) [17:15:27] hey madhuvishy, seen toby around? i'm suppose to have a 1:1 with him now [17:15:35] ottomata: I'm still at home [17:15:39] ah ok [17:15:41] let's see [17:16:52] Analytics, Analytics-Backlog: Foundation-only Geowiki stopped updating - https://phabricator.wikimedia.org/T106229#1535991 (HaithamS) Related task https://phabricator.wikimedia.org/T89447 [17:17:36] Analytics-Kanban: Foundation-only Geowiki stopped updating - https://phabricator.wikimedia.org/T106229#1535995 (Milimetric) a:Milimetric [17:20:56] Analytics-Backlog: Analyze webrequest data issue on August 3/4 and 10/11 [?pts] {hawk} - https://phabricator.wikimedia.org/T107893#1536005 (Milimetric) p:Triage>Normal [17:21:45] Analytics-Kanban: Foundation-only Geowiki stopped updating - https://phabricator.wikimedia.org/T106229#1536012 (Milimetric) [17:26:23] mforns: joal, should I merge the EL alert change? [17:26:34] I think you can [17:26:36] milimetric: i got 30 mnis til meeting [17:26:38] ottomata, if you find it correct, yes! [17:26:39] wanna chat? [17:26:49] ottomata: we are in grooming :S [17:26:57] joal: i think you guys know as much about how graphite stuff works as me. i still get confused by those things [17:26:58] oh ok [17:27:05] i'm going to merge, and we can always adjust [17:27:14] Great [17:27:22] As we said, best way is to test it [17:27:27] Thx ottomata --^ [17:27:45] ottomata, cool! thx [17:30:31] Analytics-Backlog: Double check Article Title normalization - https://phabricator.wikimedia.org/T108867#1536062 (Milimetric) p:Triage>Normal [17:31:07] Analytics-Backlog: Investigate sample cube pageview_count vs unsampled log pageview count - https://phabricator.wikimedia.org/T108925#1536068 (Milimetric) a:JAllemandou [17:33:11] (CR) Ottomata: "Great! Thanks for being so thorough! One comment inline." (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/231285 (owner: Joal) [17:35:44] ottomata, after this merge, do you know if anything needs restart? [17:36:17] mforns: no, its just puppet runs, it'll take a while to manifest, if you want me to push the runs I can [17:36:24] it has to run on eventlog1001 and then also on icinga host [17:36:37] ottomata, no, no rush [17:37:03] ottomata, wanted to know if I needed to do something else for this to take effect [17:38:02] let's see if the alerts cease, and then I'll also test that the alert continues working for positive cases, doing a small load test with invalid events [17:52:54] (PS2) Joal: Update dataset check dump script [analytics/refinery] - https://gerrit.wikimedia.org/r/231285 [17:53:54] (CR) Ottomata: [C: 2 V: 2] Update dataset check dump script [analytics/refinery] - https://gerrit.wikimedia.org/r/231285 (owner: Joal) [17:54:04] hey joal, i just checked webrequest_sequence_stats for text [17:54:04] Thanks ottomata :) [17:54:11] seeing ~8 % loss regularly? [17:54:13] is that right? [17:54:25] hm, didn't notice that, no [17:54:32] But didn't check, so ... [17:54:48] i just saw that all the raw in the last day had an X [17:54:51] in the report [17:54:56] Yes [17:55:19] I didn't look into it, supposing it was due to misc addtitions [17:55:23] to bits addition sorry [17:55:26] hm [17:55:34] shoudln't be. [17:55:34] hm [17:55:43] bits has been in text for a while [17:55:56] its just that the old bits hosts no longer serve any traffic [17:56:00] so there is no more webrequest bits source [17:56:06] so, the traffic in text hasn't changed in a while [17:56:08] months maybe [17:56:19] this seems related to kafka changes maybe [17:56:33] but weird, even small violumne hosts have this much loss [17:56:35] that is strange [17:57:08] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: EventLogging Icinga Alerts should look at a longer period of time to prevent false positives {stag} [5 pts] - https://phabricator.wikimedia.org/T108339#1536162 (mforns) @Milimetric Oh, that's interesting. It also seems to me that there's an e... [17:57:27] hm [17:57:34] will dive into this after the researcher meeting [17:58:05] k [18:03:07] hello milimetric. will you join us in DevOps meeting? [18:55:12] milimetric: around? [18:55:20] hey madhuvishy [18:55:22] i pushed changes to https://github.com/madhuvishy/restbase/tree/test_projectview [18:55:29] yes, in batcave looking at another problem with the cluster [18:55:33] cool, lemme see [18:55:41] no problem, take a look when you get a chance [18:56:55] I'm assuming the test db is empty before running tests. I've been running test/utils/cleandb.sh before running them everytime, but wouldn't mind writing a function to drop just this keyspace or sth before these tests run [18:57:05] if that would help [18:59:04] mforns: we should talk about next steps on tick! I assume you are waiting on Sean? [18:59:37] madhuvishy, yes you're right [18:59:46] not waiting for Sean yet [19:00:10] I wanted to know your thought about the tasks I created [19:00:22] do you think they are enough? [19:00:26] okay let me look at them now, i haven't [19:00:38] ok, ping me [19:00:40] :] [19:01:13] ummm mforns where are they, sorry i dont seem to know [19:01:35] madhuvishy, they are in the backlog, incoming [19:01:42] mforns: okay looking [19:01:43] marked with {tick} [19:01:55] also I wrote an email yesterday to analytics-internal [19:02:20] right, thanks found both :) [19:15:05] madhuvishy: I think the test runner runs that script, or at least that's what the team told me [19:15:12] so as long as cqlsh is on the path, it'll run it [19:17:06] *the team == the services team [19:27:14] oh cool then [19:31:21] mforns: just read all the descriptions. Sounds good to me [19:31:38] and the schema page deletion we can do ourselves, yeah [19:31:46] madhuvishy, ok, I just have 2 concerns [19:31:53] mm hmm? [19:32:20] 1) will it be possible (or OK with the DBAs) to add a new column to all tables that need bucketizing? [19:32:48] 2) how to implement the white-listing, and if this is a good idea? [19:33:22] 1. right, ya i was wondering if that would be on our side too - we have to ask I guess [19:33:48] ya have to ask both to Sean or someone? [19:34:04] yes [19:34:28] I guess he will point out all details that are not OK [19:34:58] yup! [19:35:08] BTW, in the column "action to take" I added "full aut-purge" for all schemas marked with the option #2 (delete data) also. [19:35:23] I thought it would be less work for us, not to worry to delete the data [19:35:27] yeah alright [19:35:32] it will be deleted after 90 days anyway [19:35:34] it will be deleted anyway [19:35:36] right [19:35:37] ok [19:36:01] OK cool! so I'll write to Sean [19:36:06] cool :) [19:36:20] have you scheduled the meeting with chris yet? [19:36:27] add me to it when you do :) [19:37:04] oh! I already scheduled it for tomorrow 11am [19:37:27] works for me :) [19:37:33] sent you the invite [19:37:37] great, thanks! [19:37:46] and in the meantime I'm finishing the reportcard refactorings [19:38:19] I hope we can finish tick in the next week or two...? [19:38:30] Yeah, fingers crossed! [19:39:49] :] [19:50:20] ottomata: still in cave ? [19:50:26] milimetric: --^ [19:50:28] ? [19:50:42] joal: yes [20:12:00] ottomata: I made silly mistake in the balanced consumer patch - https://gerrit.wikimedia.org/r/#/c/231407/ fixes it [20:32:33] Analytics-Backlog: Bug in pageview title extraction: change spaces to underscores after percent_decode (not only plus signs) - https://phabricator.wikimedia.org/T108866#1537076 (kevinator) [20:33:28] Analytics-Backlog, Research-and-Data: Bug in pageview title extraction: change spaces to underscores after percent_decode (not only plus signs) - https://phabricator.wikimedia.org/T108866#1537079 (DarTar) [20:34:20] Analytics-Backlog, Research-and-Data: Double check Article Title normalization - https://phabricator.wikimedia.org/T108867#1537091 (DarTar) [21:02:55] hellooo milimetric. [21:03:31] hi leila [21:03:51] :) thanks for keeping me honest, I've gotta finish up somethign quickly [21:03:51] So, two things I looked at since yesterday. 1) instead of the Dropdown, we may want to use Flag icon. Since both statements the user will make after hitting on flag are negative, flag can work well there. What do you think? [21:04:02] I'll be on this in 5 minutes or so [21:04:08] haha! okay, milimetric. np. just ping when you're done. sure. no rush [21:04:09] Flag - good [21:07:04] (PS1) Milimetric: [WIP] Add filters above timeseries graphs in the compare layout [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) [21:08:47] (CR) Milimetric: [WIP] Add filters above timeseries graphs in the compare layout (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) (owner: Milimetric) [21:08:59] bmansurov: that's the patch ^ [21:09:00] have fun! [21:09:11] milimetric, thanks! [21:09:36] bmansurov: I added a comment on the difference between what I showed you and what else I needed to do [21:09:39] just a simple path problem [21:09:52] got it [21:09:54] ok leila, so flag icon sounds good, anything else? [21:10:05] question milimetric: Flag or Trash? [21:10:05] (CR) Bmansurov: [WIP] Add filters above timeseries graphs in the compare layout (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) (owner: Milimetric) [21:10:16] trash/delete [21:10:19] hm... :) [21:10:37] probably Flag but I can see how Delete can also make sense [21:10:48] I can maybe make it trash for the dropdown, and put flag next to not notable and delete next to not interested? [21:11:37] let's go with Flag only for now? [21:11:41] k [21:12:04] re campaign name, milimetric: where will you use it? do you expect it to be passed to the URL? [21:12:32] that's where I was passing it right now [21:12:40] i'll get you the format I'm using so we can validate [21:12:45] the reason I'm asking is that if we want to run tests or collect data from a specific editathon, we want to be able to separate that data from the rest of the data easily, and one way is by providing a different url for each campaign, milimetric. [21:13:00] leila: it's this: [21:13:03] https://www.irccloud.com/pastebin/Vvy7tfg3/ [21:13:29] i'm ok with making it configurable from a json file [21:13:35] but not configurable for each pair of languages [21:13:48] like maybe it could be --test-campaign [21:13:58] and --real-campaign [21:14:01] it doesn't need to be configurable for each language pair. [21:14:06] oh, ok [21:14:15] then, I can load it from the json config [21:14:27] 2 things: 1) do we want it to have some version number? in case we add features or change the tool in the future? [21:15:06] 2) if there is a specific test/campaign, we should be able to label it somehow while letting the normal traffic to go through the fixed URL (that changes only when the version changes) [21:15:48] leila: wouldn't the timing work instead of the version number? [21:16:04] no, since at the same time, other people can also use it, right? [21:16:35] other people could use your campaign? [21:16:39] so suppose on top of the regular usage, we want to see the effect of sharing the tool in an editathon, then we need to be able to say which part of the traffic came from that editathon [21:16:54] if they really wanted to, but that would be the link that we only share with the people in the campaign [21:17:15] uh... that seems hard. Like, you'd have to ask people to choose a campaign in the UI almost [21:17:25] how else would the tool know what kind of use it's under? [21:17:52] by the info that is passed via the URL, milimetric? [21:18:16] (I'm probably not understanding some of the technical complexity, so bear with me, please milimetric) [21:18:19] which URL, to the recommender site? [21:18:27] or to CX? [21:18:35] to the recommender site [21:18:59] well, ok, I could look for parameters when people first load the site, but someone could also just go directly to recommend.wmflabs.org [21:19:26] and it's not very clear that people would follow a link with a campaign on it. Some people (like me) just delete those campaigns off URLs anyway [21:19:37] :D [21:19:40] I understand [21:20:06] okay. so let's go with article-recommender-versionNumber? [21:20:23] leila: here's an easier solution [21:20:35] we could change the URL and deploy a different instance for each different type of use case [21:20:44] recommender-editathon.wmflabs.org [21:20:47] or something [21:20:52] if you guys come up with that use case [21:21:06] I see. that's great. and we can do this later, right? [21:21:10] ok, and in the meantime we'll go live with article-recommender-1 [21:21:15] we can do that later, yes [21:21:16] yes, perfect [21:21:21] k, good [21:21:28] nothing else on my end, I semi-promise. :D [21:21:40] lol [21:21:45] k, #working [21:21:52] ottomata: around? [21:22:34] madhuvishy: while ottomata is not around, can you send me the path for where the data you collected yesterday is? :D [21:22:46] leila: :) [21:23:21] it's here - /home/madhuvishy/uniques-report/bot-detection [21:23:31] requests-per-user.hql and requests-per-user.tsv [21:23:46] thanks madhuvishy. [21:24:03] where user is defined as combination of client_ip, ip and user_agent, for the month of july [21:25:06] milimetric: did you get a chance to look at the tests :) if they're fine, what'd be next - finishing up the analytics.yaml part? [21:26:34] madhuvishy: _c3 is the count? [21:26:42] leila: ah yes [21:26:47] thanks madhuvishy [21:28:44] madhuvishy: yes, analytics.yaml is next [21:28:50] I looked at them and they looked good [21:28:57] I haven't had a chance to run them [21:29:06] okay cool, i'll work on that then, thanks :) [21:30:17] madhuvishy: hm, one suggestion for the tests maybe [21:30:21] ya? [21:30:27] instead of fully specifying the insertUrl for each test [21:30:38] leave the last part, the "end" unspecified [21:30:42] and add it in the test [21:30:46] then you can check what you get back has it [21:30:54] and you can make each URL unique for each test [21:31:11] that way, if someone accidentally maps one URL to the wrong function, the tests would find it [21:31:17] madhuvishy: can you make a file that contains only _c3? I have a hard time reading the data into R/python (it seems it's not tab-separated?) [21:31:45] madhuvishy: know what I mean? I can comment on github if you prefer [21:32:19] leila: sure [21:32:24] thanks madhuvishy [21:32:42] milimetric: hmmm, yeah I think so [21:32:57] I mean, I understand [21:33:06] you don't have to agree :) [21:33:10] it's just a thought [21:34:44] milimetric: hmmm. you are saying, to build this part - /en.wikipedia/spider/one/daily/2015070200/100 from inside the test, so people dont go change the wrong url in the global declaration right? [21:35:09] madhuvishy: not the whole part, maybe you can leave /en.wikipedia/spider/one/daily/2015070200/ outside the test [21:35:34] and then inside the test do like partialURL + '2015070301' [21:35:47] and then you can do partialURL + '2015070302' in another test [21:35:53] and partialURL + '2015070303' in another test, etc. [21:36:10] Hmmm, just the number of views, and then check if the returned response has that number? [21:36:11] because I think the db is wiped before each run, not before each test [21:36:30] madhuvishy: yeah, that works! [21:36:32] yeah, but how does this affect that? [21:36:33] something unique [21:37:01] ok, so say someone mapped getProjectData to getArticleData [21:37:13] and your test is just testing that getProjectData returns items.length === 1 [21:37:19] then it would pass [21:37:21] but it would be wrong [21:37:33] right, ya i thought of that before too. makes sense [21:37:38] cool [21:37:49] it's a bit paranoid but just in case :) [21:38:17] :) [21:49:04] (PS1) Mforns: Permit custom db param in report config [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/231438 (https://phabricator.wikimedia.org/T107502) [22:03:15] milimetric: fyi, joseph and i realized what happened was not what we thought. [22:03:23] oh :( [22:03:25] when camus's offset got borked due to that broker restart we did [22:03:28] leila: btw, pull request submitted [22:03:32] it just started off from the beginning of the kafka log [22:03:34] all the work I can think of to be done is done [22:03:43] so it is reimporting everything already [22:03:52] that's why camus runs are taking so long [22:03:55] not because it is stuck and confused [22:04:02] but because it is doing what we were going to try to make it do [22:04:06] then why is 8.33% missing over time? [22:04:11] bad sequence stats script? [22:04:19] because camus is lagging [22:04:21] on that one partition [22:04:39] and our load job gets fired based on that 2 future hour thing [22:04:52] so load just runs before all of the data has been imported [22:05:04] it will catch up though. [22:05:26] but it means we have to rerun load and refine and other dependent jobs [22:05:31] from about the 11th on. [22:06:45] k [22:06:49] col [22:06:51] *cool [22:07:04] so everything's gonna catch up eventually [22:08:20] milimetric: https://github.com/madhuvishy/restbase/blob/test_projectview/test/features/pageviews/basic.js I made changes [22:09:03] milimetric: yes think so. [22:09:15] ooh and leila I ran the script to make a file of just the counts - it's in counts.tsv in the same folder [22:09:54] (just a unix command, dint rerun query) [22:11:16] ottomata2: hiieee, I want to chat with youuu [22:11:51] hiayaa [22:11:53] wassup? [22:12:01] works for me madhuvishy [22:12:25] ottomata2: can i load test on analytics1004 now for the parallel processor stuff? [22:12:40] no, because we still have auto-create topics off on kafka brokers [22:12:44] right [22:12:51] and, since we have to move partitions around next week [22:12:57] i'd rather have fewer of them to move around :) [22:13:02] i tried running it, and it was pulling in no data, and thought may be its all empty? [22:13:14] ya, no data in kafka [22:13:17] no eventloggin gdat [22:13:18] data [22:13:20] ya alright [22:13:21] yes yes [22:13:31] okay that can wait then [22:13:35] ottomata2: oh also [22:13:43] i looked into moving the producer to pykafka [22:13:59] oh! yes? [22:13:59] and it looks like it doesn't implement Async or Keyed Producers [22:14:06] oh hm. [22:14:12] hmm [22:14:12] which is what we're using [22:14:24] so i'm not sure we should move it [22:14:27] hm [22:14:32] we probably should use sync. [22:14:37] oh? [22:15:02] https://github.com/Parsely/pykafka/blob/master/pykafka/producer.py [22:15:53] and we use the simpel producer [22:15:57] by default [22:16:06] keyed might be nice to ahve eventually though [22:16:23] milimetric: are you done from your point of view? should I ask Ellery to check it out? [22:16:27] thanks madhuvishy [22:16:35] ottomata2: hmmm, right [22:17:10] madhuvishy: i thin it is keyed though. [22:18:11] tryign to find what messages are [22:18:19] leila: yes, done, I've sent ellery the pull request [22:18:26] we can talk tomorrow, i've gotta run [22:18:28] ah round it [22:18:30] found* [22:18:48] (PS1) Mforns: Timebox reports and move to reportupdater [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/231446 (https://phabricator.wikimedia.org/T107502) [22:18:52] class Message in common? [22:18:54] uh madhuvishy https://github.com/Parsely/pykafka/blob/e708298e2e16f2dda51e9421045a30b5edbdb33d/pykafka/common.py#L27 [22:18:55] ? [22:18:55] yeah [22:18:59] althpough, no idea how that works [22:19:02] :ivar? [22:19:27] he he me neither [22:19:30] hmm [22:19:31] producer.produce(['test message ' + i ** 2 for i in range(4)]) [22:19:44] ype messages: Iterable of str or (str, str) tuples [22:19:45] ah [22:19:54] ok [22:19:59] so it works as just array of messages [22:20:04] or array of (key, msg) tuples [22:20:13] and, by default the partitioner is random [22:20:21] so it does exactly what we want! [22:20:45] ottomata2: https://github.com/Parsely/pykafka/blob/master/pykafka/protocol.py [22:21:12] (CR) Mforns: [C: -1] "Before deploying, the report files in stat1003 must be transformed from csv to tsv." [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/231446 (https://phabricator.wikimedia.org/T107502) (owner: Mforns) [22:21:30] hm i think that is what is returned by consume [22:21:43] hm [22:21:43] :class:`pykafka.protocol.Message` is used by the producer. [22:22:18] ah yes [22:22:27] madhuvishy: Producer._produce creates instances of that class [22:22:34] yeah [22:22:37] from the iterable of (key,msg) that gets passed to produce() [22:22:58] cool, anyway, eyah, Producer from pykafka looks good [22:23:22] you can do key the same way we currently do [22:23:29] okay then [22:23:33] and just remove the 'producer' param for now [22:23:38] yup [22:23:40] if we want to use a custom partitioner in the future we can implement it [22:23:44] we really just want random parittion for now [22:23:51] alright :) i'll make a task for it [22:25:05] ok, madhuvishy, am heading out for the day [22:25:15] ottomata2: okay :) good night! [22:25:43] laterrrss [22:29:33] Quarry: Every second attempt to use Quarry to do an SQL query fails - https://phabricator.wikimedia.org/T109014#1537736 (Iislucas) NEW [22:30:54] Quarry: Some long queries give no results - https://phabricator.wikimedia.org/T109016#1537760 (Iislucas) NEW [23:04:10] Analytics-Kanban, Patch-For-Review: Check and potentially timebox limn-flow-data reports {tick} [5 pts] - https://phabricator.wikimedia.org/T107502#1537845 (mforns) The refactor seems to work, however there are 2 things to mention: 1. I had to make a small change in reportupdater, and this must be deployed... [23:13:44] good night team! see ya [23:16:12] Analytics-Backlog: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1537892 (Tgr) Are tables dropped automatically now when the schema is set to inactive? Some kind of warning would have been nice :(