[00:09:13] (PS3) Neil P. Quinn-WMF: Update SQL scripts to reflect Edit schema change [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/236237 (https://phabricator.wikimedia.org/T111557) [00:11:29] (CR) Neil P. Quinn-WMF: "Okay, I think I've taken care of this. Look at patch set 3." (4 comments) [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/236237 (https://phabricator.wikimedia.org/T111557) (owner: Neil P. Quinn-WMF) [00:13:38] Analytics-EventLogging, MediaWiki-API, Patch-For-Review: Mediawiki API is returning empty strings for 'required' boolean fields - https://phabricator.wikimedia.org/T97487#1618828 (bd808) Open>Resolved >>! In T97487#1594989, @Milimetric wrote: > The solution in https://phabricator.wikimedia.org/T... [00:32:18] Analytics-Kanban: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1618861 (mforns) Here is the discussed white-list. {F2559632} **This white-list specifies which data must be kept indefinitely, the rest of the data must be auto-purged after 90 days.** It is a TSV file wi... [00:37:11] Analytics-Kanban: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1618887 (mforns) @jcrespo This is the list we talked about. Just to make clear that we can not start auto-purging before T108856 is set up, because we'd loose the history of the column editCount, which is n... [00:37:29] Analytics-Kanban: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1532299 (mforns) [00:37:30] Analytics-Kanban: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1618891 (mforns) [00:42:32] Analytics-Kanban: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1618897 (mforns) a:mforns>jcrespo [00:43:32] Analytics-Kanban: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1532299 (mforns) @jcrespo I assigned the task to you, as we spoke. Please, let me know if I can help you in any way. Thanks! [00:44:24] Analytics-Kanban: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1618903 (mforns) a:mforns>jcrespo [00:45:23] Analytics-Kanban: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1532313 (mforns) @jcrespo Hey, I also assigned this task to you, as we combined. Thanks! [01:17:14] madhuvishy: that google runs js doesn't mean googlebot accepts cookies, those are two different things [01:17:32] madhuvishy: sorry i was not here earlier [01:18:57] madhuvishy: did you find the answer to the EL udp server side issue? [01:31:57] Analytics, Engineering-Community, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1618989 (Qgil) Having metrics about the use of our web APIs is still a good goal per se. The methods described by Dario ha... [02:01:58] (PS2) Nuria: [WIP] Make pageview definition aware of preview parameter [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236800 [10:31:02] (CR) Milimetric: [C: 2 V: 2] Update SQL scripts to reflect Edit schema change [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/236237 (https://phabricator.wikimedia.org/T111557) (owner: Neil P. Quinn-WMF) [10:33:50] marktraceur: looks like http://datasets.wikimedia.org/limn-public-data/metrics/multimedia-health/uploads/ is not rsync-ed yet, so ping me when you're around and we can troubleshoot [10:37:26] (CR) Milimetric: "That might be my fault, did you try "npm install -g karma-cli" ? Also, "npm install" for the other dependencies. I had previously said "" [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) (owner: Milimetric) [10:45:10] Analytics-Kanban: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1620169 (jcrespo) p:High>Triage [10:46:27] Analytics-Kanban: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1620180 (jcrespo) p:High>Triage [10:47:52] Analytics-Kanban: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1620196 (jcrespo) p:High>Triage [11:18:27] joal: I've added changes based on Alex's guidance: https://gerrit.wikimedia.org/r/#/c/231574/3..4/hieradata/role/common/analytics/cassandra.yaml [11:18:36] mind taking a look? [11:18:47] I said I'd get your blessing before we bothered him to look at it [11:50:35] hey milimetric [11:51:06] I'm gonna look at that, but maybe I'll need some help ! [11:52:28] joal: batcave? [11:52:33] sure ! [11:52:47] omw [12:13:30] (PS6) Bmansurov: Add filters above timeseries graphs in the compare layout [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) (owner: Milimetric) [12:14:13] (CR) Bmansurov: "Yes, I followed the readme and everything worked out fine." [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) (owner: Milimetric) [12:14:56] milimetric: I think roughly I put it in the wrong place [12:15:11] milimetric: Looks like I don't have the permissions to put stuff in limn-public-data [12:15:40] (PS7) Bmansurov: Add filters above timeseries graphs in the compare layout [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) (owner: Milimetric) [12:29:26] marktraceur: sux, do you have sudo -su stats access? [12:58:02] ottomata: Morniiing :) [12:58:20] morning! [12:58:37] Quick question: an1015-1016-1017 are already available, or do you need to decommision them from hadoop ? [12:58:47] an15 is avail, i need to decom the others [12:58:51] when do you need them? [12:59:08] Not known yet, but wanted to be sure for us not break anything :) [12:59:36] k [12:59:49] milimetric has patched the puppet code base on alex comments, so hoppefully it could be reasonnably fast [13:09:39] ok, it might be good to start decoming them soon [13:09:42] i will try to start that today [13:09:47] best to do one at a time i thikn [13:09:50] but we can do more if we need to [13:14:41] no need to rush ottomata, we were mostly wondering if those specific servers will be available for this [13:14:46] and if you wanted to rename them [13:17:13] milimetric: they will need to be reinstalled, so ja we can rename [13:17:16] and probably should [13:17:22] what are the restbase casses named? [13:17:39] milimetric: it takes a day or two for a hadoop node to be properly decommed [13:17:42] so it might be good to start now [13:21:48] milimetric: Not sure, I don't think so [13:24:49] marktraceur: what server are you doing this on? stat1002 or 1003? [13:25:15] ottomata: restbase casses? [13:25:44] cassandra servers [13:33:09] milimetric: 1003 [13:35:29] ottomata: restbase100[1-9] https://github.com/wikimedia/operations-puppet/blob/c89185d3f713a262906be5a60c6be091d318db10/hieradata/role/common/restbase.yaml [13:35:42] oh right its colocated [13:36:02] marktraceur: try ssh-ing into there and do "sudo -su stats" [13:36:13] if it asks you for a password, don't try to type it in or someone will yell at you :) [13:36:17] that just means you don't have access [13:36:21] hm, should we rename them using "api" term as suggested by alex ? [13:36:28] brb [13:36:41] something like analytics-api[1-3] ? [13:36:50] milimetric: i doubt he has sudo to stats access [13:37:21] ottomata, milimetric ---^ [13:37:22] ? [13:38:55] milimetric: Oh, oops, I tried to type it in. [13:39:00] * marktraceur braces himself [13:39:14] I guess it's ottomata who will yell. [13:39:25] Sorry ottomata [13:39:30] haha i aint a yeller [13:39:52] marktraceur: https://xkcd.com/838/ [13:46:06] Analytics-Kanban: Change the agent_type UDF to have three possible outputs: spider, bot, user {hawk} [13 pts] - https://phabricator.wikimedia.org/T108598#1620628 (JAllemandou) Thanks to @milimetric I corrected a typo in my usage of Bob's regexp --> It covers more than in the previous report. The numbers in th... [13:46:37] marktraceur: ah, so you have two choices, you either ask for access from Ops-Access-Requests, you need the group statistics-users [13:46:41] (https://wikitech.wikimedia.org/wiki/Analytics/Data_access) [13:46:41] milimetric: --^ with new numbers [13:47:09] thx joal, remind me if I haven't looked until after standup, I'm getting piled on :) [13:47:22] np milimetric, good luck depiling :) [13:47:40] ottomata: have you see my comment on names? [13:48:13] joal / ottomata: on general naming conventions, alex was saying we shouldn't have anything related to our team or software that's used currently, but I think analytics-api works (analytics as in purpose not team) [13:48:18] yes, not a fan of hyphens if we can avoid them but maybe! [13:49:30] hm, ottomata, can't think of no hyphen case here: cassandra is too braos name, restbase is already used an too broad as well ;;; [13:49:56] But analytics-restbase works as well [13:49:56] yeah, but node names are one case where i'm ok with abbreviating and concatenating :) [13:50:04] Right :) [13:50:12] anapi ? [13:51:02] apilytics ? [13:51:28] Trying to avoid the sooooo many puns you can do playing that game [13:52:21] (CR) Mforns: [C: 2 V: 2] "LGTM!" [analytics/wikihadoop] - https://gerrit.wikimedia.org/r/233937 (https://phabricator.wikimedia.org/T108684) (owner: Joal) [13:52:55] Thanks mforns ! [13:53:13] hey, np! [13:53:57] (CR) OliverKeyes: [C: 2 V: 2] [WIP] Make pageview definition aware of preview parameter [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236800 (owner: Nuria) [13:59:02] (CR) Nuria: "ay ay , note that change was still WIP, it was not tested on the cluster yet" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236800 (owner: Nuria) [13:59:39] Ironholds: ay....the changes for pageviews had not been tested on cluster yet, that is why they said WIP [13:59:47] ..whoops [13:59:52] how does one un-do a merge? ;p [13:59:56] Ironholds: so they were not ready to merge [14:00:19] well, this is awkward. [14:00:19] Ironholds: let me test, cause they are not deployed yet [14:00:24] *thumbs up* [14:01:41] milimetric: ...or? [14:02:34] marktraceur: lol, omg, sorry, or you can use the report-updater script runner I was telling you about [14:02:47] for that you have to convert your SQL to timeboxed, but it shouldn't be too bad [14:02:59] * marktraceur isn't so sure but is willing to try [14:03:00] usually it just means adding a date range and taking in the parameters that it fills in [14:03:06] Besides it should be more better in general [14:03:12] marktraceur: which way? access request or sql? [14:03:22] SQL sounds better [14:03:35] As much as I like getting access to random things [14:03:35] * Ironholds sighs [14:03:40] why did we ever rewrite this project [14:04:58] marktraceur: here's a simple example then, of a script that runs with reportupdater: https://gerrit.wikimedia.org/r/#/c/227911/3/mobile/mobile-options.sql [14:05:32] marktraceur: so the easiest way is for me to make the repo and all the stuff you need [14:05:40] wait, do you already have a "limn-multimedia-data" repo? [14:05:58] (the limn-*-data is load bearing - it's a convention we've hard-coded into puppet and makes everything way easier) [14:06:50] milimetric: We have some limn things happening, but I don't know if it has anything in it [14:07:01] I mean if there's a limn-multimedia-data thing [14:07:17] milimetric: I think we have a lot of scripts to pass data around on stat1003 and not much else [14:07:25] We kinda wrote our own scripts for it, I think. [14:07:47] yep, i remember that. [14:07:52] ok, lemme look around for a sec [14:09:06] ok marktraceur looks like there's no repo specifically named "limn-multimedia-data". So I'll make it, and add basic structure, you put SQL into the multimedia/ folder, I'll point you to some examples, and we can go from there [14:10:08] marktraceur: I see, there's a limn-multimedia-data on github. That should be in gerrit, so I've made it in gerrit and we can migrate as you want [14:15:03] Cool, thanks. [14:21:23] (PS1) Milimetric: Add basic structure [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/237098 [14:21:52] marktraceur: https://gerrit.wikimedia.org/r/#/c/237098/ congratulations, it's a baby patch! [14:22:03] I'll comment with some examples there [14:22:18] AOK. [14:22:48] (CR) Milimetric: "Some other repos that use reportupdater for reference:" [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/237098 (owner: Milimetric) [14:25:41] (PS8) Milimetric: Add filters above timeseries graphs in the compare layout [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) [14:26:05] (PS9) Milimetric: Add filters above timeseries graphs in the compare layout [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) [14:26:31] (CR) Milimetric: [C: 2 V: 2] "Baha, this was awesome, thanks very much. I just updated the style of the checkboxes a little from what I originally put in there." [analytics/dashiki] - https://gerrit.wikimedia.org/r/231424 (https://phabricator.wikimedia.org/T104261) (owner: Milimetric) [14:26:37] Ironholds: http://spark.apache.org/releases/spark-release-1-5-0.html [14:28:41] joal, yay, the big day is finally here! [14:30:46] (PS1) Milimetric: Fix optimizer config error for compare [analytics/dashiki] - https://gerrit.wikimedia.org/r/237102 [14:31:04] (CR) Milimetric: [C: 2 V: 2] Fix optimizer config error for compare [analytics/dashiki] - https://gerrit.wikimedia.org/r/237102 (owner: Milimetric) [14:32:56] bmansurov: https://edit-analysis.wmflabs.org/compare/ [14:33:02] (I deployed your changes) [14:33:07] let Neil know [15:03:38] ottomata, yt? [15:03:56] Analytics-Backlog: Give /aggregate-datasets/ on stat1002 open permissions - https://phabricator.wikimedia.org/T111956#1620813 (Ironholds) NEW [15:04:37] milimetric, looks great! [15:30:46] Analytics-Kanban: enforce group-writeable in stat1002:/a/aggregate-datasets/ and stat1003:/a/public-datasets/ - https://phabricator.wikimedia.org/T111956#1620927 (Ironholds) a:Ottomata [15:33:40] (PS1) Milimetric: Version the bundles same as the main scripts [analytics/dashiki] - https://gerrit.wikimedia.org/r/237108 [15:34:04] nuria: I added you to that ^ [15:34:23] k, will look after standup [15:34:45] Analytics-Backlog: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1620957 (ggellerman) [15:35:10] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Make EventLogging monitoring and alerts based on Kafka metrics {stag} [8 pts] - https://phabricator.wikimedia.org/T106254#1620958 (Ottomata) [15:35:41] Analytics-Backlog: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1620963 (ggellerman) [15:36:27] Analytics-Backlog: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1620968 (ggellerman) [15:49:34] halfak: BTW, re. https://meta.wikimedia.org/w/index.php?title=Meta:Requests_for_comment/Enable_flow_in_the_Research_talk_(203)_namespace&diff=13537632&oldid=13535483 note that Flow isn't on TWN – they're probably talking about LQT: https://translatewiki.net/wiki/Special:Version [15:50:09] mforns: https://plus.google.com/hangouts/_/wikimedia.org/am [15:57:22] Interesting. Thanks James_F [15:57:47] halfak: Getting TWN to use Flow and convert their legacy LQT installation is indeed a longer-term objective. :-) [15:57:53] James_F, would you like to note that in the discussion or should I? [15:58:17] halfak: I'm intentionally staying out of the conversation so far. [15:58:26] (Sorry.) [15:58:28] kk. Will post then. Thanks for the info. [15:58:43] No not at all. Your best judgment and all that :) [16:01:41] Analytics-Kanban: Change the agent_type UDF to have three possible outputs: spider, bot, user {hawk} [13 pts] - https://phabricator.wikimedia.org/T108598#1621095 (Milimetric) As for the WikiBot convention. Joseph was just saying that MediawikiBot is used by Bing in the user agent. So we have to be more care... [16:09:47] (CR) MarkTraceur: [C: 2] "Merging so I have something to build on" [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/237098 (owner: Milimetric) [16:11:20] (CR) Nuria: [C: 2 V: 2] "Tested locally, looks great." [analytics/dashiki] - https://gerrit.wikimedia.org/r/237108 (owner: Milimetric) [16:12:35] milimetric: I guess Jenkins won't merge for me. :( [16:12:42] (CR) MarkTraceur: [V: 2] "RIGHT." [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/237098 (owner: Milimetric) [16:13:21] Analytics-Backlog, Analytics-Dashiki, Editing-Analysis, VisualEditor, Patch-For-Review: Improve the edit analysis dashboard {lion} - https://phabricator.wikimedia.org/T104261#1621141 (bmansurov) [16:14:36] milimetric: Does reportupdater do different time ranges? I want per-month, not per-day [16:16:28] marktraceur: yes, monthly is fine, I just saw daily in some of your files yesterday [16:17:38] Some of the old ones maybe [16:17:46] But those are event tracking, so it makes sense [16:17:55] Uploads and unique uploaders per day is less useful IMO [16:20:45] Analytics-Kanban: Change the agent_type UDF to have three possible outputs: spider, bot, user {hawk} [13 pts] - https://phabricator.wikimedia.org/T108598#1621169 (JAllemandou) No bot match the '.*WikimediaBot.*' for the hour of analysis while the list of bots containing WikiBot is not: DotNetWikiBot/2.101 (M... [16:24:34] ottomata: wanna continue the deployment? [16:24:51] Analytics, Engineering-Community, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1621206 (SVentura) @Qgil there is no disagreement here. "I'd rather focus on obtaining useful metrics of our web APIs" Co... [16:26:58] madhuvishy: ya first i gotta merge that metrics chnage, just made some lucnh [16:27:33] ottomata: okay cool, ping me. [16:28:38] (PS1) MarkTraceur: Flesh out SQL, tweak configuration [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/237134 [16:28:42] milimetric: ^^ :) [16:29:49] marktraceur: I'll check it out after lunch [16:29:55] Excellent plan. [16:30:02] Both lunch, and looking at the patch. [16:32:35] joal: if (at your convenience) you could look at the patch that ironholds merged just in case something might jump to your eye... i need a few minutes to troubleshoot my internet connection and will be testing on cluster shortly [16:35:33] joal: https://gerrit.wikimedia.org/r/#/c/236800/ [16:40:48] milimetric: If I wanted to add a metric to that collection that I didn't think I could use sql for, would I need to bend over backwards to do it now? [16:41:07] nuria: after interview :) [16:41:16] joal: whenever, of course. [16:50:44] Analytics, Engineering-Community, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1621294 (Qgil) [16:54:00] Analytics, Engineering-Community, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1621297 (Qgil) I have added a "Metrics requested" section in the description and I have added the metric that the upcoming... [17:01:43] Analytics, Engineering-Community, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1621334 (SVentura) @qgil, you're right, different groups will have different KPI lenses - we're meeting this afternoon to... [17:03:04] joal, do you want to debrief? [17:03:10] hehe mforns [17:03:15] was writing the same :) [17:03:18] xD [17:03:20] batcave [17:03:24] omw [17:04:59] marktraceur: we were going to add arbitrary script running for another project, but we don't have that yet. You can talk to mforns, he said he was going to do it in his volunteer time. I'd be obliged to you if you gave him a reason to do it during working hours :) [17:07:49] Got it. [17:08:08] milimetric: I ask because I have a metric, "illustrated pages", which I'm probably going to need to use bloody mwgrep for [17:08:19] Or something. [17:08:22] Maybe I could use imagelinks [17:11:33] ottomata: heya, can you come to batcave for a minute ? [17:11:44] we're with mforns discussing Julio's interview [17:11:51] (CR) Milimetric: Flesh out SQL, tweak configuration (5 comments) [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/237134 (owner: MarkTraceur) [17:12:21] milimetric, marktraceur, I've already started this, will take one week or two I thjink [17:12:39] joal in meeting with d'ana [17:12:40] and kevin [17:12:52] ok ottomata, let's discuss later [17:12:55] thx :) [17:14:04] I'm actually starting to think I could use imagelinks. [17:14:12] But I also approve of your efforts, mforns [17:14:21] Analytics-EventLogging, Patch-For-Review: Kafka Client for MediaWiki - https://phabricator.wikimedia.org/T106256#1621406 (csteipp) [17:15:26] marktraceur, cool, let me know if you decide to go with scripts, thanks! [17:15:45] Analytics-Tech-community-metrics, ECT-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1621412 (Qgil) This is one case where, for once, we are happy with a simple report and we don't need changes in the dashboard... [17:22:46] Analytics-Tech-community-metrics, ECT-September-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1621439 (Qgil) As I see it, our interest in Gerrit data is mostly about the present: how is the queue doing? are we progressing? The interest in Git data... [17:56:41] Analytics-Kanban, Patch-For-Review: Write scripts to track cycle time of tasked tickets and velocity [8 pts] - https://phabricator.wikimedia.org/T108209#1621599 (kevinator) Open>Resolved [18:06:31] Analytics-Kanban, Patch-For-Review: Write scripts to track cycle time of tasked tickets and velocity [8 pts] - https://phabricator.wikimedia.org/T108209#1621639 (ksmith) This could be really useful to other teams, so please share the results publicly when you are ready. [18:27:29] Question for the group...if you wanted historical data on whether a page had an image (or rather, how many pages on a wiki had images), how would you do that? [18:27:50] If I wanted to start tracking that now, I can use page_image in page_props, but historical data isn't a thing in page_props [18:28:09] I could run through every revision, parse it, parse the old revision, and count image transclusions, but ew. [18:29:13] marktraceur: I don't have a better idea :( [18:32:15] I'm not sure there is one...I might be stuck choosing between a long-running script and not having historical data [18:34:09] marktraceur: Ask halfak, he might have a data set that would better to work with (diffs instead of text) [18:35:11] marktraceur, I can show you how to write a 20 line python script that will generate your answer overnight on stat3. [18:35:36] halfak: how awesome :) [18:38:35] marktraceur, how OK is it to only consider [[File|Image:...]]? [18:39:10] halfak: Roughly OK, but there's a good number of pages that only have an image in the infobox, and that's OK [18:39:16] Like, that should count [18:39:30] OK. We might need to be clever for that, but not that clever. [18:39:44] halfak: Overnight for all of the numbers historically? [18:39:50] Yeah [18:39:56] * halfak flexes muscles [18:40:13] Damn :) [18:41:01] I guess I should write up a sql query for daily stats on page_props, then. [18:41:38] * halfak works on gist [18:41:57] mforns_gym: madhuvishy, deployed eventlogging alert change, looks good [18:43:52] Actually, I guess just running the script on revisions starting from the last one would be fine... [18:44:02] halfak: Thanks for the help :) [18:44:09] Sure. No problem. :) [18:44:22] halfak: Will the data get put in a database on stat3, then? [18:44:32] Or just output as TSV? [18:44:41] You'll be running the script. You can output however you like. [18:44:45] Oh, rockin' [18:44:49] I suggest TSV and load that into MySQL if necessary [18:44:55] Here's where I try to confirm that I have access to stat3 [18:45:38] Weren't you just asking about the MySQL password? [18:45:48] For stat1003 [18:46:01] Sorry, stat3 == stat1003 [18:46:04] Ah. [18:46:07] I'm just careless with my typing :) [18:46:24] I type the damn name so often! [18:46:57] What chars are valid in a filename? [18:47:00] Well, in that case, I might like to have write access to mysql so I can dump changes (i.e. "new image" or "removed image" on any revision would generate a row) [18:47:14] halfak: Hmm, should be anything but slash, colon, and # [18:47:24] Maybe a few others are illegal [18:47:32] Bar, I suppose [18:47:34] Maybe {{}} [18:47:36] Dunno if [] and {} are illegal [18:47:41] I guess they'd have to be [18:47:44] That'd be weird [18:48:02] Dunno if you've noticed, mediawiki is pretty weird [18:49:07] :D [18:49:56] mark, do you want a count of images? [18:50:04] Do you want to know when new images are added? [18:50:22] I'm thinking that changes per-revision is the best way to do it [18:50:40] So, a count per revision or only output those revisions that have a change in the set of images? [18:51:24] Like, revision 129 | +2 | 20040918101010 || revision 212 | -3 | 20040921123821 [18:51:56] At least...hm [18:51:57] Sure. [18:52:09] Do you care if an image was replaced? [18:52:16] Now I have to think about this, actually. I ultimately need a count, so changes could be aggregated but maybe slowly [18:52:24] That would be a nil change, I think [18:52:31] So no change for my purposes [18:52:44] Sure. We can always take multiple passes too if you want something else. [18:53:03] I guess I can do sum(img_delta) where timestamp < x and timestamp > y [18:54:31] A few more than 20 lines. You're going to need to debug my regexes. [18:54:56] I'm cool with that. [18:59:13] 58 lines https://gist.github.com/halfak/c7a6bb267fcefb3aa14c [18:59:34] But it comes complete with docopt and good script structure. [19:00:30] See also http://pythonhosted.org/mwxml/ [19:00:34] marktraceur, ^ [19:00:46] (Python 3 is required) [19:01:01] If you haven't done the whole virtualenv dance, I can show you how to do that too. [19:02:06] I already see a couple of typos. [19:02:22] I hope they're as obvious to you :) [19:05:00] Thanks halfak [19:06:23] 2 spaces!?!?!? [19:06:25] :P [19:06:50] marktraceur, it was the default in the gist editor :( [19:06:58] I usually go for 4 spaces [19:07:13] Yes, yes, no problem [19:15:57] (CR) Joal: "Didn't reviewed the tests, quite some comments to discuss." (7 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236800 (owner: Nuria) [19:16:08] halfak: Looks mostly fine to me... [19:16:19] One issue in the regexes, and I added some extensions [19:16:32] Hey nuria, reviewed the commited code --^ [19:16:34] Yeah. I kinda rushed those. [19:16:40] Glad it seems reasonable to you. [19:16:57] Let's discuss the comments tomorrow (end of day for me) [19:17:09] BTW, you might also consider using mwparserfromhell to parse links and template params *BUT* that's substantially slower than the regex scan. [19:17:17] halfak: Does stat1003 have the python libraries or do I need to do "the whole virtualenv dance" as well? [19:17:29] You need to do the virtualenv dance. [19:17:39] But this is happy. It will give you flexibility. [19:17:42] (I'm going to assume that false positives are relatively rare and risk it, not doing the parser) [19:17:43] And it's easy with a little help. [19:17:51] * marktraceur is prepared. [19:18:00] marktraceur, I'd suggest writing up some test cases for those regexes. [19:18:38] Fair enough [19:18:43] https://gist.github.com/halfak/9f4830895496af9e9731 [19:18:46] ^ How to virtualenv [19:19:00] Those commands will set you up like I manage my virtualenvs [19:19:15] commands should all work as expected on stat1003 [19:22:05] Coolio. [19:25:56] halfak: I believe "pip install mwxml" is not a thing I should/can do on the cluster [19:26:16] Oh! You have to set up a proxy for http [19:26:19] * halfak gets the docs [19:26:37] https://wikitech.wikimedia.org/wiki/Http_proxy [19:26:56] Note the https proxy points to an http URL and that is OK [19:27:00] (apparently) [19:27:14] Geez. [19:27:50] Hooray. [19:27:56] halfak: didn't know about python virtual envs ! [19:28:03] halfak: thx for teaching :) [19:28:09] :D! [19:28:52] halfak: How would you suggest testing the script? I can come up with examples, just need to know the format I guess [19:29:01] Maybe just passing straight text in... [19:29:51] I'd write a separate script that imports extract_image_links and runs a few chunks of text against it and compares the output. [19:30:01] a-team, I'm off for today ! [19:30:09] o/ joal [19:30:19] nuria: ping me tomorrow whenever you want to dicuss the CR [19:30:34] milimetric: I'll try to contact alex tomorrow about machine choice and puppet [19:31:09] See you tomorrow [19:31:30] ok joal, I'll try to address marco's comments by then [19:31:38] have a nice night [19:31:53] oh, I have seen them :( [19:31:59] I'll have a look tomorrow [19:32:18] halfak: OK, it looks like it's working fine to me...final piece of the puzzle, where is the dump? [19:35:56] marktraceur, lates enwiki is stat1003:/mnt/data/xmldatadumps/public/enwiki/20150901/ [19:36:10] Fun times [19:36:29] BTW, when you kick that script off, it's going to use all of the CPUs on stat1003 [19:36:33] So you should NICE it. [19:36:53] NICE? [19:37:57] "man nice" [19:38:12] TL;DR: it lowers the priority of your processes so that other processes can take priority. [19:38:26] This allows you to use all of the computation resources without degrading performance for others. [19:38:31] Neat. [19:38:33] Er, nice. [19:38:43] Basically "nice " [19:38:51] And it all just works. [19:38:53] I'll probably do something as generous as nice -n 19 because I don't have any particular timeframe [19:39:10] So "nice python image_link_deltas.py /mnt/data/..." [19:39:14] No! [19:39:22] Lower numbers are higher priority [19:39:37] Right, so 19 is lowest priority [19:39:43] Wait... hmmm [19:40:02] Oh! It adds that onto the default priority. [19:40:15] Default is 20, so -20 would make the priority 0 [19:40:24] I dunno why they don't just have you set the priority directly. [19:40:50] Anyway, -n19 would be appreciated. [19:40:57] Awesome. [19:41:12] And these dump files, the latest non-tmp one has all the revisions? [19:41:39] I'll give you a string. One sec. [19:42:34] You need to process 177 files. This GLOB will get them all /mnt/data/xmldatadumps/public/enwiki/20150901/enwiki-20150901-pages-meta-history*.xml*.bz2 [19:42:54] You can pass that GLOB to the script as the ... arg and it will work as expected. [19:43:04] Ah, K. [19:43:14] so "nice python extract_image_deltas.py /mnt/data/xmldatadumps/public/enwiki/20150901/enwiki-20150901-pages-meta-history*.xml*.bz2 > my_output.tsv" [19:44:41] I guess you forgot to put in a call to main() [19:44:44] * marktraceur does [19:45:02] Yea. [19:45:16] if __name__ == "__main__": main() [19:45:25] That way, you can import it and it won't try to call the main() function [19:46:05] Yup [19:46:18] And...trying to get the len() of a generator [19:47:07] OK, running now [19:47:10] ha [19:47:27] I can see it churning :) [19:47:54] using 0.1% of CPU and 100% of CPU at the same time :) [19:47:59] Fun. [19:48:15] > Avail 48G [19:48:17] Should be fine. [19:48:22] :P [19:58:48] ottomata: around? I am around if you want to continue deploying [19:58:56] hey! [19:59:13] yes, have been sidetracked all day by meetings, and have been slowly finishing up the eventlogging monitoring ticket [19:59:14] just merged this [19:59:17] https://gerrit.wikimedia.org/r/#/c/237188/ [19:59:48] ottomata: ah cool! [20:00:19] so, yes!, atlhough, i don't have much time to work more tonight [20:00:22] got about an hour left. [20:00:22] hm. [20:01:16] ottomata: we can switch a few more pieces may be? [20:01:54] also, we should write down the rest of the deployment plan? [20:01:59] yeah, lets try to do at least one. i think the next peace is the client side processor [20:02:04] ok [20:02:10] cool, batcave? [20:02:14] https://etherpad.wikimedia.org/p/eventlogging_stag [20:02:17] madhuvishy: lets IRC [20:02:22] alright [20:03:47] madhuvishy: do you think it is better to run 2 client side process [20:03:53] one still consuming from zmq, and the other from kafka [20:03:55] or 1 process [20:03:58] with an extra output [20:03:59] so [20:04:04] hm, will write in ether pad [20:05:09] hmmm, ottomata where do the balanced processors come in? [20:05:46] madhuvishy: anything consuming from kafka will use balanced processor [20:05:51] i'm not going to start more than one of them yet [20:05:55] we'll do that after we turn of zmq [20:06:00] ottomata: right, okay makes sense [20:06:12] so, see etherpad, i'm not sure if it really matters which of those we do. [20:06:29] 2 might be simpler, because the existing one then doesn't change at all [20:06:43] and then to turn zmq off we just remove it [20:07:06] also, that is how things are right now, except the kafka client side processor is running on an10 [20:07:13] this woudl just be moving it to eventlog1001 [20:07:16] hmmm [20:07:22] yeah i guess that is better [20:07:23] ja? [20:07:42] hmmm, thinking [20:09:04] ottomata: will we do the consumers right away? [20:09:34] if not we'd be writing to kafka for a while and not consume to mysql no? if we go with the second option and not process from the zmq stream [20:10:04] the second option has a zmq stream [20:10:07] its the 3rd output [20:10:50] ottomata: aah right, okay that makes sense, then it would be easy to turn of the zmq forwarder [20:10:55] lets go with that [20:11:05] the 2nd option? [20:11:29] yup [20:11:31] ok [20:14:32] ottomata: what does the multiplexer do now? [20:15:29] it consumes from the processed client and server side zmq streams and joins them into the all-events stream on :8600 [20:16:15] ottomata: ah okay [20:20:56] Analytics-EventLogging, Analytics-Kanban: Send raw client side events to Kafka using varnishkafka instead of varnishncsa {stag} - https://phabricator.wikimedia.org/T106255#1622331 (Ottomata) a:Ottomata [20:21:42] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Deploy EventLogging on Kafka to eventlog1001 (aka production!) {stag} [3 pts] - https://phabricator.wikimedia.org/T106260#1622336 (Ottomata) [20:22:32] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Deploy EventLogging on Kafka to eventlog1001 (aka production!) {stag} [3 pts] - https://phabricator.wikimedia.org/T106260#1622345 (Ottomata) a:Ottomata [20:22:38] madhuvishy: https://gerrit.wikimedia.org/r/#/c/237249/ [20:25:17] ottomata: small typo in a comment - These will auto balance amongts themselves. s/amongts/amongst [20:25:22] otherwise looks good [20:26:27] fixed. [20:28:03] madhuvishy: ok, i'm going to stop puppet on prod hosts, and test in labs. [20:28:25] ottomata: alright [20:29:01] Analytics-EventLogging, MediaWiki-extensions-NavigationTiming, Performance-Team, operations: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622378 (Krinkle) NEW [20:29:41] Analytics-EventLogging, MediaWiki-extensions-NavigationTiming, Performance-Team, operations: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622394 (Krinkle) p:Triage>High [20:30:29] joal|night: sounds good, let's discusstomorrow. [20:30:29] Analytics-EventLogging, MediaWiki-extensions-NavigationTiming, Performance-Team, operations: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622378 (Krinkle) [20:32:03] oh madhuvishy i jsut realized i broke something sorta earlier [20:32:11] the raw server side events log is no longer being output [20:32:20] because I disable the raw server side zmq stream [20:32:32] do we need to output that? [20:32:41] now that events are buffered in kafka? [20:34:04] hmmm, so the server side zmq forwarder is off, but it's forwarding to kafka [20:34:19] yeah, i think it's fine [20:34:46] yes [20:34:53] (CR) Nuria: "Joseph: can you "unmerge" these changes or is that something we have to do via gerrit." (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236800 (owner: Nuria) [20:35:05] the only reason we output raw files was to deal with backfilling, right? [20:35:06] but the server side processor is still writing to zmq [20:35:09] yes [20:35:46] ottomata: aah i don't know that, but because they are in kafka it feels pretty safe [20:36:28] aye [20:36:37] ok, looks good in labs [20:36:40] letting puppet run in rpod [20:38:37] hmm, madhuvishy, i lied, the config chagnes that i just said I made, i made [20:38:42] but i hadn't restarted eventlogging on eventlog1001 [20:38:46] so, i can safely turn the zmq one back on [20:38:51] before i restart [20:38:54] i think i will, just to be safe [20:38:59] ottomata: okay fine [20:42:22] PROBLEM - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: processor/client-side-0 [20:42:54] that's fine! gah, icinga i haven't started it yet, geeez! [20:43:00] sorry for the pings [20:43:27] :D [20:43:52] ok, restarting el! [20:44:32] RECOVERY - Check status of defined EventLogging jobs on eventlog1001 is OK: OK: All defined EventLogging jobs are runnning. [20:44:42] :) [20:45:36] madhuvishy: looking good! [20:45:43] awesome [20:46:20] want to do the consumers now? [20:46:45] hmm, Uhh [20:47:13] if it's late, we can do tomorrow [20:47:18] yeah, kinda late. hm. [20:47:32] madhuvishy: maybe files? [20:47:34] ottomata, sorry I wasn't there to +1 your patch, but you alredy knew right? [20:47:39] yup :) [20:47:40] thank you! [20:47:42] ottomata: okay [20:47:44] madhuvishy: lets do files. [20:47:46] that is easy [20:47:48] and hurts nobody :) [20:47:51] :) [20:48:35] ottomata: because we have both streams, won't we be writing everything twice to files? [20:48:51] no [20:48:56] we choose which one we want to consume from [20:49:13] oh okay, so effectively, we stop consuming from zmq [20:49:17] cool [20:50:39] yup [20:53:51] Analytics-Kanban, Patch-For-Review: Write scripts to track cycle time of tasked tickets and velocity [8 pts] - https://phabricator.wikimedia.org/T108209#1622474 (mforns) @ksmith Sure, maybe I can give a lightning talk about this. Thanks for the heads up! [20:53:52] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Deploy EventLogging on Kafka to eventlog1001 (aka production!) {stag} [8 pts] - https://phabricator.wikimedia.org/T106260#1622475 (Ottomata) [20:54:33] madhuvishy: ^ [20:54:43] https://gerrit.wikimedia.org/r/#/c/237261/ [20:57:22] ottomata: cool [20:58:25] Analytics-Kanban, Patch-For-Review: Write scripts to track cycle time of tasked tickets and velocity [8 pts] - https://phabricator.wikimedia.org/T108209#1622487 (ggellerman) @ksmith: I've also let @Awjrichards know about the script as well [20:58:50] I wonder if the auto commit interval is defined repeatedly in many places, and if we want to change it, we have to change each consumer. but it also makes sense that we might want to only change it for the processor etc [21:00:11] madhuvishy: it is defined as a default [21:00:15] it is changeable via hiera [21:00:30] but ja, i used the same variable for both processors, and for all file consumers [21:00:41] we can refactor to be more flexible if we need to [21:00:47] ottomata: yeah okay [21:04:52] Analytics-Engineering, Community-Tech: [AOI] Add page view statistics to page information pages (action=info) - https://phabricator.wikimedia.org/T110147#1622527 (kaldari) @Milimetric: Is there a Phabricator task for that API? Just wanting to add the proper blocker. [21:08:54] Analytics-Engineering, Community-Tech: [AOI] Add page view statistics to page information pages (action=info) - https://phabricator.wikimedia.org/T110147#1622567 (Milimetric) You can use the one you have in the description, @kaldari: T44259 That's the main task that I promised I'd keep updated with progre... [21:10:37] Analytics-Engineering, Community-Tech: [AOI] Add page view statistics to page information pages (action=info) - https://phabricator.wikimedia.org/T110147#1622585 (kaldari) [21:10:48] madhuvishy: looking good! [21:10:50] Analytics-Engineering, Community-Tech: [AOI] Add page view statistics to page information pages (action=info) - https://phabricator.wikimedia.org/T110147#1569847 (kaldari) [21:10:55] ottomata: yay [21:11:09] all events at leastz [21:11:11] mh [21:11:19] serer side and client side... [21:11:19] hm [21:11:21] okay so almost there [21:12:18] hmmm [21:12:27] OH [21:12:31] i took off raw=True [21:12:32] fixing. [21:15:14] thaats better [21:15:15] cool [21:15:33] ok! [21:15:36] they are going [21:15:38] zuper cool. [21:15:52] madhuvishy: we shoudl let this run, and then tomorrow, we can compare archived files on stat1002 [21:16:00] and make sure things look right [21:16:06] # events, file sizes, etc. [21:16:16] ottomata: yeah sounds good [21:17:30] ok, cool, thanks madhuvishy time for me to run! [21:17:32] tty tomorrow [21:17:44] i'll be sorta around for another 30 mintes, gotta pack some stuff [21:17:50] so i'll check back and glance to make sure its ok [21:17:55] ottomata: byee! okay :) [21:18:05] bye ottomata see ya [21:31:44] (PS1) Nuria: Revert "[WIP] Make pageview definition aware of preview parameter" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237272 [21:32:44] milimetric, madhu or mforns could you check that this is right to revert the changes that Ironholds merged this morning, that way we do not have dirty code in the refinery master branch [21:32:48] https://gerrit.wikimedia.org/r/#/c/237272/ [21:32:56] sorry again :/ [21:33:08] yes nuria I chan check [21:33:10] Ironholds: np at all [21:33:12] *can [21:33:17] Ironholds: that is what git revert is for [21:33:55] nuria: not familiar with the patch, did you just use git revert? [21:34:19] milimetric: yes, patch was not done and labelled as "WIP" so it is not finished [21:34:31] milimetric: i just git revert [21:34:47] milimetric: and push to gerrit, which i think *sounds* right [21:35:01] nuria: sounds good to me [21:35:48] Analytics-EventLogging, MediaWiki-extensions-NavigationTiming, Performance-Team, operations: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622642 (BBlack) Varnish is the primary issue here. Raising shm_reclen is non-trivial, especially to much-larger values. W... [21:37:12] milimetric: i thought there might be a "revert" button on gerrit but ... i do not see [21:37:35] nuria: yeah, there's a "revert change" button on a merged change [21:37:39] nuria, milimetric, looks good to me, too. May I merge it? [21:37:54] milimetric: where is revert button cc mforns [21:37:56] ? [21:38:45] nuria, next to review button [21:39:27] yea, maybe better to revert via gerrit [21:39:29] mforns: ahahahahah [21:39:36] ok, that makes sense [21:39:37] :] [21:39:41] let's use the force then [21:40:19] Analytics-EventLogging, MediaWiki-extensions-NavigationTiming, Performance-Team, operations: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622651 (ori) >>! In T112002#1622642, @BBlack wrote: > I think we can work out how to raise it to 2048 safely pretty easily.... [21:40:20] (Abandoned) Nuria: Revert "[WIP] Make pageview definition aware of preview parameter" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237272 (owner: Nuria) [21:40:32] (PS1) Nuria: Revert "[WIP] Make pageview definition aware of preview parameter" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237273 [21:41:17] mforns, milimetric made a new change via gerrit: https://gerrit.wikimedia.org/r/#/c/237273/ [21:41:21] will push then [21:41:24] rightt? [21:41:36] nuria: sounds good [21:41:44] that way the commit is linked and all [21:41:53] (CR) Nuria: [C: 2 V: 2] Revert "[WIP] Make pageview definition aware of preview parameter" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237273 (owner: Nuria) [21:42:43] nuria, oh! so when you push that button, gerrit creates a new change for you? [21:42:54] mforns: ya, executes git revert i guess [21:43:05] cool :] [21:46:36] (PS1) Nuria: [WIP] Make pageview definition aware of preview parameter [analytics/refinery/source] - https://gerrit.wikimedia.org/r/237274 [21:47:17] Analytics-Kanban: Document work so far on Last access uniques and validating the numbers {bear} - https://phabricator.wikimedia.org/T112010#1622658 (madhuvishy) NEW a:madhuvishy [21:51:21] Analytics, Analytics-Kanban: Transform to XML-->JSON in sorted file format [8 pts] - https://phabricator.wikimedia.org/T108684#1622669 (kevinator) Open>Resolved [21:58:23] (CR) Nuria: "Please see: https://gerrit.wikimedia.org/r/#/c/237274/ for continuation of patch" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/236800 (owner: Nuria) [22:17:24] Analytics, Engineering-Community, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1622754 (Tgr) > Number of active users of Wikimedia web APIs hosted in Wikimedia Labs and third party servers (requested b... [22:29:20] Analytics, Engineering-Community, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1622808 (Tgr) We discussed this recently in Reading Infrastructure; there are three approaches to get the data: # add user... [22:35:16] Hm.. is there any measuring of eventlogging errors? E.g. parse errors or other validation errors [22:36:06] Krinkle, there's the validation logs in the eventlogging machine [22:36:41] I'd like to add client-side logging for urls that are too long, so that I can e.g. graph 'eventlogging.errors.*.*' in Grafana to see if something goes wrong. Right now it's very risky for developers to break schemas that are too long and you'll never find out because the client swallows it (e.g. "it works on my machine", but breaks in prod for some users where there are different values in the sa [22:36:41] me properites) [22:37:06] mforns: Where does this logic live and which schema does it use (if any) or does it write to statsd direcrtlyt? [22:37:07] we're considering sending the eventlogging validation logs to logstash, would that be useful to you? [22:37:38] Logtash is useful to investigate problems, but not so to discover problems and trends. [22:37:52] Having a simple incrementer for invalidations would be useful [22:38:08] e.g. statsd/graphite eventlogging.error.. or something liek that [22:38:12] Krinkle, but if the logs are in logstash, they can be browsed in kibana, no? [22:38:59] yeah, but I don't think it's the kind of place one would look when youre not in charge of that area. E.g. I look in kibana frequently for resourceloader and memcached [22:39:03] but I don't maintain eventlogging itself [22:39:06] I wouldn't look there [22:39:13] things get deployed and they seem to work [22:39:35] Krinkle, graphite holds metrics by schema, but they are only a valid event count [22:39:37] but I can add a small graph to the performance grafana dashboard to become red if there are errors in any of our schemas [22:39:43] we can have both. we have the technology to count logstash events in graphite via statsd integration [22:39:47] not invalid, not broken down by error [22:39:58] Yeah, that's why I'm asking [22:40:11] aha [22:40:33] Yep, bd808, that's exactly what Im recommending. statsd for trends and historical measures. logstash for recent data in detail (e.g complete packets, not aggregated) [22:41:45] mforns: I could add code to WikimedaiEvents maybe that listens for 'eventlogging.error' and pushes a metric to statsv perhaps? [22:41:53] And update the server to do that for any errors as well [22:41:58] Krinkle: all invalid events will now be written to Eventlogging_eventerror schema [22:42:11] when the kafka switch is complete (hopefully tomorrow) [22:42:12] Where does the server code live? [22:42:22] then it should be fairly easy to monitor [22:42:31] Krinkle, bd808, I don't have any experience with logstash, but I believe that if the logs are pushed there, all teams can: 1) Debug there for new schemas 2) create patterns to count any event and redirect that into statsd and graphite no? [22:43:29] Krinkle: https://github.com/wikimedia/mediawiki-extensions-EventLogging/tree/master/server [22:44:45] I don't think we send to statsd after logstash, that would happen beforehand from the application that sends the log [22:44:52] mforns: sounds right. The trick for logstash is getting structured log events in so that useful analysis can be done. [22:45:19] madhuvishy: Ah, we use that in production? [22:45:23] we do now count mediawiki log events by type + channel into graphite from logstash [22:45:54] Krinkle: we are currently deploying all the kafka parts and tomorrow the existing zeromq system will be disabled [22:46:14] after that all the invalid events will go into EventError topic in Kafka [22:46:39] i'm sure we'll add a graph measuring counts of events that go into that topic on graphite [22:47:05] ottomata has been working on this majorly, i've been helping out [22:47:18] Krinkle: here's an example of how logstash can push to statsd -- https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/logstash.pp#L137-L143 [22:47:27] bd808, cool thanks [22:47:32] I'd like something slightly more details than just a global count of EventError (which we could already get through eventlogging.schema.* count) but e.g. broken down by high-level error types and most importantly which schema is erroring. [22:47:49] that way the metric becomes useful for individual teams to subscribe to their schemas and their errors [22:48:07] Krinkle: yeah, i'm sure we can do that - need to have a consumer for the eventerror topic that will log metrics via statsd [22:48:19] Yeah [22:48:20] that's one way [22:48:32] So zmq going away feels scary [22:48:41] But I'm sure it's an easy transition? [22:48:50] at the moment both systems are up [22:49:23] statsv has been switched to use kafka I see. [22:49:25] https://github.com/wikimedia/operations-puppet/tree/production/modules/webperf/files [22:49:36] but navtiming, deprecate and ve still use zmw [22:49:37] zmq [22:50:22] so all the events are being written to kafka. there is a files consumer that is not consuming from the zmq stream anymore, we are switching out the mysql one tomorrow. after that we can safely disable the zmq forwarder and processors. [22:50:36] your stuff on hafnium will die though! [22:50:52] They're quite actively used so I'd like to minimise any disruption as we actively make changes and react to the data every day. [22:51:09] aah [22:51:11] okay [22:51:23] yes, what Krinkle said [22:52:01] As I understand ori and I have both kind of not kept track of all the infrastructure improvements. I know I like them, a lot. But not sure where to begin with the migration at the moment. [22:52:32] Krinkle: i don't understand fully, could you explain what your dependency on zmq is? [22:52:41] https://github.com/wikimedia/operations-puppet/blob/production/modules/webperf/files/navtiming.py [22:52:55] The other files in that directory run on hafnium, streaming events to statsd live [22:53:19] where does it consume from? [22:53:26] ZMQ [22:53:33] we made a task for this on phab no? /me looks [22:53:46] Some of them use the eventlogging python library which is more abstracted [22:53:50] like this one https://github.com/wikimedia/operations-puppet/blob/production/modules/webperf/files/ve.py [22:54:04] Could that one be updated to use kafka so that the migration is transparent? [22:54:17] the API is quite easy to support I think [22:56:02] Krinkle: https://phabricator.wikimedia.org/T110903 [22:57:57] I don't see an actionable for me in that task at the moment. Is there a working example we can draw from? [22:58:06] Or some documentation? [23:00:59] Krinkle: from what I understand there - it seems like you can change the input url from tcp://..:8600 to the kafka:/// one [23:01:43] also the kafka urls can take a topic argument, so you can just consume the events for the topic you're looking for [23:03:57] Hm... not sure I follow [23:04:12] Which program takes the kafka url as input? [23:05:21] Analytics-EventLogging, Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1623002 (Krinkle) [23:07:34] I assume that url is not compatible with zmq [23:07:58] I also don't know what "schema based topics" are, or how I would subscribe to one with kafka. [23:10:13] Krinkle: okay, I don't know everything, but will try to explain what I know. Eventlogging consumers usually take an input url and an output one - the handlers.py code handles this. For example this is the one that reads from kafka - [23:10:13] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/91e7b0877d247f429e70928c8fd342ffd72bf350/server/eventlogging/handlers.py#L304 [23:11:48] OK. From what I know about EventLogging, none of our subscribers are EventLogging consumers. [23:12:19] They don't tie into EventLogging directly, rather they subscribe to an open topic-based broadcast (or in the case of the current zmq one, to everything). [23:12:34] zmq, redis, some kind of subscribable protocol. [23:15:29] alright - you would now have to consume from kafka instead. thinking about how that's possible [23:15:35] rather, how to do that [23:16:09] Analytics-Backlog: Doc cleanup day 2.0 - https://phabricator.wikimedia.org/T112024#1623045 (ggellerman) NEW [23:16:33] so varnishkafka consumers all requests I guess, and some are consumed by eventlogging for beacon/event. Then it presumably validates the schemas and then sends it on towards any eventlogging consumers. [23:16:43] as well as storage for querying [23:16:57] Krinkle: we use https://github.com/Parsely/pykafka to consume from kafka. would it make sense for you to use something like that? I am not sure if you can plug into the eventlogging handler somehow [23:17:05] the only piece of the puzzle I find is https://github.com/wikimedia/operations-puppet/blob/HEAD/modules/role/manifests/cache/kafka/eventlogging.pp [23:17:05] ori will know better maybe? [23:17:24] I have no idea where it goes from there or how I can subscribe to it. [23:18:10] Seeing how an existing zmq subscriber was converted in the past would help a lot [23:18:28] Or anything that's using eventlogging via kafka. [23:20:10] Krinkle: okay, I dont think we have anything right away, because the EL consumers dint have to be changed. We only needed a handler. I can do a test one today and give it to you [23:20:59] seeing a simple python or bash script that prints out a stream of eventlogging json data (for all or one topic) would be awesome. I can take it from there. [23:21:11] cool, let me write up something [23:21:14] Most of the wikitech documentation seems to be about the old way of doing things. Or maybe it hasnt' changed as much as I think. [23:21:36] yeah. we do have some diagram on the new architecture [23:21:38] * madhuvishy looks [23:21:45] * Krinkle likes diagrams :) [23:22:24] Krinkle: :) https://phabricator.wikimedia.org/T102225 [23:23:46] also, on the question about schema based topics - in this system - after an event is validated, it will go into a kafka topic specific to the schema - like all of NavTiming would be in Eventlogging_NavTiming [23:23:58] Hm.. I see. So EventLogging itself hasn't radically changed. This mostly revolves around the internals of using varnishkafka instead of varnishncsa+zmq? [23:24:15] That makes a lot more sense now [23:24:18] and all the pipes now kafka [23:24:24] instead of zmq streams [23:24:41] Right. Both for inputs (from varnish requests) and for output (kafka EL_ topics) [23:24:57] yeah [23:25:16] from the schema based topics, it gets written into hadoop, partitioned by topic [23:25:55] all of the events are also written into the "mixed" topic, from where it goes to mysql and files [23:26:18] Yeah [23:26:21] in your case, you can take advantage of the schema based ones to not have to filter the full stream [23:26:37] So those consumers of kafka's eventlogging* topics, we'd have our scripts become one of those [23:26:43] Yup [23:27:02] Im curious whether each of those scripts running somewhere in the data centre should subscribe directly to all kafka servers, or to have a pubsub system in between [23:27:26] Mostly with regards to duplication of information (e.g. the names of all kafka servers, which is more than one) and when those change. [23:27:34] Having some kind of abstraction for this in puppet would help. [23:28:05] but aside from the mysql and filesystem consumers, are there any other consumers right now that (in)directly end up sending data to statsd? [23:28:07] yeah, right now, your scripts are the only ones consuming apart from the regular EL ones. [23:28:28] and we din't even know those existed until you mentioned it that meeting :) [23:29:06] Krinkle: there's also hadoop now, but no, nothing that feeds to statsd [23:31:18] Hm.. okay. so yeah, it comes down to seeing an example of an eventlogging consumer (any). Depending on how tightly integrated the mysql or file consumer is, those could be good examples. [23:32:03] those are all in here https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/91e7b0877d247f429e70928c8fd342ffd72bf350/server/eventlogging/handlers.py#L162 [23:33:57] Interesting [23:36:10] I'm not sure I can figure out how to use this from an independent script on a separate server (e.g. hafnium or terbium) when subscribing to a topic to catch some events. [23:36:50] The list of kafka servers is in a puppet variable I imagine. [23:37:41] Krinkle: yeah, the pieces are defined here - https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/eventlogging.pp. I'll write an example for you [23:51:46] Krinkle: i din't know this before but [23:51:47] http://pastebin.com/VMzqfneP [23:51:56] https://github.com/wikimedia/operations-puppet/blob/production/modules/webperf/files/ve.py [23:52:03] it would be exactly like this [23:52:16] but the input url would be the one i put in there [23:52:21] aha, eventlogging.connect "speaks" kafka [23:52:25] yess [23:52:27] yay [23:52:41] sorry I wasn't sure that would happen before [23:52:46] I thought the eventlogging library available on the cluster in general only supported zmq [23:53:03] nope, i'm sure this is some ori magic. [23:53:17] since I generally use it as a drop-in replacement for code like this https://github.com/wikimedia/operations-puppet/blob/production/modules/webperf/files/navtiming.py#L12-L31 [23:53:19] which is almost the same [23:53:22] but used zmq directly [23:53:50] yeah, but you can use connect instead? [23:54:03] Indeed [23:54:05] * Krinkle tries [23:54:26] that huge url is somewhat uncanny though, but we can abstract that [23:54:32] let me also find the topic names for you, you would just have to change the topic param ther [23:54:42] you pass that as an arg anyway right? [23:54:48] Yeah [23:54:51] it wont be in this code [23:55:01] But I don't want to hardcode it. I'll have puppet substitute it or pass as cli param [23:56:15] hmmm, I thought that's how it was already, but ya that would be the way to go [23:56:54] Aye eventlogging is not available on terbium. [23:56:57] * Krinkle tries on tin instead [23:58:17] At the moment this is my standard template for new one-off subscribers when debugging things [23:58:18] https://gist.github.com/Krinkle/22e2101bff0b156276db