[07:25:57] (03CR) 10Filippo Giunchedi: Reset signal disposition and unblock signals for children (031 comment) [analytics/kafkatee] - 10https://gerrit.wikimedia.org/r/352591 (owner: 10Filippo Giunchedi) [07:26:03] (03PS2) 10Filippo Giunchedi: Reset signal disposition and unblock signals for children [analytics/kafkatee] - 10https://gerrit.wikimedia.org/r/352591 [07:46:53] morning! [07:47:24] What happened to varnishkafka on cp3035 was a side effect of the whole host having issues [08:41:23] (03CR) 10Joal: [V: 032] Update per host last access uniques oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352182 (https://phabricator.wikimedia.org/T164597) (owner: 10Joal) [08:43:49] Hi elukey - anything special about varnishkafka alarms, or BAU ? [08:46:25] nono host having general problems :) [08:46:29] vk caught in the middle [08:46:45] completely forgot to mention that I need to run errand for ~ 1 hour [08:46:56] brb! Will be reachable via hangouts or phone of course [08:47:00] Ah, k - Both 3035 and 4006? [08:47:08] np, latrer [08:47:22] didn't check 4006, will do later on! [08:59:23] (03CR) 10Joal: "Corrected :) Thanks @mforns" (0311 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) (owner: 10Joal) [09:00:40] (03PS2) 10Joal: Add last access uniques global oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) [09:03:54] (03PS1) 10Joal: Correct typos in last access uniques oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352765 (https://phabricator.wikimedia.org/T164597) [09:06:00] (03PS3) 10Joal: Add last access uniques global oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) [09:07:04] joal: repeated your results with march data for daylies, all looks good [09:07:20] https://usercontent.irccloud-cdn.com/file/IAoxV8pi/daily_unique_devices_variation.png [09:08:08] joal: let's talk about naming today and i will be working on docs [09:21:40] sure nuria_ [09:22:14] nuria_: Marcel reviews the patches and I updated them - Let's confirm naming, we should be able to merge and deploy tomorrow (maybe even this evening) [09:58:53] joal: confirmed that cp4006 is due to a OOM issue with Varnish [09:59:08] so both of them were BAU [09:59:09] k thanks elukey [09:59:19] (I love the nomenclature, will use it) [09:59:27] :D [09:59:42] nuria_: o/ [09:59:46] how's Spain?? [10:01:13] team: just merged a code review to automatically restart mirror maker on kafka hosts if it fails (atm it needs a manual restart). It is a horrible way to patch T157705 so I hope that we'll get a fix in the next Mirror Maker versions [10:01:14] T157705: Kafka mirror maker failures when kafka brokers are restarted - https://phabricator.wikimedia.org/T157705 [10:01:23] (or maybe there is a setting to tune) [10:03:23] nuria_: https://phabricator.wikimedia.org/T157088 is also very relevant to us (Replacing the current Job Queue implementation with ChangeProp+Eventbus) [10:10:26] mmm issues with 2017-05-08T18 - upload [10:11:45] ah no misc [10:12:19] the job that failed is 0020276-170424154741156-oozie-oozi-W [10:13:00] !log re-run manually 2017-05-08T18 for misc due to job errors (failed oozie id 0020276-170424154741156-oozie-oozi-W) [10:13:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:15:26] from the logs I can see that it failed for "wait_timeout" expired [10:15:32] maybe transient? [10:23:13] 06Analytics-Kanban: Load pivot pageview-hourly dataset every hour - https://phabricator.wikimedia.org/T164730#3243337 (10Qse24h) [10:28:33] elukey: seems that needs a kafka tier 1 tier 1 [10:30:56] tier1+1 [10:30:57] :D [10:31:29] elukey: now is tier-1ish [10:32:14] we have a good documentation now but it might be good to send it over the ops list [10:32:30] maybe I'll ask to the Kafka master ottomata to do a ops session about it [11:15:04] 10Quarry: Add some type of folders/categories/sorting to user page - https://phabricator.wikimedia.org/T164825#3247657 (10Dvorapa) [11:16:55] 10Quarry: Add some type of folders/categories/sorting to user page - https://phabricator.wikimedia.org/T164825#3247646 (10Dvorapa) [11:18:34] 10Quarry: Add some type of folders/categories/sorting to user page - https://phabricator.wikimedia.org/T164825#3247661 (10Dvorapa) [11:18:49] 10Quarry: Add some type of folders/categories/sorting to user page - https://phabricator.wikimedia.org/T164825#3247646 (10Dvorapa) [11:44:38] 10Analytics-Tech-community-metrics, 10Gerrit: Numerous Gerrit (draft) patchsets cannot be accessed: "Cannot display change because it has no revisions." - https://phabricator.wikimedia.org/T161207#3247722 (10Aklapper) 05Open>03Invalid Alright, thanks everybody. Current behavior looks intentional (as draft... [11:59:40] 10Analytics: Preserve userAgent field in apps schemas - https://phabricator.wikimedia.org/T164125#3247752 (10elukey) Trying to summarize what needs to be taken into account for the script (be patient :) We'd need to find a way to have a more granular deletion policy for fields like the User Agent since it is a... [12:31:10] milimetric: hola!, yt? [12:35:51] (03CR) 10Nuria: "Don't we need to include in this changeset also the deletion of the daily job as we no longer need it?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352784 (https://phabricator.wikimedia.org/T164730) (owner: 10Joal) [12:42:27] * elukey (late) lunch! [12:42:49] * elukey blames fdans, I am now following his lunch timings :P [12:43:02] * elukey sends wikilove to fdans [12:52:22] (03CR) 10Nuria: [V: 032 C: 032] "We can merge this one correct?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352765 (https://phabricator.wikimedia.org/T164597) (owner: 10Joal) [13:17:57] 10Analytics, 06Analytics-Kanban: Create tagging udf - https://phabricator.wikimedia.org/T164021#3247905 (10Nuria) a:03Nuria [13:36:27] nuria_: hey [13:36:33] milimetric: hola! [13:36:39] (03CR) 10Mforns: [C: 032] "LGTM! Thanks for the nit-picky changes, please merge if tested!" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) (owner: 10Joal) [13:36:57] milimetric: did we deploy dashiki extension/ [13:37:08] i was wondering ... [13:37:22] no, not yet [13:37:40] I can do that now actually [13:37:46] milimetric: okeis, sure .... [13:41:31] milimetric: thanks for taking care of this [13:41:49] don't thank me yet :) let's see how it goes this time [13:42:02] for taking the heat, juas! [13:42:16] (03CR) 10Joal: "I think we still need the daily job for data compaction - But it definitely can be discussed." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352784 (https://phabricator.wikimedia.org/T164730) (owner: 10Joal) [13:49:50] joal: ah, sorry, i think i am missing something. is teh compaction another step on the daily job? [13:50:24] A-Team, I'm still on my way back home from my last driving lesson before my exam, so I probs won't make it to standup [13:50:52] elukey: you're following a path of wisdom <3 [13:51:08] nuria_: in druid segment compaction is not an explicit step [13:51:13] (03CR) 10Milimetric: [V: 032 C: 032] Update mediawiki history oozie job SLA [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352548 (https://phabricator.wikimedia.org/T164713) (owner: 10Joal) [13:51:26] nuria_: It depends on how much data you bundle together [13:51:42] fdans: ok, let us know if you can join for staff [13:52:07] nuria_: will give more details post-standup [13:52:15] joal: ok, sounds great [13:52:20] nuria_: yep! I'll probably arrive mid-standup [13:54:28] (03CR) 10Milimetric: [C: 031] "lgtm pending discussion going on" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352784 (https://phabricator.wikimedia.org/T164730) (owner: 10Joal) [13:58:46] (03PS7) 10Mforns: Update banner monthly job to reuse index [analytics/refinery] - 10https://gerrit.wikimedia.org/r/347653 (https://phabricator.wikimedia.org/T159727) (owner: 10Joal) [14:01:32] joal: stadduppp [14:06:54] 10Analytics, 06Labs, 10Pageviews-API, 10wikitech.wikimedia.org: wikitech.wikimedia.org missing from pageviews API - https://phabricator.wikimedia.org/T153821#3248122 (10Milimetric) [14:08:48] 06Analytics-Kanban, 13Patch-For-Review: Refactor monthly banner oozie job to use already indexed daily data - https://phabricator.wikimedia.org/T159727#3248128 (10mforns) [14:08:50] 06Analytics-Kanban, 13Patch-For-Review: Agreggate banner dataset for long term retention - https://phabricator.wikimedia.org/T157582#3248130 (10mforns) [14:08:57] You didn't miss much Luca ;) [14:09:32] 06Analytics-Kanban, 13Patch-For-Review: Refactor monthly banner oozie job to use already indexed daily data - https://phabricator.wikimedia.org/T159727#3076363 (10mforns) I merged both tasks, because we decided to merge both jobs. This is the task that will prevail. Cheers! [14:24:24] 10Analytics, 10ChangeProp, 10EventBus, 06Services (later), 15User-mobrovac: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088#2995179 (10mobrovac) Correct. We have a yearly goal for next FY to move the JobQueue onto EventBus/ChangeProp. [14:50:12] joal: https://phabricator.wikimedia.org/T164608 - cache maps is going to upload [14:56:25] elukey: This is awesome news :) [14:57:45] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1003 replacement - https://phabricator.wikimedia.org/T159839#3248307 (10Cmjohnson) [14:58:39] joal: what do we need to do as prep step? [14:58:51] except upgrading the refinery [14:59:04] (wondering if we need to) [14:59:13] (camus probably will need an update) [15:00:25] elukey: something's broken with salt on analytics1064 and 1068, they don't react to test.ping even, e.g. "salt analytics106* test.ping" [15:01:31] those are new nodes, maybe they don't have the salt-key accepted, will check! [15:03:31] fdans: coming to meeting? [15:03:37] fdans: we can see you in etherpad [15:10:07] moritzm: sudo -i salt 'analytics106*' test.ping works for me now (restarted the minions and accepted some keys) [15:11:51] 1064/1068 confirmed fixed [15:11:59] 1003 seems to have the same problem, was it recently reimaged? [15:15:35] yep! [15:16:06] done! [15:34:19] elukey: frozen ? [15:34:29] joal: I was about to say the same :O [15:34:33] huhu [15:35:00] so many good bike rental places! http://www.bikoadventures.com/ [15:35:15] elukey: Back in, can you hear? [15:35:27] looks in on your side [15:43:01] 10Analytics, 06DC-Ops, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decom/Reclaim analytics1027 - https://phabricator.wikimedia.org/T161597#3248535 (10Cmjohnson) [15:49:35] mforns: say you have something like (in EL whitelist) [15:49:49] TableName1 field[blabla] [15:49:53] TableName1 field [15:49:54] yep [15:50:18] I'd be super conservative and just throw a parser error [15:50:26] there is almost surely a problem in there [15:50:50] mmmm [15:51:15] yes, I agree it would be better [15:51:35] ok I'll come up with a list of tests for the parser [15:51:41] to avoid PEBKAC [15:51:42] :D [15:52:06] nuria_: https://meta.wikimedia.org/wiki/Config:Dashiki:ReportCard [15:52:07] because the time that takes from +2 in puppet and a complete wipe is sooooo tiny :D [15:52:29] milimetric: TEARSSSS [15:53:01] elukey, you're right :] [15:53:18] :) [15:53:29] milimetric, \\\o/// [15:54:25] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3248609 (10faidon) [15:55:44] joal: I am going to shutdown some hadoop workers to let Chris to apply the thermal paste [15:55:54] sure [15:57:58] mforns / nuria_: in case you ever want to test Dashiki changes without messing with the working pages: https://meta.wikimedia.org/wiki/Config:Dashiki:Test [15:58:57] milimetric, nuria_, vital signs has annotation on pageviews: https://analytics.wikimedia.org/dashboards/reportcard/#pageviews-july-2015-now [15:59:08] awesome [15:59:25] milimetric, the testing config helps thx! [15:59:50] is it just me or does everyone else love how those look? :) [15:59:58] * milimetric pats himself really hard on the back [16:01:54] stopped yarn on 1032,1033,1040 [16:32:34] hello Analytics. Are we going to do the geekout chillout? [16:33:15] I know Aaron is in Tech Mgmt meeting but I'm around if we want to jump in the Hangout. your call. (And I see that ottomata is not around, milimetric.) [16:33:35] lzia: I'm double booked too [16:33:55] Heya [16:33:59] CAn be there lzia [16:34:13] helloo joal. i'd say with the boss not being around (otto) we should cancel it? [16:34:14] :D [16:34:23] feasible :) [16:34:28] No boss, no meeting ;) [16:34:33] ok, deleting the event now. [16:34:54] ow, it didn't let me delete it. It just let me remove it from my calendar. ow well. [16:36:22] joal: while I have you. WWW conference (now rebranded as the web conference) is in Lyon next year. You should attend. https://www2018.thewebconf.org/ [16:36:43] We can organize the Wiki Workshop together there with Bob, but we can't get closer than this to you. ;) [16:36:55] huhuh :) [16:37:07] Seems very feasible ! [16:37:26] yeah, if Nuria is not listening, I'd say even schedule your offsite around that time. :P [16:37:34] hehe [16:37:56] Lyon is awesome for various reasons, among which food and restaurants - let's plan on doing that :D [16:38:38] yeah. people are so excited about the food. They had the conference once in Lyon in 2012 and people really liked the food. [16:39:11] indeed lzia, I'm not surprised much [16:39:16] also, if you want to be more involved with organizing the conference in 2018, let me know. I know the Lyon team and they're pretty awesome. [16:39:29] (that is, in case if you have a lot of extra time. :D) [16:41:27] lzia: Our second child is planned to arrive in the next 2 month - I won't have anything closer to free-time for at least a year :) [16:42:01] riiiiight. I had forgotten that joal. ok. then let's just plan the Wiki Workshop there, this time with Analytics presence. \o/ [16:42:09] :) [17:03:20] nuria_: interestingly enough, my findings would be that the errors could be in per-domain uniques, not global one [17:09:24] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): On the "Git" dashboard, filtering on one organization still lists authors who are with another organization - https://phabricator.wikimedia.org/T157709#3248777 (10Albertinisg) Hey @Aklapper, I've been digging into this and I found what's... [17:26:44] nuria_: I ahve funny results :) [17:30:33] need to care Lino for a minute - will be back [17:45:20] * elukey off! [17:45:29] 1032,1033,1040 are still down [18:13:47] 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 10ops-eqiad, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3249024 (10Cmjohnson) analytics[1032-1033,1040].eqiad.wmnet have had the thermal paste replaced. One observation on all 3 is that... [19:05:13] 06Analytics-Kanban, 13Patch-For-Review: Add "Damn Small XSS Scanner" (DSXS) to list of known bots - https://phabricator.wikimedia.org/T157528#3249194 (10Tbayer) PS, to record this here as a small footnote: I also double-checked and confirmed that these undetected bot requests happened on desktop only. ```la... [19:43:00] (03PS8) 10Mforns: Update banner monthly job to reuse index [analytics/refinery] - 10https://gerrit.wikimedia.org/r/347653 (https://phabricator.wikimedia.org/T159727) (owner: 10Joal) [19:43:12] joal, y still t? [19:51:53] Hi mforns [19:51:57] here for a minute [19:52:02] hello joal [19:52:04] what's up? [19:52:17] oh, ok, I tested the banner job [19:52:24] restoring 1032,33,40 worker nodes! [19:52:34] Yay elukey, thanks :) [19:52:42] and the data looks good, but there's one thing that worried me [19:52:59] the metrics are displayed in pivot in different order [19:53:31] original data set is: 1) Request Count 2) Normalized Request Count [19:53:51] monthly test data set is: 1) Normalized Request Count 2) Request Count [19:54:16] although the json template specifies the first order [19:54:38] ""Error: Memory initialization warning detected."" [19:54:53] we are definitely not lucky recently with hardware [19:54:57] My guess is that this will be fine, because metrics have a name and pivot is wise [19:54:58] (this is 1040) [19:55:05] but just confirming with you joal [19:55:15] O.o [19:55:28] mforns: We should be able to select the default metric in pivot config [19:56:33] joal, yes, my concern was more with the pivot potentially getting confused with monthly job inputing metrics in a different order [19:57:18] mforns: if data looks good, I can't imagine how it could be an issue [19:57:33] joal, OK [19:58:19] joal, the other thing is... HUE delay missing files, see: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0021572-170424154741156-oozie-oozi-C/ [19:59:01] The job for April says: Missing hdfs://analytics-hadoop/user/mforns/banner_activity_directory/daily/year=2017/month=6/day=30/_SUCCESS [19:59:25] which seems correct, because that day should be the last one to have the _SUCCESS file [19:59:28] but [19:59:44] the job for march says: Missing hdfs://analytics-hadoop/user/mforns/banner_activity_directory/daily/year=2017/month=6/day=2/_SUCCESS [20:00:03] which is looking 2 days too far in the future I think [20:00:15] mforns: looks incorrect to me: month=6 for both? [20:00:42] joal, yea, the second one looks at day 2, but still [20:00:57] it should be month=5 day=31 [20:01:23] now, I did some maths looking at the coordinator code and I can not see how this is possible... [20:02:54] mforns: coord:current(coord:daysInMonth(0) + coord:daysInMonth(1) + coord:daysInMonth(2)) [20:03:06] yep [20:03:16] I added a -1 in the last change [20:03:35] mforns: the conf running is not having it [20:04:30] joal, yes, I didn't rerun the job, because, even if I think that the correct delay has a -1, the -1 wouldn't fix the problem [20:06:12] 1040 stays down for today :D [20:07:21] mforns: it makes sense then :) [20:07:52] mwarf elukey [20:08:42] mforns: right? [20:09:06] all right 1032/33 are working now [20:09:17] Chris found out that on one CPU there was basically no thermal paste [20:09:23] joal, mmmmm [20:09:44] I can rerun the job with the -1, but theoretically, it won't do what I expect [20:10:55] mforns: with -1 it'll wait until month=6/day=1 :) [20:11:23] joal, yes, but it should wait until month=5/day=31 no? [20:14:21] 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 10ops-eqiad, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3249434 (10elukey) @Cmjohnson thanks! analytics1040 shows up memory errors on boot, I wasn't able to power it on.. Do you mind to c... [20:14:34] * elukey off! (again) [20:15:24] byeeeee elukey [20:15:26] It's indeed weird mforns :/ [20:15:31] bye elukey [20:15:52] ok joal thanks! I won't keep you here any more :] [20:19:18] sorry mforns, my brain is not functionning anymore [20:19:34] np joal :] [20:19:45] will continue trying [20:21:04] mforns: you could use the coord:dateOffset function? [20:21:12] But really it's weird [20:21:25] joal, sure, let me look it up [20:22:20] oh, yes, this looks like it will help :] [20:28:57] 10Analytics, 06DC-Ops, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decom/Reclaim analytics1027 - https://phabricator.wikimedia.org/T161597#3136552 (10Dzahn) @elukey system is still in Icinga, causing alerts, has the puppet/salt part been done ? decom tasks should ideally have the full check list fr... [20:29:52] (03PS9) 10Mforns: Update banner monthly job to reuse index [analytics/refinery] - 10https://gerrit.wikimedia.org/r/347653 (https://phabricator.wikimedia.org/T159727) (owner: 10Joal) [20:39:32] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): On the "Git" dashboard, filtering on one organization still lists authors who are with another organization - https://phabricator.wikimedia.org/T157709#3249599 (10Albertinisg) User is now merged and index up to date: https://wikimedia.bi... [20:40:02] (03PS2) 10Nuria: Release of analytics.wikimedia.org [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/352075 [20:55:20] (03PS3) 10Nuria: Release of analytics.wikimedia.org [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/352075 [20:55:39] (03PS4) 10Nuria: Release of analytics.wikimedia.org [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/352075 [20:59:47] (03CR) 10Nuria: [V: 032 C: 032] "Self merging release" [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/352075 (owner: 10Nuria) [21:04:58] 06Analytics-Kanban: Finalize list of metrics, breakdowns, and filters for Wikistats 2.0 backend - https://phabricator.wikimedia.org/T163356#3249702 (10Nuria) 05Open>03Resolved [21:05:00] 06Analytics-Kanban, 10Analytics-Wikistats: Backend for wikistats 2.0 - https://phabricator.wikimedia.org/T156384#3249703 (10Nuria) [21:05:19] 06Analytics-Kanban: cannot edit dashiki's production configuration - https://phabricator.wikimedia.org/T162465#3249705 (10Nuria) 05Open>03Resolved [21:13:38] 06Analytics-Kanban: Wikistats 2.0. - https://phabricator.wikimedia.org/T130256#3249750 (10Nuria) @Neil_P._Quinn_WMF : We reworked our annotations to be friendlier, they are visible now on reportcard, please take a look: https://analytics.wikimedia.org/dashboards/reportcard/#pageviews-july-2015-now Annotations... [22:41:45] 10Analytics, 06DC-Ops, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decom/Reclaim analytics1027 - https://phabricator.wikimedia.org/T161597#3250062 (10Cmjohnson) @dzahn, I jumped on this too early. None of those things were done prior to me doing my pet. Its now iwiped and all switch ports are disabled. [23:27:49] 10Analytics, 06DC-Ops, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decom/Reclaim analytics1027 - https://phabricator.wikimedia.org/T161597#3250139 (10Dzahn) 05Open>03Resolved removed from site.pp / puppet / salt / icinga ^. that should be all now. i don't see it anywhere else.