[00:02:30] CindyCicaleseWMF: ah no , sorry, no need to e-mail , if question comes up in any pm forum which you might attend e-mail or otherwise [00:02:45] CindyCicaleseWMF: i try to sync up with all pms couple times a quarter [00:03:03] Gotcha! Thanks again for all of the help! Much appreciated! [00:03:33] And, I will definitely recommend this to all :-D [00:04:24] CindyCicaleseWMF: jaja, good, it is used quite abit [00:04:51] CindyCicaleseWMF: here is browser data (same layout) https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os [00:05:07] CindyCicaleseWMF: this one is used quite a bit by world at large [00:11:09] Yes, I used that one as a guide when setting up the pingback config. Verynice. [00:11:53] 10Analytics, 10Collaboration-Team-Triage, 10EventBus, 10StructuredDiscussions: Flow does not emit recent change event when regular user creates topic - https://phabricator.wikimedia.org/T187861#4029848 (10Krinkle) [01:19:29] aw [01:35:49] 10Analytics, 10Analytics-Dashiki: Make it possible to suppress the box in the bottom left of dygraphs-timeseries graphs - https://phabricator.wikimedia.org/T189069#4030449 (10CCicalese_WMF) [01:37:38] 10Analytics, 10Analytics-Dashiki: Make the header of table-timeseries tables fixed when vertically scrolling the table - https://phabricator.wikimedia.org/T189070#4030475 (10CCicalese_WMF) [02:31:38] I would like to annotate the pingback graphs with MediaWiki release dates. I tried to follow the instructions at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Dashiki#Configuring_annotations to create page Config:Annotations:MediaWikiReleases, but I got an error saying I was not authorized to create the page. I tried to use Config:Dashiki:... instead, since there don't seem to be any [02:31:38] Config:Annotations:... pages on meta, but the annotations don't appear. [02:34:27] I'd like something like https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-browser. [05:51:34] 10Analytics, 10Operations, 10Traffic: Investigate and fix odd uri_host values - https://phabricator.wikimedia.org/T188804#4020291 (10Nuria) Ok, was about to say same, this is basically reproducible with a curl in which you override host header. [07:15:56] !log manually re-run wikidata-articleplaceholder_metrics-wf-2018-3-6 [07:16:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:24:25] 10Analytics, 10EventBus, 10Services (doing), 10User-Elukey: Kafka sometimes misses to rebalance topics properly - https://phabricator.wikimedia.org/T179684#4031298 (10elukey) From kafkfa1002 (that acted as Kafka group coordinator broker afaics) I can see the following logs for `change-prop-RecordLintJob`:... [10:13:33] rebooting analytics1035 (one of the hdfs journal nodes) [10:40:09] PROBLEM - Hadoop NodeManager on analytics1052 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [10:40:45] forgot to downtime --^ [10:52:56] rebooting also stat100[5,6] too [10:53:04] as announced on the mailing lists [10:54:32] 10Analytics-Kanban, 10User-Elukey: Reboot all Analytics hosts for Kernel upgrade - https://phabricator.wikimedia.org/T188594#4031487 (10elukey) [10:59:19] RECOVERY - Hadoop NodeManager on analytics1052 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [11:02:40] all right all hadoop worker nodes completed [11:02:43] + stat boxes [11:02:50] the journal edit log seems good [11:05:24] 10Analytics, 10User-Elukey: Expand the Hadoop Journal nodes from 3 to 5 to improve resiliency - https://phabricator.wikimedia.org/T189105#4031525 (10elukey) [11:05:39] this is something that I'd really love to do next quarter --^ [11:34:02] * elukey lunch + errand! (back in ~2h) [13:14:23] there's a disk space warning for analytics1036, BTW: "DISK WARNING - free space: /var/lib/hadoop/data/i 21 GB (0% inode=99%):" [13:40:50] moritzm: ack, thanks a lot! [13:49:59] (03PS1) 10Cicalese: Add unique wiki count and fix memory limit labels. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/416940 [14:32:57] milimetric: nuria_: I got annotations to work. It turns out that there was an error in the documentation at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Dashiki#Configuring_annotations that I fixed. The annotations pages need to be in the main namespace not the Config: namespace. But I had already created an annotation file in the Config: namespace on meta that I cannot delete. [14:33:31] CindyCicaleseWMF: I was just looking at that! [14:34:11] ok, cool, I thought there was an error in the docs, but when I looked at it I saw your fix and I was just examining history to make sure I wasn't crazy [14:34:12] milimetric: beat you to it ;-) [14:34:21] k, good, I'll look at your patch [14:34:28] thanks! [14:35:53] btw CindyCicaleseWMF: for this dashboard I used to parse git log and push out annotations automatically when there was a VisualEditor-related change: https://edit-analysis.wmflabs.org/compare/ [14:35:59] but that stopped working sometime in 2016 [14:36:28] I think the git branching might have changed around then, I never looked at it, but the point being, it's possible to publish that type of data as a metric and then consume it as annotations [14:36:45] (in case you get tired of manually maintaining that kind of thing) [14:37:33] very cool! [14:38:00] elukey: hiii [14:38:33] i see some data node partitions on an70 :) [14:39:47] milimetric: It would be great to have that done automatically. I will look at that to see if that will work. [14:41:22] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review: Record and aggregate page previews - https://phabricator.wikimedia.org/T186728#4032052 (10Ottomata) Ah, I just realized that `agent_type` can be obtained from the EventLogging `userAgent['is_bot']` field.... [14:41:36] ottomata: helloooo [14:41:42] I am doing 71 now :) [14:41:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Bring back raw user_agent in EventLogging data so we can do further processing in Hadoop - https://phabricator.wikimedia.org/T188673#4032060 (10Ottomata) 05Open>03declined [14:41:56] COOOl [14:41:58] milimetric: we don't make releases that often, but updating the annotations would be easy to forget to do [14:42:02] we need to make sure the memory settings make sense [14:42:28] ottomata: if you want to triple check hadoop things I'll keep going and prep partitions [14:42:57] ok great, on it [14:42:59] thanks [14:45:46] 10Analytics-Kanban, 10Patch-For-Review: Remove sensitive fields from whitelist for QuickSurvey schemas (end of Q2) - https://phabricator.wikimedia.org/T174386#4032080 (10fdans) @leila the fields to be nullified are the ones detailed in the task: QuickSurveyInitiation: userAgent QuickSurveysResponses: userAgen... [14:52:10] (03CR) 10Milimetric: Add unique wiki count and fix memory limit labels. (032 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/416940 (owner: 10Cicalese) [14:52:51] (03CR) 10Milimetric: [V: 032 C: 032] Ground sqoop output to DEVNULL [analytics/refinery] - 10https://gerrit.wikimedia.org/r/415909 (owner: 10Milimetric) [14:55:30] yeah hm elukey need to adjust some hiera, trying to find a smart way to do it other than just overriding in hostname regex hiera [14:57:06] ack [14:57:12] let me know if you want to brain bounce [14:58:01] i guess we could make a new role [14:58:19] worker_large [14:58:20] dunno [15:02:18] * elukey is now in the position of rejecting names that Andrew comes up with, muhahahahaha [15:02:47] :) [15:04:12] hahah [15:04:19] yar, this would actually be much easier on newer version of facter [15:04:30] rigiht now i think there's no way to get total memory [15:04:34] in puppet [15:04:38] only memory free, etc. [15:04:55] ah so we can't come up with a formula to use [15:05:00] we need to hardcode values [15:07:00] (03CR) 10Cicalese: Add unique wiki count and fix memory limit labels. (032 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/416940 (owner: 10Cicalese) [15:07:40] I am reading the task and Rob mentioned to not touch an1076, but an1077 is down [15:07:48] so I am wondering if he meant 77 not 76 [15:08:11] aye ya [15:08:18] hmm, i think i am finding a way in puppet elukey... [15:09:15] but ya we'll need to hardcode the memory for a node somewhere. i think i don't mind doing that in a hiera hostname regex [15:09:23] and then will put some calculations into puppet profile based on that [15:09:44] makes sense [15:10:03] https://gerrit.wikimedia.org/r/#/c/416736/1/templates/10.in-addr.arpa - analytics.eqiad.wmnet? [15:11:28] mmm but an1076 dns works now [15:11:30] (A and PTR) [15:11:41] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0: Add statistics for the geographical origin of the contributors - https://phabricator.wikimedia.org/T188859#4032165 (10Milimetric) > @Milimetric, can you confirm that if aggregate editor location data passes our privacy standards, it would go in Wikistat... [15:11:58] https://gerrit.wikimedia.org/r/#/c/416854/1/templates/10.in-addr.arpa [15:12:02] 10Analytics-Kanban: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280#4032168 (10Milimetric) [15:12:04] 10Analytics, 10Analytics-Dashiki: Optionally do not sort columns in table-timeseries alphabetically - https://phabricator.wikimedia.org/T189125#4032172 (10CCicalese_WMF) [15:12:06] ok it is definitely 1076 the one to not touch [15:12:09] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0: Add statistics for the geographical origin of the contributors - https://phabricator.wikimedia.org/T188859#4032170 (10Milimetric) [15:12:12] checking what's up with 1077 then [15:16:30] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0.: display the mean value for Total Page Views - https://phabricator.wikimedia.org/T188552#4032197 (10Milimetric) >>! In T188552#4021929, @Pamputt wrote: > Hmm, actually I misunderstood. I was thinking it displays the total of the whole displayed value. F... [15:18:48] weird, I had to use install-console and force a puppet run [15:21:22] ah it seems that it doesn't take the spare::role [15:23:28] node /analytics107[0-7]\.eqiad.wmnet/ { role(spare::system) [15:23:29] } [15:23:42] is there something weird that I don't see? [15:24:54] oh huh, [15:25:00] looks like it should match [15:25:01] hm [15:25:28] 10Analytics, 10Analytics-EventLogging, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking): Make it easier to enable EventLogging's debug mode - https://phabricator.wikimedia.org/T188640#4032222 (10Milimetric) Thanks so much @Krinkle and @phuedx, very clear now. We'll triage accordingly, but as p... [15:30:24] now it is ok [15:30:24] analytics1077 is a Unused spare system (spare::system) [15:30:26] mmmm [15:31:47] mistery [15:31:53] ok going to add partitions [15:32:56] all right all hosts but 1076 are ready ottomata [15:34:54] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4032261 (10elukey) Added the hadoop partitions to all the nodes but 1076, waiting for Rob's green light before proceeding. I had to run puppe... [15:34:54] ok cool [15:34:55] great [15:35:15] 10Analytics, 10Analytics-Dashiki: Optionally do not sort columns in table-timeseries alphabetically - https://phabricator.wikimedia.org/T189125#4032262 (10CCicalese_WMF) [15:39:00] (03CR) 10Cicalese: Add unique wiki count and fix memory limit labels. (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/416940 (owner: 10Cicalese) [15:59:24] elukey: https://gerrit.wikimedia.org/r/#/c/416965/ [16:04:16] oh weird, I can't use puppet compiler on a spare::system? [16:04:19] ? [16:04:28] https://puppet-compiler.wmflabs.org/compiler03/10321/analytics1070.eqiad.wmnet/ [16:09:45] sorry just seen it :( [16:09:53] I am updating pcc, should be ready in 2 mins [16:15:52] done! [16:16:55] running pcc now [16:17:41] https://puppet-compiler.wmflabs.org/compiler02/10324/analytics1070.eqiad.wmnet/ [16:22:09] (03PS1) 10Milimetric: Add interlanguage script [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/416976 [16:46:05] milimetric: I need to brainbounce again :( sorry [16:47:05] fdans: what? I love brainbouncing, omw [16:47:18] fdans: bc-2 [16:57:00] (03PS23) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [16:59:30] (03CR) 10jerkins-bot: [V: 04-1] Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [17:04:57] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Alarm when /mnt/hdfs is mounted but showing no files/dirs in there - https://phabricator.wikimedia.org/T187073#4032511 (10elukey) a:03elukey [17:11:06] (03PS24) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [17:27:54] 10Analytics: More granular permits for HDFS user when it comes to data access - https://phabricator.wikimedia.org/T189135#4032560 (10Nuria) [17:36:31] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4032605 (10Pchelolo) p:05Triage>03High [17:37:00] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4032619 (10RobH) [17:37:37] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4002756 (10RobH) a:05RobH>03elukey Analytics1076 is ready to go as well now! You can resolve or close this task as you need/want to track implementation. [17:40:29] 10Analytics: More granular permits for HDFS user when it comes to data access - https://phabricator.wikimedia.org/T189135#4032631 (10Nuria) sudo -u hdfs ask when deletions are about to happen to include a confirmation step Maybe a hdfs-litle user that just allows you to read? Maybe aliasing hdfs rm to hdfs rm... [17:46:08] Yay elukey !! We-re soon to have something, rihgtr? [17:46:24] oops - sorry - was reading other things :) [17:47:20] CindyCicaleseWMF: thanks for correcting docs! [17:47:32] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0.: display the mean value for Total Page Views - https://phabricator.wikimedia.org/T188552#4032656 (10Pamputt) @Milimetric oh! Thanks for the explanation. I did not pay attention to this number underneath the graph. Indeed, this is the number I was lookin... [17:48:10] nuria_: you're welcome! thanks for the awesome tools! [17:48:11] * milimetric is proud of figuring out misunderstandings ^ [17:48:12] :) [17:48:40] (I was pointing to that phab comment, not cindy's) [17:49:42] joal: all the new worker nodes are ready to get stuff deployed, but we'll take it slowly to test config + stretch [17:51:42] elukey: ok, please let us know [17:51:47] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4032665 (10elukey) Thanks Rob! Just created the hadoop partitions on 76 as well. Since those are Stretch nodes we are going to slowly put them in production... [17:59:47] joal, nuria: new paper by Taha et al using clickstream data, presented yesterday https://arxiv.org/abs/1710.03326 [18:06:23] Hi DarTar - This is super great !! --^ [18:07:16] ciao DarTar! [18:08:32] Gone for diner team, will be back soon [18:15:45] (03PS1) 10Fdans: [wip]Responsive site [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/416999 [18:15:52] (03CR) 10jerkins-bot: [V: 04-1] [wip]Responsive site [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/416999 (owner: 10Fdans) [18:21:26] (03PS2) 10Fdans: [wip]Responsive site [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/416999 [18:21:34] (03CR) 10jerkins-bot: [V: 04-1] [wip]Responsive site [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/416999 (owner: 10Fdans) [18:23:44] elukey: am mergeing the hadoop node patch [18:24:18] ottomata: ack [18:34:51] ottomata: need any help? Otherwise I am going off :) [18:36:20] naw its cool, got it! [18:36:21] just weird package issues [18:36:34] well not so weird, we had some backport overrides for jessie [18:36:35] etc. [18:37:16] super! [18:37:21] * elukey off! [18:39:02] PROBLEM - Disk space on Hadoop worker on analytics1070 is CRITICAL: NRPE: Command check_disk_space_hadoop_worker not defined [18:52:02] RECOVERY - Disk space on Hadoop worker on analytics1070 is OK: DISK OK [19:03:13] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4033006 (10Ottomata) analytics1070 is in Hadoop ready for biznaass [19:04:15] ottomata: --^ already loaded man !!!! [19:04:41] ottomata: 47 containers, 104Gb RAM used [19:04:47] woow nice [19:19:48] ottomata: those new machines are monsters :) [19:20:57] 2X! [19:21:03] Indeed ! [19:21:07] well, the compute [19:21:09] same disk space [19:21:12] right [19:21:28] ottomata: I think I'm gonna be wisdhing to try presto :) [19:22:43] :) [19:42:39] (03PS3) 10Fdans: [wip]Responsive site [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/416999 [20:01:04] 10Analytics-Kanban: Refresh SWAP notebook hardware - https://phabricator.wikimedia.org/T183145#3844739 (10Ottomata) a:03Ottomata [20:01:20] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install notebook100[34] - https://phabricator.wikimedia.org/T183935#4033211 (10Ottomata) [20:14:34] (03PS4) 10Fdans: [wip]Responsive site [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/416999 [20:27:44] 10Analytics, 10Analytics-Dashiki: publish mediawiki deployments as a metric tsv - https://phabricator.wikimedia.org/T189156#4033271 (10Milimetric) [20:29:05] (03CR) 10Milimetric: [V: 032 C: 032] Add unique wiki count and fix memory limit labels. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/416940 (owner: 10Cicalese) [20:44:21] 10Analytics-Kanban, 10Patch-For-Review: Remove sensitive fields from whitelist for QuickSurvey schemas (end of Q2) - https://phabricator.wikimedia.org/T174386#4033321 (10leila) @fdans got it. If I remember correctly this was the initial list and when @mforns and I talked a few weeks ago we arrived at a differe... [20:51:40] fdans: yt? [20:51:48] 10Analytics, 10EventBus, 10Wikidata, 10Services (watching), 10TechCom-RFC (TechCom-Approved): RFC: Requirements for change propagation - https://phabricator.wikimedia.org/T102476#4033335 (10daniel) 05Open>03Resolved a:03daniel Closing this as Resolved: the RFC provided the guidance needed to implem... [21:05:44] nuria_: hellooo [21:06:02] fdans: let's talk about perf in a bit [21:06:36] sounds good [21:12:32] ottomata: ullo [21:12:56] ottomata: I have found a super weird bug in spark [21:13:36] A bug that involves the spark-shuffle-service [21:14:06] And I wonder if using spark2 with a shuffle-service of spark1.6 could not be the hting [21:17:53] joal: why would spark2 be using spark 1.6 shuffle service? [21:18:28] ottomata: am I wrong in assuming we only have a single shuffle service? [21:18:40] ottomata: embeded in yarn and all? [21:19:02] joal: i don't think there are any special spark daemons [21:19:31] ottomata: for spark dynamic allocation to work, nodemanagers have an auxiliary spark service called ExternalShuffleService [21:20:03] ottomata: Those are started as children processes of nodemanagers [21:20:16] aux spark service? started by the spark job then? [21:21:21] ottomata: grep -C 3 spark /etc/hadoop/conf/yarn-site.xml [21:24:27] huh! didn't realize or totally forgot abou thtat [21:24:38] interesting, so that is running in the nodemanager [21:24:47] correct [21:24:54] to be precise, as a child of the nodemanager [21:25:13] I looked at the code, it seems very similar - But still there might small differences [21:25:14] 10Analytics, 10Analytics-Dashiki: Add annotationsMetric option to tabs layout - https://phabricator.wikimedia.org/T189159#4033434 (10Milimetric) [21:30:07] hm, is it necessary to work in yarn then? [21:30:14] hm [21:30:27] ottomata: It is indeed [21:30:54] ottomata: this service is the one handling the shuffled-files between stages when executors can die [21:31:27] ahh for dynamic [21:31:27] hm [21:34:15] fdans: there there? [21:35:10] joal: i wonder if it i possible to run both shuffle services [21:35:34] I was looking onto that - I didn't find examples [21:36:05] ottomata: Given that the service is described bu class in conf and that class names are the asmwe, I assume not [21:39:38] hmm i guess we need to look into it...haha, maybe we can totally stop using spark 1:p [21:40:40] ottomata: I'd love that - maybe Erik will not (discovery has jobs running from ooie IIRC) [21:41:42] aye [21:42:13] ottomata: I'm going to test my hypothesis by running a job without dynamic allocation [21:42:44] aye k [21:47:24] nuria_: holaaa I went out a moment, cave? [21:47:41] fdans: ok, but only if it is not too late [21:47:59] nuria_: no, I'm working, let's go [21:53:17] 10Analytics, 10Analytics-Dashiki: Add a legend for annotations - https://phabricator.wikimedia.org/T189164#4033525 (10CCicalese_WMF) [21:55:56] (03PS7) 10Milimetric: [WIP] Compute geowiki statistics for Druid from cu_changes data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/413265 [21:59:12] (03CR) 10Nuria: [wip]Responsive site (032 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/416999 (owner: 10Fdans) [22:10:37] 10Analytics-Tech-community-metrics, 10Differential, 10DevRel-April-2016, 10DevRel-March-2016, 10Developer-Relations (Jan-Mar-2018): Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#4033598 (10Aklapper) [22:10:53] 10Analytics-Tech-community-metrics, 10Differential, 10DevRel-April-2016, 10DevRel-March-2016, 10Developer-Relations (Jan-Mar-2018): Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#1808364 (10Aklapper) 05stall...