[00:06:09] 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics, 10Growth-Team (Current Sprint): Resume refinement of edit events in Data Lake - https://phabricator.wikimedia.org/T202348 (10nettrom_WMF) Based on conversations with @Neil_P._Quinn_WMF and @Catrope, and going through the... [00:17:45] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10ayounsi) > If this does go into the 'public' VLAN, could we restrict access to these nodes using some simple ferm rules?... [00:23:12] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Krenair) >>! In T207321#4687651, @ayounsi wrote: >> Where are the labsdb hosts going to live if they are being moved out... [02:43:58] (03CR) 10Milimetric: Handle null name values in top metrics from UI (036 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468964 (https://phabricator.wikimedia.org/T206968) (owner: 10Fdans) [03:56:59] bearloga: that causaleffect video was real nice, i think joal will like it: https://www.youtube.com/watch?v=GTgZfCltMm8 [05:48:47] morning! [05:48:54] seems that the new camus version worked fine [06:11:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: eventlogging logs taking a huge amount of space on eventlog1002 and stat1005 - https://phabricator.wikimedia.org/T206542 (10elukey) We finally deployed the new camus eventlogging-client-side job that dumps raw eventlogging data to HDFS periodically. The re... [06:21:28] (03PS1) 10Elukey: Upgrade to 1.8.1 [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/469165 (https://phabricator.wikimedia.org/T197276) [06:34:10] from my deep ignorance, I have no idea what version of node I should use to upgrade --^ [06:34:39] does it change anything if I run npm install from 6.4 rather than say 6.11 (that is what we have in prod for turnilo?) [06:38:16] PROBLEM - YARN NodeManager Node-State on analytics1068 is CRITICAL: CRITICAL: YARN NodeManager analytics1068.eqiad.wmnet:8041 Node-State: Could not find the node report for node id : analytics1068.eqiad.wmnet:8041 [06:38:49] this is me, working on it --^ [06:39:56] Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=analytics1001.eqiad.wmnet/10.64.36.118:8031] [06:40:00] lol [06:40:05] this is the host that needed the motherboard replacement [06:40:36] RECOVERY - YARN NodeManager Node-State on analytics1068 is OK: OK: YARN NodeManager analytics1068.eqiad.wmnet:8041 Node-State: RUNNING [06:41:25] the datanode is different, it kinda tries over and over to contact the old masters [06:41:35] RECOVERY - Hadoop NodeManager on analytics1068 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [06:42:05] !log restart yarn and hdfs daemon on analytics1068 to pick up correct config (the host was down since before we swapped the Hadoop masters due to hw failures) [06:42:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:21:08] I have cherry picked the new turnilo to labs [07:21:15] and it seems working fine with Druid 0.12.3 [07:28:53] so joal I think that Druid 0.12.3 is ready for prime time, but let's check together just to be sure :) [08:11:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar): Explore NavigationTiming by faceted properties - EventLogging refine - https://phabricator.wikimedia.org/T166414 (10Gilles) Looks great! Already I'm finding interesting facts about Chrome 69 vs Chrome 70 [08:52:43] fdans: o/ [10:00:16] helloooteamm [10:02:06] Hi folks :) [10:02:54] helloooo [10:06:18] joal: Bonjour! Whenever you have time let's chat about our dear friend druid [10:06:54] Let's do that when you want elukey :) [10:07:14] if you have time even now! [10:07:23] now it is :) [10:07:41] batcave? [10:08:44] even in here it is fine [10:08:50] as you wish :) [10:09:09] sooo the indexation for webrequest seems running fine now, no more weird errors [10:09:20] \o/ [10:09:44] and I prepared https://gerrit.wikimedia.org/r/469165 this morning to upgrade turnilo to 1.8.1, but not sure if everything is correct [10:10:00] I cherry picked it on the turnilo host in analytics-labs and it seems working fine [10:10:10] ssh -L 9091:turnilo.eqiad.wmflabs:9091 turnilo.eqiad.wmflabs [10:11:31] metrics seems working fine [10:12:20] the remaining thing to do in my opinion is to review the list of chances between 0.11 -> 0.12.3 just to be sure that we are not missing anything critical [10:12:27] config upgrade, etc.. [10:12:44] also I am not sure if the caffeine cache needs some tuning [10:12:50] or if we can rely on its defaults [10:13:50] I think that's your subconsciousness asking for another coffe elukey [10:14:45] elukey: My tests of turnilo in labs looks good so far [10:17:38] joal, elukey, qq (for after your discussion): do you know where does the Turnilo metric "Count" come from? is it added by default be Turnilo itself? [10:19:16] mforns: I think it is [10:20:39] list of changes [10:20:41] https://github.com/apache/incubator-druid/releases/tag/druid-0.12.3 [10:20:42] https://github.com/apache/incubator-druid/releases/tag/druid-0.12.2 [10:20:44] https://github.com/apache/incubator-druid/releases/tag/druid-0.12.1 [10:21:01] https://github.com/apache/incubator-druid/releases/tag/druid-0.12.0 [10:21:05] https://github.com/apache/incubator-druid/releases/tag/druid-0.11.0 [10:24:20] the last one is not really needed :) [10:25:11] :) [10:33:47] joal: do you think that there are other things that we'd need to check? [10:34:00] the upgrade procedure seems very easy and standard [10:41:31] elukey: from what Ihave read, it seems most of the impacting changes are on the kafka-ingestion side, which we need to test [10:41:41] so I think we're good :) [10:53:25] joal: also another n00b question - is https://gerrit.wikimedia.org/r/469165 ok? npm install on 6.4 is ok for 6.11? [10:55:01] elukey: I don't know :( [10:55:19] elukey: I think we'd rather confirm with our JSers - fdans? any opinion? [11:00:17] all right :) [11:00:45] joal: are we good to schedule a druid upgrade for private? Maybe Thursday morning? [11:01:10] +1 elukey ! [11:01:34] (I would even do it later on in the afternoon if somebody confirms the turnilo question) [11:01:50] (since it is a rolling upgrade) [11:02:01] (and thursday public :) [11:03:22] Works for me elukey [11:04:27] elukey: I use npm and node but really I have no clue if a version-number diff makes a big difference in downloaded packages [11:05:38] IN THEORY it shouldn't, especially since 6.4 vs 6.11 is not really a major version change [11:05:41] buuuut not sure :) [11:05:49] I didn't find any specific warning in the readme [11:06:01] will add it one once a js expert confirms :) [11:06:14] elukey: I have the same feeling, but sometimes computers don't care about my feelings (how dare they?) [11:10:07] ahahaha [12:13:56] elukey: I have managed to solve one issue with parquet-logs in hive [12:16:19] joal: ah I wasn't aware that there was an issue [12:16:58] elukey: some queries have parquet logs in the results [12:17:09] elukey: I have a way to prevent those logs to be present [12:17:15] nice! [12:17:48] elukey: not you're gonna like it though: We'd need to update the hive log4j file I think [12:18:13] joal: that doesn't seem bad :D [12:18:19] no? [12:18:39] elukey: not sure if updaring it will work, or if we'll need another file :( [12:18:47] elukey: Can we try? [12:19:15] elukey: try = stop puppet on stat1004, having me modify the hive-logging file and test [12:21:38] Arf actually elukey - I'm sure we need to asdd a new logging conf file, and also update the hive launching script (see https://issues.apache.org/jira/browse/HIVE-13954) [12:23:14] can we test in labs? [12:23:50] hm, I have not tried to replicate in labs [12:24:08] trying ! [12:24:32] if it is possible we'll be able to test in there without affecting anybody [12:24:38] +1! [12:25:04] otherwise let's do it on a stat host [12:25:16] if you don't manage to repro in say 10/15 mins [12:25:53] don't want that you waste a ton of time on this, it should not be a big deal to test in prod [12:25:56] :) [12:27:58] elukey: seems reproducible in labs, we can test :) [12:29:36] elukey: on hadoop-worker-1 in labs: hive -S -e "select uri_host, uri_path from wmf.webrequest where webrequest_source = 'text' and year = 2018 and month = 5;" | grep parquet [12:30:00] Those lines showing up in query response instead of going to STDERR [12:30:33] nice! [12:31:34] elukey: I'm assuming the changes should be done on cdh puppet submodule? [12:32:17] 10Analytics, 10Operations, 10ops-eqiad: analytics1068 doesn't boot - https://phabricator.wikimedia.org/T203244 (10Cmjohnson) @elukey analytics1068 is back up and running. Please resolve this task if everything looks good to you. [12:32:37] Welcome back an1068! [12:34:48] joal: it was back this morning :) [12:35:01] joal: what file did you change? [12:35:12] elukey: it's actually tricker :( [12:36:10] elukey: We'd need to add another logging file (parquet-logging.properties for instance), and add "-Djava.util.logging.config.file=$bin/../conf/parquet-logging.properties" to the hive bin script (or have a wrapper) [12:36:36] elukey: Hive uses log4j by default, and parquet uses java.util.logging :( [12:36:56] 10Analytics, 10Operations, 10ops-eqiad: analytics1068 doesn't boot - https://phabricator.wikimedia.org/T203244 (10elukey) @Cmjohnson thanks a lot! I noticed it this morning and fixed it, it was running with a old/stale config (and failing, so no big deal). Can we sync (either with me or Andrew) next time bef... [12:37:19] elukey: the problem we're solving here is solved in newer versions of hive (see issue above) [12:37:43] * joal is angry at CDH for not releasing updates [12:38:28] not sure if hive is managed by puppet though, I think it is part of the deb package (so we'll need a wrapper) [12:38:37] :( [12:38:52] we do that for beeline, ,why not hive you'll tell me [12:51:41] sorry I was reviewing another change, going to see in puppet [12:52:08] elukey@stat1004:~$ dpkg -S /usr/bin/hive [12:52:09] hive: /usr/bin/hive [12:52:12] joal: --^ [12:52:41] elukey: I'm assuming this means it's taken care of by deb package, right? [12:53:07] yes yes it is a dpkg feature, it searches for a specific file path into what is provided by debs [12:53:36] righ [12:53:40] mwarf [12:54:16] dumb question elukey: If you have /usr/bin/hive and /usr/local/bin/hive, which one do you prefer? [12:54:26] Actually not you, but the sheel :) [12:55:24] it depends on PATH IIRC [12:55:31] elukey: I assume so [12:55:47] in labs, local first [12:55:49] PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games [12:55:51] yeahh [12:55:59] makes sense [12:55:59] and it makes sense [12:56:03] indeed :) [12:56:10] we could override hive in there [12:56:22] do you want to try with a basic script in /usr/local/bin ? [12:56:34] elukey: could we override HADOOP_CLIENT_OPTS instead? [12:56:52] elukey: just thinking aloud [12:56:58] nono please go ahead [12:57:01] in hive-env? [12:57:06] elukey: cause that's what hive-patch does [12:57:10] yes [12:57:40] ahhh okok! [12:57:49] hive-env is managed by us [12:57:50] elukey: seems easy enough to test [12:57:51] so good [12:58:35] elukey: if you agree, I'll create a logging-config file in labs, and modify hive-env.sh to reference it in HADOOP_CLIENT_OPTS var [12:58:39] And test hive [13:05:40] oh yes please do [13:05:56] elukey: do I need to stop puppet? [13:08:59] yep [13:09:39] elukey: puppet-agent --disable ? [13:09:51] * joal is so ashamed of not even knowing that .... [13:11:30] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) Ok, great, then it sounds like this should go in the public VLAN, with ACLs in the Analytics VLAN to allow us t... [13:19:13] elukey, hdfs user on an-coord1001 has some cron jobs in its crontab that should not be there, they were added by a previous version of EL2Druid profile, but not needed any more, how is the process to remove them? [13:19:13] joal: you can always test which one is prefered for your current setup by running "which hive" [13:19:16] 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics, 10Growth-Team (Current Sprint): Resume refinement of edit events in Data Lake - https://phabricator.wikimedia.org/T202348 (10Ottomata) Since SQL is case insensitive, we recommend using snake_case rather than camelCase.... [13:19:20] just crontab -e and delete? [13:20:32] mforns: yes exactly, puppet does not clean up resources unless you explicitly tell it (with ensure => absent) [13:20:46] mforns: puppet will install crons with a # comment with the cron resource name [13:20:53] so you should delete the comment with it too [13:21:15] elukey, ottomata, cool, I will sudo -u hdfs crontab -e and delete them [13:21:16] thanks [13:23:30] (03CR) 10Ottomata: [C: 031] "COOL thanks luca!" [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/469165 (https://phabricator.wikimedia.org/T197276) (owner: 10Elukey) [13:24:07] ottomata: \o/ [13:24:16] do you think that using npn install with 6.4 is ok ? [13:24:20] (node I mena) [13:24:24] *mean [13:24:41] it is cherry picked in turnilo.eqiad.wmflabs if you want to test it! [13:25:09] ottomata, theres 3 files associated with each one of the deleted cron jobs: 1) command script 2) properties file 3) log file. Will delete all of them for the monthly jobs, is that right? [13:27:32] hm, the script files belong to root, I can not remove [13:27:49] elukey: recurse => true, is not a feature on directories i was aware of!!!! [13:27:54] 10Analytics, 10Patch-For-Review: Many client side errors on citation data, significant percentages of data lost - https://phabricator.wikimedia.org/T206083 (10bmansurov) @Nuria I'd appreciate your review of https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaEvents/+/468490/ before the branch cut... [13:27:58] i might be too old how long has that been possible?!?! [13:28:12] ottomata: I have no idea, I found it somewhere in our puppet stuff :D [13:28:34] properties files also belong to root [13:28:43] 10Analytics, 10Discovery-Search (Current work), 10Patch-For-Review: Many client side errors on citation data, significant percentages of data lost - https://phabricator.wikimedia.org/T206083 (10bmansurov) [13:28:55] mforns: k looking [13:29:13] elukey: hm, i dunno if it will be fine or not, i gues so? why not 6.11? [13:29:28] ottomata: because I have 6.4 on my mac :D [13:29:34] i usuually do the npm stuff on a labs stretch instance that has the same version [13:29:38] yeahhh that won't work [13:29:41] you should do it on the same OS [13:29:46] i mean, i t might work [13:29:52] ack let's not risk it then [13:29:53] any compiled deps also get frozen [13:29:55] lemme re do it [13:29:56] we should add that to the readme [13:30:03] yep yep will take care of it [13:30:09] k thanks [13:30:19] (even if it is JS!) [13:30:20] :P [13:30:31] it would def not work for e.g. eventstreams [13:30:34] since it uses node-rdkafka [13:30:40] which uses librdkafka [13:30:47] there might be other JS things like that too [13:30:52] that are C extensions [13:31:18] (03Abandoned) 10Elukey: Upgrade to 1.8.1 [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/469165 (https://phabricator.wikimedia.org/T197276) (owner: 10Elukey) [13:31:21] mforns: i think we could chown those files hdfs in puppet if we wanna [13:31:24] but i'll delete for now [13:31:25] which ones? [13:31:39] ottomata, listing [13:31:41] all the monthly ones? [13:32:11] /usr/local/bin/eventlogging_to_druid_navigationtiming_monthly [13:32:20] /usr/local/bin/eventlogging_to_druid_readingdepth_monthly [13:32:26] /usr/local/bin/eventlogging_to_druid_pageissues_monthly [13:32:29] and [13:33:02] /etc/refinery/eventlogging_to_druid/eventlogging_to_druid_navigationtiming_monthly.properties [13:33:09] /etc/refinery/eventlogging_to_druid/eventlogging_to_druid_readingdepth_monthly.properties [13:33:15] /etc/refinery/eventlogging_to_druid/eventlogging_to_druid_pageissues_monthly.properties [13:37:28] elukey: actually, i just made jessietest-1 in analytics labs project [13:37:32] yesterday for testing some stuff I have [13:37:32] OHHHH [13:37:34] sorry it is jessie [13:37:35] NM! [13:37:41] ok [13:38:17] mforns: done [13:38:24] :D thank you! [13:39:01] ottomata: I am trying to find npm on turnilo's stretch node in labs [13:39:07] but I am a n00b and can't find it [13:39:23] so it is not packaged with nodejs afacis [13:39:28] *afaics [13:39:35] * elukey hates nodejs [13:41:15] yeah.............>>>>>> [13:41:18] how did I do this [13:41:24] i might have done it on my vagrant [13:41:28] but how did I get npm there...looking [13:41:34] elukey: i know you can add the nodejs apt source [13:41:38] buuut might be a better way [13:42:00] ah yup [13:42:04] https://packages.debian.org/jessie/npm is there [13:42:07] but not on stretch!!! [13:42:12] elukey: i did it in vagrant, [13:42:17] apt::repository { 'nodesource': [13:42:17] uri => 'https://deb.nodesource.com/node_6.x', [13:42:26] yeah that is what I found as well [13:42:29] uff [13:42:30] https://nodejs.org/en/download/package-manager/#debian-and-ubuntu-based-linux-distributions [13:42:34] i just do this manually then [13:42:47] wget -qO- https://deb.nodesource.com/setup_6.x | sudo -E bash - [13:42:47] sudo apt-get install -y nodejs npm [13:43:33] yep yep I found that bit as well [13:44:38] elukey: SUCCESS! [13:45:22] elukey: I have added a parquet-logging file (copy from the hive patch), updated hive-env.sh --> no more poluting logs :) [13:45:56] Thanks moritzm for the trick :) [13:46:28] wow eh? [13:48:35] 10Analytics, 10Project-Admins: Create project for SWAP - https://phabricator.wikimedia.org/T207425 (10Aklapper) 05Open>03Resolved a:03Aklapper Requested public project #Analytics-SWAP has been created: https://phabricator.wikimedia.org/project/view/3711/ I've also adjusted H126 to add #Analytics if not... [13:54:16] (03PS1) 10Elukey: Upgrade to upstream 1.8.1 version [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/469198 (https://phabricator.wikimedia.org/T197276) [13:54:43] ottomata: --^ [13:55:47] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Aklapper) [13:55:49] 10Analytics, 10Analytics-SWAP: Jupyter Notebooks TLC 2018-2019 - https://phabricator.wikimedia.org/T188275 (10Aklapper) [13:55:51] 10Analytics, 10Analytics-SWAP: Functionality to share & view SWAP notebooks - https://phabricator.wikimedia.org/T156934 (10Aklapper) [13:55:53] 10Analytics, 10Analytics-SWAP: RStudio web version on SWAP - https://phabricator.wikimedia.org/T180270 (10Aklapper) [13:56:19] ottomata: just cherry picked on turnilo.eqiad.wmflabs, seems working [13:56:58] joal: can you re-test turnilo plz? :) [13:58:07] I surely can elukey [13:58:39] this time generated via nodejs 6.14 on stretch [13:58:43] that is better probably [13:59:16] as Andrew was saying the issue that can arise is if a module needs to compile stuff [13:59:18] elukey: faster (could be druid related) [13:59:37] elukey: functionaly everthing I have tried work [13:59:51] super! [14:00:08] elukey: And the timegranularity manual setting will be super useful :) [14:00:23] elukey: How shall I proceed with the hive logging stuff? [14:00:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Finalize eventlogging to druid ingestion with a whitelist instead of a blacklist - https://phabricator.wikimedia.org/T206342 (10mforns) [14:01:05] joal: two ways 1) you tell me what I need to change and I'll file the puppet change or you do it and I merge :) [14:01:30] elI'll try to it myself, but it means I'll have plenty questions :) [14:01:35] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar): Explore NavigationTiming by faceted properties - EventLogging refine - https://phabricator.wikimedia.org/T166414 (10mforns) I backfilled the last 3 months of data. This is now productionized! Data will continue to be imported... [14:01:50] joal: it is fine if you want me to do it! [14:02:15] 10Analytics, 10Analytics-Kanban, 10Page-Issue-Warnings, 10Product-Analytics, and 3 others: Ingest data from PageIssues EventLogging schema into Druid - https://phabricator.wikimedia.org/T202751 (10mforns) I backfilled the last 3 months of data. This is now productionized! Data will continue to be imported... [14:02:23] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Ingest data into druid for readingDepth schema - https://phabricator.wikimedia.org/T205562 (10mforns) I backfilled the last 3 months of data. This is now productionized! Data will continue to be imported automatically ev... [14:03:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Finalize eventlogging to druid ingestion - https://phabricator.wikimedia.org/T206342 (10mforns) [14:03:16] elukey: I'll try, and if it's taking me too long, I'll get back to you :) [14:03:34] also elukey, thanks a lot for oncall doc for timers :) [14:03:58] still need more things! but I hope that it will be enough for the moment [14:12:05] (03CR) 10Ottomata: [C: 031] Upgrade to upstream 1.8.1 version [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/469198 (https://phabricator.wikimedia.org/T197276) (owner: 10Elukey) [14:23:38] 10Analytics, 10Analytics-Dashiki, 10Google-Code-in-2018, 10goodfirstbug: Add external link to tabs layout - https://phabricator.wikimedia.org/T146774 (10Aklapper) (Would some mentor this in #GCI-2018, if that's in scope? If so, providing some more pointers in the task description for an absolute newcomer w... [14:24:42] (03CR) 10Mforns: Add change_tag to mediawiki_history sqoop (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465416 (owner: 10Fdans) [14:41:00] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10bd808) >>! In T207321#4687656, @Krenair wrote: >>>! In T207321#4687651, @ayounsi wrote: >>> Where are the labsdb hosts go... [14:42:28] ottomata: applied the /srv/log/eventlogging/systemd fix to both prod and beta, looks good [14:43:17] great! [14:46:38] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T206915 (10jijiki) [14:47:09] a-team: I have workers at home and I'd need to follow up with them, is it a problem if I skip standup and send e-scrum instead? [14:47:32] not on my side! [14:53:49] all right sent e-scrum [14:57:01] (03CR) 10Elukey: "https://gerrit.wikimedia.org/r/#/c/analytics/turnilo/deploy/+/469198/" [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/469165 (https://phabricator.wikimedia.org/T197276) (owner: 10Elukey) [14:57:42] (03CR) 10Mforns: [C: 031] "LGTM! Left a comment, but it is probably not an issue." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [15:00:22] ping ottomata fdans [15:00:37] ping elukey [15:01:04] nuria: I’m off today! :) [15:01:13] fdans: ah ok [15:01:18] nuria: sent e-scrum, I have some workers at home :( [15:03:15] 10Analytics, 10Analytics-Dashiki, 10Google-Code-in-2018, 10goodfirstbug: Add external link to tabs layout - https://phabricator.wikimedia.org/T146774 (10Milimetric) @Aklapper I'm happy to mentor this, I'll make the description amazing :) (but after my meetings today) [15:05:40] 10Analytics, 10Android-app-Bugs, 10Product-Analytics, 10Reading-analysis, and 5 others: Many errors on ReadingDepth.enable (?) schema - https://phabricator.wikimedia.org/T207423 (10Nuria) Is this getting deployed today? errors continue to be quite high: https://grafana.wikimedia.org/dashboard/db/eventloggi... [15:07:52] (03CR) 10Milimetric: [C: 032] Replace literal "anonymous editor" with null [analytics/aqs] - 10https://gerrit.wikimedia.org/r/468927 (https://phabricator.wikimedia.org/T206968) (owner: 10Fdans) [15:10:40] 10Analytics, 10Analytics-Kanban, 10Performance-Team (Radar): Possible statsv corruption? - https://phabricator.wikimedia.org/T189530 (10Ottomata) a:03Ottomata [15:11:55] (03CR) 10Joal: [C: 031] "LGTM :) Let's also make a PR for restbase doc please" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/468927 (https://phabricator.wikimedia.org/T206968) (owner: 10Fdans) [15:15:24] (03CR) 10Joal: [V: 031] Add oozie job partitioning webrequest subset (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [15:16:22] mforns: I answered to your comment there --^ Basically i don't know why we do not use the "+0" pattern in the year field :( [15:47:35] 10Analytics, 10Operations: setup/install barium/WMF4750 as oxygen replacement - https://phabricator.wikimedia.org/T207760 (10RobH) p:05Triage>03Normal [15:48:25] 10Analytics, 10Operations, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264 (10RobH) 05Open>03Resolved Created sub-task T207760 for setup. [16:02:37] 10Analytics, 10Android-app-Bugs, 10Product-Analytics, 10Reading-analysis, and 5 others: Many errors on ReadingDepth.enable (?) schema - https://phabricator.wikimedia.org/T207423 (10Jdlrobson) @nuria @Milimetric this is deployed but alas if clients have cached old javascript they will continue to trigger th... [16:22:01] 10Analytics, 10Android-app-Bugs, 10Product-Analytics, 10Reading-analysis, and 5 others: Many errors on ReadingDepth.enable (?) schema - https://phabricator.wikimedia.org/T207423 (10Nuria) @Jdlrobson When was this deployed to all wikis? there is no apparent reduction of errors so either all clients have ca... [16:25:54] !log altering topic eventlogging_ReadingDepth to increase partitions from 1 to 12 [16:25:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:28:49] 10Analytics, 10Operations: setup/install weblog1001/WMF4750 as oxygen replacement - https://phabricator.wikimedia.org/T207760 (10RobH) [16:29:36] * joal loves kafka and grafana [16:29:38] https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&var-schema=ReadingDepth&from=now-3h&to=now [16:33:35] (03PS1) 10GoranSMilovanovic: Init [analytics/wmde/Wiktionary/WD_percentUsageDashboard] - 10https://gerrit.wikimedia.org/r/469220 [16:34:09] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] Init [analytics/wmde/Wiktionary/WD_percentUsageDashboard] - 10https://gerrit.wikimedia.org/r/469220 (owner: 10GoranSMilovanovic) [16:36:11] joal: very nice :) [16:37:15] * elukey off! [16:39:20] 10Analytics: Make sure webrequest_text preferred partition leadership is balanced - https://phabricator.wikimedia.org/T207768 (10Ottomata) [16:54:25] 10Analytics: Make sure webrequest_text preferred partition leadership is balanced - https://phabricator.wikimedia.org/T207768 (10Nuria) [16:56:01] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Implement EventLogging Hive refinement - https://phabricator.wikimedia.org/T162610 (10Nuria) [16:56:05] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar): Explore NavigationTiming by faceted properties - EventLogging refine - https://phabricator.wikimedia.org/T166414 (10Nuria) 05Open>03Resolved [16:58:38] 10Quarry, 10Cloud-Services, 10cloud-services-team (Kanban): Migrate 'Quarry' project to eqiad1 - https://phabricator.wikimedia.org/T207677 (10Framawiki) I see: - setup a maintenance message few days before (well-named var in /srv/quarry/config.yaml then restart web service) - when no queries are running nor... [17:08:13] 10Quarry, 10Documentation: admin docs: quarry - https://phabricator.wikimedia.org/T206710 (10Framawiki) Hello @aborrero and thanks for trying to take care of it. It was in my to-do list. Is it a doc from/for the team ? or maintainers (me and @zhuyifei1999) can/are invited to edit it, to avoid duplicating this... [17:08:26] 10Quarry, 10Cloud-Services, 10cloud-services-team (Kanban): Migrate 'Quarry' project to eqiad1 - https://phabricator.wikimedia.org/T207677 (10zhuyifei1999) >>! In T207677#4689292, @Framawiki wrote: > @zhuyifei1999 can you add the sql query that you've used in last deployment window to the doc ? ``` FLUSH TAB... [17:17:26] 10Analytics, 10Operations, 10Patch-For-Review: setup/install weblog1001/WMF4750 as oxygen replacement - https://phabricator.wikimedia.org/T207760 (10RobH) a:05RobH>03Cmjohnson So, I'm not sure what port this is on, we'll need @cmjohnson to trace the cable and update the network switch (or atleast this ta... [17:17:36] 10Analytics, 10Operations, 10ops-eqiad: setup/install weblog1001/WMF4750 as oxygen replacement - https://phabricator.wikimedia.org/T207760 (10RobH) [17:19:19] 10Analytics, 10Android-app-Bugs, 10Product-Analytics, 10Reading-analysis, and 5 others: Many errors on ReadingDepth.enable (?) schema - https://phabricator.wikimedia.org/T207423 (10Tbayer) >>! In T207423#4687298, @Tbayer wrote: > Once the firefighting is done, could someone please spell out the implication... [17:24:25] 10Quarry, 10Documentation: admin docs: quarry - https://phabricator.wikimedia.org/T206710 (10aborrero) >>! In T206710#4689351, @Framawiki wrote: > Hello @aborrero and thanks for trying to take care of it. It was in my to-do list. > Is it a doc from/for the team ? or maintainers (me and @zhuyifei1999) can/are i... [17:24:42] 10Quarry, 10Cloud-Services, 10cloud-services-team (Kanban): Migrate 'Quarry' project to eqiad1 - https://phabricator.wikimedia.org/T207677 (10Andrew) Can I ask one of you to put up the maintenance message and suggest a window for this move? Anytime during US work hours (let's say after 14:00 UTC) will suit... [17:29:58] 10Analytics: Many client side errors on citation data, significant percentages of data lost - https://phabricator.wikimedia.org/T206083 (10bmansurov) Thanks, @Nuria! [17:52:37] 10Quarry, 10Documentation: admin docs: quarry - https://phabricator.wikimedia.org/T206710 (10Framawiki) Oh, I didn't know "Data Services" was "offering" service to the movement. It can be great to specify in the [[ https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry | project page ]] and its footer that i... [18:15:51] 10Analytics, 10Android-app-Bugs, 10Product-Analytics, 10Reading-analysis, and 5 others: Many errors on ReadingDepth.enable (?) schema - https://phabricator.wikimedia.org/T207423 (10Jdlrobson) [18:16:16] nuria: i messed up the patch yesterday. On it ^ [18:19:47] 10Analytics, 10Android-app-Bugs, 10Product-Analytics, 10Reading-analysis, and 5 others: Many errors on ReadingDepth.enable (?) schema - https://phabricator.wikimedia.org/T207423 (10Jdlrobson) There was a problem with the original SWAT. I've marked this as a deployment blocker @twentyafterfour We should ge... [18:24:45] 10Analytics, 10Analytics-Kanban, 10Performance-Team (Radar): Possible statsv corruption? - https://phabricator.wikimedia.org/T189530 (10Ottomata) Wow huh, I don't know what could cause this, but as far as I can tell, it isn't happening anymore. Any objections to just closing? [18:34:49] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: eqiad | (3) Labs Data Lake hardware - https://phabricator.wikimedia.org/T199674 (10RobH) Please note this has been ordered as part of T204177. [18:35:00] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: eqiad | (14 + 6) hadoop hardware refresh and expansion - https://phabricator.wikimedia.org/T199673 (10RobH) Please note this has been ordered as part of T204177. [18:40:18] ottomata: I have a question for regarding hive logging [18:40:22] ottomata: do you have a minute? [18:40:26] ya [18:40:55] Thanks ottomata - The current logging conf uses log4j, and logs hive file in /tmp/{user} [18:41:14] the logging we're trying to get rid of uses java.util.logging [18:41:17] The solution I [18:41:33] have found is to provide a config for this logging system [18:42:02] Now the concern: java.util.logging doesn't have a way to access username through config :( [18:43:03] I can put the file in /tmp/ directly, or in user.home, but in /tmp/{user} [18:43:14] but NOT in /tmp/{user} [18:44:36] Should I use /tmp and add a unique identifier to the log file to prevent conflicts, and possibly I could also raise the logging level for parquet to SEVERE [18:44:37] what is the file? [18:44:45] parquet-related info [18:44:58] raising the log level seems not a bad idea [18:45:07] will the user running hive own the file in /tmp? [18:45:28] essir [18:45:32] Yes [18:45:32] can you name it something identifiable, like /tmp/hive-user-logs-XXXXX [18:45:33] ? [18:45:37] 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics, 10Growth-Team (Current Sprint): Resume refinement of edit events in Data Lake - https://phabricator.wikimedia.org/T202348 (10Jdlrobson) @nettrom_WMF to support joining data with https://meta.wikimedia.org/wiki/Schema:Rea... [18:45:41] or somethign? [18:45:41] Not access to username [18:45:44] that's fine [18:45:45] :( [18:45:50] but a unique XXXX like yousay? [18:45:54] Yes [18:45:57] i think that's fine [18:46:00] It'll be a hell of files [18:46:04] isn't it already? [18:46:05] ohhh [18:46:12] can you do /tmp/hive-user/logs/XXXX ? [18:46:20] probably [18:46:23] I'll try [18:46:26] /tmp/hive-user-logs/XXXX [18:46:26] ? [18:46:28] Thanks [18:46:32] dunno if that is hte best name [18:46:35] but something like that [18:46:47] Maby even: /tmp/hive-parquet-logs/XXX [18:47:00] Testing that [18:48:45] aye [19:16:53] milimetric: yt? [19:17:08] hey, yeah [19:20:35] ottomata: ^ [19:20:50] hey [19:21:05] i'm putting some brain cycles into naming stream intake service [19:21:10] wanna have some fun or do it another time? [19:22:56] I’m pretty braindead at the moment, but I trust you! And maybe I’ll chime in later [19:24:34] haha [19:24:35] k [19:24:37] https://etherpad.wikimedia.org/p/event-platform [19:24:48] kinda like Turnstile, but Siphon and Bivalve are not bad...:p [19:44:21] errors should be back to normal [19:44:26] https://grafana.wikimedia.org/dashboard/db/reading-web-dashboard?orgId=1&panelId=16&fullscreen&from=now%2Fd&to=now [19:45:04] RECOVERY - Throughput of EventLogging EventError events on einsteinium is OK: (C)30 ge (W)20 ge 19.76 https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1 [19:45:38] thanks jdlrobson! [19:58:57] to stop this happening again it's on our dashboard which we check daily [19:59:04] so we'll notice any spikes relating to any schema changes we make [21:42:44] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10ayounsi) Not impacting that task, but for labsdb10[08|09|10], the presence of sensitive data + need to be reached from Cl... [21:50:13] jdlrobson: great, super thanks [21:53:43] 10Analytics, 10Android-app-Bugs, 10Product-Analytics, 10Reading-analysis, and 5 others: Many errors on ReadingDepth.enable (?) schema - https://phabricator.wikimedia.org/T207423 (10Nuria) 05Open>03Resolved [21:55:35] ottomata: back from salesforce presentation [22:02:15] ottomata: nice, so this is the effect of changing partitions? https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&var-schema=ReadingDepth [22:05:48] ottomata: it seems that all high volume are processes by 1002, right? https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&var-schema=VirtualPageView [22:06:02] ottomata: https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&var-schema=Print [22:11:18] (03CR) 10Nuria: [C: 031] Upgrade to upstream 1.8.1 version [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/469198 (https://phabricator.wikimedia.org/T197276) (owner: 10Elukey)