[00:00:07] PROBLEM - yarn.wikimedia.org HTTPS on analytics-tool1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster [00:11:09] RECOVERY - yarn.wikimedia.org HTTPS on analytics-tool1001 is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.011 second response time https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster [00:34:45] RECOVERY - Hue CherryPy python server on analytics-tool1001 is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue runcherrypyserver https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [01:22:33] PROBLEM - Hue CherryPy python server on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [01:22:51] PROBLEM - yarn.wikimedia.org HTTPS on analytics-tool1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster [01:46:41] RECOVERY - yarn.wikimedia.org HTTPS on analytics-tool1001 is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.012 second response time https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster [02:03:53] RECOVERY - Hue CherryPy python server on analytics-tool1001 is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue runcherrypyserver https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [03:28:01] PROBLEM - Hue CherryPy python server on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [03:34:47] PROBLEM - yarn.wikimedia.org HTTPS on analytics-tool1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster [03:56:31] RECOVERY - Hue CherryPy python server on analytics-tool1001 is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue runcherrypyserver https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [03:56:57] RECOVERY - yarn.wikimedia.org HTTPS on analytics-tool1001 is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.008 second response time https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster [05:53:03] good morning! :) [06:23:39] (03CR) 10Elukey: "Thanks a lot Joal, didn't think about it sorry :(" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533861 (https://phabricator.wikimedia.org/T231787) (owner: 10Joal) [06:33:22] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10elukey) @Nuria can you add more info to the description? What is the idea? [06:35:14] 10Analytics, 10Operations, 10Traffic: varnishkafka statsv and webrequest crashed on cp1081 - https://phabricator.wikimedia.org/T231331 (10elukey) I agree with Andrew, the issue seems to be a violation of an assert or similar in the Varnish libs, so unlikely related to a Varnishkafka bug (famous last words).... [06:38:54] 10Analytics, 10Analytics-SWAP, 10Product-Analytics: Provide Python 3.6 on SWAP - https://phabricator.wikimedia.org/T212591 (10elukey) >>! In T212591#5445917, @Neil_P._Quinn_WMF wrote: >>>! In T212591#5432055, @elukey wrote: >> As FYI we have now Python3.7 + libpython3.7 on notebooks: >> >> Caveat: since thos... [06:40:54] 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Yair_rand) Ideas to lower the number of potential problems: * Make the numbers less precise, somehow? Something like making each u... [06:48:02] hellooo luca welcome back [06:54:13] o/ [07:00:18] 10Analytics, 10Analytics-Kanban: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10elukey) I think this work is great, but I am a bit on the fence on the root keytab. I would prefer to avoid any super user to log in via Kerberos from any of our hosts. Currently we can... [07:10:13] Good morning team :) [07:13:01] bonjour! [07:13:32] Were holidays good elukey? [07:14:29] yes! A lot of rain but I liked Scotland :) [07:20:31] ita-irish knows rain :) [07:22:04] :) [07:22:22] I am checking the past alerts, did we figure out what was the problem with the hadoop worker node disks filling up? [07:23:22] elukey: not with precision, but the consensus was that it was due to trash filling up because of non-skip of deletion scripts [07:24:06] elukey: I added a task about parameterization of the hdfs-balancer threshold (T231828) [07:24:07] T231828: Should we change HDFS balancer threshold? - https://phabricator.wikimedia.org/T231828 [07:28:20] joal: ack [07:28:54] elukey: I have also continued the page on Kerb testing, about oozie - Please let me know if you thin I missed something [07:30:51] still haven't reviewed, but thanks a lot <3 [07:35:15] also elukey, do we have an idea of the reason of hue and yarn alarms this morning? [07:35:52] Ah just saw your email - sorry [07:36:38] didn't check what happened on the host yet but I think that the vm froze :( [07:38:03] need to step afk for 10/15 mins, sorry! [08:10:20] back :) [08:17:55] I am almost done with emails, a surprisingly small backlog :) [08:33:59] joal: very interesting - https://turnilo.wikimedia.org/#wmf_netflow [08:34:05] shows only the count measure :( [08:34:36] elukey: config issue? [08:35:07] I don't know, IIRC turnilo was happily showing all the measures when I have upgraded [08:36:10] (test_wmf_netflow shows all) [08:36:12] I recall that as well elukey [08:36:20] hm [08:36:41] will open a task [08:43:53] 10Analytics, 10Analytics-Kanban: Turnilo doesn't show all the measures for wmf_netflow - https://phabricator.wikimedia.org/T232307 (10elukey) [08:44:41] elukey: looks like archiva is down [08:47:20] joal: lovely [08:47:27] elukey: sorry :( [08:47:53] elukey: ganeti must have been shaky this morning [08:48:58] it is strange that I don't see any alarm about it [08:49:19] the vm is up [08:51:02] web version doesn't show up for me [08:52:46] yep yep for me too, checking [09:01:48] joal: a restart made archiva working, but not sure why it was stuck [09:02:15] elukey: Thank you :) [09:02:56] I am a bit confused why the alarm didn't fire [09:03:02] since everything was blocked [09:05:56] elukey: if the alarm is built the same as for hadoop for instance with a prometheus exporter, maybe the exporter was stuck as well? [09:06:52] joal: the alarm should be the standard one for reachability of the https endpoint, in theory it should be a plain http+tls call [09:07:01] k [09:09:01] joal: if we had an https alarm of course :P [09:09:05] apparently we don't [09:09:08] sigh [09:09:13] :( [09:43:52] 10Analytics, 10Analytics-Cluster, 10Operations: notebook1004 - /srv is full - https://phabricator.wikimedia.org/T232068 (10jbond) p:05Triage→03Normal [10:01:00] mmm no for archiva in theory we should have a https alert [10:01:08] apparently it was not alerting [10:20:30] * elukey lunch! [10:21:13] 10Analytics, 10Analytics-Cluster, 10Operations: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted - https://phabricator.wikimedia.org/T232069 (10jbond) I tried running the followin command on the server however the Current Cache policy remains as `WriteThrough` ` analytics1045 ~ % sud... [10:21:42] 10Analytics, 10Analytics-Cluster, 10DC-Ops, 10Operations, 10ops-eqiad: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted - https://phabricator.wikimedia.org/T232069 (10jbond) p:05Triage→03Normal [11:35:34] 10Analytics, 10Analytics-Kanban: Turnilo doesn't show all the measures for wmf_netflow - https://phabricator.wikimedia.org/T232307 (10elukey) Sanity checks: * I can see the `bytes` and `packets` fields in the json payload of the `netflow` kafka topic * puppet hive to druid config mentions `bytes` and `packets... [11:39:31] (03PS10) 10Fdans: Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [11:41:00] (03CR) 10Fdans: Add cassandra loading job for requests per file metric (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) (owner: 10Fdans) [11:42:44] mmm no for archiva in theory we should have a https alert [11:42:44] 12:01:08 <@elukey> apparently it was not alerting [11:42:48] oops [11:45:43] (03PS11) 10Fdans: Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [11:55:34] 10Analytics, 10Analytics-Kanban: Turnilo doesn't show all the measures for wmf_netflow - https://phabricator.wikimedia.org/T232307 (10elukey) From the Druid console, segments up to 2019-07-21 seems to weight a few bytes, compared to the other daily ones (~300/400MB). I think those are related to previous tests... [12:00:13] joal: ---^ - is it possible to delete only some segments in a druid datasource? I can only think of https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Actual_deletion but not sure if it is good or not [12:00:27] elukey: very possible yes [12:01:36] elukey: looks like I still have an issue with archiva - I don't manage to download artifacts from stat1004 [12:03:38] what error do you get? [12:04:03] I don't get errors, but it gets stuck at download stage on forst file [12:04:43] ah okok [12:05:43] elukey: I have setup the proxy (maybe I shouldn't have?) [12:08:03] joal: do you mean the export http blabla? [12:08:17] if so I'd try to remove it and see if it works [12:08:26] elukey: yes, this, but setup in xml in ~/.m2/settings.xml [12:09:12] Removed, trying again [12:09:38] elukey: its using https://archiva.wikimedia.org/repository/mirrored/ [12:09:59] in previous trial it was saying through the proxy, this time no proxy [12:10:07] But no download either [12:10:10] I can curl https://archiva.wikimedia.org/repository/mirrored/ from stat1004 [12:10:55] elukey: sorry hte precise url is https://archiva.wikimedia.org/repository/mirrored/org/apache/maven/plugins/maven-enforcer-plugin/1.0/maven-enforcer-plugin-1.0.pom [12:13:31] elukey: browsing through web works [12:13:41] elukey: https://archiva.wikimedia.org/#artifact~mirrored/org.apache.maven.plugins/maven-enforcer-plugin/1.0 [12:15:17] but not if you hit the file directly [12:15:19] mmmm [12:15:36] so I don't see the request in the archiva access log, but I can see it in the nginx one as "user terminated request" [12:16:37] is it the only one that gets stuck right? [12:17:10] elukey: it's the first, so only, yes :) [12:22:02] I am trying others but it doesn't seem to work [12:22:21] * joal is sad to welcome elukey with such issues [12:24:24] not even sure if we are supposed to download stuff like that without authentication [12:24:34] (I mean from the browser [12:29:28] restarting archiva as test [12:29:53] ack [12:30:50] so there is a bot that is currently downloading stuff [12:31:04] but it doesn't seem aggressive [12:31:12] testing anew [12:31:25] still waiting :( [12:32:38] stuff like repository/mirrored/org/apache/spark/spark-core_2.11/2.1.1/spark-core_2.11-2.1.1.jar.md5 works [12:33:29] meh [12:38:43] joal: are you doing something different this time or is it a regular build? [12:38:58] elukey: regular build [12:39:35] so nginx proxies correctly the request to archiva, that hangs from that moment onward [12:39:46] and it doesn't even log the request (archiva, nginx does) [12:39:49] elukey: Since I'm hitting an issue with spark and some code built with possible hadoop-version stuff, I deleted my .m2 folder, so a lot of downloads to do, but nothing unusual [12:40:10] pff [12:40:35] elukey: could be a change made last Friday on front-end for the doc? [12:40:43] s/doc/dos [12:40:52] ? [12:41:33] what change? [12:41:58] oh not friday, saturday - I have no clue if a change has been made, but maybe (to mitigate the attack) [12:43:47] ah no I'd say it shouldn't be the issue, I get the same result doing curl on localhost [12:44:02] k [12:48:52] (03PS3) 10Fdans: Add per file mediarequests endpoint to AQS [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) [12:49:34] (03CR) 10Fdans: Add per file mediarequests endpoint to AQS (0310 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) (owner: 10Fdans) [12:58:21] 10Analytics: wmf_netflow cube in Turnilo missing bytes and packets measures - https://phabricator.wikimedia.org/T232226 (10mforns) > I'm not exactly sure how to fix, but perhaps the Turnilo config needs an explicit wmf_netflow dataCube declared? I thought there was already some config for netflow, but checked a... [13:00:23] 10Analytics: wmf_netflow cube in Turnilo missing bytes and packets measures - https://phabricator.wikimedia.org/T232226 (10elukey) [13:00:25] 10Analytics, 10Analytics-Kanban: Turnilo doesn't show all the measures for wmf_netflow - https://phabricator.wikimedia.org/T232307 (10elukey) [13:02:27] 10Analytics: wmf_netflow cube in Turnilo missing bytes and packets measures - https://phabricator.wikimedia.org/T232226 (10elukey) @mforns: In T232307 (I didn't know that this task existed) I noticed from the coordinators UI that some segments should be probably deleted, they all have dimensions/metrics that we... [13:05:58] hey team :] [13:06:02] o/ [13:07:16] hola elukey :D [13:07:24] welcome back! [13:07:52] thanks! [13:09:35] elukey: hola! [13:11:28] o/ [13:12:04] joal: I am a bit out of ideas for archiva, I also admit my ignorance about its usage [13:12:07] :( [13:13:09] elukey: I must say it's similar for me - Maybe it needs login in order to give access to artifacts, but it feels it wasn't [13:14:52] 10Analytics: wmf_netflow cube in Turnilo missing bytes and packets measures - https://phabricator.wikimedia.org/T232226 (10Nuria) Assigning to myself per ops week. Last change to config: https://github.com/wikimedia/puppet/commit/711b8218a30f88896b22533b9841d8e234a95ff0#diff-720f242f8649d2e3c81b2abc89a78821 [13:15:05] 10Analytics: wmf_netflow cube in Turnilo missing bytes and packets measures - https://phabricator.wikimedia.org/T232226 (10Nuria) a:03Nuria [13:15:08] but in theory the fact that the build on stat1004 doesn't work is not great [13:17:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Nuria) [13:17:44] 10Analytics: wmf_netflow cube in Turnilo missing bytes and packets measures - https://phabricator.wikimedia.org/T232226 (10elukey) As reference, this is what I meant above: {F30282429} (wmf_netflow in the Druid's coord UI) [13:19:45] nuria: qq about the log database - with "analytics replica cluster" you mean the dbstore hosts? [13:20:26] (trying to understand what is needed) [13:28:13] 10Analytics, 10Analytics-SWAP, 10Product-Analytics: Provide Python 3.6 on SWAP - https://phabricator.wikimedia.org/T212591 (10Ottomata) > You can use python3.7 in your venv when you create the notebook All of the default python notebook kernels we make available in JupyterHub use the same auto created venv.... [13:30:27] 10Analytics, 10Analytics-SWAP, 10Product-Analytics: Provide Python 3.6 on SWAP - https://phabricator.wikimedia.org/T212591 (10elukey) Ok I thought it was possible to create your own venv and use that, but I was wrong. [13:34:27] joal: I am re-thinking about what you said earlier about the changes done to the traffic layer.. If mirrored grabs stuff on the fly from maven central, then even if I curl from localhost it might end up in the same trouble [13:34:36] I didn't see the issue from the right angle [13:34:44] (namely archiva contacting central or similar) [13:35:04] I think mirror does indeed do that, with some caching IIRC [13:36:02] 10Analytics, 10Analytics-Kanban: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10Ottomata) Doing what SRE already does is a good thing. Does SRE use bacula for main wiki backups? I think this DB is relatively large (compared to Matamo?). How long will MySQLDump lo... [13:38:17] 10Analytics, 10Analytics-Cluster, 10Operations: notebook1004 - /srv is full - https://phabricator.wikimedia.org/T232068 (10Ottomata) 05Open→03Resolved a:03Ottomata [13:38:57] 10Analytics, 10Analytics-Kanban: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10elukey) Yep as far as I know SRE does use Bacula extensively: https://wikitech.wikimedia.org/wiki/MariaDB/Backups The other questions I don't know, it needs a bit of testing, but in the... [13:39:36] joal: also another caveat is that Archiva is not behind Varnish [13:39:45] hm [13:40:02] but in this case it shouldn't matter, there might be something blocking archiva from contacting central [13:41:25] 10Analytics, 10Analytics-Kanban: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10Ottomata) Yeah one more question: do they only take backups from readonly slaves? Perhaps the write locks don't interfere with production backups because they just end up pausing the re... [13:41:36] elukey, I'm trying to compile refinery-source in stat1007 and it gets stuck when maven starts downloading stuff... [13:42:00] mforns: yeah we are investigating it with Joseph :) [13:42:04] known issue mforns - I have the same problem from stat1004 and home [13:42:05] Downloading: https://archiva.wikimedia.org/repository/mirrored/commons-codec/commons-codec/maven-metadata.xml [13:42:15] oh ok, sorry [13:42:33] should read scrollback.. [13:45:13] (03CR) 10Nuria: [C: 04-1] Add per file mediarequests endpoint to AQS (034 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) (owner: 10Fdans) [13:45:18] elukey: not being behind varnish is I guess in our issue - At lesat it comes from archiva itself [13:45:50] I guess GOOD sorry [13:46:04] so in theory archiva should contact central for [13:46:05] https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-enforcer-plugin/1.0/maven-enforcer-plugin-1.0.pom [13:46:20] do you guys remember the last time that you did a build? [13:46:46] elukey: not so long ago I think, but not sure at all it needed downloads [14:01:09] elukey o/ <3 :) [14:02:18] ottomata: hola, can you merge the skip trash changes to fix the disk space issues: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/533955/ [14:03:17] ottomata: o/ o/ o/ [14:04:13] merged nuria ! [14:08:22] gotta head home!...afk for a bit [14:08:25] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is CRITICAL: CRITICAL https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [14:09:04] forced the remount --^ [14:09:11] should be fixed now [14:16:24] joal: I am trying to do some curl to https://repo.maven.apache.org/maven2 from archiva1001 [14:16:39] and sometimes they fail [14:21:11] ok gathering some info, it definitely seems an issue with eqiad [14:35:37] nuria: do you have a second in batcave? [14:41:38] joal,mforns - after a chat with SRE, it is probably an issue on the maven side, that is on fastly [14:42:11] eqiad, where archiva runs, seems to be the only one getting trouble in resolving https://repo.maven.apache.org/maven2/ [14:42:37] so I'd wait a bit and see if things improve [14:43:37] elukey, thanks! [14:46:34] (back) [14:54:02] 10Analytics, 10Analytics-EventLogging: Disable production EventLogging analytics MySQL consumers - https://phabricator.wikimedia.org/T232349 (10Ottomata) [14:56:51] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Disable production EventLogging analytics MySQL consumers - https://phabricator.wikimedia.org/T232349 (10Ottomata) Since the page-create MySQL based dashboards are no longer needed, can we go ahead and just turn off the mysql-eventbus EventLogging co... [14:58:17] 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) >(Probably stupid question: Aren't there people who actually specialize in this kind of thing, with established methods for... [14:58:19] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Move reportupdater reports that pull data from eventlogging mysql to pull data from hadoop - https://phabricator.wikimedia.org/T223414 (10Ottomata) [14:58:21] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Port reportupdater queries that use MySQL log eventlogging database to Hive event database - https://phabricator.wikimedia.org/T229862 (10Ottomata) [14:58:36] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Disable production EventLogging analytics MySQL consumers - https://phabricator.wikimedia.org/T232349 (10Ottomata) [15:09:35] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is OK: OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [15:44:44] (03PS4) 10Fdans: Add per file mediarequests endpoint to AQS [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) [15:45:43] (03CR) 10Fdans: Add per file mediarequests endpoint to AQS (033 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) (owner: 10Fdans) [15:47:03] PROBLEM - Hue CherryPy python server on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [15:48:39] RECOVERY - Hue CherryPy python server on analytics-tool1001 is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue runcherrypyserver https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [15:49:57] this was due to the oom killer [15:50:03] not sure what happened [15:50:57] my suspicion is that somebody did/browsed something huge in hue that caused the oom [15:55:25] 10Analytics: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 (10fdans) a:03leila [15:56:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Decomission eventlogging-service-eventbus and clean up related configs and code - https://phabricator.wikimedia.org/T232122 (10fdans) [15:56:59] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Decomission eventlogging-service-eventbus and clean up related configs and code - https://phabricator.wikimedia.org/T232122 (10fdans) p:05Triage→03High [15:58:05] 10Analytics, 10New-Readers: Add KaiOS to the list of OS query options for pageviews in Turnilo - https://phabricator.wikimedia.org/T231998 (10fdans) In order for this to be reported, a PR has to be sent here: https://github.com/ua-parser/uap-core Nothing really for analytics to do here. [15:58:25] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, and 2 others: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Ottomata) @Cmjohnson / @Jclark-ctr https://gerrit.wikimedia.org/r/535221 adds DNS for non mgmt entries. Sho... [15:59:21] 10Analytics, 10Product-Analytics: Get "edits hourly" on a daily basis - https://phabricator.wikimedia.org/T231938 (10fdans) In the future we would like to do this but right now the edits data is generated on a monthly basis. [16:01:30] 10Analytics: Rename oozie edit_hourly job - https://phabricator.wikimedia.org/T231874 (10fdans) 05Open→03Declined [16:04:43] 10Analytics: Check home leftovers of smalyshev - https://phabricator.wikimedia.org/T231861 (10EBernhardson) I only see one directory in smalyshev hdfs home, looks safe to delete. [16:11:15] 10Analytics, 10Analytics-SWAP, 10Product-Analytics: Provide Python 3.6+ on SWAP - https://phabricator.wikimedia.org/T212591 (10Neil_P._Quinn_WMF) [16:12:02] 10Analytics: Check home leftovers of smalyshev - https://phabricator.wikimedia.org/T231861 (10elukey) ` ====== stat1004 ====== ls: cannot access '/srv/home/smalyshev': No such file or directory -rw------- 1 root root 1497 Sep 3 09:41 /var/userarchive/smalyshev.tar.bz2 ====== stat1006 ====== ls: cannot access '... [16:12:03] ottomata: Can you remind me if you had documented your pyspark experiences with versions and packages and all? [16:19:43] 10Analytics: Check home leftovers of smalyshev - https://phabricator.wikimedia.org/T231861 (10elukey) ` elukey@stat1004:~$ sudo tar -tvf /var/userarchive/smalyshev.tar.bz2 -rw------- smalyshev/wikidev 3686 2019-07-25 22:32 home/smalyshev/.viminfo -rw------- smalyshev/wikidev 72 2019-07-17 05:41 home/smalyshev/... [16:20:10] 10Analytics: Check home leftovers of smalyshev - https://phabricator.wikimedia.org/T231861 (10elukey) @EBernhardson can you recheck whenever you have time? [16:31:17] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Disable production EventLogging analytics MySQL consumers - https://phabricator.wikimedia.org/T232349 (10Ottomata) [16:42:06] 10Analytics, 10New-Readers: Add KaiOS to the list of OS query options for pageviews in Turnilo - https://phabricator.wikimedia.org/T231998 (10Nuria) Please ping us once your PR has been accepted. [16:43:04] going afk for ~20mins [16:48:08] 10Analytics, 10New-Readers: Add KaiOS to the list of OS query options for pageviews in Turnilo - https://phabricator.wikimedia.org/T231998 (10Nuria) a:03SBisson [16:48:40] 10Analytics, 10New-Readers: Add KaiOS to the list of OS query options for pageviews in Turnilo - https://phabricator.wikimedia.org/T231998 (10Nuria) Assigning to @SBisson to work on pull requests, #analytics to update library when accepted [16:54:24] 10Analytics, 10Analytics-Cluster, 10Operations: notebook1004 - /srv is full - https://phabricator.wikimedia.org/T232068 (10Groceryheist) I deleted a couple Gb that I don't need. Unfortunately most of the space I'm using is from ORES assets so I can't really store it in Hadoop. Maybe I should move this work... [16:57:47] 10Analytics, 10Analytics-Cluster, 10Operations: notebook1004 - /srv is full - https://phabricator.wikimedia.org/T232068 (10Nuria) @Groceryheist Can you explain a bit why ORES assets cannot be stored in hadoop? [17:11:16] 10Analytics, 10Analytics-Cluster, 10Operations: notebook1004 - /srv is full - https://phabricator.wikimedia.org/T232068 (10Ottomata) It looks like we have about 46G available for now, so hopefully that can hold us over. If you don't mind, just keep an eye on usage, and if it gets close to full, delete thing... [17:20:43] 10Analytics: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) In my mind, there are three major things that we'd need to do for Hadoop: * Complete the work on Kerberos, roll out the new config and handle the fallout of problems that we didn't test/take into account. Even if... [17:21:18] * elukey off! [17:35:27] 10Analytics: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10Nuria) Agreed with @elukey and priority wise I think we cannot test any hadoop upgrades until we have rolled out kerberos [17:38:45] milimetric, fdans, mforns, elukey, ottomata: please meet mgerlach. He started today with us at Research and he successfully completed his first meeting with nuria and joal now. ;) He will work from Berlin, so one more addition to CEST. :) [17:39:00] hi mgerlach ! :) welcome! [17:39:23] Hi mgerlach (I'm Joseph, nick are not always obvious) [17:39:44] Hi all! [18:23:16] hey mgerlach! welcome :] I'm Marcel [18:30:39] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), 10Services (watching): Migrate all event-schemas schemas to current.yaml and materialize with jsonschema-tools and remove old schemas - https://phabricator.wikimedia.org/T232144 (10Ottomata) [18:46:47] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: Migrate all event-schemas schemas to current.yaml and materialize with jsonschema-tools and remove old schemas - https://phabricator.wikimedia.org/T232144 (10Ottomata) [18:46:50] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Decomission eventlogging-service-eventbus and clean up related configs and code - https://phabricator.wikimedia.org/T232122 (10Ottomata) [19:33:01] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Decomission eventlogging-service-eventbus and clean up related configs and code - https://phabricator.wikimedia.org/T232122 (10Ottomata) eventlogging-service-eventubs decommed in beta! [19:34:32] 10Analytics: Discrepancies in Superset Pageview Data - https://phabricator.wikimedia.org/T232382 (10kzimmerman) [19:46:14] a-team , elukey fyi: https://phabricator.wikimedia.org/T227541 [19:46:20] pdu upgrade [19:46:26] aqs1008 is on one [19:46:30] should be no problem, but just in case! [19:46:56] ok [19:53:26] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Operations, and 2 others: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10Nuria) ping @ema now that ahem, things are a bit more quiet [19:59:46] 10Analytics, 10PageViewInfo, 10Pageviews-API, 10Tool-Pageviews: Pageview graphs doesn't work on ruwiki - https://phabricator.wikimedia.org/T232388 (10MBH) [21:01:25] 10Analytics, 10EventBus, 10WMF-JobQueue, 10CPT Initiatives (Modern Event Platform (TEC2)), 10good first bug: EventBus extension must not send batches that are too large - https://phabricator.wikimedia.org/T232392 (10Pchelolo) [21:03:56] 10Analytics, 10Growth-Team, 10Notifications, 10Wikimedia-production-error: Database error "Duplicate entry" for PRIMARY key (from EchoNotificationMapper::insert) - https://phabricator.wikimedia.org/T217079 (10Krinkle) [22:37:14] 10Analytics: wmf_netflow cube in Turnilo missing bytes and packets measures - https://phabricator.wikimedia.org/T232226 (10Nuria) the ingestion has bytes and packets: https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/job/druid_load.pp#L44 [22:55:30] 10Analytics: wmf_netflow cube in Turnilo missing bytes and packets measures - https://phabricator.wikimedia.org/T232226 (10Nuria) mmm... the autogenerated config > /usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --druid http://druid1001.eqiad.wmnet:8082 --print-config --with-co... [23:07:52] 10Analytics, 10Product-Analytics: Discrepancies in Superset Pageview Data - https://phabricator.wikimedia.org/T232382 (10kzimmerman) [23:09:14] 10Analytics, 10Product-Analytics: Discrepancies in Superset Pageview Data - https://phabricator.wikimedia.org/T232382 (10kzimmerman) [23:24:04] 10Analytics: wmf_netflow cube in Turnilo missing bytes and packets measures - https://phabricator.wikimedia.org/T232226 (10Nuria) Deleted segments with kb rather than mb (mostly month of july 2019) [23:45:40] 10Analytics, 10Analytics-EventLogging: Disable production EventLogging analytics MySQL consumers - https://phabricator.wikimedia.org/T232349 (10ayounsi) This look related, https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=eventlog1002&service=Check+systemd+state `CRITICAL - degraded: The syst...