[05:56:13] morning! [05:56:19] so analytics1028 is behaving fine afaics [05:56:47] still really confused/annoyed by the fact that a (wiped) journal node cannot resync by itself with the other hosts in the ensemble [08:12:57] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Reimage the Debian Jessie Analytics worker nodes to Stretch. - https://phabricator.wikimedia.org/T192557#4199417 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analytics1035.... [08:16:50] reimaging analytics1035 [08:48:54] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Reimage the Debian Jessie Analytics worker nodes to Stretch. - https://phabricator.wikimedia.org/T192557#4199452 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1035.eqiad.wmnet'] ``` and were **ALL** successful. [09:07:54] ok this time the reimage of the journal node was less painful [09:07:58] will add a note in the docs [10:00:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Reimage the Debian Jessie Analytics worker nodes to Stretch. - https://phabricator.wikimedia.org/T192557#4199700 (10elukey) Added documentation about the journal nodes: https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FSystems%... [10:09:44] aaand now reimaging the last hadoop worker/journal node to stretch! [10:09:45] 1052 [10:28:20] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Reimage the Debian Jessie Analytics worker nodes to Stretch. - https://phabricator.wikimedia.org/T192557#4199844 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analytics1052.... [10:57:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Reimage the Debian Jessie Analytics worker nodes to Stretch. - https://phabricator.wikimedia.org/T192557#4199937 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1052.eqiad.wmnet'] ``` and were **ALL** successful. [10:58:31] all hadoop worker nodes migrated to stretch! [10:59:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Reimage the Debian Jessie Analytics worker nodes to Stretch. - https://phabricator.wikimedia.org/T192557#4199938 (10elukey) ``` elukey@neodymium:~$ sudo cumin 'A:hadoop-worker' 'cat /etc/debian_version' 50 hosts will be targeted: analytics... [11:10:31] 10Analytics, 10Analytics-Wikistats: Consider adding breadcrumbs to Wikistats 2 - https://phabricator.wikimedia.org/T178018#4199996 (10sahil505) [11:10:35] 10Analytics-Kanban, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210#4199995 (10sahil505) [11:13:35] 10Analytics-Kanban, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210#4200029 (10sahil505) Removed T178018 based on @Milimetric comments. [11:34:44] elukey: hellooo should we copy the geoip archive directories to /usr/share/geoip/archive or rerun the script? [11:35:13] copy with hardlink from dan's home folder might not work because of hardlinking across devices [11:40:30] fdans: I didn't follow the changes in the script, is rerun going to repopulate all the history? [11:42:04] elukey: I did a test run of this script to turn the repo into a dir structure by dates, so I have all those populated dirs in my home directory [11:46:25] fdans: sorry but I am not following (I don't know how the script works now) - what is the best solution that you want to be applied? [11:47:28] you mentioned: your home, dan's and re-running the script [11:48:25] (I'll help but I'd need some info :) [11:48:44] elukey: rerunning the script would attempt to copy with hardlink the repository in dan's home in the form of Y-m-d [11:50:13] but there's a different option, which is just copying the files that I already have on my home after the test run, to the archive's dir [11:50:42] fdans: let's batcave so it will be quicker [11:51:27] yep [12:03:11] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Druid clusters to 0.11 - https://phabricator.wikimedia.org/T193712#4200157 (10elukey) [12:13:40] 10Analytics, 10EventBus, 10JobRunner-Service, 10MediaWiki-Database, and 5 others: Wikimedia\Rdbms\LoadBalancer::{closure}: found writes pending - https://phabricator.wikimedia.org/T191282#4200165 (10jcrespo) Most if not all of these seem to have gone since yesterday's train (need more time to check if abso... [13:50:38] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Druid clusters to 0.11 - https://phabricator.wikimedia.org/T193712#4200317 (10elukey) [13:56:43] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Druid clusters to 0.11 - https://phabricator.wikimedia.org/T193712#4200339 (10elukey) @JAllemandou I've pushed all the changes for the druid package to the Druid git repo, rebuilt and deployed those packages in labs. Not yet upload... [14:03:07] hey teaaam [14:13:49] mforns: o/ [14:13:54] hey elukey :] [14:14:15] FYI I am going to do a hdfs/yarn failover to apply openjdk security upgrades [14:15:02] elukey, I was about to launch an oozie job, should I wait? [14:15:56] mforns: yeah gimme 10 mins [14:16:15] elukey, no problem just checking [14:24:31] mforns: you are free to go [14:24:44] elukey, thanks!!! :] [14:25:00] !log restarted hadoop namenodes/resourcemanagers to apply openjdk security upgrades [14:25:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:25:43] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Restart Analytics hosts for Java 8 Security upgrades - https://phabricator.wikimedia.org/T194268#4200397 (10elukey) [14:33:25] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Restart Analytics hosts for Java 8 Security upgrades - https://phabricator.wikimedia.org/T194268#4200428 (10elukey) [14:47:19] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Restart Analytics hosts for Java 8 Security upgrades - https://phabricator.wikimedia.org/T194268#4200461 (10elukey) [14:47:29] elukey: the directories are ready to be copied from /home/fdans/geoip/MaxMind-database/archive to /usr/share/GeoIP/archive [14:49:06] fdans: ack, lemme check [14:49:14] thank youuuu [14:50:52] fdans: qq - why did you guys decide to use /usr/share/GeoIP rather than /srv? [14:51:33] there are ~65G on the root partition, and we'd occupy 31 of them with this copy [14:51:51] elukey: andrew suggested it as part of the last CR [14:52:07] and to me it made sense, but I'm reconsidering it now [14:52:16] does the space consumption grow a lot ? [14:52:35] no idea [14:53:21] from what I can see 100/200MB every day? [14:53:27] elukey: my main concern is that dan's job is currently committing the whole content of /usr/share/GeoIP, so we'd have to stop the old cron job the moment we copy those files [14:53:39] elukey: ah sorry, yeah, 300mb per week [14:57:12] so for the short/medium term it should be fine, let's see how it goes [14:57:33] fdans: so you would like to stop dan's cron forever and then copy? [14:58:18] I can comment it in his crontab on stat1005 [14:58:20] then do the copy [14:58:53] 4.0K /home/fdans/geoip/MaxMind-database/archive/2018-04-15 [14:58:53] 4.0K /home/fdans/geoip/MaxMind-database/archive/2018-04-22 [14:58:54] 4.0K /home/fdans/geoip/MaxMind-database/archive/2018-04-29 [14:58:58] are those ok? [14:59:04] 4K seemes not right [14:59:20] (they are empty) [15:00:00] yeah elukey those I need to extract from dan's repo, but I can't git checkout, so I was going to ask him to get them out [15:00:35] elukey: are you able to git checkout those dates in dan's repo? [15:01:13] fdans: ah so git checkout $day; copy data to corrspondent dir; next [15:01:32] ok so let's stop dan's cron first [15:01:59] elukey: yeah they re HEAD~1, HEAD~2 and HEAD~3 [15:02:16] HEAD is already there as you can see, because i didn't have to git checkout [15:02:39] #30 5 * * * /home/milimetric/GeoIP-toolbox/update_data_files.sh >/dev/null 2>/dev/null [15:02:42] commented [15:02:47] awesome [15:07:22] elukey: once all the tree is there, the last step is to copy all of them to HDFS (/wmf/data/archive/geoip/) [15:08:50] 313M /home/fdans/geoip/MaxMind-database/archive/2018-04-15 [15:08:50] 313M /home/fdans/geoip/MaxMind-database/archive/2018-04-22 [15:08:51] 314M /home/fdans/geoip/MaxMind-database/archive/2018-04-29 [15:09:32] lookin legit! [15:09:32] so now IIUC we have to copy it to usr/share/etc.. and then to hdfs? [15:09:46] yes [15:10:32] sudo cp -v /home/fdans/geoip/MaxMind-database/archive/* /usr/share/GeoIP/archive [15:10:36] good? [15:10:46] in this way we'll get root:root perms [15:10:59] rather than fdans:wikidev [15:11:13] elukey: looks good to me [15:11:17] and then we can chown as we want [15:11:23] yep [15:16:36] (copying) [15:19:09] fdans: done! [15:19:29] so now the root partition is ~80% filled up [15:19:37] let's have a chat with Andrew about it on Modnay [15:19:40] *Monday [15:19:49] elukey: niiiice! is that on hdfs too? [15:20:27] nope still to do [15:20:37] thanks for doing this elukey [15:20:45] sei un mostro di bravura!! [15:21:25] ahhahahahaah [15:21:41] (I don't know if italian people actually say this ironically but I mean it as you're the best <3 ) [15:21:43] that sounds like something a grandma would say to her nephew [15:21:52] :D [15:22:12] anyhow, do you want to try to upload to hdfs? [15:22:32] it should be one liner (I think copyFromLocal [15:22:48] (hdfs dfs -copyFromLocal etc..) [15:22:53] elukey: am I able to upload to that dir with my creds? [15:24:34] fdans: I think only sudoing as hdfs [15:24:59] so there seems to be no "geoip" dir under /wmf/data/archive/ [15:25:12] do we want to start with that? [15:25:31] yep! [15:27:05] elukey: but can I do things as hdfs? [15:29:20] because hdfs dfs -mkdir on that location is giving me permission denied [15:30:00] did you sudo -u hdfs before ? [15:30:12] you should be able to [15:35:28] ah yes you did it :) I see /wmf/data/archive/geoip [15:36:12] elukey: will copy all the things now [15:36:37] * fdans didn't know he could do sudo -u hdfs 🤦🏻‍♂️ [15:36:40] fdans: before starting let's sync because the things in there are a bit sensitive :) [15:36:51] elukey: cave? [15:36:53] sure [15:38:12] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4178076 (10Rspeer) I agree that Wikidata has been making a big mistake here. Many Wikipedia editors put incredible amounts of effort into maintaining things such as its infoboxes and c... [16:03:50] a-team? standup? [16:20:58] elukey, do you have time for teaching me the log side of the force? [16:23:45] mforns: I was checking logs, sure! [16:23:48] bcave? [16:23:50] ok :] [16:27:47] I'm in elukey :] [16:35:38] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4200647 (10Tgr) Do editors have rights over infobox data? Individual facts are not copyrightable; collections of infobox data could maybe be copyrighted if they were curated by an organ... [16:40:18] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4200653 (10Aschmidt) Please note that there is not one copyright law. There are as many copyright laws as there are legal systems Wikipedia and Wikidata content can be retrieved from.... [17:04:11] PROBLEM - Number of segments reported as unavailable by the Druid Coordinators -Analytics cluster- on einsteinium is CRITICAL: 12 gt 10 https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&panelId=46&fullscreen&orgId=1&var-cluster=druid_analytics&var-druid_datasource=All [17:28:33] fixing the druid alarm to something that hopefully doesn't spam us o often [17:36:51] RECOVERY - Number of segments reported as unavailable by the Druid Coordinators -Analytics cluster- on einsteinium is OK: (C)10 gt (W)5 gt 0 https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&panelId=46&fullscreen&orgId=1&var-cluster=druid_analytics&var-druid_datasource=All [17:38:43] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4200874 (10Rspeer) Tgr: The situation of having the copyright on a project held by a large number of different individuals is not unique, and it does not at all make the copyright inval... [17:41:55] (merged the new alarm!) [17:42:33] * elukey off! [17:42:37] have a good weekend people :) [19:05:15] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Wikimedia-Logstash, and 2 others: EventBus HTTP Proxy service does not report errors to logstash - https://phabricator.wikimedia.org/T193230#4201184 (10Pchelolo) We've got the logs in logstash, thank you @Ottomata [19:05:33] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Wikimedia-Logstash, 10Services (done): EventBus HTTP Proxy service does not report errors to logstash - https://phabricator.wikimedia.org/T193230#4201185 (10Pchelolo) 05Open>03Resolved [19:22:19] 10Analytics, 10EventBus, 10Patch-For-Review, 10Services (doing), 10User-Elukey: Investigate group.initial.rebalance.delay.ms Kafka setting - https://phabricator.wikimedia.org/T189618#4201263 (10Pchelolo) [19:58:50] (03PS1) 10Mforns: Fix bug in virtualpageview druid monthly indexation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/432643 (https://phabricator.wikimedia.org/T192305) [20:01:07] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4178076 (10Jarekt) The type of data we are copying from Wikimedia projects to to Wikidata is not copyrightable. Those are just facts, like coordinates, dates, names, filenames, identifi... [20:05:10] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review: Record and aggregate page previews - https://phabricator.wikimedia.org/T186728#4201469 (10mforns) The data set with the source fields is ready, accessible in hive: wmf.virtualpageview_hourly. It is update... [20:06:46] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Index and store page preview agreggates on Druid so they are visible in pivot/superset - https://phabricator.wikimedia.org/T192305#4201473 (10mforns) [20:34:27] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#4201532 (10mforns) @Tbayer Thanks for posting this here. It helped me understand how you use that field. I can totally see the benefits of k... [20:40:52] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review: Record and aggregate page previews - https://phabricator.wikimedia.org/T186728#4201548 (10Tbayer) Thanks @mforns, also for keeping the existing data up earlier while the fix was implemented (I was able to... [20:55:00] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4201558 (10Rspeer) Wikidata has copied the entire ontology of Wikipedia categories. The claim that ontologies are not copyrightable would be controversial at best, actively untrue if o... [22:25:05] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4201740 (10Rspeer) I must amend my previous statement; I thought Wikipedia categories were all represented on Wikidata, but it appears they may not be. Maybe I don't know how to use Wik... [22:53:07] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#4201808 (10Nuria) >I think if we store the install date (YYYY-MM-DD) in all events, we could calculate things like this using that field inst... [23:01:47] 10Analytics, 10Operations, 10SRE-Access-Requests: Access to usergroups for Marshall Miller - https://phabricator.wikimedia.org/T194550#4201768 (10Dzahn) [23:09:06] (03CR) 10Nuria: [V: 032 C: 032] Fix bug in virtualpageview druid monthly indexation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/432643 (https://phabricator.wikimedia.org/T192305) (owner: 10Mforns) [23:49:19] 10Analytics, 10Operations, 10SRE-Access-Requests: Access to usergroups for Marshall Miller - https://phabricator.wikimedia.org/T194550#4201903 (10DannyH) I approve Marshall's access. [23:57:34] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Reading-analysis: Productionize per-country daily & monthly active app user stats - https://phabricator.wikimedia.org/T186828#4201906 (10chelsyx) a:03chelsyx [23:57:46] 10Analytics, 10Operations, 10SRE-Access-Requests: Access to usergroups for Marshall Miller - https://phabricator.wikimedia.org/T194550#4201908 (10Nuria) Approved on my end too.