[00:52:44] PROBLEM - Check the last execution of reportupdater-browser on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:52:50] PROBLEM - Check the last execution of reportupdater-published_cx2_translations on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:53:20] PROBLEM - Check the last execution of archive-maxmind-geoip-database on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:53:24] PROBLEM - Check the last execution of refinery-import-wikidata-all-json-dumps on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:55:50] PROBLEM - Check the last execution of reportupdater-pingback on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:56:02] PROBLEM - Check the last execution of reportupdater-wmcs on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:56:38] PROBLEM - Check the last execution of reportupdater-structured-data on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:57:28] PROBLEM - Check the last execution of reportupdater-interlanguage on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:00:44] PROBLEM - Check the last execution of reportupdater-reference-previews on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:01:06] PROBLEM - Check the last execution of refinery-import-siteinfo-dumps on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:01:24] PROBLEM - Check the last execution of wikimedia-discovery-golden on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:01:30] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [01:01:38] PROBLEM - Check the last execution of refinery-import-page-current-dumps on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:01:56] PROBLEM - Check the last execution of refinery-import-wikidata-all-ttl-dumps on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:02:00] PROBLEM - Check the last execution of refinery-import-page-history-dumps on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:18:06] RECOVERY - Check the last execution of reportupdater-pingback on stat1007 is OK: OK: Status of the systemd unit reportupdater-pingback https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:18:18] RECOVERY - Check the last execution of reportupdater-wmcs on stat1007 is OK: OK: Status of the systemd unit reportupdater-wmcs https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:18:54] RECOVERY - Check the last execution of reportupdater-structured-data on stat1007 is OK: OK: Status of the systemd unit reportupdater-structured-data https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:19:42] RECOVERY - Check the last execution of reportupdater-interlanguage on stat1007 is OK: OK: Status of the systemd unit reportupdater-interlanguage https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:23:00] RECOVERY - Check the last execution of reportupdater-reference-previews on stat1007 is OK: OK: Status of the systemd unit reportupdater-reference-previews https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:23:20] RECOVERY - Check the last execution of refinery-import-siteinfo-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-siteinfo-dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:23:38] RECOVERY - Check the last execution of wikimedia-discovery-golden on stat1007 is OK: OK: Status of the systemd unit wikimedia-discovery-golden https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:23:54] RECOVERY - Check the last execution of refinery-import-page-current-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-page-current-dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:24:10] RECOVERY - Check the last execution of refinery-import-wikidata-all-ttl-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-wikidata-all-ttl-dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:24:14] RECOVERY - Check the last execution of refinery-import-page-history-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-page-history-dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:26:06] RECOVERY - Check the last execution of reportupdater-browser on stat1007 is OK: OK: Status of the systemd unit reportupdater-browser https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:26:12] RECOVERY - Check the last execution of reportupdater-published_cx2_translations on stat1007 is OK: OK: Status of the systemd unit reportupdater-published_cx2_translations https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:26:42] RECOVERY - Check the last execution of archive-maxmind-geoip-database on stat1007 is OK: OK: Status of the systemd unit archive-maxmind-geoip-database https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:26:46] RECOVERY - Check the last execution of refinery-import-wikidata-all-json-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-wikidata-all-json-dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:32:38] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1007 is OK: OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [04:02:29] 10Analytics, 10Product-Analytics: wmfdata cannot recover from a crashed Spark session - https://phabricator.wikimedia.org/T245713 (10nshahquinn-wmf) @Ottomata, thanks for jumping in! See my responses in T245896. [04:35:41] 10Analytics, 10Product-Analytics: wmfdata cannot recover from a crashed Spark session - https://phabricator.wikimedia.org/T245713 (10nshahquinn-wmf) @Ottomata, on the newly refined subject of this task (being unable to recover from a Spark error), I just compiled a complete log of the commands and output for o... [06:25:54] 10Analytics, 10Product-Analytics: Geoeditors dataset should use country ISO codes instead of country names - https://phabricator.wikimedia.org/T245967 (10Yair_rand) [06:49:51] finally the build of hadoop completed with the patch for openssl [06:50:05] aaand just using the libcrypto from 1.1.0 I get [06:50:05] openssl: true /usr/lib/x86_64-linux-gnu/libcrypto.so [06:50:09] yessssssssssssssss [06:50:11] \o/ [06:50:51] will try to copy/install the new package on all hadoop worker nodes in test and see if spark now works [07:13:04] 10Analytics, 10Pageviews-API: Pageviews API should allow specifying a country - https://phabricator.wikimedia.org/T245968 (10Yair_rand) [07:25:45] also, today I'd move all report updater jobs to an-launcher1001 [07:25:56] from stat1006 and stat1007 [09:02:12] Good morning elukey [09:02:19] Super great news for hadoop and SSL :) [09:03:22] joal: bonjour! Spark still doesn't work though grr [09:03:59] :s mwarf [09:04:09] elukey: mapreduce works? [09:05:52] (03CR) 10Joal: Add wikidata item_page_link oozie job (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/572834 (https://phabricator.wikimedia.org/T244707) (owner: 10Joal) [09:08:27] joal: yes yes all works, usual spark crypto rpc non sense [09:08:33] meh [09:08:41] 10Analytics, 10User-Elukey: Standard partman recipe for druid hosts - https://phabricator.wikimedia.org/T245810 (10fgiunchedi) [09:08:47] elukey: possibly same kind of issue? [09:09:28] yes yes I think there is a problem with openssl, but the logs are not super clear [09:09:44] this time I am using openssl 1.1.0 and not 1.0.2 [09:10:06] elukey: ok - let me know if I can help (when I say that I feel I'd be more of a burden, but still wish to offer :) [09:10:12] ok [09:10:32] I'll surely ask at some point, I kinda hoped that the hadoop checknative thing was the culprit [09:10:46] I assume it probably was on the way [09:10:48] but I am also learning a bit how bigtop works etc.. the devs are really helpful [09:11:22] elukey: idea - Spark is built with a hadoop version linked into it - Would rebuilding it with your fixed hadoop be the thing? [09:11:55] elukey: You're the OSS angel in my eyes :) [09:12:40] elukey: systemd failure earlier today are to due to misutilization of stat1007 right? [09:12:44] joal: good point, I'll check how Andrew builds spark, if hadoop jars are shipped with it then we might need some refinement [09:12:53] joal: yeah :( (stat1007) [09:12:56] elukey: :( [09:13:08] but I am prepping a change now to move RU jobs to stat1007 [09:13:11] so that's a start [09:13:22] elukey: about hadoop- Maybe an easy-dirty try can be to change jars in spark folders? [09:13:26] then I'll need the xmldumps mountpoints and also the other jobs will be gone [09:13:37] STRONG ENFORCEMENT! [09:15:10] ahh interesting, in the spark2 package there are indeed hadoop jars with 2.6.x version [09:15:16] \o/ [09:16:06] maybe Andrew targets a specific version of spark in the deb [09:16:09] err hadoop [09:16:35] elukey: I can't remember Andrew was building spark [09:16:46] elukey: packaging maybe, but can't recall rebuilding [09:22:10] 10Analytics, 10User-Elukey: Standard partman recipe for druid hosts - https://phabricator.wikimedia.org/T245810 (10elukey) So I checked and the /var/lib/druid dir seems also created by the deb: ` elukey@druid1001:~$ dpkg -S /var/lib/druid druid-common, druid-middlemanager, druid-historical: /var/lib/druid `... [09:24:23] elukey: if we need to rebuild hadoop, I'll actually ask for a bump in apache-compress ;) [09:24:35] joal: I am not sure if I'll allow it [09:24:46] :P [09:24:49] huhuhu [09:25:08] ok - I'll rerun those jobs manually FOREVAAAAAAR [09:25:15] ahahaha [09:25:43] in any case, the patch for openssl seems good, so I'll report to bigtop the result and hopefully we'll have that merged [09:25:51] \o/ [10:37:27] change to move RU from stat1007 to an-launcher ready https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/574385/ [10:37:35] it was a little bit more difficult than expected [10:37:38] but should be good [10:37:55] I'll wait for Dan or Marcel just to be on the safe side [10:38:12] but so far it looks good [10:38:41] moving stat1006's jobs should be easy too [10:38:53] so hopefully by EOD all RU jobs will be moved [10:38:57] out of stat boxes [10:46:25] awesome elukey [11:19:12] 10Analytics, 10Product-Analytics: wmfdata cannot recover from a crashed Spark session - https://phabricator.wikimedia.org/T245713 (10JAllemandou) @nshahquinn-wmf sorry for not shimming in earlier. I will try to provide explanations and ideas and suggest ways to deal with the problem. Similarly to a python pro... [11:25:50] 10Analytics, 10Analytics-Wikistats, 10translatewiki.net, 10Patch-For-Review: Add stats.wikimedia.org to translatewiki.net - https://phabricator.wikimedia.org/T240621 (10abi_) I've submitted a patch to add the project to translatewiki.net. Following things need to be checked, 1. @fdans - Please let me know... [11:33:27] 10Analytics, 10Analytics-Wikistats, 10translatewiki.net, 10Patch-For-Review: Add stats.wikimedia.org to translatewiki.net - https://phabricator.wikimedia.org/T240621 (10MarcoAurelio) >>! In T240621#5911666, @abi_ wrote: > 2. Please ensure that l10n bot [[ https://www.mediawiki.org/wiki/Gerrit/L10n-bot | ha... [11:38:19] * elukey lunch! [13:12:44] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on analytics1044 - https://phabricator.wikimedia.org/T245910 (10jbond) p:05Triage→03Medium [14:16:04] ottomata: gooood morning [14:19:57] when you are caffeinated and ready, I'd need to talk with you about spark [14:20:20] elukey: we're in meeting, I'm sure he'll be there at some time ;) [14:21:27] ah okok! [14:34:07] elukey: be with you in 10-15 mins ya? [14:34:26] sorry internet issues [14:36:02] ottomata: even 1h, no rush :) [14:42:17] 10Analytics: Unable to access SWAP notebooks using LDAP - https://phabricator.wikimedia.org/T245997 (10Fsalutari) [14:47:08] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Kerberize Superset to allow Presto queries - https://phabricator.wikimedia.org/T239903 (10elukey) Things are moving, my PR to fix the Travis CI has been merged. Adding in here what it is needed in my opinion: === PyHive === - support for User impersonation -... [14:47:25] heya a-team, there is a question about pageviews and redirects on wikitech-l that I don't fullly understand [14:47:38] (i didn't know there was MW action API for pageviews? https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query&format=json&prop=pageviews&titles=MediaWiki&pvipmetric=pageviews&pvipdays=60&pvipcontinue=) [14:47:47] subject [14:47:47] [Wikitech-l] MediaWiki API pageview issue [14:47:52] if someone could check it out and answer! [14:48:08] I’ll read up [14:48:26] yeah, the mw api just calls aqs [14:48:34] ah [14:48:36] ty [14:54:36] oook elukey hiyaa [14:54:37] spark? [14:55:10] ottomata: sureeeee [14:56:03] so I think I have fixed the hadoop issue with openssl, but spark 2.4 doesn't work yet on hadoop test. While talking with Jo this morning he suggested to check hadoop libs, and in fact we do ship 2.6 jars with the spark2's deb IIUC [14:56:14] hmm [14:56:31] I am reading the README in the debian dir, and HADOOP_VERSION is a parameter [14:57:04] ah ya spark is packaged for hadoop versions [14:57:05] for example, from dpkg -S [14:57:06] /usr/lib/spark2/jars/hadoop-client-2.6.5.jar [14:57:07] etc.. [14:57:27] in theory it should work with 2.8.x, in practice I'd love to rule out the possibily that the libs are the culprit [14:57:30] to avoid insanity [14:57:42] (maybe too late for that but it is worth a try :D) [14:57:52] they also give a 'spark-2.4.5-bin-without-hadoop.tgz [14:58:05] which is probably the right thin to do for us [14:58:12] and just make sure the classpath is right [14:58:27] in fact I wanted to ask you what it is best at this point [14:58:59] I was thinking to just re-build spark2 as one off on boron with hadoop 2.8.5 deps [14:59:11] but if there are better solutions I am all ears [15:00:12] elukey: btw am reading about spark 3: [15:00:19] "Spark 3.0 handles the above challenges much better. In addition it adds support for different GPUs like Nvidia, AMD, Intel and can use multiple types at the same time" [15:00:19] heh [15:00:45] oh cool [15:00:47] and incorpoorates https://delta.io/ [15:00:55] milimetric: ^ looks kinda like hudi [15:00:58] butwill lbe built into spark 3 [15:01:30] anyway [15:01:37] elukey: there isn't a spark dist for hadoop 2.8 [15:01:38] I was just reading about a debezium / delta lake thing, hang on [15:01:44] just 2.6 and 2.7 [15:01:57] https://github.com/tikal-fuseday/delta-architecture/blob/master/README.md [15:02:15] cool [15:03:01] elukey: i bet it isn't too hard to package the hadoopless spark [15:03:03] and use it [15:05:27] ah no 2.8.x ? sigh [15:05:30] how is that possible? [15:05:36] is 2.6 supposed to work? [15:06:22] elukey: all those dists do is provide thee hadoop jars for that version [15:06:31] afaik spark isn't compiled or anything for a particular version? [15:06:35] hmm [15:06:38] i guess it uses the hdfs api... [15:06:41] and yarn api [15:06:46] sooo it must have some dep... [15:06:48] dunno? [15:07:10] i see that spark 3 has hadoop 2.7 and 3.2 dists [15:07:18] (spark 3 pre release) [15:07:43] https://archive.apache.org/dist/spark/spark-2.4.5/ [15:08:59] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10Nuria) >Regarding the base url en.wikipedia.org/beacon/event, do we Those beacon pings are not counted pageviews so for this ticket it does not matter [15:09:50] ottomata: maybe 2.7 could work with 2.8/2.10 ? [15:11:42] anyway, now that I am using openssl 1.1.0 the errors are different [15:11:54] I'll keep debugging and see if I can resolve in another way [15:12:24] maybe! [15:13:14] aways a joy [15:13:31] milimetric: goood morning! I have a change ready to move RU jobs from 1007 to an-launcher [15:13:52] oh that was fast [15:14:02] if we do it today could you help me triple checking that nothing explodes? [15:14:10] of course [15:14:15] thanks :) [15:14:52] just ping when it’s done and I’ll take a look at each job. It’s even my ops week [15:15:41] ah lovely [15:52:12] elukey: i'm trying to get the most recent pyhive stuff to wokr with presto in jupyter [15:52:21] having an issue with the ca cert for python requests [15:52:26] you've got this to work, right? [15:53:17] doing this atm [15:53:18] https://gist.github.com/ottomata/a0d2ba4ad104f4d87b3b00ff9292b840 [15:53:59] ottomata: I was checking the code today about the same thing (since I'll need it for Superset) and it needs a code change to work [15:54:00] am getting [15:54:00] * ottomata Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)", [15:54:05] oh ya? [15:54:24] ok so not ready yet eh? [15:56:03] mmm weird I was about to suggest to try a custom request object with verify, but the code already does it [15:56:14] why are you using pyhive out of curiosity? [15:56:28] is there anybody asking for it? [15:56:34] (just to understand how urgent this is) [15:56:43] welll, neil is having all those problems with spark in jupyter [15:56:48] because of spark session state stuff [15:56:53] and presto doesn't have a persistent session [15:56:54] ah okok [15:57:04] so was going to see if i could give him another option tot ry [15:57:06] one thing - what version of pyhive are you using? [15:57:13] i pulled youur latest commit [15:57:17] that got merged [15:57:18] so master [15:57:23] !pip install git+https://github.com/dropbox/PyHive.git@437eefa7bceda1fd27051e5146e66cb8e4bdfea1 [15:57:37] super [15:57:55] OH actually i ma one commit behind, but i think yorus is just tests [15:57:58] going to install that one thouugh [15:58:02] 9265b580963edd7303d4f21bf06af05dc3f8488b [15:58:11] ottomata: one thing - the Cursor class offers requests_session [15:58:24] basically you can instanciate your own request object [15:58:52] but requests_kwargs should work [15:58:53] mmmmm [15:59:19] OHHH naybve i got it [16:01:30] hmmm well no more error but now result is just None [16:01:31] hm [16:01:31] standuuup [16:01:36] OH oh [16:02:14] nuria: standup? [16:02:17] milimetric: [16:02:25] ottomata: yes, my clock again! [16:32:26] 10Analytics, 10Analytics-Kanban: Unable to access SWAP notebooks using LDAP - https://phabricator.wikimedia.org/T245997 (10Milimetric) p:05Triage→03High a:03Ottomata [16:34:19] 10Analytics, 10Pageviews-API: Pageviews API should allow specifying a country - https://phabricator.wikimedia.org/T245968 (10Milimetric) Can you explain why this is useful? As in, what is the use case. Keep in mind we don't report article views per country for privacy reasons. [16:36:31] 10Analytics, 10Product-Analytics: Geoeditors dataset should use country ISO codes instead of country names - https://phabricator.wikimedia.org/T245967 (10Milimetric) 05Open→03Resolved a:03Milimetric It will in the future, we're refactoring it and making it available via API so that will use the country c... [16:37:23] 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests, 10Operations: Unable to access SWAP notebooks using LDAP - https://phabricator.wikimedia.org/T245997 (10Ottomata) [16:38:02] 10Analytics, 10Product-Analytics: Give clear recommendations for Spark settings - https://phabricator.wikimedia.org/T245897 (10Milimetric) p:05Triage→03Medium [16:47:36] 10Analytics, 10Product-Analytics: Spark application UI shows data for different application - https://phabricator.wikimedia.org/T245892 (10Milimetric) Clearly a bug in some way - but we don't know how yet. Investigating. [16:48:39] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Services (watching): jsonschema-tools should add a 'latest' symlink - https://phabricator.wikimedia.org/T245859 (10Milimetric) p:05Triage→03High [16:49:28] 10Analytics, 10User-Elukey: Standard partman recipe for druid hosts - https://phabricator.wikimedia.org/T245810 (10Milimetric) p:05Triage→03High [16:50:36] 10Analytics, 10Epic, 10Product-Analytics (Kanban): Analysts cannot reliably use Sparker in Jupyter to run SQL queries against Hive databases - https://phabricator.wikimedia.org/T245891 (10Ottomata) [16:50:48] 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on analytics1044 - https://phabricator.wikimedia.org/T245910 (10Milimetric) a:03elukey [16:52:13] 10Analytics, 10Gerrit, 10Gerrit-Privilege-Requests, 10Patch-For-Review, 10User-MarcoAurelio: Give access to Wikistats 2 to l10n-bot - https://phabricator.wikimedia.org/T245805 (10Milimetric) Is this done or are there additional steps? [16:56:10] 10Analytics, 10Analytics-Kanban, 10ArticlePlaceholder, 10Wikidata, and 4 others: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Milimetric) Added @Nuria as a reviewer where she was pinged, she'll comment there, waiting on WMDE on the other patch (rev... [16:56:29] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Patch-For-Review, 10Services (watching): Switch all eventgate clients to use new TLS port - https://phabricator.wikimedia.org/T242224 (10Milimetric) [17:19:56] Any preference...should I try and figure out how to allow analytics-search to read hdfs://analytics-hadoop/user/hdfs/analytics-research-client.txt, or deploy /etc/mysql/conf.d/analytics-research-client.cnf to an-airflow1001? [17:21:17] or i guess it could separately be deployed to hdfs in /user/analytics-search/ [17:33:54] 10Analytics, 10Analytics-Kanban, 10ArticlePlaceholder, 10Wikidata, and 4 others: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Ladsgroup) >>! In T236895#5912660, @Milimetric wrote: > Added @Nuria as a reviewer where she was pinged, she'll comment th... [17:36:08] 10Analytics, 10Analytics-Kanban, 10ArticlePlaceholder, 10Wikidata, and 4 others: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Milimetric) my apologies, was running through triage, we'll take a look and ping here [17:37:59] 10Analytics, 10Event-Platform, 10Wikimedia-Extension-setup, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Wikimedia-extension-review-queue: Deploy EventStreamConfig extension - https://phabricator.wikimedia.org/T242122 (10Jdforrester-WMF) a:05Jdforrester-WMF→03Ottomata Ottomata is doin... [17:46:39] Hallo [17:46:51] I haven't done it in a few weeks, so maybe I missed something [17:47:04] To get EventLogging data from hive, am I suppoed to log in to stat1007? [17:47:13] ssh stat1007 doesn't work for me [17:47:57] aharoni: nothing has changed in the last few weeks [17:47:59] ssh stat1007.eqiad.wmnet [17:48:00] ? [17:50:00] ottomata : thanks, this works. It used to work with just `ssh stat1007`. Maybe it has something to do with updating my laptop to macOS Catalina. [17:50:32] hm, just ssh stat1007 woudln't work unless you had some specific ssh/config to make it work :) [17:50:45] or maybe some somethiing special in your /etc/hosts file [17:50:57] but, glad it works [17:54:43] elukey: btw, do I need separate approval for https://phabricator.wikimedia.org/T245997 or can I just do it? [17:55:05] 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests, 10Operations: Add Fsalutari to nda LDAP group - https://phabricator.wikimedia.org/T245997 (10Ottomata) [17:56:10] ottomata: if the username is 'fsalutari', it is not in the nda LDAP group [17:56:15] yes i know [17:56:19] can I just add? [17:56:40] ah ok sorry without context I didn't get the request :) [17:57:07] for nda I'd follow up with the ldap request phab tag [17:57:18] did :) [17:57:18] that is already there [17:57:19] ok [17:57:24] who checks those? [17:57:31] whoever is in the group [17:57:37] the group? [17:57:54] https://phabricator.wikimedia.org/tag/ldap-access-requests/ => members [17:58:45] Flavia has a valid NDA so I'd say that we could triple check with moritzm [17:58:48] and then add [17:59:17] milimetric: do you have time now to move RU? [18:00:12] or mforns [18:00:30] ok thanks elukey [18:00:33] elukey, yes [18:00:53] elukey, do you want me to do it, and you can leave? [18:01:04] mforns: thankssss - so I stopped the timers on stat1007, and copied /srv/reportupdater to an-launcher1001 [18:01:26] 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests, 10Operations: Add Fsalutari to nda LDAP group - https://phabricator.wikimedia.org/T245997 (10Ottomata) @Muehlenhoff just double checking: Fsalutari has an NDA, can I just add to `nda` LDAP group? [18:01:27] I have a puppet change to merge to absent RU jobs on stat1007 and create them on an-launcher1001 [18:01:30] is it enough? [18:01:40] 10Analytics, 10Fundraising-Backlog, 10fundraising-tech-ops: Install superset on front end server for analytics - https://phabricator.wikimedia.org/T245755 (10Nuria) [18:01:47] elukey, I believe so [18:02:01] 10Analytics, 10Fundraising-Backlog, 10fundraising-tech-ops: Install superset on front end server for analytics - https://phabricator.wikimedia.org/T245755 (10Nuria) Putting on radar for #analytics @Jgreen can ping us if he needs help. [18:02:31] mforns: ack then I'll proceed, can I ask you to triple check that nothing is exploding afterwards? [18:02:36] should be one in 10/15 mins max [18:02:39] elukey, so the only thing to be done is to merge the thing right? because you already copied over the reeports, right? [18:02:46] correct [18:02:57] ok, sure, sounds good [18:03:03] if "the reports" == /srv/reportupdater [18:03:18] /srv/reportupdater/output [18:03:21] yes [18:03:50] is the rsync configured to copy report files to public endpoint? [18:03:53] elukey, ^ [18:04:06] I imagine yes, as part of puppet role [18:04:15] 10Analytics, 10Fundraising-Backlog, 10fundraising-tech-ops: Install superset on front end server for analytics - https://phabricator.wikimedia.org/T245755 (10Milimetric) Excited to collaborate on this. My first thought is to caution against seeing superset as a solution. In our data pipelines, Superset com... [18:08:37] mforns: ah good point, not sure [18:09:51] in theory yes [18:17:03] ottomata: btw, did you manage to make pyhive to work? [18:17:22] there is also the presto client https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto#Usage_on_analytics_cluster [18:18:03] OH prestodb...... [18:18:06] i did not try that [18:18:21] elukey: no, i got a 401 unauthroized [18:18:23] not sure why [18:18:35] will try prestodb [18:19:09] elukey: ....if you have time, would you be able to occasionally help screen a few of these SRE resumes? [18:19:18] Applications [18:19:18] 12 of 114 [18:19:19] Est. time left [18:19:19] 681 mins [18:19:35] i'm going to try to do at least 20 a day this week [18:19:43] you could just do a few here and there [18:19:45] :D [18:20:05] ottomata: sure I can :) [18:20:13] ok, leemm ask recruiting to give you perms [18:20:34] !log move report updater jobs from stat1007 to an-launcher1001 [18:20:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:21:31] elukey, checking [18:21:42] mforns: still running [18:21:49] gimme 5 mins :) [18:22:16] ottomata: 5 euros/resume to help btw [18:22:24] friendly price [18:22:25] :D [18:22:44] jokes aside, are all for us? [18:23:17] ok [18:24:00] heheh elukey us and search team right? [18:24:27] sure sure [18:24:43] ottomata: can I bother you a second about spark? [18:24:49] on IRC I mean [18:25:20] elukey: yues yes [18:25:33] thanks :) [18:26:43] I am currently wondering if the hadoop libs that I have patched for bigtop are picked up by spark [18:27:01] hmmmm which ones? [18:27:12] because what I am seeing is failure to use openssl, same problems that I fought when I enabled by default RPC encryption [18:27:16] like maybe the ones shipped with spark are taking precedence? [18:27:33] because in 2.6.x libs there is no support for openssl 1.1.0 [18:27:41] meanwhile now there is, but for bigtop libs [18:28:12] we have the native libs stuff in /usr/lib/hadoop/lib/native [18:28:21] that should contain libhadoop.so [18:28:56] so in theory they should work mmmm [18:29:40] -cp /usr/lib/spark2/conf/:/usr/lib/spark2/jars/*:/etc/hadoop/conf/ -Dscala.usejavacp=true [18:29:48] mforns: all ready to check! [18:29:53] /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/lib/spark2/conf/:/usr/lib/spark2/jars/*:/etc/hadoop/conf/ -Dscala.usejavacp=true -Xmx1g org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name Spark shell spark-shell [18:29:53] elukey, ok! [18:30:03] is full command spark2-shell lauches on analytics1030 [18:30:14] so, elukey you coulld try that command but editing the -cp bit [18:31:04] hm, but the spark jars are in /usr/lib/spark2/jars/ [18:31:21] elukey: you can just dl the tarball release without the hadooop jars included [18:31:31] and try to run spark directly from there with cp pointing at /usr/lib/hadoop stuff [18:31:39] okok will do [18:32:12] you might have to do things like set HADOOP_HOME or JAVA_HOME , not sure [18:32:14] but it works the same [18:32:27] our deb is essentially the tarball with some wrappers [18:32:40] https://archive.apache.org/dist/spark/spark-2.4.4/ [18:32:58] i guess this one https://archive.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-without-hadoop.tgz [18:38:40] so in the container's logs I can see [18:38:41] 20/02/24 18:34:43 DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path [18:38:44] 20/02/24 18:34:43 DEBUG NativeCodeLoader: java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib [18:38:56] but the driver seems to load them correctly judging from the logs [18:39:50] hmm ok java.library.path is different than class path, hm, and idon't think spark is bundled with native stuff, just jars [18:39:52] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 5 others: Enable client side error logging in prod for small wiki - https://phabricator.wikimedia.org/T246030 (10Nuria) [18:40:03] elukey, RU hasn't run yet from an-launcher [18:40:15] will wait a bit and recheck. [18:40:18] elukey: is hadoop native stuff needed for ssl? [18:40:33] I found that pingback reports are stuck since 2019-12-08 [18:40:50] mforns: do you want me to force the execution of one? [18:41:04] elukey, nah... it should run in 20 mins [18:41:07] thanks [18:41:12] ottomata: I am not entirely sure, but might be, will investigate [18:41:34] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 5 others: Enable client side error logging in prod for small wiki - https://phabricator.wikimedia.org/T246030 (10Nuria) [18:42:00] and we also use spark.executorEnv.LD_LIBRARY_PATH /usr/lib/hadoop/lib/native [18:42:03] in spark defaults [18:42:40] mmm but the default file is not picked up by Yarn probably [18:42:44] ah interesting [18:42:55] maybe I need to add it to the yarn-site.xml [18:43:33] hmm [18:43:42] mforns: forced reportupdater-browser.service [18:43:49] completed right away [18:43:55] aha [18:43:58] checking [18:44:36] elukey: i think that is in yarn-site [18:44:37] elukey, logs look good! but we'll have to wait until tomorrow for there are no hourly jobs [18:45:00] will check tomorrow morning [18:45:33] mforns: super :) [18:45:45] elukey: u what happens if you add -D java.library.path=...:/usr/lilb/hadoop/lib/native [18:45:45] ? [18:46:13] ottomata: on yarn-site we have [18:46:13] yarn.app.mapreduce.am.env [18:46:13] LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native [18:46:19] but not the spark one [18:46:24] interesting [18:46:39] ottomata: can I pass the -D etc.. directly to the spark2-shell? [18:47:08] elukey: i think you need [18:47:21] --driver-java-options [18:47:27] --driver-java-options='-D...' [18:47:35] oh [18:47:36] elukey: [18:47:37] there is [18:47:38] --driver-library-path Extra library path entries to pass to the driver. [18:47:38] ? [18:48:01] but I'd need something for the executors not the driver no? [18:48:35] and also [18:48:36] spark.executor.extraLibraryPath [18:48:41] not a CLI for that [18:48:41] but [18:48:55] --conf spark.executor.extraLibraryPath=... [18:49:34] * ottomata https://spark.apache.org/docs/2.4.4/configuration.html#runtime-environment [18:54:06] the --^ seems working [18:55:16] in the sense that native libs are picked up but spark still fails [18:55:36] as in, you don't the debug warnings? [18:55:39] about native loader? [18:55:51] correct [18:55:58] aye ok so makes sense [18:56:02] the hadoop native is not needed for SSL [18:56:05] so there is another problem :) [18:59:41] ldd doesn't show any libcrypto or simialar for the native libs [18:59:53] anyway, the option might be good to be added to spark-defaults [19:00:04] looks a reasonable one [19:00:11] (for performance) [19:00:13] ya would be good just to make hadoop native work for spark [19:00:14] ya [19:00:29] ack then, I'll file a patch tomorrow [19:00:58] another thing about RU ottomata - now the current code allows stat boxes to rsync /srv and /home from an-launcher1001 [19:01:10] do you think that we should change it? [19:01:19] probably yes [19:05:44] anyway, will check later the RU jobs, for the moment it looks good :) [19:05:55] ttl! thanks a lot for the brainbounce :) [19:05:57] * elukey dinner [19:11:07] 10Analytics, 10Operations, 10Research, 10Traffic, 10WMF-Legal: Enable layered data-access and sharing for a new form of collaboration - https://phabricator.wikimedia.org/T245833 (10Miriam) [19:11:25] 10Analytics, 10Operations, 10Research, 10Traffic, 10WMF-Legal: Enable layered data-access and sharing for a new form of collaboration - https://phabricator.wikimedia.org/T245833 (10Miriam) [19:16:26] 10Analytics, 10Fundraising-Backlog, 10fundraising-tech-ops: Install superset on front end server for analytics - https://phabricator.wikimedia.org/T245755 (10EYener) @Milimetric Likewise excited for collaboration! I agree that visualization is the final piece of this puzzle. In parallel to discussing a front... [19:21:19] elukey hmmm, i don't mind if that rsync is allowed, but ya we probably don't need it [19:21:59] 10Analytics, 10Epic, 10Product-Analytics (Kanban): Analysts cannot reliably use Sparker in Jupyter to run SQL queries against Hive databases - https://phabricator.wikimedia.org/T245891 (10nshahquinn-wmf) @Ottomata, is there a reason you think this is specific to Jupyter? Based on what I've seen, it probably... [19:22:44] I'm looking to deploy access to mw sql replicas (or really, the credentials and information to figure out what lives where) to an-airflow1001. Currently this is split between opeations/mediawiki-config deployed in profile::analytics::cluster::repositories::statistics, along with mysql credentials deployed in profile::statistics::private. Adding the repositories one to an-airflow makes [19:22:50] sense, but the private stats profile not so much. I can simply copy the right bit out of there into airflow profile, but it doesn't seem quite right. I was thinking make some new profile for holding these things and name it something related to accessing mw db replicas? [19:23:29] basically, does that seem reasonable? Or i can drop an appropriate statistics::mysql_credentials { ... } into airflow profile [19:23:41] 10Analytics, 10Epic, 10Product-Analytics (Kanban): Analysts cannot reliably use Sparker in Jupyter to run SQL queries against Hive databases - https://phabricator.wikimedia.org/T245891 (10Ottomata) I suspect {T245892} and {T245713} are related to Jupyter, you are right though, the others are more general. [19:24:48] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10nshahquinn-wmf) >>! In T244547#5912212, @Nuria wrote: >>Regarding the base url en.wikipedia.org/beacon/event, do we > > Those beacon pings are not counted pageviews so for t... [19:24:48] hmm, ebernhardson luca is working on standarizing and unifying the stat box puppet stuff a lot right now [19:24:51] so this might be more clear soon [19:25:10] for now i think dropping a statistics::mysql_credentials makes sense [19:25:16] into airflow profile [19:25:19] ok, will go the easy route :) [19:52:00] 10Analytics, 10Fundraising-Backlog, 10fundraising-tech-ops: Install superset on front end server for analytics - https://phabricator.wikimedia.org/T245755 (10Milimetric) For OLAP-style dimensional data we like Druid as a data store. So the flow there for us is: * Kafka -> Camus (bucket hourly) -> HDFS * Me... [20:05:13] 10Analytics, 10Gerrit, 10Gerrit-Privilege-Requests, 10Patch-For-Review, 10User-MarcoAurelio: Give access to Wikistats 2 to l10n-bot - https://phabricator.wikimedia.org/T245805 (10MarcoAurelio) >>! In T245805#5912642, @Milimetric wrote: > Is this done or are there additional steps? I'd like https://gerri... [20:07:52] 10Analytics, 10Product-Analytics: Presto: missing partitions causes queries to fail - https://phabricator.wikimedia.org/T246034 (10nettrom_WMF) [20:37:37] 10Analytics, 10Product-Analytics: Spark application UI shows data for different application - https://phabricator.wikimedia.org/T245892 (10JAllemandou) I have tried to replicate but couldn't: I launched 2 notebooks using Spark from wmfdata and got 2 different spark UIs. Keeping the task open for now in case it... [20:46:08] 10Analytics, 10Product-Analytics: Give clear recommendations for Spark settings - https://phabricator.wikimedia.org/T245897 (10JAllemandou) There is some misunderstanding here between recommendations and examples IMO. the links pasted in the task definition show examples, not recommendations, and I don't think... [20:46:43] 10Analytics, 10Product-Analytics: wmfdata cannot recover from a crashed Spark session - https://phabricator.wikimedia.org/T245713 (10LGoto) p:05Triage→03High a:03kzimmerman [20:49:51] 10Analytics, 10Product-Analytics: Presto: missing partitions causes queries to fail - https://phabricator.wikimedia.org/T246034 (10JAllemandou) Interesting ! There are partitions in hive for that table since `2019-05`, but data folders are only present `2019-11-05` onward. @mforns : Could this be a bug on data... [20:52:11] <3 joal thank you for your answers on those spark tickets [20:53:22] 10Analytics, 10Product-Analytics: Presto: missing partitions causes queries to fail - https://phabricator.wikimedia.org/T246034 (10mforns) Hmmmm... yes it could be. Not sure if this is due to manual deletion or to the deletion jobs that were set up. I think it's more likely to be because of manual deletions. W... [20:53:31] thanks ottomata :) Please chime-in for precisions and all ) [20:53:38] 10Analytics, 10Epic, 10Product-Analytics (Kanban): Analysts cannot reliably use Sparker in Jupyter to run SQL queries against Hive databases - https://phabricator.wikimedia.org/T245891 (10LGoto) a:05nshahquinn-wmf→03kzimmerman [20:53:56] 10Analytics, 10Epic, 10Product-Analytics (Kanban): Spark applications crash when running large queries - https://phabricator.wikimedia.org/T245896 (10LGoto) p:05Triage→03High [21:09:33] 10Analytics, 10DC-Ops, 10Operations, 10ops-eqiad: (Need by: TBD) rack/setup/install an-druid1001 and druid1007 - https://phabricator.wikimedia.org/T245569 (10RobH) [21:09:45] ottomata, I pushed latest changes: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventLogging/+/573677/ [21:10:34] 10Analytics, 10Operations, 10ops-eqiad: (Need by: TBD) rack/setup/install kafka-jumbo100[789].eqiad.wmnet - https://phabricator.wikimedia.org/T244506 (10RobH) [21:10:42] looking [21:12:50] ottomata: required followup, basically the credentials are gated on defined(Group['analytics-privatedata-users']) which was false: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/574565 [21:13:07] HMMMMM [21:13:09] right [21:13:15] oh no ssh cool. [21:13:16] hm [21:13:32] i think that will work? Running it into PCC [21:15:59] ebernhardson: maybe it'd be better to just change the group ownership of the mysql creds file to analytlics-search [21:15:59] ? [21:16:36] ottomata: statistics::mysql_credentials won't even deploy the file currently. Would have do define new credentials (with same user/pass) i imagine? [21:17:54] hm, shoudlnt' be needed, the creds are the same and defiend in puppet private [21:18:05] this just renders a new mysql conf cred file with the u/p [21:18:11] only readable by the group [21:18:18] oh, i was totally not reading this code...yea it doesn't use the group for anything except setting owner [21:18:25] yes, it would be way easier to change ownership [21:18:33] try just group => analytics-search [21:21:17] ottomata: updated, this makes much more sense [21:21:51] doh, renaming still.. [21:46:16] 10Analytics: Problem with Matomo page overlay - https://phabricator.wikimedia.org/T246046 (10Varnent) [21:46:50] nuria: i +1 mforns patch and added you and timo as reviewers [22:25:11] 10Analytics: Problem with Matomo page overlay - https://phabricator.wikimedia.org/T246046 (10Nuria) Sorry, i cannot repro, maybe you want to talk to your team mates and see if they face similar problem?