[00:10:33] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Privacy Engineering, and 4 others: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10Ottomata) Ok, next week I'll work on merging https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/635304, ma...
[05:53:32] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to Production Shell Access (analytics-privatedata-users) for Rmaung - https://phabricator.wikimedia.org/T266250 (10Marostegui)
[05:53:46] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to Production Shell Access (analytics-privatedata-users) for Rmaung - https://phabricator.wikimedia.org/T266250 (10Marostegui) @KFrancis can you confirm if @Rmaung has a valid NDA signed? I cannot see it on the NDA tracking sheet.
[05:54:50] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Marostegui)
[05:55:06] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Marostegui)
[05:59:26] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to Production Shell Access (analytics-privatedata-users) for Rmaung - https://phabricator.wikimedia.org/T266250 (10Marostegui) Confirmed that @rmaung is staff by checking via ldap-corp. @Rmaung we'd also need your manager to sign off this re...
[05:59:38] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to Production Shell Access (analytics-privatedata-users) for Rmaung - https://phabricator.wikimedia.org/T266250 (10Marostegui)
[06:00:51] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Marostegui)
[06:01:02] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Marostegui) Confirmed janstee@wikimedia.org via ldap corp as staff. @JAnstee_WMF we'd need your manager to sign this off. Thanks!
[06:11:07] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh 16 nodes in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T255140 (10elukey) All old nodes with removed from Hadoop!
[06:11:51] <wikibugs>	 10Analytics, 10Analytics-Kanban: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10elukey) 05Open→03Resolved
[06:11:53] <wikibugs>	 10Analytics: Analytics Hardware for Fiscal Year 2019/2020 - https://phabricator.wikimedia.org/T244211 (10elukey)
[06:25:50] <elukey>	 good morning
[06:26:08] <elukey>	 I am running a test on an-presto1001, namely setting the Xmx of the jvm to 60G
[06:26:15] <elukey>	 (the others run at 110G)
[06:38:01] <wikibugs>	 10Analytics: Check home/HDFS leftovers of jkumarah - https://phabricator.wikimedia.org/T263715 (10elukey) 05Open→03Resolved a:03elukey homes deleted.
[06:38:25] <GoranSM>	 good morning
[06:38:34] <elukey>	 morning!
[06:39:29] <GoranSM>	 Could it be that /srv/published/datasets us not updating https://analytics.wikimedia.org/published/datasets/ from stat1005? I have updated some files hours ago and I still can't see not change via https?
[06:41:29] <wikibugs>	 10Analytics: Check home/HDFS leftovers of shiladsen - https://phabricator.wikimedia.org/T264269 (10elukey) Deleted all the home dirs on stat100x, only hdfs files are left :)
[06:42:36] <elukey>	 GoranSM: what is the link of the dir not updating? 
[06:43:42] <wikibugs>	 10Analytics: Increase in usage of /var/lib/mysql on an-coord1001 after Sept 21st - https://phabricator.wikimedia.org/T264081 (10elukey) 05Open→03Resolved a:03elukey It seems way more stable now, closing for the moment :)
[06:45:25] <wikibugs>	 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10elukey)
[06:46:33] <wikibugs>	 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10elukey)
[06:46:54] <wikibugs>	 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10elukey)
[06:48:01] <elukey>	 https://issues.apache.org/jira/browse/BIGTOP-3434 - Hadoop-3.3.0 deb packaging support
[06:55:54] <GoranSM>	 elukey: Hm, I had the same directories under /srv/published/datasets on stat1005 and stat1007, could that be the origin of the problem? I have just removed the directories from stat1007.
[06:56:10] <GoranSM>	 elukey: As of your question: https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/wdcm/etl, https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/wdcm/ml, https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/wdcm/geo
[06:58:22] <elukey>	 GoranSM: having dirs on multiple nodes may cause issues, does it work now?
[07:00:22] <wikibugs>	 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T264268 (10elukey) All stat100x home dirs purged, only hdfs/hive left!
[07:03:25] <wikibugs>	 10Analytics: Check home/HDFS leftovers of rush - https://phabricator.wikimedia.org/T265121 (10elukey) Sent an email to John to get a final confirmation.
[07:05:18] <wikibugs>	 10Analytics: Check home/HDFS leftovers of joewalsh - https://phabricator.wikimedia.org/T265447 (10elukey) 05Open→03Resolved a:03elukey All stat100x home dirs removed!
[07:06:37] * elukey bbiab
[08:32:53] <wikibugs>	 (03PS6) 10Elukey: Add oozie webrequest test bundle [analytics/refinery] - 10https://gerrit.wikimedia.org/r/491791 (https://phabricator.wikimedia.org/T212259)
[08:43:33] <GoranSM>	 elukey: It works. Mea culpa: switched some ML operations to stat1005 and forgot to remove /srv/published/datasets related things from stat1007. Thx. 
[08:45:46] <elukey>	 np! glad that it is fixed :)
[09:00:54] <elukey>	 ebernhardson: re integration environment - can you give us a little bit more details? :)
[09:49:05] <elukey>	 ok the analytics-test-hive.eqiad.wmnet trick seems to work in hadoop test
[09:49:42] <elukey>	 the main downside though is that it will require a restart/update of all client when we change the settings on the hive server/metastore (since they cannot run two service principals)
[09:52:16] <elukey>	 --
[09:52:33] <elukey>	 while  trying to run refine on hadoop test (with bigtop) I got
[09:52:34] <elukey>	 org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NoSuchMethodError: com.maxmind.geoip2.DatabaseReader
[09:52:40] <elukey>	 (webrequest_load's refine)
[09:53:43] <elukey>	 refinery_jar_version              = 0.0.137
[09:55:01] <elukey>	 mforns: o/ is it related to the work that you are doing by any chance?
[10:02:19] <elukey>	 mmm no last change seems to be https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/588715
[10:04:03] <elukey>	 I think it is an issue with hive 2.x
[10:23:28] <wikibugs>	 10Analytics: Possible between Maxmind and Hive 2.x libs in Refinery source - https://phabricator.wikimedia.org/T266322 (10elukey)
[10:23:41] <wikibugs>	 10Analytics: Possible issue between Maxmind and Hive 2.x libs in Refinery source  - https://phabricator.wikimedia.org/T266322 (10elukey)
[10:26:55] <wikibugs>	 10Analytics: Check home/HDFS leftovers of shiladsen - https://phabricator.wikimedia.org/T264269 (10mforns) I deleted HDFS and HIVE files. Resolving!
[10:27:19] <wikibugs>	 10Analytics: Check home/HDFS leftovers of shiladsen - https://phabricator.wikimedia.org/T264269 (10mforns) 05Open→03Resolved a:03mforns
[10:45:25] <mforns>	 hey elukey 
[10:45:30] <mforns>	 just joined
[10:46:19] <mforns>	 elukey: I believe those might be related to changes to remove unused fields derived from maxmind
[10:46:28] <mforns>	 I deployed those on tuesday
[10:48:54] <elukey>	 ah!
[10:49:13] <elukey>	 I am going afk for lunch + errand, let's check later on if you have time
[10:49:18] <elukey>	 nothing super urgent :)
[10:50:01] <mforns>	 yes ok
[11:06:41] <wikibugs>	 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T264268 (10mforns) 05Open→03Resolved a:03mforns Deleted both HDFS and HIVE directories, plus the corresponding database in HIVE. Marking this as resolved!
[11:07:00] <wikibugs>	 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T256356 (10mforns)
[13:03:26] <elukey>	 mforns: o/
[13:03:46] <elukey>	 was the last refinery that you deployed 0.0.137?
[13:04:13] <elukey>	 if so I can try 0.0.136, but I am afraid that the issue is more subtle (like hive 2.x dependent)
[13:04:27] <elukey>	 it is my bad that I kept running oozie with an old version of refinery, and never seen issues
[13:11:13] <wikibugs>	 10Analytics: Check home/HDFS leftovers of rush - https://phabricator.wikimedia.org/T265121 (10JBennett) Nothing we need to keep, good to cleanup, thanks!
[13:17:29] <wikibugs>	 10Analytics: Check home/HDFS leftovers of rush - https://phabricator.wikimedia.org/T265121 (10elukey) 05Open→03Resolved a:03elukey All stat100x homes cleaned up,  HDFS home also cleaned up!
[13:23:09] <elukey>	 mforns: checked with 0.136, same error
[13:45:16] <wikibugs>	 10Analytics-Clusters, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10elukey) Summary of actions done: * created a dns CNAME analytics-test-hive.eqiad.wmnet -> an-test-coord1001.eqiad.wmnet * created the kerberos principal `hive/analytics-test-h...
[14:23:29] <wikibugs>	 10Analytics: Check home/HDFS leftovers of leila - https://phabricator.wikimedia.org/T264994 (10elukey) stat100x homes done (content moved under `/home/leizi`)  For HDFS:  ` ======= HDFS ======== Found 6 items drwx------   - leila leila              0 2018-06-27 00:37 /user/leila/.staging drwxr-xr-x   - leila bma...
[14:24:00] <mforns>	 elukey: hi, sorry was having lunch
[14:24:12] <elukey>	 mforns: how dare you marcel to eat? 
[14:24:16] <elukey>	 :D
[14:24:18] <mforns>	 xD
[14:24:27] <elukey>	 please don't say sorry :)
[14:24:31] <mforns>	 hehehehe
[14:25:00] <mforns>	 I haven't understood the error, is there a task or log I can look at?
[14:25:05] <mforns>	 or alarm?
[14:25:05] <elukey>	 yep!
[14:25:08] <elukey>	 I opened one
[14:25:16] <elukey>	 https://phabricator.wikimedia.org/T266322
[14:25:21] <elukey>	 it is on the Test cluster
[14:25:23] <elukey>	 with hive 2.x
[14:25:37] <elukey>	 so no alarm, it is just me trying to run webrequest-load in there
[14:25:50] <elukey>	 it used to work, but with a older version of refinery, stuff might have changed
[14:26:02] <elukey>	 but if it doesn't work in test we cannot really migrate to Bigtop :(
[14:26:02] <mforns>	 aha
[14:28:00] <elukey>	 it is more ops week so don't spend time on it, I pinged you if you had any idea since I recalled maxmind changes during the last deployment
[14:28:11] <elukey>	 but it seems a more complicated issue
[14:30:36] <mforns>	 elukey: I worked with maxmind, but didn't change it...
[14:31:03] <mforns>	 but... IIUC maxmind is not shipped with BigTop no? It is imported externally, right?
[14:34:53] <mforns>	 elukey: I think there's only one place in refinery-source where we use a DatabaseReader constructor: refinery-core...maxmind/AbastractDatabaseReader.java
[14:35:07] <elukey>	 yep yep
[14:35:32] <elukey>	 I added a stackoverflow link, they seem to have had the same issue
[14:35:38] <mforns>	 ah
[14:35:46] <elukey>	 but they solved it changing the dependencies 
[14:37:18] <mforns>	 elukey: we can check in mvn tree that maxmind version is the same in prod and in test
[14:38:52] <elukey>	 we can yes, what changes (I think) is hive 1.x vs hive 2.x libs
[14:43:30] <mforns>	 in prod: [INFO] +- com.maxmind.geoip2:geoip2:jar:2.1.0:compile
[14:44:07] <milimetric>	 mforns: does DataFrameToHive automatically write _SUCCESS flags here somehow? https://github.com/wikimedia/analytics-refinery-source/blob/fed14e6dbad3eb5a65069d80182c6070e203dbe6/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/Refine.scala#L521
[14:48:42] <milimetric>	 oh my bad that's in our codebase, looking
[14:49:53] <mforns>	 milimetric: don't remember...
[14:50:38] <milimetric>	 it's very weird!  The _REFINED and _SUCCESS flags are written at the same time, so they must be written around there somewhere, but I search for _SUCCESS and it doesn't show up anywhere
[14:50:56] <milimetric>	 (this is for refined event streams like mediawiki_page_move)
[14:56:06] <milimetric>	 AHA!!!
[14:56:08] <milimetric>	 Spark does it
[14:56:30] <milimetric>	 out of outputDf.df.write.parquet(...)
[14:56:40] <milimetric>	 oof, I'm gonna add a comment
[14:57:14] <milimetric>	 ok, ottomata, I figured it out, Spark writes _SUCCESS at the same time as the DataFrameToHive writes _REFINED by calling the callback from Refine.scala
[14:58:09] <milimetric>	 the dataset definition uses _SUCCESS, so when data is coming into codfw, it can't use the canaries from eqiad, because those don't represent the availability of the data in codfw
[15:01:26] <mforns>	 interesting
[15:04:52] <wikibugs>	 (03PS1) 10Milimetric: Make explicit that _SUCCESS flag is written [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/636051
[15:05:17] <wikibugs>	 (03CR) 10Milimetric: [C: 03+2] Make explicit that _SUCCESS flag is written [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/636051 (owner: 10Milimetric)
[15:11:01] <wikibugs>	 (03Merged) 10jenkins-bot: Make explicit that _SUCCESS flag is written [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/636051 (owner: 10Milimetric)
[15:37:53] <wikibugs>	 10Analytics, 10Analytics-Wikistats: pagecounts-ez uploads stopped after 9/24 - https://phabricator.wikimedia.org/T265378 (10Danilo) I didn't find the total per month in those files, it will not be provided anymore? I have some tools that use the total pegecounts per month, that is the only data I need from the...
[15:40:16] <fdans>	 hi team, looking into the wikipediapreview alerts
[15:56:33] <wikibugs>	 10Analytics-Radar, 10Release-Engineering-Team, 10observability, 10serviceops, and 2 others: Create a separate 'mwdebug' cluster - https://phabricator.wikimedia.org/T262202 (10jijiki)
[15:57:49] <wikibugs>	 10Analytics-Radar, 10Release-Engineering-Team, 10observability, 10serviceops, and 2 others: Create a separate 'mwdebug' cluster - https://phabricator.wikimedia.org/T262202 (10jijiki)
[16:06:07] <nuria>	 mforns: yt? 
[16:06:16] <mforns>	 hey nuria yes
[16:06:28] <ebernhardson>	 elukey: in terms of an integration environment, i'm trying to setup a string of docker containers in docker-compose that runs airflow and all the things it talks to, such that i can trigger a task in the environment and see the results in elasticsearch at the end
[16:06:31] <nuria>	 mforns: ahem, question about the data_quality_stats job
[16:06:39] <mforns>	 yep
[16:06:54] <nuria>	 jobs are running ok  but teh update steps
[16:07:09] <nuria>	 that move data to my local data_quality_stats table
[16:07:16] <nuria>	 are succeeding 
[16:07:20] <elukey>	 ebernhardson: ah okok that is more clear :) 
[16:07:25] <nuria>	 but no data is present 
[16:07:59] <nuria>	 mforns: so hdfs://analytics-hadoop/user/nuria/data/data_quality_stats is empty (no partitions)
[16:08:04] <elukey>	 ebernhardson: we can work on specific issues if you want, but I am afraid that all our configs are in puppet (there are guidelines but it is fairly complicated to make all the pieces working together)
[16:08:37] <mforns>	 nuria: can you paste the command that you're using?
[16:08:42] <nuria>	 mforns: is there anything i am forgetting 
[16:08:45] <nuria>	 https://www.irccloud.com/pastebin/slk5vC8u/
[16:11:28] <ebernhardson>	 elukey: for the moment i'm leaning on a cloudera quickstart image for hadoop (but its only cdh5.9, uses java 7. Basically i can't submit anything to clutser, can only access hive/hdfs via apis). 
[16:11:42] <ebernhardson>	 i might get around to trying to replace that with 5.14 on debian ... but not today :)
[16:11:58] <elukey>	 ebernhardson: keep in mind that we are moving to apache bigtop, so don't invest too much on cloudera
[16:12:16] <ebernhardson>	 ok, then it's certainly not worth doing anything beyond the quickstart image they are providing
[16:12:22] <elukey>	 (they offer a lot of docker images to use)
[16:13:00] <ebernhardson>	 is the timeline next fiscal? Or still in early planning?
[16:13:25] <ebernhardson>	 (or maybe much closer than i expect :)
[16:13:52] <nuria>	 mforns: will keep onlooking i think data is being moved to a diff location , i am executing this as 'nuria'  so no prod data is overriden
[16:15:49] <elukey>	 ebernhardson: should be this quarter or the next :)
[16:15:59] <ebernhardson>	 elukey: awesome!
[16:16:10] <elukey>	 we are going to upgrade hdfs to 2.8.5, hive to 2.3.3, etc..
[16:25:50] <nuria>	 mforns: do not look at this deeply ,  i can bypass this issue
[16:25:59] <nuria>	 mforns: really
[16:26:07] <mforns>	 just looking if I find something eviden
[16:26:10] <mforns>	 evident
[16:27:14] <nuria>	 mforns: do not worry, i will just do away with updater step
[16:27:24] <mforns>	 nuria is there source data for 2020-05?
[16:27:52] <mforns>	 yea yea, pageview_hourly right, of course...
[16:27:57] <nuria>	 mforns: right
[16:28:03] <nuria>	 mforns: really, do not worry
[16:28:09] <nuria>	 mforns: will shortcut
[16:38:42] <nuria>	 mforns: got it
[16:38:51] <mforns>	 nuria: ho, what was it?
[16:38:52] <nuria>	 mforns: * i think*
[16:38:56] <nuria>	 mforns: wait
[16:41:17] <nuria>	 mforns: i think the query_name is missing
[16:45:23] <mforns>	 nuria: the query_name should be in the bundle file, in the coordinator snippet, no?
[16:45:40] <nuria>	 mforns: yes, but i must have a snaffu somehere
[16:47:19] <mforns>	 nuria: could it be that the query is not in /home/nuria/workplace/refinery/refinery_main/ ?
[16:47:44] <mforns>	 oh, it's there
[16:47:46] <nuria>	 mforns: as in the hql file?
[16:47:51] <mforns>	 yes
[16:48:14] <nuria>	 mforns: this smells so much of  one of my famous STUPID TYPOS
[16:48:26] <nuria>	 ains
[16:48:40] <nuria>	 mforns: nvm will continue later
[17:14:17] * elukey afk!
[17:30:09] <wikibugs>	 10Analytics: Check home/HDFS leftovers of leila - https://phabricator.wikimedia.org/T264994 (10leila) @elukey thanks. Just drop the Hive tables, please. No need to move them.
[17:49:27] <wikibugs>	 10Analytics, 10Product-Analytics, 10Structured-Data-Backlog: Add image table to monthly sqoop list - https://phabricator.wikimedia.org/T266077 (10mpopov) The team will review and prioritize this during our next board review meeting (October 26th).
[18:47:32] <wikibugs>	 10Analytics-Radar, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: Develop a new schema for MediaSearch analytics or adapt an existing one - https://phabricator.wikimedia.org/T263875 (10CBogen)
[18:47:34] <wikibugs>	 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: [L] Instrument MediaSearch results page - https://phabricator.wikimedia.org/T258183 (10CBogen)
[22:08:53] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to Production Shell Access (analytics-privatedata-users) for Rmaung - https://phabricator.wikimedia.org/T266250 (10KFrancis) >>! In T266250#6573604, @Marostegui wrote: > @KFrancis can you confirm if @Rmaung has a valid NDA signed? I cannot s...
[22:31:43] <wikibugs>	 10Analytics, 10Product-Analytics: Analyze differences between checksum-based and revert-tag based reverts in mediawiki_history - https://phabricator.wikimedia.org/T266374 (10nettrom_WMF)
[22:36:06] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to Production Shell Access (analytics-privatedata-users) for Rmaung - https://phabricator.wikimedia.org/T266250 (10Nuria) @Rmaung: can you describe what data are looking to access? This is so we can see what is the appropriate level of acces...
[22:38:02] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to Production Shell Access (analytics-privatedata-users) for Rmaung - https://phabricator.wikimedia.org/T266250 (10Nuria) Also, @Rmaung please take a look at https://wikitech.wikimedia.org/wiki/Analytics/Data_Access_Guidelines and ask any qu...
[22:42:23] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Nuria's volunteer account - https://phabricator.wikimedia.org/T266086 (10Nuria)
[22:43:17] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Nuria's volunteer account - https://phabricator.wikimedia.org/T266086 (10Nuria) NDA signed now but I do not have access to https://phabricator.wikimedia.org/L2?
[22:45:24] <wikibugs>	 10Analytics, 10Product-Analytics: Add timestamps of important revision events to mediawiki_history - https://phabricator.wikimedia.org/T266375 (10nettrom_WMF)
[22:48:45] <wikibugs>	 10Analytics, 10Product-Analytics: Add timestamps of important revision events to mediawiki_history - https://phabricator.wikimedia.org/T266375 (10nettrom_WMF) @Isaac : you wanted me to tag you when I filed the task for getting information about revision tag changes into MediaWiki history. Here's said tag. I do...
[22:49:43] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Nuria's volunteer account - https://phabricator.wikimedia.org/T266086 (10Dzahn) @Nuria Try again now, I just added you to the project called "WMF-NDA-Requests" (https://phabricator.wikimedia.org/project/profile/974/) which seems like it's needed to allow you...
[22:52:37] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Nuria's volunteer account - https://phabricator.wikimedia.org/T266086 (10Nuria) done!
[22:58:29] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Nuria's volunteer account - https://phabricator.wikimedia.org/T266086 (10Dzahn) >>! In T266086#6575705, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://sal.toolforge.org/log/5BKtV3UBpU87LSFJgL3r} [2020-10-23T22:5...
[23:14:19] <wikibugs>	 10Quarry: Quarry down for logged in users - https://phabricator.wikimedia.org/T265997 (10Framawiki) From my records it was down for 9 hours and 20 minutes. Logs on that day are full of: ` sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'quarry-db-01.quarry...
[23:19:40] <wikibugs>	 10Analytics, 10Operations, 10SRE-Access-Requests: Nuria's volunteer account - https://phabricator.wikimedia.org/T266086 (10KFrancis) @Dzahn because they are an employee of the WMF, the NDA is kept in file by T&C.
[23:25:33] <nuria>	 mforns: all my issues are just permit issues, somewhere somewhere the table that script is trying to update is under analytics
[23:25:52] <nuria>	 mforns:  diagnostics: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user=nuria, access=WRITE, inode="/tmp/analytics/data_quality_stats_updater":analytics:hdfs:drwxr-xr-x
[23:26:46] <nuria>	 mforns: but funny how job suceeds, this is the spark problem of errors not being surfaced, i think
[23:35:50] <nuria>	 mforns: ok, got it, the temp directory needs to be overriden  or it will default to waht spark has which is "/tmp/analytics/",  this can be fixed with docs or a small change in workflow
[23:50:07] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): Quarry down for logged in users - https://phabricator.wikimedia.org/T265997 (10bd808) a:03Bstorm The database had crashed as I remember. @Bstorm did things to get it back up and running. She may have a better memory of what was broken and why.