[07:02:14] goood morning [07:19:21] 10Analytics, 10Event-Platform, 10Patch-For-Review: WikimediaEventUtilities and produce_canary_events job should use api-ro.discovery.wmnet instead of meta.wikimedia.,org to get stream config - https://phabricator.wikimedia.org/T274951 (10elukey) ` elukey@puppetmaster1001:~$ sudo -i confctl --quiet --object-t... [08:11:52] 10Analytics-Radar, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Product-Analytics, 10Structured-Data-Backlog (Current Work): Set up generation of JSON dumps for Wikimedia Commons - https://phabricator.wikimedia.org/T259067 (10ArielGlenn) Well, the json ones are being generated with the name "all"... [08:20:40] 10Analytics-Radar: Presto error in Superest - only when grouping - https://phabricator.wikimedia.org/T270503 (10elukey) >>! In T270503#6847080, @EYener wrote: > Thanks again for helping with the authentication issue @elukey. I'm sure it's all fixed now, but I also wanted to note that it looks like my email addre... [08:31:19] (03CR) 10Jdrewniak: SearchSatisfaction: Add editBucketCount (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/665317 (https://phabricator.wikimedia.org/T272991) (owner: 10Jdrewniak) [08:44:24] * elukey bbiab [09:01:07] !log reboot stat1005/stat1008 for kernel upgrades [09:01:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:03:12] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) A high level script to change the user `hdfs` could be: ` #!/bin/bash set -ex UID=$(id -u hdfs) GID=$(id -g hdfs) usermod -u 200 hdfs groupmod -g 200 hdfs find / \( -path /proc... [10:03:22] the standardization of users/groups is a little more brutal than what I imagined :D [10:42:15] 10Analytics, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 - https://phabricator.wikimedia.org/T274866 (10elukey) @GoranSMilovanovic yep I will ping you when a stable solution is fine, I am still trying to check how to fix it properly but I fear that a task... [11:03:22] (03CR) 10Phuedx: [C: 03+1] SearchSatisfaction: Add editBucketCount (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/665317 (https://phabricator.wikimedia.org/T272991) (owner: 10Jdrewniak) [11:04:23] 10Analytics, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 - https://phabricator.wikimedia.org/T274866 (10elukey) https://issues.apache.org/jira/browse/BIGTOP-3508 [11:28:36] Morning! [11:29:00] joal: https://gerrit.wikimedia.org/r/c/operations/puppet/+/665321 ANy opinions/comments? [11:33:05] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10MoritzMuehlenhoff) >>! In T231067#6848008, @elukey wrote: > * Adding system users to data.yaml (to reserve uid/gid ) means that our users will be deployed fleetwide, that seems to be too muc... [11:40:16] * elukey afk! lunch! [13:02:05] * klausman lunch (bbiab) [13:03:06] (03CR) 10Phuedx: [C: 03+2] "Being **bold**." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/665317 (https://phabricator.wikimedia.org/T272991) (owner: 10Jdrewniak) [13:03:42] (03CR) 10jerkins-bot: [V: 04-1] SearchSatisfaction: Add editBucketCount [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/665317 (https://phabricator.wikimedia.org/T272991) (owner: 10Jdrewniak) [13:47:00] (03PS2) 10Phuedx: SearchSatisfaction: Add editBucketCount [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/665317 (https://phabricator.wikimedia.org/T272991) (owner: 10Jdrewniak) [13:58:15] (03CR) 10Jdrewniak: "@Phuedx, thanks for the fix. I wasn't sure where this minimum/maximum error was coming from. I guess it's for validation." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/665317 (https://phabricator.wikimedia.org/T272991) (owner: 10Jdrewniak) [14:03:25] (03CR) 10Ottomata: "Ok!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/665317 (https://phabricator.wikimedia.org/T272991) (owner: 10Jdrewniak) [14:07:36] !log upgrade spark2 on stat1004 to 2.4.4-bin-hadoop2.6-5~wmf0 (hadoop jars removed) [14:07:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:07:44] (03CR) 10Phuedx: [C: 03+2] "Per Jdrewniak and Ottomata's comments above." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/665317 (https://phabricator.wikimedia.org/T272991) (owner: 10Jdrewniak) [14:12:21] !log upgrade spark2 on an-coord1001 to 2.4.4-bin-hadoop2.6-5~wmf0 (hadoop jars removed), will remove and auto-re add spark-2.4.4-assembly.zip in hdfs after running puppet here [14:12:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:13:42] 10Analytics-Radar: Presto error in Superest - only when grouping - https://phabricator.wikimedia.org/T270503 (10EYener) Thanks @elukey! I also wanted to ask about caching; it appears that caching is no longer working for this dashboard - do you know of a possible cause for this? [14:15:44] 10Analytics-Radar: Presto error in Superest - only when grouping - https://phabricator.wikimedia.org/T270503 (10elukey) >>! In T270503#6848789, @EYener wrote: > Thanks @elukey! I also wanted to ask about caching; it appears that caching is no longer working for this dashboard - do you know of a possible cause fo... [14:15:51] 10Analytics, 10Patch-For-Review: Repackage spark without hadoop, use provided hadoop jars - https://phabricator.wikimedia.org/T274384 (10Ottomata) ` sudo -u hdfs hdfs dfs -mv /user/spark/share/lib/spark-2.4.4-assembly.zip /user/spark/share/lib/spark-2.4.4-hadoop2.6-assembly.zip ` I upgraded spark on an-coord1... [14:16:50] ottomata: o/ [14:16:54] elukey: i'm about to upgrade to spark2 with no hadoop on the whole cluster [14:16:57] s'ok? [14:17:07] do we need also to uplaod spark2 jars in oozie shlib? (and restart oozie) [14:17:08] am doing an-launcher1002 right now, running a refine with it first [14:17:11] yes i think we do [14:17:30] although if it isn't affecting things right now i'm not sure what difference it will make but ya [14:17:36] should def do [14:17:39] yeah let's do that now [14:17:39] sure +1 [14:20:32] actually, lets do the spark2 upgrade first, the new spark-env.conf automatically includes the provided bigtop hadoop jars on the spark classpath [14:20:41] probably good to get that before wew change oozie assembly [14:20:53] hm i didn't do that on test cluster elukey [14:22:32] ok so on an-test-coord1001, i'm going to move away the spark-2.4.4 dir in the current oozie sharelib and then run puppet [14:25:42] ack [14:26:05] ok looks like it did the right thing there [14:34:10] ottomata: \O [14:34:14] heyooo [14:34:39] So I got the camus thingy submitted. Now I need to query the actual data [14:35:41] klausman: cool ok, so the easiest thing to do is probably to use spark [14:35:44] with spark.read.json [14:36:02] you can do in python or scala [14:36:13] wow and also spark2-sql works! [14:36:20] https://spark.apache.org/docs/2.4.4/sql-data-sources-json.html [14:36:32] e.g. on a stat box [14:36:36] spark2-shell [14:36:45] (you miht need --master yarn if your dat ais biggish) [14:36:47] or[ [14:36:49] pyspark [14:36:49] then [14:36:57] Ah, I prefer the REPL/iterative approach to figure out the meat of the code [14:36:58] spark.read.json("/path/to/dir/in/hdfs") [14:37:02] cool [14:38:46] !log upgrade spark2 on analytics cluster to 2.4.4-bin-hadoop2.6-5~wmf0 (hadoop jars removed) - T274384 [14:38:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:38:56] T274384: Repackage spark without hadoop, use provided hadoop jars - https://phabricator.wikimedia.org/T274384 [14:39:49] ottomata: how do I use a HDFS data source with Spark? [14:40:10] klausman: on a stat box if you just launch spark2 it'll be auto configgured to do so [14:40:21] it'll assume a protocol less path is an hdfs path [14:40:25] to be precise [14:40:27] you can prepend [14:40:30] hdfs:/// [14:40:30] or [14:40:34] hdfs://analytics-hadoop/ [14:40:43] Right, thanks [14:40:43] but [14:40:44] for oyu [14:40:52] /wmf/data/raw/... [14:40:55] or wherever your stuff is [14:40:56] should do [14:48:26] 10Analytics, 10Patch-For-Review: Repackage spark without hadoop, use provided hadoop jars - https://phabricator.wikimedia.org/T274384 (10Ottomata) ` sudo -u hdfs hdfs dfs -mv /user/oozie/share/lib/lib_20210210190411/spark-2.4.4 /tmp/oozie-sharelib-spark-2.4.4-hadoop.2.6 ` After upgrading spark2 everywhere, r... [14:49:27] ok elukey spark2 should be upraded everywhere to hadoopless [14:49:52] spark-env.conf everywhere now does SPARK_DIST_CLASSPATH=$(hadoop classpath) [14:49:55] Hrm. I don't have access to the relevant HDFS files, it seems. [14:50:07] oh [14:50:15] I do have a valid Kerberos ticket. [14:50:52] ottomata: nice! [14:50:55] hmmm it should be readable by analytics-privatedata-users [14:51:32] ah hm [14:51:38] the data itself is not, but all the directories are. [14:51:42] that's strange [14:51:43] hm [14:52:16] Is this something I missed to do on the camus job? [14:52:18] elukey: i can never remember this stuff, but shouldn't files inherit ownership of their parent dirs? [14:52:32] the dirs all look lke they have the correct perms [14:53:05] no klausman i don't think so [14:53:10] same for webrequest data [14:53:19] we rarely query the raw data with our users [14:53:26] we used to be able to dothoug [14:53:38] i think this must have changed with the default umask change we did a few months ago [14:53:44] hmmm [14:53:55] ottomata: yes the files should get group ownership from the parent dir, but it depends where the files are created.. if they are eventually copied to a target dir but created in another one, this might explain [14:54:09] oh hm, yes. camus does that [14:54:21] yep then this is the issue [14:54:23] it first writes to a working dir and then when it finishes it moves them in place [14:54:37] the working dir needs to have privatedata-users perms [14:54:50] otherwise files will be created with wrong perms [14:54:58] yup basically eerythingg in /wmf/camus [14:55:00] maybe it is only a matter of chowning correctly the working dirs? [14:55:02] it is still word readble in there though [14:55:12] yeah, i don't feel comfortable doing that for everything yet, but i'll do it for atskafka now [14:55:23] good starting point [14:56:26] oh nice actually the atskafka working dir (in /wmf/camus), since it is a new camus job, got created as not world readble, since it was created after the umask change [14:56:29] it just needs proper group perms [14:56:34] Btw, is there something similar as the Spark shell, but with Python as the language? My Scala skills are basically on the "I can copy and paste code" level. [14:56:37] i'll also fix the final data dir [14:57:10] klausman: yes yes [14:57:11] pyspark2 [14:57:44] Merci bien [14:58:02] ok klausman hopefully perms all fixed up for atskafka [14:58:03] raw [14:58:03] try now [14:58:14] i think new data imported will also have proper perms [14:58:16] we will see [14:58:22] i'll file a ticket to fix the rest of raw [14:59:00] I am getting a different error now (no schema), so I guess it could at least access the data [15:00:24] 10Analytics-Clusters: /wmf/data/raw should be readable by analytics-privatedata-users - https://phabricator.wikimedia.org/T275396 (10Ottomata) [15:00:24] hmm [15:00:30] with spark.read.json [15:00:31] ? [15:00:34] it should infer the schema [15:01:12] https://phabricator.wikimedia.org/P14442 [15:10:11] hi teamm! [15:10:36] Heyooo [15:18:11] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10MoritzMuehlenhoff) >>! In T257412#6605740, @elukey wrote: > The `fallback` value seems not accepted on 1.17 (hue complains if set), so I am wondering if... [15:23:10] I've managed to create a schema manually [15:23:33] But now I get `Not a file: hdfs://analytics-hadoop/wmf/data/raw/atskafka_test_webrequest_text/atskafka_test_webrequest_text` [15:23:56] Oh, because it's a dir, obviously [15:26:40] aaaand I only get empty results :D [15:29:07] klausman: I am super n00b in spark but I have some snippets in https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Spark [15:29:30] if the data is in sequence file format then the last snippet may help [15:29:44] Roger, will take a look [15:30:37] OHOHOH rightto forgot bout the seqeunce file bit [15:31:25] klausman: we have some code to help with that [15:31:33] oh but. python [15:32:09] here's the scala code [15:32:14] you could use that method if you were in scala [15:32:24] there might be someway to call it from python but meh [15:32:28] might as well just do it in python [15:33:16] sc.sequenceFile exists in the Python API as well [15:37:32] Ok, I've got an rdd with my data in it. Now I "just" need to process it [15:37:48] nice, klausman if you make it into a dataframe using read json it will be easier [15:37:49] than an rdd [15:37:54] val df = spark.read.json(rdd.map(x => x._2)) [15:38:10] that's sclaa i guess [15:38:17] df = spark.sread.json(rdd) [15:38:18] ? [15:38:23] or maybe it is create dataset? [15:38:26] daframe are awesome [15:39:16] yup, df = spark.read.json(rdd) just works [15:39:19] nic! [15:39:21] great [15:40:14] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Repackage spark without hadoop, use provided hadoop jars - https://phabricator.wikimedia.org/T274384 (10Ottomata) [15:40:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Repackage spark without hadoop, use provided hadoop jars - https://phabricator.wikimedia.org/T274384 (10Ottomata) p:05Triage→03High [15:41:55] Hurm. Corrupt records. [15:44:00] malformed json? [15:44:16] Not sure. I just tried a different hour, and I'm still getting errors [15:44:25] pyspark.sql.utils.AnalysisException: "cannot resolve '`uri_path`' given input columns: [global_temp.requests._corrupt_record]; line 1 pos 7;\n'Project ['uri_path]\n+- SubqueryAlias `global_temp`.`requests`\n +- LogicalRDD [_corrupt_record#12], false\n" [15:47:06] trying again with mode="DROPMALFORMED" [15:47:53] that's strange [15:48:00] Looks like it doesn't understand any of the JSON. With dropping malformed records, the column list is empty [15:48:05] huh [15:49:31] klausman: maybe giving it an explicit schema will help? [15:49:34] does it have the same schema as webrequest? [15:49:39] Yes [15:49:45] try [15:51:26] ... [15:51:56] ok we need jsonserde jar to use the wmf_raw.webrrequest table [15:51:56] so [15:52:05] pyspark2 --jars /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar [15:52:13] then when you get to your spark.read.json step [15:52:14] do [15:52:33] webrequest_schema = spark.table("wmf_raw.webrequest").schema [15:52:45] spark.read.schema(webrequest_schema).json("path to json") [15:52:49] or sorry [15:52:56] spark.read.schema(webrequest_schema).json(rdd) [15:53:13] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest#wmf_raw.webrequest [15:53:35] mforns yt? got a few mins before standup to brain bounce event data sanitization? [15:54:23] I can21/02/22 15:53:21 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf. [15:54:30] ^^^ this something to worry about? [15:54:47] i don't think so, i think that is maybe spark trying to show you some logs about what it is about to do [15:54:52] but it is truncating them [15:54:57] Okay [15:55:08] >>> df2.show() [15:55:10] [Stage 1:> (0 + 1) / 1]21/02/22 15:53:56 WARN BlockReaderFactory: I/O error constructing remote block reader. [15:55:20] And ten NULLs [15:55:22] huh [15:56:09] gonna try in scala real quick since i know it better and to make sure it works for me [15:56:35] Right [15:57:52] hm ok works in sprak for me, let me try json [15:57:56] s/json/python/ :p [15:58:00] 10Analytics, 10Event-Platform, 10Release-Engineering-Team: Stop using puppet + git pull for auto deployment of schema repos - https://phabricator.wikimedia.org/T274901 (10thcipriani) >>! In T274901#6834014, @Ottomata wrote: > I tagged RelEng here for advice. > > I want a merge in gerrit to trigger a deplo... [15:58:18] 10Analytics, 10Event-Platform, 10Release-Engineering-Team-TODO: Stop using puppet + git pull for auto deployment of schema repos - https://phabricator.wikimedia.org/T274901 (10thcipriani) [15:58:30] klausman: can you paste your code? [15:59:32] sec [16:01:39] hm klausman yeah it works for me just fine [16:02:32] https://phabricator.wikimedia.org/P14444 [16:02:46] Oddly enough, I can't reproduce the I/O error [16:02:50] klausman: standup? [16:02:53] omw [16:03:15] klausman: https://gist.github.com/ottomata/fce64dc1650935abce62142b3ac3175f [16:06:59] df.show() only gives me a bunch of rows with all values being null [16:07:34] (I specified no exact file in my paste, but even using the same exact path you did, all nulls) [16:08:24] ahhh klausman [16:08:28] i think you are not mapping out the sequence feil value [16:08:33] rdd0 = spark.sparkContext.sequenceFile(p).map(lambda x: x[1]) [16:09:19] in luca's example [16:09:19] https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Spark [16:09:30] the rdd.map(x => x._2) par tis important [16:10:42] But with rdd0=spark.sparkContext.sequenceFile("/wmf...")..map(lambda x: x[1]); df = spark.read.schema(webrequest_schema).json(rdd0); df.show() I still get only nulls [16:11:00] let's skip the webrequset schema part [16:11:03] that shouid not be needed [16:12:00] success! [16:12:04] nice! [16:12:52] and sql selectoring also works now. [16:24:36] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Make hive temporary-tables storage format explicit [analytics/refinery] - 10https://gerrit.wikimedia.org/r/665425 (https://phabricator.wikimedia.org/T168554) (owner: 10Joal) [16:27:50] (03PS2) 10Milimetric: Fix inconsistent Hive query fail [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/665406 [16:39:34] razzi: we moved here https://meet.google.com/qdx-cxwy-feo?authuser=1 [16:42:20] 10Analytics, 10Analytics-Kanban, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: remove deletion timers for Growth's sanitized EL tables - https://phabricator.wikimedia.org/T274297 (10Milimetric) p:05Triage→03High a:03mforns [16:44:48] 10Analytics, 10Analytics-Visualization, 10Project-Admins: Archive #Analytics-Visualization (which seems to be about Limn)? - https://phabricator.wikimedia.org/T274647 (10Milimetric) Hi @Aklapper, yes, definitely please archive #analytics-visualization, decline all the tasks in bulk if you could. Let me know... [16:45:18] 10Analytics, 10Analytics-Visualization, 10Project-Admins: Archive #Analytics-Visualization (which seems to be about Limn)? - https://phabricator.wikimedia.org/T274647 (10Milimetric) a:03Milimetric [16:50:15] 10Analytics: Purge deprecated reportupdater outputs - https://phabricator.wikimedia.org/T274986 (10Milimetric) a:03mforns [16:52:22] 10Analytics, 10Analytics-Kanban, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: update welcome survey aggregation schedule - https://phabricator.wikimedia.org/T275172 (10Milimetric) p:05Triage→03High a:03mforns [16:53:35] 10Analytics, 10Analytics-Kanban, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: shorten welcome survey retention to 90 days - https://phabricator.wikimedia.org/T275171 (10Milimetric) p:05Triage→03High a:03mforns [17:20:48] (03PS2) 10Ottomata: Migrate TranslationRecommendation from metawiki [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/661399 (https://phabricator.wikimedia.org/T271163) [17:22:06] (03CR) 10Ottomata: [C: 03+2] Migrate TranslationRecommendation from metawiki [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/661399 (https://phabricator.wikimedia.org/T271163) (owner: 10Ottomata) [17:26:06] 10Analytics, 10Event-Platform, 10Research, 10Patch-For-Review: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10Ottomata) [17:26:21] (03PS3) 10Ebernhardson: refinery-drop-hive-partitions: Ensure verbose logging goes somewhere [analytics/refinery] - 10https://gerrit.wikimedia.org/r/661799 [17:59:27] ottomata: wanna do a event migration sync today? [18:00:04] yes maybe in :50 mins? [18:00:10] :50 after this hour? [18:03:06] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure, and 2 others: prefUpdate schema contains multiple identical events for the same preference update - https://phabricator.wikimedia.org/T218835 (10phuedx) a:03phuedx [18:19:49] ottomata: ok! [18:31:48] * elukey afk! [18:49:14] mforns: 5 more mins plz am chowin [18:49:22] ottomata: no prob! [18:55:33] mforns in bc [18:55:36] oh you are there! [19:15:06] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Repackage spark without hadoop, use provided hadoop jars - https://phabricator.wikimedia.org/T274384 (10Milimetric) a:03Ottomata [19:18:33] 10Analytics, 10Analytics-Kanban: Filter out webrequest where debug=1 from pageview - https://phabricator.wikimedia.org/T273083 (10Milimetric) @jijiki & @ema: just following up on this, everything was deployed on our side and looks to be working. If you've seen the actual data, let us know if it doesn't look r... [19:22:02] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Update Image usage metric - https://phabricator.wikimedia.org/T271571 (10Milimetric) This was deployed and runs monthly with a week delay. So the next run, around March 6th, should reflect the new logic. [19:25:49] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Product-Analytics, and 2 others: Document how ad blockers / tracking blockers interact with EventLogging - https://phabricator.wikimedia.org/T263503 (10Mholloway) I added a new section https://wikitech.wikimedia.org/wiki/Analytics/Systems/Event... [19:27:37] !log restart oozie on an-coord1001 to pick up new spark share lib without hadoop jars - T274384 [19:27:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:27:41] T274384: Repackage spark without hadoop, use provided hadoop jars - https://phabricator.wikimedia.org/T274384 [19:40:58] * razzi afk for lunch [19:50:37] 10Analytics-Clusters, 10Analytics-Kanban: Update sqoop to work with multi-instance clouddb1021 mariadb host - https://phabricator.wikimedia.org/T274690 (10Milimetric) p:05Triage→03High a:05razzi→03Milimetric [20:25:32] (03PS1) 10Awight: Restore templatewizard queries [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/666203 [20:26:14] (03CR) 10Awight: [C: 03+1] "Sorry for the confusion, we're just putting the final touches on these queries and will enable puppet jobs soon!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/666203 (owner: 10Awight) [20:26:50] (03PS6) 10Awight: Use edit count bucket sent by TemplateWizard [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657634 (https://phabricator.wikimedia.org/T273475) [20:27:24] (03CR) 10Awight: "PS 6: rebase over accidental deletion" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657634 (https://phabricator.wikimedia.org/T273475) (owner: 10Awight) [20:31:06] (03PS4) 10Awight: Use the edit count bucket sent by TemplateData [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659227 (https://phabricator.wikimedia.org/T272569) (owner: 10Andrew-WMDE) [20:31:40] (03CR) 10Awight: "PS 4: minor, manual rebase" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659227 (https://phabricator.wikimedia.org/T272569) (owner: 10Andrew-WMDE) [20:33:04] (03PS7) 10Awight: Segment CodeMirror metrics by user edit count [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/656210 (https://phabricator.wikimedia.org/T273471) [20:33:22] (03CR) 10Awight: "PS 7: minor, manual rebase" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/656210 (https://phabricator.wikimedia.org/T273471) (owner: 10Awight) [20:41:38] ottomata: from the web [20:41:43] again [20:42:19] ottomata: from the webrequest errors, it seems the camus changes you made are impacting webrequest-production [20:43:00] ottomata: Both text and upload have timedout for hours 14 and 15, due to _IMPORTED flag not being present [20:43:30] Will leave you with that, I'm supposedly off today (and tomorrow) [20:49:43] Aren't you supposed to be *off work*, joal? [20:49:55] Andnyet... [20:55:57] hehe [21:13:04] ??? [21:13:06] looking [21:17:17] but i didn't change the prod job...just the atskafka one...???? [21:17:20] stil llooking [21:22:15] AH NO! it is because of the way the classpath is set for the camus checker jar [21:22:27] it is set explicitly [21:22:31] when i upgraded spark it lost the hadoop jars in /usr/lib/spark/jars [21:22:39] so i need to add the hadoo pjars to the class path in the job [21:22:40] on it [21:27:14] do you need help ottomata? [21:28:51] (03PS1) 10Milimetric: [WIP] Update mysql resolver to work with cloud db replicas [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666209 (https://phabricator.wikimedia.org/T274690) [21:29:04] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Update Image usage metric - https://phabricator.wikimedia.org/T271571 (10Isaac) Thanks @Milimetric! [21:39:24] mforns: hm, i think i know what's wrong, might need a review in a sec [21:48:10] ottomata: as I continue with kafka partition rebalancing, do we have a plan for rolling out the new alarm "Kafka Broker Replica Max Lag is increasing" and deprecating the old one? [21:49:34] razzi: ah yeah we were gogni t talk about that today [21:49:52] i think we can remove the old one [21:49:59] it hasn't been firing during your parttion mvoes yet, has it? [21:50:02] sorry [21:50:06] the new one hasn't been firing, right? [21:50:13] sorry am trying to fix a camus problem quickyl... [21:50:32] heh, I just saw the webrequest pipeline stopped, and yall are way ahead of you including people on vacation!!! /me feels slow [21:50:52] milimetric: mforns i have go afk from 5-6 [21:50:56] i know why this isn't working [21:51:04] i have a fix, but haven't been able to verify it yet [21:51:08] i'll push it up [21:51:20] k, we can take a look [21:53:59] (03PS1) 10Ottomata: bin/camus - use hadoop classpath when running checker jar. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666214 (https://phabricator.wikimedia.org/T274384) [21:54:06] milimetric: mforns ^ [21:54:13] i'd expect the test cluster camus is also faliing [21:54:28] well, hm, it probably iisn't writing _IMPORTED flags [21:54:39] aha [21:54:42] indeed [21:54:42] Feb 22 21:50:26 an-test-coord1001 camus-webrequest[81083]: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path [21:55:05] so, i'm not sure if that works yet i've been having trouble getting to actually run, but i trhink i'm doing somethin gwrong [21:55:15] you should be abble to test it on an-test-coord1001 thuogh [21:55:23] if not no worries, i'll be back on after 6 [21:55:24] t olok [21:55:41] k [21:56:05] (03PS2) 10Ottomata: bin/camus - use hadoop classpath when running checker jar. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666214 (https://phabricator.wikimedia.org/T274384) [21:56:16] ottomata: sounds good, I haven't kicked off the next migration yet, was looking into scheduling downtime. Let me leave your alarm on, and temporarily disable the old one, then if yours is working we can consider disabling the old one [21:56:25] perfect [21:56:27] +1 [21:58:26] milimetric: wanna pair on this? I feel it would take me definitely more than 1 hour to look into it by myself [22:32:00] what I'm seeing in an-test-coord1001 is that flags have been missing since we did the migration to BigTop, not just today [22:33:41] wait, no sorry [22:33:59] they have been missing since: 2021-02-19T15 [22:36:47] maybe the initial change was tested in an-test-coord1001 on friday and only deployed today [22:37:05] ok, re-running in test with new code [22:42:19] (03Abandoned) 10Bearloga: [WIP] Create mediawiki_page fragment [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/615562 (https://phabricator.wikimedia.org/T255302) (owner: 10Bearloga) [22:42:36] (03Abandoned) 10Bearloga: [WIP] Create mediawiki_common fragment [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/622183 (https://phabricator.wikimedia.org/T255302) (owner: 10Bearloga) [22:42:41] (03Abandoned) 10Bearloga: [WIP] Create mediawiki_user fragment [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/615561 (https://phabricator.wikimedia.org/T255302) (owner: 10Bearloga) [22:51:04] sorry mforns, I am just now back [22:51:19] heya milimetric no problem, I was trying stuff [22:51:36] I think I got it to run, but can not see any changes [22:51:50] I can batcave if you're still around, or work with Andrew when he's back [22:51:55] yea :] [22:52:03] omw [23:01:48] (03PS1) 10Erin Yener: WikipediaPortal schema whitelist request [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666223 [23:01:50] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666223 (owner: 10Erin Yener) [23:02:52] (03CR) 10Erin Yener: "Thank you for reviewing!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666223 (owner: 10Erin Yener) [23:18:58] (03CR) 10Milimetric: [V: 03+2 C: 03+2] bin/camus - use hadoop classpath when running checker jar. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666214 (https://phabricator.wikimedia.org/T274384) (owner: 10Ottomata) [23:29:49] (03PS1) 10Erin Yener: MobileWikiAppiOSFeed Whitelist Request [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666227 [23:30:44] (03CR) 10Erin Yener: "Thanks for reviewing! This is the second of three schemas we in Fundraising will be requesting for whitelisting." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666227 (owner: 10Erin Yener) [23:49:12] (03PS1) 10Erin Yener: MobileWikiAppFeed Whitelist Request [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666229 [23:57:03] (03PS2) 10Erin Yener: MobileWikiAppiOSFeed Whitelist Request [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666227