[06:28:24] 10Analytics, 10WMF-NDA-Requests: Check PPI leftovers - awight - https://phabricator.wikimedia.org/T220377 (10elukey) 05Open→03Resolved a:03elukey All removed thanks! [08:01:07] 10Analytics: Decide: start_timestamp for mediawiki history - https://phabricator.wikimedia.org/T220507 (10JAllemandou) I did a quick analysis using Spark on user data after @nettrom_WMF comment: ` import com.databricks.spark.avro._ // user table data val u = spark.read.avro("/wmf/data/raw/mediawiki/tables/user/s... [09:08:33] (03PS8) 10Joal: Update mw user-history timestamps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/497604 (https://phabricator.wikimedia.org/T218463) [09:27:55] joal: bonjour! Going to send now the email for cdh upgrade (wed 15:00 CET) [09:42:47] 10Analytics-Volunteering, 10Developer-Advocacy, 10Project-Admins, 10patch-welcome: Analytics-Volunteering and Wikidata's Need-Volunteer tags; "New contributors" vs "volunteers" terms - https://phabricator.wikimedia.org/T88266 (10Aklapper) For the records, `#need-volunteer` got renamed to `#patch-welcome` b... [09:53:40] brb [10:41:21] * elukey lunch! [12:56:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Run critical Analytics Hadoop jobs and make sure that they work with the new auth settings. - https://phabricator.wikimedia.org/T212259 (10elukey) [13:19:56] (03PS9) 10Joal: Update mw user-history timestamps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/497604 (https://phabricator.wikimedia.org/T218463) [13:26:51] (03PS8) 10Joal: Correct names in mediawiki-history sql package [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/498861 [13:28:45] (03PS5) 10Joal: Fix mediawiki-history-checker after field renamed [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/499527 (https://phabricator.wikimedia.org/T219484) [13:32:40] 10Analytics, 10EventBus, 10Services (watching): EventGate service runner worker occasionally killed, usually during higher load - https://phabricator.wikimedia.org/T220661 (10Ottomata) 2 workers might be an ok number, but I'd stil like to figure out why it is dying in the first place. I can reproduce on sta... [13:38:01] (03PS2) 10Joal: Fix for null-timestamps in checker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/499914 [13:38:35] ottomata: o/ - whenever you have time can you tell me if https://etherpad.wikimedia.org/p/analytics-cdh5.16.1 is ok? [13:43:07] milimetric: Hello - would by any chance have a few minutes for me? [13:43:27] yep, one sec [13:44:20] joal: ok, omw cave [13:56:11] 10Analytics, 10EventBus, 10Services (watching): EventGate service runner worker occasionally killed, usually during higher load - https://phabricator.wikimedia.org/T220661 (10akosiaris) >>! In T220661#5107329, @Pchelolo wrote: > @akosiaris no, spanning up a new worker takes no time, the problem here is actua... [14:02:43] 10Analytics, 10Analytics-Wikistats: Add user_is_bot_by_group to MediaWiki history - https://phabricator.wikimedia.org/T219177 (10JAllemandou) Here is the definition we agred on with @Milimetric: removal of `user_is_bot_by_name` boolean field and addition of 2 new fields: `user_is_bot_by: Array[String]` and `us... [14:03:02] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Add user_is_bot_by_group to MediaWiki history - https://phabricator.wikimedia.org/T219177 (10JAllemandou) a:03JAllemandou [14:03:30] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Add user_is_bot_by to MediaWiki history - https://phabricator.wikimedia.org/T219177 (10JAllemandou) [14:25:19] 10Analytics, 10EventBus, 10Patch-For-Review, 10Services (watching): EventGate service runner worker occasionally killed, usually during higher load - https://phabricator.wikimedia.org/T220661 (10Ottomata) > unless I am misunderstanding something, this sounds just like a capacity problem. I can reproduce u... [14:44:26] mforns, to encourage your work on T189475, people at the translate-a-thon I went to last week were running into the issue mentioned - blocked by edit filters. So this is important and good work! [14:44:27] T189475: Identify common abuse filters that affect translations - https://phabricator.wikimedia.org/T189475 [14:47:11] (03CR) 10Milimetric: [C: 03+2] "hey so nobody seems to care about this, I'm just gonna self-merge. Feel free to revert anyone" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/502623 (https://phabricator.wikimedia.org/T220088) (owner: 10Milimetric) [14:48:20] 10Analytics, 10Fundraising-Backlog: CentralNoticeImpression refined impressionEventSampleRate is int instead of double - https://phabricator.wikimedia.org/T217109 (10Milimetric) ping @AndyRussG, can you confirm that it's ok to delete the old data? [14:51:25] 10Analytics, 10Analytics-Kanban: Wikistats: Change Mercator Projection to Eckert IV - https://phabricator.wikimedia.org/T218045 (10Milimetric) [14:53:22] mforns: who's Jsc and are they going to work on T200070? [14:53:22] T200070: Wikistats2: Values in map view show unnecessary decimal digits - https://phabricator.wikimedia.org/T200070 [14:54:23] milimetric, Jsc? [14:54:54] oh, was looking at wrong task [14:56:04] milimetric, I don't knwo Jsc, I believe they are a community developer [14:56:33] but it seems, since I pinged them, they didn't respond, so... [14:56:45] aha, ok [14:58:47] 10Analytics, 10EventBus, 10Operations, 10Core Platform Team (Modern Event Platform (TEC2)), and 3 others: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019. - https://phabricator.wikimedia.org/T217359 (10herron) >>! In T217359#5032835, @elukey wrote: > In the SRE spreadsheet I can see that the s... [15:02:37] ping ottomata [15:03:51] trying... [16:27:08] elukey left a couple of comments on cdh upgrade etherpad, +1 otherwise [16:30:10] ottomata: thanks! In theory we could go one step further and automate everything in a spicerack script [16:30:19] but I preferred to leave that for the next time :D [16:30:23] aye [16:32:04] ottomata: the spark-assembly thing is something that Joseph had to do after me upgrading the testing cluster, I thought we had a cdh::exec for that but I didn't find it.. BUT I still need to check [16:32:36] about sqoop - is the package only a client or something more that needs to stay on workers? [16:32:54] just curious, don't really care if it stays or not :) [16:55:08] FYI all dates in dashiki are broken: https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os/os-family-timeseries [16:56:10] fdans: assigning you this bug fix since i believe it is your ops week: https://phabricator.wikimedia.org/T221018 [16:56:52] nuria: qq - do you remember the request about restarting jupyter on notebooks? [16:57:00] elukey: yes! [16:57:07] was it the jupyterhub service or your unit? [16:57:11] the -nuria- one [16:57:19] elukey: yes [16:57:26] elukey: my unit [16:58:27] nuria: so I thought jupyterhub (the main service) since you mentioned clean up and restarting it is the only way to proper clean up old notebooks [16:58:39] in theory people should be able to restart their stuff [16:59:32] (the doc that I am referring to is https://wikitech.wikimedia.org/wiki/SWAP#Resetting_user_virtualenvs) [16:59:55] so probably I have misunderstood your request and granted access to something else [16:59:59] that is valuable anyway [17:00:07] but not what you wanted :D [17:00:12] so I'll need to file another request [17:00:31] elukey: no, i could not restart my stuff either [17:00:50] elukey: i can look into this later , maybe i am mixing things here ... [17:01:03] nono I think you are right, going to check now [17:01:09] my bad sorry! [17:15:16] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504067/ [17:15:19] this should fix it [17:47:03] * elukey off! [17:47:04] 10Analytics, 10EventBus, 10Operations, 10Core Platform Team (Modern Event Platform (TEC2)), and 3 others: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019. - https://phabricator.wikimedia.org/T217359 (10elukey) 128GB sounds good, the more page cache Kafka has the better :) I'd also think about... [18:05:03] 10Analytics, 10Analytics-Kanban, 10Core Platform Team (Modern Event Platform (TEC2)), 10Core Platform Team Backlog (Watching / External), and 2 others: [Post-mortem] Kafka Jumbo cluster cannot accept connections - https://phabricator.wikimedia.org/T219842 (10debt) [18:16:11] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10Nuria) >My other current theory is the missing 10% is possibly browsers that don't support sendBeacon Seems unlikely rather makes sense that if you hav... [19:23:23] when logging into SWAP on notebook1003 i only get 500 errors :( [19:28:20] killed the jupyterhub-singleuser from bash, using the "start my server" button also fails: Spawner failed to start [status=3]. [20:01:00] ebernhar|lunch: really strange, I now see that the unit returns 127 as return code after restart [20:07:33] we could try to reset the venv [20:08:03] https://wikitech.wikimedia.org/wiki/SWAP#Resetting_user_virtualenvs [20:28:56] 10Analytics, 10EventBus, 10Operations, 10Core Platform Team (Modern Event Platform (TEC2)), and 3 others: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019. - https://phabricator.wikimedia.org/T217359 (10mobrovac) +1 for SSDs. We will be directly exposing time-based messages to consumers, so I i... [20:30:28] 10Analytics, 10EventBus, 10Patch-For-Review, 10Services (watching): EventGate service runner worker occasionally killed, usually during higher load - https://phabricator.wikimedia.org/T220661 (10Ottomata) More data: https://gist.github.com/ottomata/608d19a7cdb4385170564521e23c4316 This time also stracing... [20:39:13] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10Isaac) > My other current theory is the missing 10% is possibly browsers that don't support sendBeacon >> Seems unlikely rather makes sense that if you... [20:44:26] elukey: sure i'll try that. For the moment i just copied the notebook i wanted over to 1004 [20:48:44] ebernhardson: an ops will be needed to restart the jupyterhub service, if you re-create the venv I'll do the rest later on :) [20:55:41] elukey: ok, anything in particular when recreating, or just a bare virtualenv? [21:02:29] ebernhardson: IIRC just a new venv [21:02:35] it should suffice [21:04:16] alright, will do [21:06:54] ack, going afk again, will work on it tomorrow morning :) [21:09:26] sure [21:16:22] ottomata: can you ssh to eventlogging machine? [21:16:42] ottomata: ah wait bast2001.wikimedia.org timed out, did we changed bastions? [21:24:09] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10Nuria) >Yes, both mobile web and desktop but not the app. The final proportion ends up being ~50% mobile and ~50% desktop. Ok, makes sense that loading...