[08:54:05] 10Analytics, 10Analytics-Kanban: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 (10elukey) p:05Triage>03High [08:54:08] joal: --^ :( [08:56:00] High elukey - This is not nice [08:57:43] joal: I just restarted the failed job [08:57:58] missed to log it since nobody was online :) [08:58:06] np elukey - many thanks for that [08:58:31] elukey: I'd be interested in investigating more on this issue with you if you'd like [08:58:33] so I basically modified spark 2.3.0 with spark 2.3.1 [08:58:44] fixed manually and restarted the coordinator [08:58:48] elukey: I'm not so good in JVM tuning, and learning with you is great [08:59:08] it is weird since I don't see stressful signs in the GC logs [08:59:11] sigh [08:59:15] meh :( [08:59:26] and the fact that it happens right after changing the masters is not good [08:59:43] elukey: the spark default was not correct in properties file? [09:00:02] agreed elukey - Could be material issue (RAM or something?) [09:00:11] was 2.3.0, I think that Andrew cleaned up some oozie dirs on hdfs since I can find only "spark" and spark2.3.1 [09:00:21] (among the oozie libs) [09:00:35] how are you feeling btw? [09:01:16] elukey: better!! Still some leftover, but I can now understand reasonably what I'm looking at :) [09:02:46] elukey: I'm very sorry, the mobile_apps_session_metrics spark-lib param shold have been updated in my previous patch, but I missed it [09:02:50] changing it now [09:03:33] joal: I hope that I've restarted the coordinator with the right timing [09:03:40] this one is 7 days and it was a bit weird [09:04:15] (03PS1) 10Joal: Correct spark-lib version in apps-session-metrics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/467112 [09:04:33] elukey: --^ [09:04:38] elukey: will self merge that now [09:05:11] (03CR) 10Elukey: [C: 032] Correct spark-lib version in apps-session-metrics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/467112 (owner: 10Joal) [09:05:33] (03CR) 10Joal: [V: 032 C: 032] "Self merging bug-correction" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/467112 (owner: 10Joal) [09:08:00] elukey: From what I can see, the job you restarted has correct params :) [09:08:06] Many thanks again [09:09:37] \o/ [09:15:00] !log restart apps-session-metrics with spark 2.3.1 oozie libs (modified the coordinator.properties file manually on disk) [09:15:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:15:24] !log restart yarn resource manager on an-coord1002 (failover happened due to jvm issues) [09:15:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:15:54] Oh by the way elukey - You could have restarted with "-Doozie_spark_lib=spark2.3.1" instead of modifying file on disk ;) [09:17:11] ah snap you are right! [09:17:15] way easier sigh [09:17:27] elukey: no big deal - justb saying [09:21:04] 10Analytics, 10Analytics-Wikistats, 10Russian-Sites: Wikistats summary for Report Card has English/Russian Wikipedia-Zero traffic as separate entries instead of added to English/Russian Wikipedia mobile traffic - https://phabricator.wikimedia.org/T146777 (10Liuxinyu970226) [09:26:38] joal: have a nice sunday! Talk with you tomorrow :) [09:26:46] Bye elukey :) [15:09:11] 10Quarry: Provide a way to add hyperlink in Quarry results/output - https://phabricator.wikimedia.org/T74874 (10Seb35) I like Tgr’s both proposals, even if the implementation is nontrivial for the first comment. Perhaps for the format page_title it can be used the MediaWiki API with cached results for e.g. 7 da... [15:16:34] 10Quarry: Provide a way to add hyperlink in Quarry results/output - https://phabricator.wikimedia.org/T74874 (10Seb35) A more advanced suggestion (but could be another task) would be to have a dictionary about the datatype of common field names. E.g. default formatting for `rev_user_text` is `@quarry:format link... [15:53:57] 10Analytics, 10EventBus, 10Core Platform Team Current (Doing), 10Patch-For-Review, 10Services (doing): Revision visibility change event sets a wrong performer - https://phabricator.wikimedia.org/T206277 (10CCicalese_WMF) [16:28:16] PROBLEM - Webrequests Varnishkafka log producer on cp5007 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf [16:28:17] PROBLEM - eventlogging Varnishkafka log producer on cp5007 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/eventlogging.conf [16:28:18] PROBLEM - statsv Varnishkafka log producer on cp5007 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf [16:33:37] RECOVERY - Webrequests Varnishkafka log producer on cp5007 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf [16:33:47] RECOVERY - eventlogging Varnishkafka log producer on cp5007 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/eventlogging.conf [16:33:47] RECOVERY - statsv Varnishkafka log producer on cp5007 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf [17:14:49] 10Quarry: Provide a way to add hyperlink in Quarry results/output - https://phabricator.wikimedia.org/T74874 (10Seb35) I would have a first fix in JavaScript for the URLs, which is a simpler case, but I’m not sure it is really useful given there are not so much URLs in MediaWiki databases. Probably we should ins... [19:42:07] 10Analytics, 10Analytics-EventLogging: It should be possible to see the number of server-side validation errors for a schema in Grafana - https://phabricator.wikimedia.org/T206959 (10phuedx) [21:57:07] I just got an oozie job error report (mediawiki-load-wf-wmf_raw.ApiAction-2018-10-14-19) that I don't understand (probably not the report's fault, I just don't know anything about oozie) [21:57:16] am I supposed to do something about it? [22:07:11] 10Quarry: Provide a way to add hyperlink in Quarry results/output - https://phabricator.wikimedia.org/T74874 (10Tgr) >>! In T74874#4664521, @Seb35 wrote: > I would have a first fix in JavaScript for the URLs, which is a simpler case, but I’m not sure it is really useful given there are not so much URLs in MediaW...