[00:50:24] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): On the "Git" dashboard, filtering on one organization still lists authors who are with another organization - https://phabricator.wikimedia.org/T157709#3266196 (10Aklapper) 05Open>03Resolved Thanks a lot! All looks correct now, hence... [01:07:24] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Git code repository is listed but not all recent activity in it is shown on wikimedia.biterg.io - https://phabricator.wikimedia.org/T161211#3266201 (10Aklapper) 05stalled>03Open [01:07:54] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Git code repository is listed but not all recent activity in it is shown on wikimedia.biterg.io - https://phabricator.wikimedia.org/T161211#3124726 (10Aklapper) 05Open>03Resolved https://wikimedia.biterg.io:443/goto/0bb30596c109ed5ef... [08:05:53] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad: rack/setup/install replacement to stat1002 (stat1004 or misc name?) - https://phabricator.wikimedia.org/T165368#3266433 (10Ottomata) stat1004 please! :) [08:06:15] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad: rack/setup/install replacement to stat1003 (stat1005 or misc name?) - https://phabricator.wikimedia.org/T165366#3266434 (10Ottomata) stat1005 please! :) [08:07:01] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad: rack/setup/install replacement stat1006 (stat1003 replacement) - https://phabricator.wikimedia.org/T165366#3266436 (10Ottomata) [08:07:14] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad: rack/setup/install replacement to stat1005 (stat1002 replacement?) - https://phabricator.wikimedia.org/T165368#3266437 (10Ottomata) [08:07:22] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad: rack/setup/install replacement to stat1005 (stat1002 replacement) - https://phabricator.wikimedia.org/T165368#3264256 (10Ottomata) [08:07:51] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad: rack/setup/install replacement to stat1005 (stat1002 replacement) - https://phabricator.wikimedia.org/T165368#3264256 (10Ottomata) [08:08:38] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad: rack/setup/install replacement stat1006 (stat1003 replacement) - https://phabricator.wikimedia.org/T165366#3264224 (10Ottomata) [08:08:48] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad: rack/setup/install replacement to stat1005 (stat1002 replacement) - https://phabricator.wikimedia.org/T165368#3264256 (10Ottomata) [08:09:24] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad: rack/setup/install replacement to stat1005 (stat1002 replacement) - https://phabricator.wikimedia.org/T165368#3266446 (10Ottomata) I modified the description for the host name, and I also moved the blurb about GPU from T165366 to this ticket, sin... [09:09:13] https://etherpad.wikimedia.org/p/analytics-goals [09:10:47] https://www.mediawiki.org/wiki/Wikimedia_Engineering/2016-17_Q4_Goals#Analytics_Engineering [09:57:05] https://etherpad.wikimedia.org/p/analytics-goals [10:32:02] https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Draft/Programs/Product [10:32:19] 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3266941 (10Ottomata) Update on this. @luca is working on T156933, and in talking, we realized that if we get rid of the second slave (db1047), we will only have one copy of E... [10:34:44] 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3266955 (10jcrespo) If redundancy is the main reason, and not load balancing, I would suggest having the redundant server on codfw. But there is now no analytics server on co... [11:55:04] (03CR) 10XXN: [C: 031] Add a space between href attribute and class attribute [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/343548 (owner: 10MZMcBride) [12:08:54] 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3267305 (10Ottomata) ​+1, that sounds like a good idea to me! [12:22:22] when looking at eventlogging data through mysql on stat1002, is it expected for the clientIp to be 'NULL'? I'm looking at log.QuickSurveyInitiation_15278946 [12:47:18] elukey: let me know if you want to look at the issue with the jobqueue stuff! [12:48:44] addshore: we are still in meetings, maybe around 16:30/17:00? [12:48:52] (thanks :) [12:49:04] I have a meeting at 16:00 ;) maybe till 18:00! [12:49:22] It may finish early though! so lets see! if not, tomorrow! [12:49:41] 10Analytics, 06Analytics-Kanban, 13Patch-For-Review: Create tagging udf - https://phabricator.wikimedia.org/T164021#3267405 (10Milimetric) p:05Triage>03High [12:49:44] 10Analytics, 06Analytics-Kanban, 13Patch-For-Review: Create tagging udf - https://phabricator.wikimedia.org/T164021#3218558 (10Milimetric) p:05High>03Triage [12:50:03] 10Analytics: Spike, test idea on spark job that reads tags and produces different outputs - https://phabricator.wikimedia.org/T164020#3218541 (10Milimetric) p:05Triage>03High [12:50:05] 10Analytics: Spike, test idea on spark job that reads tags and produces different outputs - https://phabricator.wikimedia.org/T164020#3267407 (10Milimetric) p:05Triage>03High [12:50:07] 10Analytics: Spike, test idea on spark job that reads tags and produces different outputs - https://phabricator.wikimedia.org/T164020#3218541 (10Milimetric) p:05High>03Triage [12:50:24] 10Analytics, 06Analytics-Kanban, 13Patch-For-Review: Create tagging udf - https://phabricator.wikimedia.org/T164021#3218558 (10Milimetric) p:05Triage>03High [12:50:37] 10Analytics: Spike, test idea on spark job that reads tags and produces different outputs - https://phabricator.wikimedia.org/T164020#3218541 (10Milimetric) p:05Triage>03High [12:53:55] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#2144836 (10Nuria) Ping @luca, I think maps partition is going to disappear [12:55:44] 10Analytics: User History: Add history of annonymous users to history reconstruction - https://phabricator.wikimedia.org/T139760#3267435 (10JAllemandou) 05Open>03declined [13:03:17] nuria_: https://phabricator.wikimedia.org/T164259 [13:04:58] 10Analytics, 06Operations, 10Traffic, 15User-Elukey: Add VSL error counters to Varnishkafka stats - https://phabricator.wikimedia.org/T164259#3267476 (10elukey) [13:16:01] 10Analytics, 06Operations, 10Traffic, 15User-Elukey: Add VSL error counters to Varnishkafka stats - https://phabricator.wikimedia.org/T164259#3227497 (10Nuria) Let's (as a first step) send these errors to graphite. [13:43:09] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#3267590 (10elukey) Maps is going to upload soon T164608 [14:34:08] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#3267732 (10Nuria) Should we decline? [15:34:18] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#3267973 (10elukey) 05Open>03declined [19:44:17] 06Analytics-Kanban, 06Operations, 10ops-eqiad: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3269102 (10Cmjohnson) 05Open>03Resolved The system board has been replaced and the idrac failure has been corrected but now we have a raid bbu issue...creating a new tic... [21:11:27] 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 10ops-eqiad, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3269263 (10Cmjohnson) [21:17:10] PROBLEM - Hadoop NodeManager on analytics1030 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [22:40:13] 10Analytics, 06Operations, 10ops-eqiad: SATA errors for stat1004 in the dmesg - https://phabricator.wikimedia.org/T162770#3269545 (10Cmjohnson) @elukey we will need to coordinate a time to try and replace the sata cable and/or check settings. [22:54:24] !log analytics1040 back to the hadoop worker nodes after maintenance [22:54:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:54:56] !log disabled puppet and hadoop daemons again on analytics1030 (still need hw maintenance but motherboard replaced) [22:54:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:56:10] PROBLEM - Hadoop DataNode on analytics1030 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [23:00:54] 06Analytics-Kanban, 06Operations, 10ops-eqiad: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3269575 (10elukey) @Cmjohnson I just disabled all hadoop services and puppet on the host, from what I can read we'd need more hw maintenance right? [23:07:05] 06Analytics-Kanban, 06Operations, 10ops-eqiad: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3269591 (10Cmjohnson) @elukey yes we need to replace the bbu on the raid controller