[00:13:01] (03PS1) 10Milimetric: Add more logging to validation errors [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371212 [00:13:18] (03CR) 10Milimetric: [V: 032 C: 032] Add more logging to validation errors [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371212 (owner: 10Milimetric) [00:21:36] (03PS1) 10Milimetric: Fix versions of celery [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371213 [00:21:38] (03CR) 10Milimetric: [V: 032 C: 032] Fix versions of celery [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371213 (owner: 10Milimetric) [00:23:31] (03PS1) 10Milimetric: Fix versions of celery [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371214 [00:23:33] (03CR) 10Milimetric: [V: 032 C: 032] Fix versions of celery [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371214 (owner: 10Milimetric) [00:34:06] (03PS1) 10Milimetric: Fix sqlalchemy version [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371216 [00:34:46] (03CR) 10Milimetric: [V: 032 C: 032] Fix sqlalchemy version [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371216 (owner: 10Milimetric) [00:44:41] 10Analytics, 10DBA: Wikimetrics is trying to use more than 10 connections at the same time - https://phabricator.wikimedia.org/T173062#3517627 (10Milimetric) [00:44:54] 10Analytics-Kanban, 10DBA: Wikimetrics is trying to use more than 10 connections at the same time - https://phabricator.wikimedia.org/T173062#3517640 (10Milimetric) [00:47:39] (03PS1) 10Milimetric: Try to close the sessions while validating [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371218 [00:47:58] (03CR) 10Milimetric: [V: 032 C: 032] Try to close the sessions while validating [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/371218 (owner: 10Milimetric) [00:50:21] 10Analytics-Kanban, 10DBA: Wikimetrics is trying to use more than 10 connections at the same time - https://phabricator.wikimedia.org/T173062#3517671 (10Milimetric) Well, that was easy. https://gerrit.wikimedia.org/r/#/c/371218/ fixed it. Sorry for the noise. [00:50:37] 10Analytics-Kanban: Wikimetrics is trying to use more than 10 connections at the same time - https://phabricator.wikimedia.org/T173062#3517627 (10Milimetric) [01:10:59] 10Analytics-Kanban: Wikimetrics is trying to use more than 10 connections at the same time - https://phabricator.wikimedia.org/T173062#3517724 (10Marostegui) Thanks @Milimetric for spending time on fixing this and for understanding the implications of going from 10 to 100 as max allowed connections :-) cc @madhu... [08:36:31] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633#3517961 (10elukey) Looking good: ``` [Tue Aug 8 13:45:05 2017] tg3 0000:01:00.0 eth0: Link is up at 1000 Mbps, full duplex [Tue Aug... [09:11:30] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3352337 (10elukey) Hello people, any timeline for these hosts? Don't mean to pressure, just knowing the timings to organize/schedule... [11:09:57] * elukey lunch! [11:10:11] (03CR) 10Zhuyifei1999: [C: 032] Drop redundant Bootstrap CSS [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/337214 (owner: 10Ricordisamoa) [11:10:28] (03Merged) 10jenkins-bot: Drop redundant Bootstrap CSS [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/337214 (owner: 10Ricordisamoa) [13:48:49] (03PS37) 10Ottomata: JsonRefine: refine arbitrary JSON datasets into Parquet backed hive tables [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [13:52:28] 10Analytics-Kanban, 10EventBus, 10Scap, 10Patch-For-Review, 10User-Elukey: eventlogging-service-eventbus scap deployments should depool/pool during deployment - https://phabricator.wikimedia.org/T171506#3518410 (10elukey) Marko just deployed to all the nodes, all good! (task will be closed soon as part o... [13:56:19] 10Analytics: upgrade druid and pivot - https://phabricator.wikimedia.org/T157977#3518430 (10Ottomata) [13:56:21] 10Analytics-Kanban, 10Patch-For-Review: Upgrade Druid to 0.9.2 as a temporary measure - https://phabricator.wikimedia.org/T170590#3518428 (10Ottomata) 05Open>03Resolved Alright! 0.9.2 pushed to gerrit (in branch debian-0.9.2) and added to apt: https://apt.wikimedia.org/wikimedia/pool/main/d/druid/ [14:10:00] 10Analytics, 10Operations, 10ops-eqiad: Remove stat1002 - https://phabricator.wikimedia.org/T173094#3518454 (10elukey) [14:37:12] 10Analytics, 10Operations, 10hardware-requests: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097#3518505 (10elukey) [14:50:11] milimetric: so, what was the issue with wikimetrics? [15:01:39] 10Analytics, 10Operations, 10hardware-requests: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097#3518544 (10elukey) [15:02:04] 10Analytics-Kanban: Wikimetrics is trying to use more than 10 connections at the same time - https://phabricator.wikimedia.org/T173062#3518547 (10Nuria) [15:11:01] 10Analytics-Kanban, 10User-Elukey: Archive PageContentSaveComplete in hdfs while we continue collecting data - https://phabricator.wikimedia.org/T170720#3518580 (10Nuria) Still having problems importing tinyint fields: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_mysql_import_of_tinyint_1_from_mysq... [15:11:39] 10Analytics-Data-Quality, 10Analytics-Kanban, 10Readers-Web-Backlog (Tracking): Pageview drop in ro.wikipedia hu.wikipedia and fr.wikipedia - https://phabricator.wikimedia.org/T170845#3518584 (10Nuria) [15:11:52] 10Analytics, 10Operations, 10hardware-requests: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097#3518586 (10elukey) [15:13:48] 10Analytics, 10Operations, 10hardware-requests: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097#3518505 (10elukey) Just removed all the puppet references of stat1002 and disabled alarms. Please sync with Chris and check https://phabricator.wikimedia.org/T173094 before proceeding... [15:28:02] 10Analytics-Kanban, 10DBA, 10Patch-For-Review: Inconsistent default charset for analytics slaves - https://phabricator.wikimedia.org/T170952#3518622 (10elukey) From db1046: ``` MariaDB [log]> SELECT count(table_name) FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='log' and not DATA_TYPE='binary' and COL... [15:31:08] mforns: the only difference i see is that in mw fields are not nullable and tinyint(1) unsigned [15:31:11] 10Analytics-Kanban, 10DBA, 10Patch-For-Review: Inconsistent default charset for analytics slaves - https://phabricator.wikimedia.org/T170952#3518626 (10jcrespo) Yes, I assume if not given, it will take the default from the database- changing it on the db everywhere should work for new creations. Do we need t... [15:31:28] mforns: in eventlogging db the fields are nullable and just tiny int [15:31:32] nuria_, and in EL? [15:31:36] ok ok [15:32:04] mforns: and mw scoop code maps those to boolean: https://github.com/wikimedia/analytics-refinery/blob/master/bin/sqoop-mediawiki-tables#L297 [15:32:14] mforns: and i guess it works right? [15:32:33] yes [15:32:50] maybe them being signed is the problem [15:32:53] mforns: let me look at connection args [15:33:03] mforns: in mw they are signed [15:33:08] mforns: and they work [15:33:16] mforns: in eventlogging they are not signed [15:33:30] oh I understood from what you said it was the opposite [15:33:35] sorru [15:33:41] Sorry, [15:34:12] so the EL fields are nullable and unsigned, right? [15:34:38] wait no, [15:34:43] xD [15:36:31] ya [15:36:33] mforns: jaja [15:37:00] ok, I looked at EL: unsigned nullable [15:37:47] ok, and MW: signed non-nullable [15:38:23] no! [15:38:31] MW: unsigned non-nullable [15:39:12] and EL signed nullable [15:39:29] so, maybe the fact that EL tinyints are signed makes the import fail? [15:40:50] mforns: ya, must be cause they are bigger than 1 byte in that case right? [15:41:02] mforns: let me read the spec [15:43:20] nuria_, in mysql cast(someSignedTinyIntField to unsigned) works, maybe we can try that [15:45:07] mforns: Yes! [16:02:01] 10Analytics-Kanban, 10DBA, 10Patch-For-Review: Inconsistent default charset for analytics slaves - https://phabricator.wikimedia.org/T170952#3518660 (10Marostegui) >>! In T170952#3518622, @elukey wrote: > From db1046: > > ``` > MariaDB [log]> SELECT count(table_name) FROM INFORMATION_SCHEMA.COLUMNS WHERE TA... [16:07:24] * elukey going offline [16:07:26] byyeee [16:15:34] hm problems with the fan [16:25:04] (03PS1) 10Ottomata: Add simple shell script to check if a yarn app is running [analytics/refinery] - 10https://gerrit.wikimedia.org/r/371496 [16:52:14] mforns: in laptop? [16:52:42] nuria_, yes, happened to me as well with the old one, had to install some packages [16:52:52] mforns: boy ... [16:52:57] gotta love linux [16:53:11] it's not a big problem, it just gets stuck at full speed eventually [16:53:24] hehe [17:05:31] mforns: looks like someone ahem is showing wikistats 2.0 at wikimania: https://twitter.com/WikiResearch [17:05:38] cc fdans milimetric [17:08:15] O.o [17:08:40] mforns: sorry https://twitter.com/WikiResearch/status/896030184125718529 cc milimetric , fdans [17:11:09] nuria_: wasn't shown in talks afaik [17:58:41] Pchelolo: forgot to answer! Thanks a lot for checking the librdkafka issue, we can wait for 0.11 to be released and then try it with eventstreams [18:42:11] (03PS1) 10Ricordisamoa: Load Open Sans from fontcdn [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/371514 [19:29:19] 10Analytics-Kanban, 10User-Elukey: Archive PageContentSaveComplete in hdfs while we continue collecting data - https://phabricator.wikimedia.org/T170720#3519330 (10Nuria) Ok, so the avro import is working fine, just opened avro files to verify: { "id" : { "int" : 1 }, "uuid" : { "bytes" : "f2bec... [19:36:33] (03PS38) 10Ottomata: JsonRefine: refine arbitrary JSON datasets into Parquet backed hive tables [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [20:12:11] 10Analytics-Kanban, 10User-Elukey: Archive PageContentSaveComplete in hdfs while we continue collecting data - https://phabricator.wikimedia.org/T170720#3519499 (10Nuria) In fact, creating avro table from schema, like: CREATE EXTERNAL TABLE `PageContentSaveComplete_5588433` ROW FORMAT SERDE 'org.apache.ha... [20:15:57] (03CR) 10Nuria: Add simple shell script to check if a yarn app is running (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/371496 (owner: 10Ottomata) [20:22:39] 10Analytics-Kanban, 10User-Elukey: Calculate how much Popups events EL databases can host - https://phabricator.wikimedia.org/T172322#3519526 (10Nuria) >OK, so for the avoidance of confusion, I assume that this archiving process with Scoop will be separate from the ongoing hourly import via Camus as described... [20:26:38] (03CR) 10Ottomata: "this would be done with PYTHONPATH=/srv/deployment/analytics/refinery/python" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/371496 (owner: 10Ottomata) [20:27:13] (03PS2) 10Ottomata: Add simple shell script to check if a yarn app is running [analytics/refinery] - 10https://gerrit.wikimedia.org/r/371496 [20:28:40] (03CR) 10Nuria: Add simple shell script to check if a yarn app is running (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/371496 (owner: 10Ottomata) [20:40:25] 10Analytics, 10Operations, 10hardware-requests: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097#3519593 (10RobH) [20:42:00] 10Analytics, 10Operations, 10hardware-requests: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097#3518505 (10RobH) [20:43:00] (03CR) 10Ottomata: Add simple shell script to check if a yarn app is running (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/371496 (owner: 10Ottomata) [20:56:23] 10Analytics-Kanban, 10Operations, 10ops-eqiad: Degraded RAID on analytics1055 - https://phabricator.wikimedia.org/T172809#3510136 (10RobH) So the raid0 device to disk is not a 1:1 mapping, so while VD2 (raid0 of a single disk) has failed, its actually the HDD is slot 1: ``` Enclosure Device ID: 32 Slot Num... [20:58:09] 10Analytics-Kanban, 10Operations, 10ops-eqiad: Degraded RAID on analytics1055 - https://phabricator.wikimedia.org/T172809#3510136 (10RobH) a:03Cmjohnson