[00:52:46] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10atgo) a:05kzimmerman→03chelsyx Moving to @chelsyx per @kzimmerman request. @... [07:05:46] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) https://rocm.github.io/hardware.html is very handy. As said previously, I'd stick with a GFX9 card, I'd say a RX Vega 64 or Radeon VII... [07:45:41] * elukey errand for a bit! [08:26:30] back :) [08:38:23] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Opened T216226 to discuss hw requirements for the new GPU (everybody interested please subscribe/chime-in!), let's use this task to deb... [08:38:37] 10Analytics, 10Operations, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10elukey) [08:38:48] ta daaan! ---^ [08:38:51] let's see how it goes [09:06:37] trying to merge my patch for Yarn [09:10:38] elukey: Heya ! Here I am for help if needed :) [09:11:02] thanks! [09:15:24] joal: as FYI, I had to create https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/490826/ as well [09:15:36] because of how the defaults are set in the cdh module in puppet [09:15:46] (it ends up being /rmstore-cdh every time) [09:15:53] this is something that I am planning to fix [09:15:55] I'll open a task [09:16:25] Ah crap - Maybe we shouldn't set a variable default in CDH? [09:16:30] ok [09:16:47] yeah I am working with Marcelo to get rid of the old default class model etc.. [09:16:58] but it will take a bit of time probably :D [09:17:12] I can imagine :) Thanks for that :) [09:20:29] ok yarn RM test up [09:20:58] I don't see anything weird in the RM prod logs [09:20:59] elukey: yarn prod UI looks ok [09:21:35] bringing up the test workers NM [09:23:26] [zk: localhost:2181(CONNECTED) 1] ls /rmstore [09:23:27] rmstore-analytics-test-hadoop rmstore [09:23:55] [zk: localhost:2181(CONNECTED) 3] ls /rmstore-analytics-test-hadoop/ZKRMStateRoot/RMAppRoot [09:23:58] [application_1550222299355_0001] [09:24:12] camus-webrequest_test [09:24:14] \o/ [09:24:20] all right it seems working :) [09:25:31] \o/ !!! [09:25:34] Awesome :) [09:27:14] ok I can finally add the missing bits for the testing cluster for webrequest-test [09:27:21] (refine and druid webrequest drop) [09:27:30] then we'll be ready for testing security [09:28:00] joal: when you have time I'd also want to ask you a question about https://phabricator.wikimedia.org/T215589#4955062 [09:28:23] Neil asked if it is possible to wait a bit for the dbstore1002 staging migration to allow change_tags to be deployed in mw history [09:28:45] but IIUC it will probably take more than 1/2 weeks right? [09:29:41] elukey: change_tags will be present in next snapshot, but not yet integrated in mediawiki-history [09:36:25] joal: so basically not usable by Neil? [09:37:40] if you could comment in the task about timings it would be great (whenever you have time), if what Neil needs will be present only in 2 snapshots time we cannot wait that long for the dbstore migration [09:40:00] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10JAllemandou) @ArielGlenn : Could we decide on regular day-in-month patterns for the various entity-dumps that need to be generate... [09:43:03] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10JAllemandou) @Neil_P._Quinn_WMF Hi ! We have planned to release `change_tags` raw table next month (february snapshot, released at beginning of March). The data will however pro... [09:43:43] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) What I'd prefer to do, if we are changing things around, is to do one right after another, so: all, truthy, lexemes b... [09:45:58] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10JAllemandou) Works for me :) I assume the system would work similarly to the existing XML dumps, meaning that dumps would be gene... [09:48:26] joal: <3 [09:50:49] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) The directories and any links or status files would be as they are now, but the date could I presume be passed into t... [09:53:25] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10elukey) @Neil_P._Quinn_WMF if the answer to the above question is no I'd proceed anyway with the staging migration, but probably Monday is too soon for you to figure all this ou... [09:56:41] (03PS1) 10Joal: Remove old change_tag definition from sqoop script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490828 (https://phabricator.wikimedia.org/T205940) [09:56:49] elukey: if you don't mind --^ [10:01:14] elukey: interesting! When trying to use refinery on an-coord1001 I get ImportError: No module named 'dns' [10:02:09] (03CR) 10Elukey: [C: 03+1] "It looks consistent with the previous patch from Dan, but I have to admit that I am fairly ignorant about the subject :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490828 (https://phabricator.wikimedia.org/T205940) (owner: 10Joal) [10:02:22] Thanks elukey :) [10:02:50] joal: ah snap my bad [10:03:11] elukey: I assume we need a little puppet tweak to install the missing dep? [10:03:22] basically I disabled puppet on an-coord1001 to prevent the mediawiki-geoeditors timer to be restored [10:03:29] and the package wasn't installed [10:03:30] hehe - of course [10:03:38] Sorry for that elukey :( [10:03:46] should be ok now [10:03:51] <3 [10:03:52] can you check? [10:04:04] \o/ [10:04:10] I think that I'll set the timer to 'absent' in puppet for the time being [10:04:10] Thanks mate :) [10:04:13] then we'll re-enable it [10:08:00] joal: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/490829/ [10:13:06] (03PS2) 10Joal: Remove old change_tag definition from sqoop script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490828 (https://phabricator.wikimedia.org/T205940) [10:15:56] (03PS1) 10Elukey: Add analytics1030 (test coord) to the targets [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/490830 [10:16:08] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add analytics1030 (test coord) to the targets [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/490830 (owner: 10Elukey) [10:16:25] added the test coordinator to the refinery scap targets --^ [10:57:28] (03PS3) 10Joal: Correct sqoop script for change_tag [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490828 (https://phabricator.wikimedia.org/T205940) [10:59:58] 10Analytics: Set up a Kerberos KDC service in production with minimal puppet automation - https://phabricator.wikimedia.org/T212257 (10elukey) [11:00:56] 10Analytics: Set up a Kerberos KDC service in production with minimal puppet automation - https://phabricator.wikimedia.org/T212257 (10elukey) Created a subtask to request a Ganeti VM to deploy a bare minimum kerberos service on it. The goal is to use it only to test it with the Hadoop Test cluster, the final pr... [11:10:18] systemctl status hadoop-yarn-nodemanager -> all good! [11:10:35] tail -f etc../...log -> shutdown completed [11:10:40] -.- [11:11:28] I was trying to find why camus-test was still in accepted state.. [11:12:43] * elukey rants something about the state of the cdh packages [11:14:22] ( [11:14:25] :( [11:57:52] I am wondering if we should try the fix for https://github.com/cloudera/hue/issues/746 [12:18:33] 10Analytics, 10Operations, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10Shilad) I've been lurking on this issue, and just wanted chime in with one bit of information I learned through experience. I'm not sure what model of Dell server you are using, but there a... [12:54:01] * elukey lunch! [13:44:13] (03PS8) 10Fdans: Change email send workflow to notify of completed jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) [13:44:20] (03CR) 10Fdans: Change email send workflow to notify of completed jobs (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans) [13:59:10] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10akosiaris) >>! In T213566#4934895, @Ottomata wrote: >> they will also not allow them to send the SYN/ACK packet required for the secon... [14:37:48] (03CR) 10Joal: "Still one nit: need to add the sub-workflow file definition in mediawiki/history/reduced/coord.properties files. I also added a comment ab" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans) [14:38:15] Hi fdans --^ I'm super sorry there 2 things that were not present in my previous review [14:40:00] (03PS9) 10Joal: Add spark code for wikidata json dumps parsing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 [14:42:25] joal: sorry, you’re right, I missed that since I didn’t test it with the reduced job [14:42:47] fdans: the comment about comment will also be nice, for us to remember why [14:43:01] fdans: please <3 [14:43:27] yessir [14:48:23] elukey: you around? [14:48:29] neilpquinn: I am :) [14:48:38] morning! [14:48:51] evening! ;) [14:49:02] It's 20:18 where I am :D [14:49:21] anyway, I *can* use the raw data or the EventBus logs for my script [14:49:24] ah! [14:49:36] But to be honest I would rather not [14:49:57] So the question is, would it derail your work significantly if you waited a month to migrate? [14:50:07] Or to lock staging [14:50:31] If so, I can do the extra work—but I wanted to make sure first :) [14:50:56] neilpquinn: if I got it correctly, you don't need the data in staging, but only a scratch pad on dbstore1002 [14:51:02] (And the reason I would rather not is because then I would later rewrite the script again to use mediawiki_history when that's incorporated) [14:51:12] yeah, that's right [14:51:22] so we could do the following [14:51:23] no reason it has to be staging per se [14:51:42] (need to verify with the data persistence team) [14:51:47] 1) we do the migration on Monday [14:52:02] 2) we don't set the staging database on dbstore1002 as read-only [14:52:19] 3) we alert people that we will not import data to the new staging [14:52:30] so in theory you'll be able to keep using dbstore1002 [14:52:45] ah, yeah, that would work perfectly [14:52:47] but we will import a fresh snapshot of staging into 1005 [14:53:02] the only risk is that people will need to avoid using staging from now on [14:53:08] and it might not happen [14:53:18] but I don't see any other easy solution [14:53:31] elukey: would it be hard to create a second staging database on dbstore1002? [14:53:33] the main motivation for the read only is that we wanted to avoid people relying on data on dbstore1002 [14:53:38] yeah, that makes sense [14:53:55] if there was a second, I could switch to that, but no one else would take the tame [14:53:57] *time [14:54:05] this is a good point [14:54:09] we could call it with your username [14:54:21] sure, that would work perfectly [14:54:34] lemme verify with the dbas [14:54:39] sounds good! [14:55:14] one caveat - data on dbstore1002 is starting to be not a lot reliable, we have some "holes" due to corruption after crashes, and the slaves are constantly lagging [14:55:24] so that host it is slowly dying [14:55:35] hmm [14:55:38] that's true [14:55:45] I hope that it doesn't happen but it might crash badly before we have the change tags [14:55:50] I've seen all those headaches cropping up on the ticket [14:55:55] yeah :)( [14:56:19] well, in that case, maybe it's worth the extra work just to get everything onto the Data Lake earlier rather than later [14:57:10] we are trying to add the support asap but we have a ton of things to do, very busy weeks :) [14:58:01] elukey: okay, I'm decided. I'll cut ties with dbstore1002 now rather than waiting :) [14:58:11] so you can go ahead with the migration as you originally planned [14:58:52] elukey: I will only have to repeat like 5-10% of the migration work so it's really not that big a deal [14:59:10] thanks for talking it through with me :) [15:00:52] neilpquinn: sure? I think that having the extra database is not a big deal for the moment [15:01:07] elukey: yeah, sure [15:01:41] neilpquinn: all right :) Is monday a good timing for you for the migration or do you prefer to postpone? [15:02:22] elukey: we will still be able to read after the migration, right? [15:02:26] neilpquinn: I'm currently sqooping change_tag and change_tag_def as a test for next month - You'll be able to test requests with them if you want [15:03:00] elukey: if that's the case, I have no problem with Monday [15:03:24] joal: ah, cool! I will definitely look at that [15:04:00] 10Analytics, 10Scoring-platform-team: [Discuss] ORES model development and deployment processes - https://phabricator.wikimedia.org/T216246 (10Halfak) [15:04:26] elukey: thanks for being willing to put in a workaround for me—but I decided that it's just not worth the effort to squeeze another month out of dbstore1002 :) [15:04:49] 10Analytics, 10Scoring-platform-team: [Discuss] ORES model development and deployment processes - https://phabricator.wikimedia.org/T216246 (10Halfak) In response to T214089#4954811: I think that we build models in hadoop is an excellent proposal. Regretfully, it's very painful as a developer to do something... [15:05:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Update git lfs on stat1006/7 - https://phabricator.wikimedia.org/T214089 (10Halfak) Moving to {T216246} [15:07:14] neilpquinn: np! Yes all the tables will be readable [15:07:21] the only change is staging read only [15:07:28] 10Analytics, 10Scoring-platform-team: [Discuss] ORES model development and deployment processes - https://phabricator.wikimedia.org/T216246 (10Halfak) [15:07:34] Amir1: o/ [15:07:46] Cc addshore [15:07:52] ok for the monday's migration plan? [15:08:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Update git lfs on stat1006/7 - https://phabricator.wikimedia.org/T214089 (10Halfak) With that, I think we can resolve this task. [15:09:08] elukey: I sent mangers an email didn't hear anything back from them so far. So It's good to go IMO [15:10:05] Amir1: super - so I'll notify when we start and when the staging db is ready on dbstore1005 [15:10:19] likely Tuesday morning EU time [15:10:52] Okay thanks! [15:11:00] thank you! [15:12:55] Yup, all sounds good to me [15:13:09] elukey: ^^ [15:13:14] <3 [15:13:23] joal: can you tell me if hue is a bit faster now? [15:14:41] 10Analytics, 10EventBus, 10Research, 10Services, 10The-Wikipedia-Library: page-links-stream doesn't caputre links on page deletion - https://phabricator.wikimedia.org/T216249 (10bmansurov) p:05Triage→03Normal [15:16:48] (03PS9) 10Fdans: Change email send workflow to notify of completed jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) [15:17:31] 10Analytics, 10EventBus, 10Research, 10Services, 10The-Wikipedia-Library: page-links-change stream doesn't caputre links on page deletion - https://phabricator.wikimedia.org/T216249 (10bmansurov) [15:18:57] 10Analytics, 10EventBus, 10Research, 10Services, 10The-Wikipedia-Library: page-links-change stream doesn't capture links on page deletion - https://phabricator.wikimedia.org/T216249 (10Samwalton9) [15:24:22] elukey: I don't use hue a lot - On oozie stuff, it feels relatively as usual, maybe a bit faster, but I'm not the right person to answer your question :) [15:24:52] joal: need to tune it a bit more, but basically now httpd servers the static files directly [15:24:59] before that it was served by python [15:25:03] a huge bottleneck [15:25:09] I didn't add caching headers [15:25:11] Wow - makes sense ! Thanks for that [15:25:17] but probably with those it will be even faster [15:26:16] (03CR) 10Joal: [C: 03+1] "Tested on cluster :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490828 (https://phabricator.wikimedia.org/T205940) (owner: 10Joal) [15:26:44] hi team :] [15:26:59] joal hiiiii! [15:27:58] is it possible to select insert a partitioned hive table into another table? I know you can insert select a given partition, but all in one command? [15:29:44] going to the vet, ttl! [15:34:19] Hi mforns - I don't understand what you're trying to do :) [15:34:27] joal, hehehe [15:35:22] I'm trying to dump the contents of event.navigationtiming into mforns.navigationtiming. The first has event.deviceMemory as biging the latter has event.deviceMemory as double [15:35:31] *bigint [15:36:14] mforns: I can't think of any other way than doing it partition by partition [15:36:16] I know how to dump 1 single partition: insert into table partition (blah) select blah from blah; [15:36:20] ok [15:36:26] yea, that was the question [15:36:49] mforns: Going for dynamic partitions will put a lot of overhead --> every input part bundle altogether, then split again [15:36:58] aha... [15:37:25] joal, yesterday I tried the alter table to change the type [15:37:42] yes? [15:37:45] it works well for bigints to doubles, IF the field is simple [15:37:57] but if the field to change is a subfield of a struct... [15:38:06] changing the type breaks the json serde... [15:38:07] yeah - A lot more complicated [15:38:18] and the whole event field is broken for all time [15:38:34] mforns: Ahhh ! It's not event serde, it's json [15:38:35] even if we backfill last 3 months, the previous data will be broken [15:38:38] ah [15:38:47] arf sorry - It's json serde, it's parquet IIRC [15:38:58] ok [15:39:00] NOT json, but parquet [15:39:06] I see [15:39:22] so, my idea was to insert select all the table [15:39:25] And I think it's because parquet doesn't acceept to read bigint when you tell it it should be double [15:39:33] but if we need to do it partition by partition... [15:39:48] joal, yes, the representation seems incompatible [15:40:01] You'll need in any case to do a conversion job to move bigint to double [15:40:07] aha [15:40:21] If you backfill into a double from real data, then newly inserted data should be ok [15:40:23] you mean spark? [15:40:33] I mean spark or hive [15:40:57] but in hive I need to execute a command for each partition? [15:41:10] yes mforns [15:41:12] there are around 10000 partitions [15:41:40] In spark as well, but you'll be able to automate the loop in spark (while you'll need to do it in bash if you use hive) [15:41:53] mforns: Are you planning to backfill from real data? [15:42:39] mforns: If no, then a job converting the table from bigint to the same other one except for double is the way to go [15:42:47] the idea was to create a temp table in mforns database with field type=double [15:42:53] then dump all data into it [15:43:10] then delete the original table, and recreate with correct type [15:43:22] and then move the data dir to original place [15:43:37] No backfill - Ok - I think it;s a good idea [15:44:03] joal, well, we can backfill, but we need to fix the historical data first no? [15:44:14] the other way to do it is to move existing data into another place, drop and recreate table for new data to already have correct type, and gently insert from old into new-correct [15:44:27] mforns: Nope - If you backfill, it fixes historical data [15:44:29] yes, same thing, but reverse [15:44:42] joal, we only have 3 months data [15:44:48] we can not backfill 2017 [15:44:52] mforns: it changes in that currently loaded data already benefits from being double (less not-correct data) [15:45:05] correct mforns, we cannot backfill older data [15:45:28] mforns: but we don't need to transform the last 3 month [15:45:34] as they'll be backfilled [15:45:37] anyway [15:45:50] ah! ok [15:45:59] yes, makes sense 3 months less to dump [15:46:22] mforns: The fact that date changes etc mkes it difficult to handle as well [15:46:39] date changes? [15:47:00] mforns: something else this use-case makes me think about: Are we planning to store hourly partitions for more than 100 schemas for the end of time? [15:47:22] that's a good question [15:47:24] mforns: if so, we need to think about a way of reducing the number of partitions to handle (bundle daily or even monthly_ [15:47:58] mforns: When you backfill, data continues to flows in, and the 3 month old-kept data also moves [15:48:18] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10mpopov) @elukey: is there a recommendation for how to `sqoop` with the shards?... [15:48:44] oh, I see [15:49:30] but I think Refine will handle that ok, no? if we pass to it absolute datetimes [15:50:32] mforns: refine will do fineas long as data doesn't get deleted as it runs [15:50:56] joal, aha, I can leave a margin of a couple days, no prob [15:51:02] That's my point :) [15:51:18] cool [15:52:27] mforns: So the idea would be: stop refine, move existing old-data (not to be backfilled), drop and recreate table with double field, restart refine (prod data flows in with correct type) [15:53:03] mforns: Then backfill data using refine, and at the same time convert old data to new schema [15:53:10] makes sense mforns ? [15:53:34] joal, yes [15:53:37] :] [15:53:55] Cool mforns :) I hope I'm not derailing your plan too much ;) [15:54:13] I'm only not confident of doing insert select for all partitions [15:54:26] joal, in spark seems a lot easier no? [15:55:05] mforns: Very much, yes [15:55:07] I could generate a hql file with all insert select statements for all partitions and then pass it to hive [15:55:14] Please don't ;) [15:55:18] but it would take for ever! [15:55:19] yes [15:55:33] k will look into it [15:55:36] thaaaanks! [15:55:41] Thank you :) [16:07:06] joal, I can just delete /wmf/data/event/NavigationTiming, and then reinsert from trash folder, no? is that too risky? [16:20:59] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10mforns) a:03mforns [16:25:32] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10elukey) @mpopov yes exactly, instead of using `analytics-store.eqiad.wmnet` and... [16:26:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10mforns) The plan is to: 1) Stop refine (ensure => absent in puppet?) 2) Move /wmf/data/event/Navigati... [17:01:09] a-team? standup? [17:01:14] I feel alone [17:01:40] mforns: i cannot connect [17:02:13] ping milimetric [17:02:48] I’m feeling sick nuria, just going to keep sitting in bed daydreaming [17:02:57] milimetric: sounds good [17:03:41] ping joal [17:03:48] joaL: standdduppp [17:04:10] elukey, I finished the maintenance: we can reenable: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/490887/ [17:05:14] mforns: I have a failure for eventlogging_to_druid_navigationtiming_hourly.service [17:05:21] I guess it should have been disabled as well? [17:05:43] elukey, yes... hm my fault [17:05:57] will backfill that [17:06:33] mforns: do you want me to proceed with the patch? [17:06:38] yes please :] [17:06:41] ack :) [17:09:06] PROBLEM - Check the last execution of eventlogging_to_druid_navigationtiming_hourly on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly [17:12:55] done :) [17:14:30] thanks [17:15:46] (03CR) 10Joal: [V: 03+2 C: 03+2] "Thank you Francisco :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans) [17:20:49] mforns: do you want to restart the failed job above? [17:20:57] otherwise we can systemctl reset-failed [17:21:02] and wait for the next execution [17:21:16] elukey, I think we can wait for next execution [17:42:35] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10Nuria) Per irc conversation we are good to proceed with migration Monday the 18th. As @JAllemandou mentioned the next scooping should include the change tag table even if the ch... [17:54:42] (03CR) 10Nuria: [C: 03+2] Correct sqoop script for change_tag [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490828 (https://phabricator.wikimedia.org/T205940) (owner: 10Joal) [17:57:29] 10Analytics, 10Analytics-Kanban, 10Contributors-Analysis, 10Product-Analytics, 10Patch-For-Review: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Nuria) THis change should get deployed by next week , jobs need to... [17:57:37] 10Analytics, 10Operations, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10RobH) Thanks for the input @shilad, its much appreciated! That info is EXACTLY the kind of info we need (and why this task exists!) To echo what some of us discussed about this in irc ye... [18:06:35] mforns: will restart turnilo after teh reloading of data I did and removal of dimensions [18:17:41] !Log restarted turnilo in analytics-tool1002 [18:18:36] !log restarted turnilo in analytics-tool1002 [18:18:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:27:39] 10Analytics, 10EventBus, 10Research, 10The-Wikipedia-Library, 10Services (watching): page-links-change stream doesn't capture links on page deletion - https://phabricator.wikimedia.org/T216249 (10Pchelolo) Indeed, haven't thought about that. On page deletion no `LinksUpdate` is scheduled, `LinksDeletionU... [18:50:17] mforns: I checked the status of eventlogging_to_druid_navigationtiming_hourly [18:50:20] it failed again [18:50:26] Exception in thread "main" org.apache.spark.SparkException: Application application_1550134620574_5526 finished with failed status [18:50:31] hm [18:50:34] diagnostics: User class threw exception: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('a' (code 97)): [18:50:42] at [Source: {"type":"struct","fields":[{"name":"dt","type":"string","nullable":true,"metadata":{}},{.. [18:50:43] elukey, I can not see data flowing in to event.navigationtiming [18:51:49] elukey, I think it's because event.navigationtiming is kinda broken... [18:53:37] elukey, it is normal that eventlogging_to_druid fails, because it has a lag of 4 hours IIRC [18:53:54] and I deleted all data, so it will take 4 hours until it starts processing data, and thus not failing [18:54:37] but that json parser error is related to that? [18:55:03] but for some reason, there's no data yet in event.navigationtim,ing [18:55:15] hmmmm [18:55:34] app is application_1550134620574_5526 [18:55:40] if you want to check the lost [18:55:42] *logs [18:55:46] yea [18:55:48] thanks [18:56:10] https://yarn.wikimedia.org/cluster/app/application_1550134620574_5526 [18:56:39] https://yarn.wikimedia.org/cluster/app/application_1550134620574_5526 [18:56:44] Unexpected character ('a' (code 97)): was expecting double-quote to start field name [19:00:27] RECOVERY - Check the last execution of eventlogging_to_druid_navigationtiming_hourly on an-coord1001 is OK: OK: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly [19:00:34] yes elukey I think it has to do with the way I recreated the table [19:05:28] mforns: so it will be solved? [19:05:41] I'm tring [19:05:45] I can downtime the alarm for a while if this is the case [19:05:46] trying [19:06:01] ah no no sorry I thought you meant that it would have solved in say x hours [19:07:21] 10Analytics, 10Discovery-Search, 10Multimedia, 10Reading-Admin, and 3 others: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10Fuzheado) FYI, some developments in the area of using image classification in the Wikiverse: We now have a Wikidata Distributed Game - Depicts tha... [19:10:07] elukey, refine failed, and there's no data [19:10:21] yes, I'd have thought that it would be solved in 4 hours, but no. [19:10:51] I will drop the table, and let Refine re-create it as if it did not exist yet [19:11:53] PROBLEM - Check the last execution of eventlogging_to_druid_navigationtiming_hourly on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly [19:15:55] ack :) [19:16:20] going to dinner but will check later [19:16:30] mforns: if you need me ping me on my phone [19:16:33] will re-join [19:16:47] elukey, thanks don't worry! [19:16:57] nice weekend! [19:16:59] * elukey off! [19:17:03] you too! [19:35:49] 10Analytics, 10Operations, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10colewhite) p:05Triage→03Normal [19:42:12] 10Analytics, 10EventBus, 10Research, 10The-Wikipedia-Library, 10Services (watching): page-links-change stream doesn't capture links on page deletion - https://phabricator.wikimedia.org/T216249 (10bmansurov) @Samwalton9 > Undeleting the page counts as a page creation and tracks the link addition(s). I... [19:52:54] 10Analytics, 10Scoring-platform-team: [Discuss] ORES model development and deployment processes - https://phabricator.wikimedia.org/T216246 (10Halfak) [20:02:14] RECOVERY - Check the last execution of eventlogging_to_druid_navigationtiming_hourly on an-coord1001 is OK: OK: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly [20:08:16] 10Analytics, 10Operations, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10EBernhardson) >>! In T216226#4957729, @RobH wrote: > Thanks for the input @shilad, its much appreciated! That info is EXACTLY the kind of info we need (and why this task exists!) > > > T... [20:36:58] Heya mforns - Need help trying to debunk the NavigationTiming stuff? [20:38:51] mforns: I'm thinking of the effect of having changed the hive schema to double instead of bigint, and I actually think it'll be of no help [20:43:01] Hm - Having given some more thought, I might be completely wrong in thinking it'll not work - Please discard that previous message :) [20:44:01] heh joal [20:44:19] I had some problems, but now it works [20:44:30] am about to launch Refine backfilling [20:53:35] \o/ mforns :) [20:54:00] I must say I'm happier that it works :) [20:54:52] Thanks for caring this data mforns - Don't work too late, it's friday evening [20:54:57] Have a good weekend :) [20:54:59] no prob :] [20:55:03] you too! [21:13:06] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10Neil_P._Quinn_WMF) >>! In T215589#4957707, @Nuria wrote: > Per irc conversation we are good to proceed with migration Monday the 18th. As @JAllemandou mentioned the next scoopin... [21:21:46] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10Marostegui) Another staging database where? Just to clarify: dbstore1002 will be full read only after the migration (MySQL doesn't allow to set read only on a database level, it... [21:23:31] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10Marostegui) Ah, nevermind my comment, you decided to completely move away from dbstore1002 :-) Thanks! [22:44:52] 10Analytics: Hive log4j logging is misconfigured - https://phabricator.wikimedia.org/T216294 (10Neil_P._Quinn_WMF) [22:45:47] 10Analytics: Hive log4j logging is misconfigured - https://phabricator.wikimedia.org/T216294 (10Neil_P._Quinn_WMF) p:05Normal→03Triage [23:37:10] 10Analytics, 10Data-release, 10Research, 10Privacy: An expert panel to produce recommendations on open data sharing for public good - https://phabricator.wikimedia.org/T189339 (10DarTar) a:05DarTar→03Nuria