[00:02:43] PROBLEM - Check the last execution of refinery-sqoop-mediawiki-production on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-sqoop-mediawiki-production [00:20:24] 10Analytics, 10Analytics-EventLogging, 10Discovery, 10EventBus, and 2 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Nuria) +1 to @Tgr specially to "log schemas are fluid while webhook schemas need to b... [00:30:48] I think our scooping failed cause it could not write to /wmf/data/raw/mediawiki/tables/logging/snapshot=2019-01 as it already existed on hdfs [00:31:55] milimetric: yt? [00:37:14] I am going to delete /wmf/data/raw/mediawiki/tables/logging/snapshot=2019-01 and restart scooping ? ping ottomata or milimetric to see it sounds ok [00:43:29] hopefully ahem ... this checks out, if it does not the 2019-01 will be in trash [00:45:41] errors: [00:45:51] https://www.irccloud.com/pastebin/2QRbV9Qy/ [00:48:46] will wait for milimetric a bit, can do same later [01:06:57] nuria: those errors are from today, the sqoop had already finished. I can take a closer look after bathtime [01:07:46] but we figured the sqoop ran fine, we miscommunicated on the oozie job restarting [01:08:05] it did seem fast so maybe something’s wrong [01:23:34] milimetric: see alarm: "Check the last execution of refinery-sqoop-mediawiki-production on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-sqoop-mediawiki-production" [01:23:47] milimetric: and dates of errors 2019-02-07T00:00:56 [01:24:32] milimetric: and directories on hdfs 2019-02-05 22:22 /wmf/data/raw/mediawiki/tables/logging/snapshot=2019-01/wiki_db=tawikisource [01:24:55] milimetric: are created before the errors on the sqoop log, right? [01:54:38] nuria: ok, so looking at this now [01:54:50] it seems maybe that sqoop job is wrong, but we all looked at it, I'm surprised [01:54:53] there are 3 jobs now [01:55:08] there's mediawiki, mediawiki-production, and mediawiki-private [01:55:21] the production one is just supposed to sqoop actor and comment, but I guess it's not [01:57:09] the normal one, that sqoops most tables, is good. I checked revision, archive, logging, etc. and they're all fine [01:57:17] on the other hand, actor and comment haven't been touched [01:57:24] checking logs now [01:59:39] hm, something seems wrong with the systemd timer. It says pretty clearly "-t actor,comment" [01:59:46] but it must be calling the wrong one or something [02:07:58] ok nuria, this patch is what was wrong: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/488670/ [02:08:15] nuria: I'll trigger the correct sqoop job manually, it's only a little late, should be fine [02:18:36] found another bug, added to change, sqoop is running now [03:31:34] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10Tbayer) >>! In T215379#4930611, @elukey wrote: > I think that the first step, if this job is important, could be to restore the cronjob under a user's crontab that will actively maintain it. T... [06:10:33] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) @JAllemandou dbstore1002 crashed, let's start the same thing but with less jobs I would suggest. [06:10:51] 10Analytics, 10Analytics-Kanban, 10Operations, 10Product-Analytics, 10Patch-For-Review: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) dbstore1002 crashed, possibly due to {T215450} [06:42:29] RECOVERY - Check the last execution of refinery-sqoop-mediawiki-production on an-coord1001 is OK: OK: Status of the systemd unit refinery-sqoop-mediawiki-production [06:43:09] I manually reset the status of --^ since dan is executing the job in screen [07:19:09] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10JAllemandou) @Marostegui : Sqoop finished yesterday at 23:31 UTC with expected number of rows. Maybe the db crash was unrelated? @Halfa... [07:22:15] \o/ - manual production-sqooping is done, mediawiki-history job started as expected :) [07:23:31] Something else to change in the production-sqoop config: use a different logging file than the priovate one [07:23:35] milimetric: --^ [07:24:05] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) @JAllemandou at what time did you start the job? From what I can see it crashed at around 18:32, so it crashed before then... [07:26:36] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10JAllemandou) @Marostegui : Indeed I started the job later (comment time is almostsynchronous). [07:37:45] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) Great - so the crash happened before. Thanks a lot for helping out here :) Let's wait for @Halfak to verify he's got everyt... [08:05:31] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Allow Erik Bernhardson to have root access on stat1005 for GPU testing - https://phabricator.wikimedia.org/T215384 (10Joe) I second the idea, and I see @Nuria has given +1 to the patch which I assume can count as manager approval. Given... [08:42:53] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Allow Erik Bernhardson to have root access on stat1005 for GPU testing - https://phabricator.wikimedia.org/T215384 (10Joe) a:03Joe [08:50:43] * elukey afk for a bit, cat to the vet [09:47:52] spent more than one hour fixing dbstore1002's replication [09:47:55] what a mess [09:48:03] now I really feel Manuel and Jaime [10:15:26] still broken [10:28:28] joal: o/ [10:43:51] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) I am also mysqldumping that table, just in case. [10:52:43] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10elukey) This is jdcc's crontab (I thought it was not an active user anymore, I was wrong): ` 0 15 * * * USER=jdcc /home/jdcc/anaconda3/bin/python /home/jdcc/project_monitoring/scripts/check_p... [10:53:34] addshore: o/ [10:53:38] HI [10:53:50] would you like to be the first one to move to the new dbstores? :D [10:54:04] i could certainly think about trying it out today [10:55:03] I have updated https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas with some notes [10:55:12] about the new cnames etc.. [10:55:17] thanks! [10:55:24] dbstore1002 is getting broken and broken [10:56:28] :D [11:59:27] 10Analytics, 10Analytics-Kanban, 10Operations, 10Product-Analytics, 10Patch-For-Review: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) @elukey @jcrespo Any objection to put dbstore1002 as IDEMPOTENT? This host crashes every single day, the data is already drifts a lot... [12:00:17] 10Analytics, 10Discovery-Search, 10Multimedia, 10Research, and 2 others: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10Gilles) There is an image classifier worth building that probably wouldn't fall into preexisting bias, which is determining whether an image is a photog... [12:04:44] 10Analytics, 10Analytics-Kanban, 10Operations, 10Product-Analytics, 10Patch-For-Review: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10jcrespo) ok to me, data is already garbage, more garbage would not be a problem :-) [12:07:16] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) @elukey the .sql file is at: `dbstore1003:/srv/tmp.staging/staging.sql` Can you grab it and store it somewhere else, so we... [12:14:30] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Joe) [12:18:42] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10JAllemandou) I suggest using `hdfs://wmf/data/archive/sqldumps` as a base for sql-dumps, with content-oriented subfolders, leading to `... [12:34:31] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10elukey) I am moving the dump to stat1004, then I'll move it to the HDFS dir that Joseph proposed. Will update the task once done :) [12:53:48] * elukey lunch! [13:05:22] ciao analytics! I have a question: I recall someone mentioned that ores revision scores were going to be available on hive? Or is it just something I really hoped for and I modified my memories accordingly? [13:05:26] :) [13:07:23] Hi miriam_ - Indeed you can get ores revision scores in hive (or even better, Spark) [13:08:04] joal: ohh this is amazing news :) which tables should I look at? [13:10:11] miriam_: You'll have only new stuff as data is flowing in through events (revisions are flowing in from 2018-12) [13:10:55] miriam_: actually, the reason for which data is available back for 3 month only is because of the data-rentention policy I guess [13:12:52] miriam_: he table you should use is event.mediawiki_revision_score [13:13:44] miriam_: Also, the data being not that big (12G), I really suggest using spark :) [13:13:47] joal oh, I see, my data is from september, so I won't be able to use it this time but this is very good to know for the future! [13:14:53] yes makes sense to use spark :) [13:15:03] thanks joal! [13:15:09] np miriam_ [13:15:15] merci :) [13:15:23] hehe [13:18:32] 10Analytics, 10Analytics-Kanban, 10Operations, 10Product-Analytics, 10Patch-For-Review: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10JAllemandou) sqoop for actor and comment tables just finished and we should use the new hardware next month, ,so no problem fir me either :) [13:26:14] Hey, I'm having problems running queries in Hue... [13:26:46] "Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask" [13:34:34] Hi edsanders - unfortunately error messages from hue are not informative most of the time :( [13:34:44] edsanders: con you tell me more about what you;re doing? [13:37:41] (03PS1) 10Joal: Add change_tag and change_tag_def to sqoop script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488927 (https://phabricator.wikimedia.org/T205940) [13:40:49] joal: aaah, can't believe we didn't catch all of these the first time, I must be more tired than I thought [13:42:10] milimetric: too many breaking stuff at one - we're getting back on track, let's even try to have stuff cleaner than it was before ;) [13:42:39] oh, at least the private log file I think was on purpose [13:42:48] Ah ok :) [13:42:57] I remember thinking it was ok to have it in the private one, I didn't make variables for a different log [13:43:02] do you prefer it different? [13:45:28] (03CR) 10Milimetric: "one small syntax thing, then merging" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488927 (https://phabricator.wikimedia.org/T205940) (owner: 10Joal) [13:45:46] joal: do you prefer different log files? [13:46:16] milimetric: given the 2 jobs (private and prioduction) are different, I think it makes sense to have 2 different files [13:46:39] also milimetric, commented on T154370 [13:46:40] T154370: Create script for moving orphaned revisions to the archive table - https://phabricator.wikimedia.org/T154370 [13:47:24] (03PS2) 10Joal: Add change_tag and change_tag_def to sqoop script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488927 (https://phabricator.wikimedia.org/T205940) [13:47:46] (03CR) 10Joal: Add change_tag and change_tag_def to sqoop script (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488927 (https://phabricator.wikimedia.org/T205940) (owner: 10Joal) [13:48:35] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add change_tag and change_tag_def to sqoop script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488927 (https://phabricator.wikimedia.org/T205940) (owner: 10Joal) [13:48:51] thanks milimetric --^ [13:49:17] joal: nice!!! How did you find that task?! [13:49:33] milimetric: phabricator-search [13:49:44] milimetric: I must say I have been lucky ;) [13:50:15] you are a genius kind of lucky, jo [13:50:28] mwarf ;) [13:51:50] milimetric: we're get settled on mediawiki-history soon - only 2 stages to go [13:52:27] cool. I mean, it would be weird if something happened, I tested it multiple times [13:52:54] milimetric: I'm just eager to tick the line and deploy the new snapshot ;) [13:56:38] for sure [13:56:44] ok, logfile split is up for review [13:56:49] gonna go have breakfast and stuff [14:03:43] merged and deployed [14:07:16] thanks elukey :) [14:08:49] joal: I am adding the .sql file to /wmf/data/archive/backup/misc/dbstore1002_backup/staging_dbstore1002_07022019.sql since all the others are already there [14:08:59] but I just realized that naming is not optimal [14:09:04] too many "backup" etc.. [14:09:19] so after the copy we can rename as we thing it's best [14:09:28] is it ok? [14:09:37] elukey: is that file is only for mep_word_persistence table, I bet we should rename it :) [14:10:24] yes yes it is similar to how manuel named it, so but I realized after starting the copy [14:10:32] shouldn't be a big deal to rename [14:10:51] elukey: for sure no, but let's do it now while we still remember what it's about :) [14:12:09] joal: I think it would be fun to discover it after some months [14:12:13] you ruin all the fun [14:12:19] Mwahahahaha :D [14:12:53] elukey: knowing it might be part of the *I* could be doing, I'm in the mood to ask for a rename, now :) [14:12:57] joal: do you have a minute today for the hive db drop stuff? [14:13:29] elukey: Yessir, now? [14:13:47] sure! [14:25:16] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10akosiaris) How is the data going to make it from Hadoop, which resides in the analytics cluster and is firewalled at the router level... [14:29:01] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Ottomata) > How is the data going to make it from Hadoop, which resides in the analytics cluster and is firewalled at the router level... [14:33:44] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) >>! In T213566#4934832, @akosiaris wrote: > Is it just a `LOAD DATA INFILE "something.tsv"` or is it something more complex... [14:34:41] 10Analytics, 10Analytics-EventLogging, 10Discovery, 10EventBus, and 2 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Ottomata) > If we were to use Monolog, we'd likely want to do it with the aim of conve... [14:36:07] 10Analytics, 10Discovery: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Ottomata) @fgiunchedi ping on this...or can you ping someone who might know more about using Swift for this kind of thing? Should we consider th... [14:36:58] 10Analytics, 10Discovery: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Ottomata) [14:41:17] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10akosiaris) >>! In T213566#4934835, @Ottomata wrote: >> How is the data going to make it from Hadoop, which resides in the analytics cl... [14:44:32] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) > That does look simple enough and not resource expensive on mwmaint1002. I guess it can fit in there as well? But a VM is... [14:56:44] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Ottomata) > they will also not allow them to send the SYN/ACK packet required for the second (of the three) phase of the TCP handshake... [14:57:14] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10elukey) Naming is not great (yet) but the backup is on HDFS: ` elukey@stat1004:~$ sudo -u hdfs hdfs dfs -ls /wmf/data/archive/backup/m... [14:58:15] joal: just trying to run any query with a "group by" clause [14:58:27] I can run simple queries such as "select * from foo limit 1" [14:58:54] edsanders: might depend on the table as well [14:59:34] so this query which I copied off a phab task: [14:59:34] select [14:59:34] to_date(dt) as date, [14:59:34] count(*) as events [14:59:34] from editattemptstep [14:59:34] where year = 2018 and month=12 [14:59:34] group by to_date(dt) [14:59:50] in event_sanitized [15:00:20] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) Thanks @elukey - ok to delete the .sql file I created from dbstore1003? [15:00:57] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10elukey) >>! In T215450#4934900, @Marostegui wrote: > Thanks @elukey - ok to delete the .sql file I created from dbstore1003? +1 [15:02:55] edsanders: the query worked for me in CLI - Might be related to reading rights [15:02:56] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) Done - thank you [15:03:48] yeah - my googling suggested it was a rights issue [15:03:50] edsanders: I need to go errand for a ~2h, I'll be back after that - something else: ou should change the query month to use 2019-01, it contains data while 2018-12 doesn't :) [15:04:13] elukey: if you have a minute, can you check with edsanders his LDAP groups and all? [15:04:21] gone for kids team - see you at standup [15:04:56] sure [15:05:28] what is it needed from LDAP? [15:11:51] 10Analytics: Check home leftovers of user imarlier (Ian Marlier) - https://phabricator.wikimedia.org/T213702 (10elukey) Everything cleaned up, thanks all for the feedback! [15:12:02] 10Analytics, 10Analytics-Kanban: Check home leftovers of user imarlier (Ian Marlier) - https://phabricator.wikimedia.org/T213702 (10elukey) [15:12:23] 10Analytics, 10Analytics-Kanban: Check home leftovers of user imarlier (Ian Marlier) - https://phabricator.wikimedia.org/T213702 (10elukey) a:03elukey [15:25:51] updated https://wikitech.wikimedia.org/wiki/Analytics/Ops_week#Have_any_users_left_the_Foundation? with info about how to drop a hive database [15:25:55] (credits to Joseph) [15:28:23] NICE elukey and joal! :) [15:28:42] btw mforns wanted to say thanks for all the recent wikitech docs too (HiveToDruid, etc.) those are great!@ [15:37:45] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10Jdcc-berkman) I only setup the reports to be dumped to disk. I didn't know they were getting emailed out, so that must have been handled somewhere else. [15:56:27] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Halfak) Confirmed that they have the same number of rows (SELECT COUNT(*) FROM mep_word_persistence;). I confirmed that I can run a us... [15:57:14] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) Thanks @Halfak, can we drop the table from dbstore1002 then? [15:57:15] joal: yup, just noticed that, although still fails with the correct month [16:09:15] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Halfak) Yes. Thanks for your patience. [16:29:29] 10Analytics, 10EventBus, 10Parsoid, 10Research, and 5 others: Surface link changes as a stream - https://phabricator.wikimedia.org/T214706 (10bmansurov) [16:50:42] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Convert Aria/Tokudb tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10Marostegui) I finished converting `mep_word_persistence` to InnoDB on dbstore1003: ` (1 day 9 hours 52 min 45.49 sec) ` This was obviously done before we knew it... [16:52:38] 10Analytics, 10EventBus, 10Research, 10Wikidata, and 4 others: Surface link changes as a stream - https://phabricator.wikimedia.org/T214706 (10bmansurov) [16:53:57] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10JAllemandou) Thanks @Halfak for the double check :) Also, If you like hive speed, try Spark :D Massive thanks again to @Marostegui for... [16:56:48] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) >>! In T215450#4935181, @Halfak wrote: > Yes. Thanks for your patience. Thank you! I will kill this table tomorrow and c... [16:59:32] ottomata: I confirm the cast fails from struct to maps, even with correctly typed structs [16:59:43] ottomata: Double-reading is the only way I can think of [17:02:03] does double reading even work though? [17:02:05] 10Analytics, 10EventBus, 10Research, 10Wikidata, and 4 others: Surface link changes as a stream - https://phabricator.wikimedia.org/T214706 (10bmansurov) p:05Triage→03Normal [17:02:24] ottomata: not even sure no, but worth a try :) [17:02:40] elukey: standdup? [17:04:44] \o/ - mediawiki-history-check worked :) [17:05:12] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reportupdater queries jobs failing - https://phabricator.wikimedia.org/T213219 (10Nuria) [17:05:14] 10Analytics, 10Analytics-Kanban: Reportupdater should alert if it fails over and over - https://phabricator.wikimedia.org/T213309 (10Nuria) 05Open→03Resolved [17:05:34] dependent jobs have kicked in, seems it's working :) [17:10:45] edsanders: we're in standup, we'll discuss what could be wrong with your access to the data - Will report after [17:25:58] thanks! [17:36:49] edsanders: you need to be in the analytics-privatedata group, and for that you need to file a task with Ops-Access-Requests [17:41:45] milimetric: thanks, is there a template? [17:42:03] 10Analytics, 10Discovery-Search, 10Multimedia, 10Research, and 2 others: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10Isaac) If we go down that pathway of trying to identify what images are photographs, we should look into work by a former colleague of mine on detecting... [17:42:20] edsanders: I'll try to find an example, I don't think there's a template, it's just like "please give me access to this group, this is why, here's my manager CC-ed for approval" [17:42:58] I think I already did something similar that gave me access to the stats servers [17:44:42] I can access the old mysql databases on stat1005.eqiad.wmnet [17:56:38] edsanders: you shouldn't be able to log in into stat1005, is it the case? [17:56:50] (it has been deprecated in favor of stat1007 a while ago) [18:00:25] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) Since this host is important for the Analytics team, I'd be up to take over from the OS install perspective to remove some work from... [18:03:20] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10fdans) p:05Triage→03Low [18:06:53] 10Analytics: Aggregate pageviews to Wikidata entities - https://phabricator.wikimedia.org/T215438 (10fdans) p:05Triage→03Low [18:08:25] 10Analytics, 10Growth-Team, 10Product-Analytics: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10fdans) p:05Triage→03High [18:10:42] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Milimetric) @Krinkle: refined data was down-cast and is going to remain incorrect unless we re-ingest. We have raw data going back 90 days [1], s... [18:12:37] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Ottomata) Eeee I don't know if that is a good idea... to treat all data as doubles. The right solution is to create the table from the JSONSchema... [18:14:03] 10Analytics: Replace analytics mailto link in analytics.wikimedia.org - https://phabricator.wikimedia.org/T215362 (10fdans) p:05Triage→03Normal [18:14:07] 10Analytics, 10Analytics-Kanban: update mw scooping to be able to scoop from new db cluster - https://phabricator.wikimedia.org/T215290 (10fdans) p:05Triage→03High [18:15:53] 10Analytics, 10Analytics-Kanban: update mw scooping to be able to scoop from new db cluster - https://phabricator.wikimedia.org/T215290 (10fdans) [18:15:58] 10Analytics, 10Analytics-Kanban: Update reportupdater to be able to query the new db cluster that will substitute 1002 - https://phabricator.wikimedia.org/T215289 (10fdans) p:05Triage→03High [18:16:04] 10Analytics, 10Analytics-Wikistats: There are two entries for Cantonese language - https://phabricator.wikimedia.org/T215139 (10fdans) p:05Triage→03High [18:16:38] 10Analytics, 10Wikimedia-Stream: EventStreams returns 502 errors from outside the WMF network - https://phabricator.wikimedia.org/T215013 (10fdans) @Ottomata are there any actionables for us here? [18:17:56] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Convert Aria/Tokudb tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10fdans) [18:17:58] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10fdans) 05Open→03Resolved [18:19:08] ottomata: I forgot to ask yesterday about Spark 1.x deprecation! :) [18:19:14] I think that we should be ready right? [18:19:17] elukey: yes, I can log into stat1007, not stat1005 [18:19:23] ack thanks :) [18:19:44] for a moment I was like "did I mess with access for that host?" :D [18:19:57] do I still need to request more access? [18:20:27] for analytics-privatedata? I think so, I didn't follow the conversation but it shouldn't change [18:23:02] 10Analytics, 10EventBus: Spike: Can Refine handle map types if Hive Schema already exists with map fields? - https://phabricator.wikimedia.org/T215442 (10fdans) a:03JAllemandou [18:23:13] 10Analytics, 10Analytics-Kanban, 10EventBus: Spike: Can Refine handle map types if Hive Schema already exists with map fields? - https://phabricator.wikimedia.org/T215442 (10fdans) [18:26:49] ottomata: agreed that eventually refine should be schema-aware, I always liked that better. But in the meantime, we need to solve this problem, we're butchering data [18:32:44] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10Nuria) @Slaporte: if these scripts are important let's please find a person responsible for maintaining them. They are very sparsely documented and as you can see it is not even clear who setu... [18:33:41] * elukey ofF! [18:59:04] milimetric: we're butchering data only in a few cases, right? [18:59:12] we could precreate schemas for ones we're worried about [18:59:26] its only if the schema gets created incorretly during the first refine (or addition of a field) [19:00:09] joal: would it be possible to move anlaytics systems hangtime to before standup on thursdays? [19:00:12] i see you have a busy block there... [19:09:58] ottomata: I have the kids before standup, I won't make it :( [19:11:56] aye ok hmmm [19:12:31] leila: so yaa if we want joseph to come (which we do!) we can't do it until later [19:15:45] ottomata, leila: could we do it on fridays, after analytics standup? [19:16:32] There seem to be a free slot for both of you [19:17:19] neither of us like friday meetings :) [19:17:30] i am off every other friday too so sometimes it might not work [19:17:36] makes sense - i don't either, but he [19:17:42] Ah right forgot about that [19:17:45] hm [19:18:03] Could be on wednesdays? [19:21:13] not before think not before standup, others have meeting then [19:30:42] ottomata: I'm out of suggestions :( [19:32:06] Nettrom: Good evening - Are you by any chance still around? [19:33:36] joal: I am, but about to have a meeting :( [19:34:16] Nettrom: no problem, no emergency - Let's see if I'll still have a question tomorrow ;) [19:34:34] joal: sounds good, I should be around tomorrow morning PST, so feel free to ping me then! [19:34:44] Thanks Nettrom :) [19:37:38] ottomata: a question for you - Have we closed new artifacts mirroring on archiva? [19:41:51] new artifacts mirroring? [19:41:56] yessir [19:42:02] not sure what that means [19:42:30] meaning, if I add a new dependency in a pom depending on archiva, will it be mirrored autmogically? [19:46:29] i think so yes [19:46:31] it should be iirc [19:46:37] hm [19:46:46] ok - I have an issue with maven repo maybe then :) [19:47:56] Ah ! Found it [19:48:49] ottomata: I'd need to mirror a package located in a different repo than maven-core - namely: [19:49:03] https://dl.bintray.com/spark-packages/maven/ [19:49:39] where reside a bunch of spark packages, among which graphframes, that I'd like to update to 0.7.0 (at least 0.6.0) [19:51:33] (03PS1) 10Joal: Update graphframes to 0.7.0 in refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/489004 [19:51:36] ottomata: here is the pull request --^ [19:51:49] ottomata: the build fails because the jars are not handled by the main repo [19:51:53] ah [19:51:59] yeah we only mirror from maven central and cdh [19:52:05] makes sense [19:52:09] do you know which packages you need? [19:52:12] can you upload them manually? [19:52:31] yessir - details in the PR :) [19:52:47] https://wikitech.wikimedia.org/wiki/Archiva#Uploading_dependency_artifacts [19:53:45] (03CR) 10jerkins-bot: [V: 04-1] Update graphframes to 0.7.0 in refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/489004 (owner: 10Joal) [19:53:48] Will follow ottomata - Thanks 1 [19:55:21] yup! [19:55:23] lemme know if it works [19:57:36] ottomata: worked like a charm - Many thanks :) [19:58:53] 10Analytics-Kanban: Bump graphframes version to 0.6.0+ - https://phabricator.wikimedia.org/T215547 (10JAllemandou) [19:59:01] 10Analytics-Kanban: Bump graphframes version to 0.6.0+ - https://phabricator.wikimedia.org/T215547 (10JAllemandou) a:03JAllemandou [19:59:39] (03PS2) 10Joal: Update graphframes to 0.7.0 in refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/489004 (https://phabricator.wikimedia.org/T215547) [20:02:14] joal: I'm trying to keep my Friday's free from meetings for focus time. [20:02:30] leila: I definitely understand :) [20:02:38] leila: We'll find a time :) [20:02:56] joal: I'd really appreciate it. Friday is the one day I can fully focus and deep dive. [20:03:44] leila: Please keep it this way, I'm sure there'll be a time that we can accomodate without too much disturbance [20:03:57] joal: thanks! :) [20:07:03] ottomata: added the line you asked for in https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark#Spark_tuning_for_big_jobs [20:07:24] I'll call it done and merge the code patches (mforns +1ed them already) [20:08:22] :] [20:11:27] (03PS3) 10Joal: Update big spark jobs settings [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/482661 (https://phabricator.wikimedia.org/T213525) [20:11:52] thanks joal! [20:11:58] (03CR) 10Joal: [C: 03+2] "Merging after providing doc." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/482661 (https://phabricator.wikimedia.org/T213525) (owner: 10Joal) [20:12:04] Thanks for reading ;) [20:13:14] I'd also hapilly merge https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/489004 if any of you gives me a +1 :) [20:21:52] (03CR) 10Ottomata: [C: 03+1] Update graphframes to 0.7.0 in refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/489004 (https://phabricator.wikimedia.org/T215547) (owner: 10Joal) [20:22:30] (03Merged) 10jenkins-bot: Update big spark jobs settings [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/482661 (https://phabricator.wikimedia.org/T213525) (owner: 10Joal) [20:29:55] 10Analytics, 10Wikimedia-Stream: EventStreams returns 502 errors from outside the WMF network - https://phabricator.wikimedia.org/T215013 (10Ottomata) Yes, at the least we need to make the aliveness check somehow check from outside prod networks. [20:30:01] (03CR) 10Joal: [C: 03+2] "Merging for bug-fix. Thanks ottomata" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/489004 (https://phabricator.wikimedia.org/T215547) (owner: 10Joal) [20:34:07] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Wikidata, and 3 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10JAllemandou) I confirm the fix :) Closing this task. [20:35:55] (03Merged) 10jenkins-bot: Update graphframes to 0.7.0 in refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/489004 (https://phabricator.wikimedia.org/T215547) (owner: 10Joal) [20:36:18] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Nuria) Ideally I would prefer that stats machines are completely out of the workflow of pushing data to machines like mwmaint1002.eqia... [20:38:17] nuria: fyi in that ticket ^ stat1007 was just an example of rsync pull from analytics vlan [20:38:26] not suggestion that we would use it [20:38:43] T213976 is where we will figure out how to do this [20:38:44] T213976: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 [20:40:26] ottomata: ah k, let me look at that one [20:42:27] 10Analytics, 10Discovery, 10Operations, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Nuria) [20:43:17] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Nuria) As @Ottomata pointed out more generic discussion about this topic can be found here: https://phabricator.wikimedia.org/T213976 [20:49:19] 10Analytics, 10Analytics-Kanban: Update sqoop base script for new analytics-db infra - https://phabricator.wikimedia.org/T215549 (10JAllemandou) [20:49:50] milimetric, nuria - I just created that --^ (analytics board + kanban) [20:51:09] joal: i think is a duplicate of https://phabricator.wikimedia.org/T215290? [20:51:37] It is nuria ! Didn't know about the original one - will close [20:52:14] 10Analytics, 10Analytics-Kanban: update mw scooping to be able to scoop from new db cluster - https://phabricator.wikimedia.org/T215290 (10JAllemandou) [20:52:16] 10Analytics, 10Analytics-Kanban: Update sqoop base script for new analytics-db infra - https://phabricator.wikimedia.org/T215549 (10JAllemandou) [20:52:55] 10Analytics, 10Analytics-Kanban: Test sqooping from the new dedicated labsdb host - https://phabricator.wikimedia.org/T215550 (10JAllemandou) [20:53:03] nuria: is that one also a dup --^ [20:53:05] ? [20:55:47] 10Analytics, 10EventBus, 10Research, 10Wikidata, and 4 others: Surface link changes as a stream - https://phabricator.wikimedia.org/T214706 (10Pchelolo) We're getting closer. The last part of the puzzle would be to emit the event publically via event streams. In order to do that, we need to add the `mediaw... [21:01:58] Gone for today team - see you tomorrow [21:03:44] byeeeee [21:23:52] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10Tbayer) >>! In T215379#4934230, @elukey wrote: > This is jdcc's crontab (I thought it was not an active user anymore, I was wrong): Me too (based on T183291) ... > > ` > 0 15 * * * USER=jdcc... [21:40:12] 10Analytics, 10Product-Analytics: Superset's rolling average feature results in error message - https://phabricator.wikimedia.org/T213488 (10jlinehan) Since the Superset 0.28.1 upgrade ([[ https://phabricator.wikimedia.org/T211605 | T211605 ]]) has run aground for now, I talked with @Nuria about the possibilit... [22:35:28] (03PS1) 10Ladsgroup: Introduce WikimediaDbSectionMapper based on db-eqiad.php config [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/489097 (https://phabricator.wikimedia.org/T213894) [22:56:16] 10Analytics, 10Product-Analytics: Superset's rolling average feature results in error message - https://phabricator.wikimedia.org/T213488 (10Nuria) p:05High→03Triage [22:57:02] 10Analytics, 10Performance-Team: Plan navtiming data release - https://phabricator.wikimedia.org/T214925 (10Nuria) p:05Low→03Normal [22:59:36] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10sbassett) !!**Security Review Summary - February 2019**!! Overall, this looks pretty good to me. Issues d... [23:05:05] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Krinkle) @Milimetric If I understand correctly, you're proposing to change the schema and replace records for which we still have the raw data. Ol...