[01:17:41] (03CR) 10Milimetric: [C: 03+2] Fix mediawiki_history_reduced checker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507254 (https://phabricator.wikimedia.org/T222141) (owner: 10Joal) [01:27:30] (03CR) 10Milimetric: Fix mediawiki_page_history undefined userId (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507258 (https://phabricator.wikimedia.org/T222141) (owner: 10Joal) [01:37:19] (03CR) 10Milimetric: "Please ping me tomorrow morning on IRC so we can talk about this, as early as you want. I'll be up really early." (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507260 (https://phabricator.wikimedia.org/T222141) (owner: 10Joal) [01:41:48] (03CR) 10Milimetric: Fix mediawiki-history user event join (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/504834 (owner: 10Joal) [04:27:24] so next I want to lookup geocodes for pages. Does wikidata seem like a reasonable way to do this or should I regex the page texts? [04:54:09] 10Analytics, 10Analytics-Cluster: Upgrade Spark to 2.4.2 - https://phabricator.wikimedia.org/T222253 (10Groceryheist) [05:14:19] I'm getting a weird error in Spark in Swap [05:14:37] I'm using pyspark [05:14:45] importerror: no module named 'pyarrow' [05:14:55] I think it may be related to using a udf [05:15:05] and the problem might be that pyarrow isn't installed on worker nodes. [05:15:15] It is available in the notebook [05:15:28] https://stackoverflow.com/questions/51084514/apply-function-per-group-in-pyspark-pandas-udf-no-module-named-pyarrow [06:02:12] groceryheist: hi! Today is bank holiday for most of the europeans, would you mind to open a task tagging "Analytics" ? [06:02:31] I'll make sure to work on it asap (or Andrew) [06:05:41] * elukey afk! [06:42:24] ok elukey [06:48:33] 10Analytics, 10Analytics-Cluster: Pyspark on SWAP: Py4JJavaError: Import Error: no module named pyarrow - https://phabricator.wikimedia.org/T222254 (10Groceryheist) [10:51:10] 10Analytics, 10Analytics-Cluster: Pyspark on SWAP: Py4JJavaError: Import Error: no module named pyarrow - https://phabricator.wikimedia.org/T222254 (10elukey) https://phabricator.wikimedia.org/T202812 is related to a previous request, maybe it helps! Going to check tomorrow :) [12:08:56] (03PS1) 10Joal: Fix HdfsUtil forgotten rename [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507549 [12:10:08] Heya team - When somebody gets online, Would you mind reviewing --^ ? [12:10:23] sqoop is gently failing because of that :( [12:35:05] 10Analytics, 10Performance-Team: Plan navtiming data release - https://phabricator.wikimedia.org/T214925 (10Gilles) 05Open→03Stalled p:05Normal→03Low [12:51:17] joal: howdy [12:52:33] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "Sorry I missed this the first time! I did spot check manually but it's hard to find stuff like this." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507549 (owner: 10Joal) [12:53:22] I'm here to help with the history stuff, priority 1 for me [12:54:05] oh no! it's May Day, forgot, sorry, enjoy! [13:05:52] 10Analytics, 10Research, 10Article-Recommendation, 10Patch-For-Review: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov) Thanks, @Ottomata! [13:28:10] joal: I can't tell if you're working or not, I was going to deploy https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/507549/ and restart sqooping [13:28:18] but don't want to conflict with you [13:28:24] Hi milimetric [13:28:40] please excuse me, I missed the previous ping :) [13:29:02] milimetric: I'm currently preparing the patch for mediawiki-history - Found the issues with reduced [13:29:29] If you don't mind, I'll finish this (I'll need some input from you), and we can review, merge and deploy after? [13:29:39] joal: I'm here in any way you need me [13:29:54] let me know if you need anything like food delivery :) [13:29:55] Thanks milimetric :) [13:30:32] milimetric: let's batcave for a minute so that I tell you my wonders - It'll give you some time to think about them before I need an answer :) [13:31:02] ottomata: Actually, could you manually kill the currently running sqoop job? [13:31:04] omw [13:31:24] eh? [13:31:43] sqoop-mediawiki-monthly-2019-04-frwiki.pagelinks ? [13:31:52] joal: ? [13:31:59] ottomata: the main python job on an-coord1001 [13:32:30] oook, just kill it? [13:32:38] yesp lease - it fails [13:34:06] joal the one that was writing to your user diir? [13:34:14] i hope so cause i killed it :) [13:34:36] ottomata: MEEEH? [13:38:41] ? [13:39:53] I was not supposed to write in my folder ( [13:40:19] ottomata: it was supposed to be the real prod one [13:40:39] https://github.com/wikimedia/puppet/blob/c6a9b608f33f77e38aaedee3a4c36989b7fdf78f/modules/profile/templates/analytics/refinery/job/refinery-sqoop-mediawiki-private.sh.erb [13:40:43] https://github.com/wikimedia/puppet/blob/c6a9b608f33f77e38aaedee3a4c36989b7fdf78f/modules/profile/templates/analytics/refinery/job/refinery-sqoop-mediawiki-production.sh.erb [13:40:48] https://github.com/wikimedia/puppet/blob/c6a9b608f33f77e38aaedee3a4c36989b7fdf78f/modules/profile/templates/analytics/refinery/job/refinery-sqoop-mediawiki.sh.erb [13:40:52] thanks milimetric [13:40:53] those three ottomata [13:41:43] hmm maybe i killed a few [13:41:52] but there were a bunch that were running since march 08? [13:42:02] e.g. [13:42:09] /usr/bin/python3 /home/joal/code/refinery/bin/sqoop-mediawiki-tables --job-name sqoop-mediawiki-monthly-2019-02 --labsdb --output-dir /user/joal/wmf/data/raw/mediawiki/tables --wiki-file /mnt/hdfs/wmf/refinery/current/static_data/mediawiki/grouped_wikis/labs_grouped_wikis.csv --tables archive,change_tag,change_tag_def,ipblocks,ipblocks_restrictions,logging,page,pagelinks,redirect,revision,user,user_groups --user [13:42:09] s53272 --password-file /user/hdfs/mysql-analytics-labsdb-client-pw.txt --from-timestamp 20010101000000 --to-timestamp 20190301000000 --partition-name snapshot --partition-value 2019-02 --mappers 64 --processors 16 --output-format parquet --log-file /var/log/refinery/sqoop-mediawiki.test-labsdb1012.20190308.log [13:42:37] there are no current sqoop jobs running [13:42:50] processes* [13:42:52] on an-coord1001 [13:45:12] ottomata: no sqoop maybe, but the python running the thing I think so [13:45:44] ? [13:46:41] ottomata: https://github.com/wikimedia/puppet/blob/c6a9b608f33f77e38aaedee3a4c36989b7fdf78f/modules/profile/templates/analytics/refinery/job/refinery-sqoop-mediawiki.sh.erb should be running [13:47:14] i killed it all [13:47:15] ? [13:47:22] there's no sqoop running on an-coord1001 [13:47:23] yes please [13:47:34] i just killed all the sqoop there I saw [13:47:40] including a bunch that were writing to your user dir [13:47:46] looked like very old processes? [13:48:01] probably ottomata - seems weird those very old [14:13:23] chelsyx: Hi :) [14:13:53] joal: I can go to Scrum of Scrums if you like [14:14:21] milimetric: That would help a lot - I usually don't go because wednesday is kids' day [14:14:34] np, I'll go [14:40:32] (03PS1) 10Joal: Update mediawiki_history after quality phase 1 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507587 (https://phabricator.wikimedia.org/T221824) [14:41:46] milimetric: --^ please :) [14:41:56] * milimetric looks [14:44:29] (03CR) 10Joal: Fix mediawiki-history user event join (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/504834 (owner: 10Joal) [14:44:57] joal: what's the plan with the old table? rename it and point it to an archive of the old snapshots, start fresh with the new schema? [14:45:20] correct milimetric - Keep old, we'll drop later [14:45:34] milimetric: I can add the plan to the commit message if you want [14:45:37] but it won't be accessible with the new schema [14:45:52] correct milimetric - I'll move the tables [14:46:40] that's what I was thinking - move the old snapshots to mediawiki_archive or something like that, rename the table and point it to that archive folder [14:46:51] yessir [14:47:16] milimetric: actually might be easier to drop table, move files, recreate and repair table [14:47:19] but that's details [14:47:29] gotcha [14:51:12] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Ottomata) [14:52:10] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request - https://phabricator.wikimedia.org/T222268 (10Ottomata) [14:52:32] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis, and 3 others: Port usage of mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request - https://phabricator.wikimedia.org/T222268 (10Ottomata) [14:53:05] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Ottomata) [14:53:08] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis, and 3 others: Port usage of mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request - https://phabricator.wikimedia.org/T222268 (10Ottomata) [14:53:11] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Discovery, and 4 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Ottomata) [14:54:56] (03CR) 10Milimetric: Update mediawiki_history after quality phase 1 (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507587 (https://phabricator.wikimedia.org/T221824) (owner: 10Joal) [14:55:14] joal: only one actual question/comment ^ [14:55:45] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis, and 3 others: Port usage of mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request - https://phabricator.wikimedia.org/T222268 (10EBernhardson) The primary user of this data is the oozie job in `wikimedia/discovery/analytics` r... [14:56:06] (03CR) 10Milimetric: [C: 03+2] Fix mediawiki-history user event join (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/504834 (owner: 10Joal) [14:56:34] (03CR) 10Milimetric: [C: 03+2] "(will merge for now but Joseph and Nuria may still want to sync up on this)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/504834 (owner: 10Joal) [14:56:44] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix mediawiki-history user event join [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/504834 (owner: 10Joal) [14:57:45] (03CR) 10Milimetric: Update mediawiki_history after quality phase 1 (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507587 (https://phabricator.wikimedia.org/T221824) (owner: 10Joal) [15:06:55] (03Merged) 10jenkins-bot: Fix mediawiki_history_reduced checker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507254 (https://phabricator.wikimedia.org/T222141) (owner: 10Joal) [15:07:18] (03PS2) 10Joal: Update mediawiki_history after quality phase 1 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507587 (https://phabricator.wikimedia.org/T221824) [15:18:17] (03Abandoned) 10Joal: Fix mediawiki_history eventUserIsAnonymous [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507260 (https://phabricator.wikimedia.org/T222141) (owner: 10Joal) [15:20:54] (03PS3) 10Joal: Fix mediawiki_page_history userId and anonymous [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507258 (https://phabricator.wikimedia.org/T222141) [15:22:27] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Ottomata) I just tried a version of one of @bd808's [[ https://github.com/bd808/action-api-analytics/blob/master/load-action_ua_hourly.sql | queries ]] to com... [15:22:44] (03CR) 10Joal: Fix mediawiki_page_history userId and anonymous (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507258 (https://phabricator.wikimedia.org/T222141) (owner: 10Joal) [15:22:56] milimetric: I hope we're good to go with that :) [15:26:48] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Ottomata) ` select count(*) FROM wmf_raw.ApiAction WHERE year = ${year} AND month = ${month} AND day = ${day} AND hour = ${hour} 9809748 select coun... [15:27:09] joal: event_user_text is null if the user is not anonymous? [15:27:15] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/507258/3/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/denormalized/MediawikiEvent.scala [15:28:44] ahah :) no milimetric - userTextHistorical is always defined, and we set current userText to userTextHistorical if user is anonymous (so that the IP is both in historical and current) [15:29:18] milimetric: https://phabricator.wikimedia.org/T206883 [15:29:23] right, I know that [15:29:41] but how does event_user_text get set if it's null here... oh! this is just the event [15:29:47] it gets set by the state? [15:29:58] right [15:30:53] (03CR) 10Milimetric: [C: 03+2] Fix mediawiki_page_history userId and anonymous [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507258 (https://phabricator.wikimedia.org/T222141) (owner: 10Joal) [15:32:11] joal: for this https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/507587/1..2/oozie/mediawiki/history/reduced/generate_mediawiki_history_reduced.hql we should talk when you're back [15:33:54] milimetric: here! [15:34:13] milimetric: cave? [15:34:38] joal: at sos, but basically shouldn't historical be the default and coalesce to current? because the event is considered as of that moment in history, no? [15:34:49] it wasn't like that before, so it would be yet another change, sorry to pile it on [15:34:56] and we can keep it like it is for now, change it later [15:35:27] milimetric: to me, this decision is also based on the wikistats-1 mimic [15:35:52] milimetric: we prefer to use use current value to provide results similar to WKS1 [15:36:08] hm, in this case that was just a limitation of the system, and this is new functionality (lookups in user_groups and such) [15:36:45] Yessir - IMO we should move toward using historical rather than current, but this would be a big change [15:37:15] also milimetric, can't be done before page refactor (too many page unlinks as of now I think) [15:38:11] ok joal, will merge as is, and we can revisit later, this one's definitely big enough for a task [15:38:20] \o/ !!! [15:38:56] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "definitely need to update reduced to consider historical values as primary, but we can isolate that change in a later round and consider i" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507587 (https://phabricator.wikimedia.org/T221824) (owner: 10Joal) [15:40:54] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Mediawiki-History fixes before deploy - https://phabricator.wikimedia.org/T222141 (10JAllemandou) [15:41:13] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Mediawiki-History fixes before deploy - https://phabricator.wikimedia.org/T222141 (10JAllemandou) a:03JAllemandou [15:41:42] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Mediawiki History Release - 2019-04 snapshot - https://phabricator.wikimedia.org/T221824 (10JAllemandou) [15:42:59] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Mediawiki History Release - 2019-04 snapshot - https://phabricator.wikimedia.org/T221824 (10JAllemandou) [15:47:04] (03PS1) 10Joal: Update changelog.md to version 0.0.88 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507603 [15:48:11] joal: I'm gonna jump in the cave early if you need any help with release or anything before standup [15:48:24] milimetric: OMW :) [15:48:48] milimetric: if you can double check the changelog patch above--^ [15:48:48] :) [15:49:30] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update changelog.md to version 0.0.88 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507603 (owner: 10Joal) [15:50:19] ottomata: there are 2 tasks in ready-to-deploy - is there anything for me to do with them? [15:54:30] 10Analytics: Use historical fields in history reduced dataset - https://phabricator.wikimedia.org/T222278 (10Milimetric) [15:55:26] 10Analytics, 10Analytics-Cluster: Pyspark on SWAP: Py4JJavaError: Import Error: no module named pyarrow - https://phabricator.wikimedia.org/T222254 (10Groceryheist) @elukey, thanks. It seems like I'm experiencing a regression then. I can work around it for now. See you tomorrow! [15:55:58] joal nope [15:56:06] k - thanks ottomata :) [16:03:31] nuria: you around today? [16:08:05] !log refinery-source v0.0.88 released on archiva [16:08:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:35:55] bd808: do the api logs also go to udp2log? [16:37:16] answer: yes. [16:37:19] found it. [16:43:19] Hi joal :) [16:43:46] chelsyx: I have questions about your job mimicing the uniquefor-mobiles [16:44:27] joal: yes [16:45:10] chelsyx: two things - I wonder about its usage (by-country job should make the job?), and I'd also ask for a change in date - 1st of month is a very busy day for the clustrer, moving the job to the 2nd or 3rd would help a lot :) [16:46:58] joal: I think you're talking about 'mobile_apps-uniques-by_country-monthly' right? [16:47:14] chelsyx: yes [16:48:43] !log Kill oozie mediawiki-history-druid-coord for true (replaced by edit_hourly job) [16:48:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:49:00] joal: we don't use it very frequently now -- we're working on switching to another method similar to the 'last-access' cookie, but use eventlogging , for a more accurate unique device counts for the apps. After that's done, I will stop this job [16:49:22] joal: And of course, I can move it to later days in the month :) [16:49:39] Thanks for that chelsyx :) [16:50:01] joal: As we're speaking, can I ask you a question? [16:50:07] chelsyx: I'd be interested if there is a measured diff between the per-country and the global result [16:50:43] chelsyx: For uniques using last-access, we assumed global = Sum(per-country) - This is why I ask :) [16:50:47] chelsyx: please ask :) [16:51:02] joal: I haven [16:51:27] joal: thatI haven't check whether global = Sum(per-country) [16:51:40] joal: that's a good question :) [16:52:39] joal: I'm running a query to get pageviews by page_id on enwiki in March, and I also want to get the latest rev_id for each page. But this query seems take forever to run: [16:52:53] https://www.irccloud.com/pastebin/EXANzA7L/ [16:53:17] do you have any suggestion about how to optimize it? [16:55:16] chelsyx: looking at all pageviews is huge - putting a threshold and cutting the longtail would help a lot [16:56:11] joal: I see, I'm thinking about that too [16:56:39] the other thing I'm not sure of is how the sywstem (hive or spark?) deals with the mediawiki_page partition filter in the join query [16:57:11] wmf_raw.mediawiki_page is partitioned by wiki_db, so it's important to take advantage of that [16:57:54] joal: ah, got it. I misses that [17:00:30] joal: Thanks! [17:00:37] np chelsyx :) [17:00:54] Thank you for moving the big job a bit later :) [17:02:40] !log Deploying refinery using scap [17:02:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:03:56] joal: No problem! [17:08:20] !log Killing oozie jobs for new deploy: mediawiki-history-denormalize-coord, mediawiki-history-check_denormalize-coord, mediawiki-history-reduced-coord [17:08:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:12:10] !log Moving exisiting mediawiki-history to /wmf/data/wmf/mediawiki/archive folder [17:12:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:18:11] ottomata: Hellooooo :( [17:18:19] hiyyaaa [17:18:23] ottomata: I'm having an issue with scap deploy :( [17:18:31] ottomata: 2 hosts failed [17:19:05] chelsyx: I'm curious about your query for my own reasons, I'm playing with it to see if I can make it faster, will post my attempt in a second [17:19:41] ottomata: https://gist.github.com/jobar/47edfadd421beac4c53ed38c0064efb2 [17:20:05] ottomata: looks like some git-fat path have not been changed to analytics_deploy user I guess :( [17:20:17] hmmm ohhh [17:20:33] yeah, luca just did the analytics_deploy stuff this morning [17:20:42] before he realized he was on vacation :) [17:21:27] ottomata: I'm assuming I should rollback - correct? [17:22:31] hang on [17:22:35] did it succeed elsewhere? [17:23:21] ottomata: seems to have succeeded in db1107, an-coord1001, stat1004 and stat1006 [17:23:37] hm [17:23:51] joal: don't rollback then. [17:24:08] i'd say continue and then try to do it again. i was just able to run the git fat pull command as analytics-deploy on notebook1003. [17:24:16] so, it might be temporary [17:24:19] hm [17:24:22] or you could rollback and try again [17:24:32] if you rollback, I can try if you like [17:24:37] ottomata: Which do you prefer? [17:24:52] chelsyx: ok, it seems a little bit faster, but no big gain, maybe it's easier to write separated out like this? [17:24:55] https://www.irccloud.com/pastebin/TETQNogW/ [17:25:50] milimetric: a lot more readable, and enforces partition usage on mediawiki_page --^ [17:25:53] Nice [17:26:10] milimetric: Thanks for the suggestion! I will try [17:26:14] her original was using the partition in the where clause, that's what I was curious about, if it makes a difference doing it directly [17:26:16] doesn't seem to [17:26:37] not sure :S [17:26:44] Hive seems to do the same amount of work, yeah, I wasn't sure either [17:26:47] ottomata: ping ? [17:27:08] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Ottomata) Hm ok, I don't fully know what is going on here, but I found some api requests that are missing from ApiAction. I focused on one specifically, and... [17:27:16] sorry was finishign writing ^ [17:27:25] joal go ahead and rollback, i'll tyr [17:27:27] try [17:27:28] sorry ottomata :) Was wondering :) [17:27:35] ok, rollbacking [17:28:02] ottomata: No space left on device [17:28:16] notebook1003 and 4 [17:28:22] oh ho ho [17:28:25] that is probslby the problem then' [17:28:27] grrr [17:28:30] :S [17:28:45] I thought using a small number of cached revs should have solved [17:28:52] 31G ./dsaez [17:29:19] 9.2G ./neilpquinn-wmf [17:29:23] 16G ./piccardi [17:29:43] ottomata: same partition? [17:29:50] ya [17:29:58] mwarf [17:30:04] notebook1004 [17:30:05] 11G ./ebernhardson [17:30:15] 41G ./nathante [17:32:13] :( [17:32:17] lemme see what i'm doing there... [17:32:42] virtualenv's :P [17:33:03] Wow - 11G of virtualenvs !! /me has a love-hate relation with a python [17:33:31] * joal has a hate-hate relationship with maven though [17:33:45] spacy is 376MB, tensorflow is 371MB, scipy is 126MB, numpy is 71MB. I have these installed in multiple venv's [17:34:18] i can probably delete a few though, some are simply backups before installing other stuff incase it al lbreaks [17:35:13] my local maven repo is only 3.6G :) [17:35:33] funny thing that probably is all java bytecode, whereas i'm sure there are datasets for nlp and whatnot in my python virtualenv's [17:36:13] agreed ebernhardson [17:37:50] !log Recreate wmf.mediawiki_history (+page and user) and wmf.mediawiki_history_archive (with old data) [17:37:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:39:29] ok joal i freed some space [17:39:34] i'm assuming that was your problem [17:39:36] so try the deploy again [17:39:40] Many thanks ottomata - Trying again [17:40:23] !log deploy refinery using scap after failed attemp [17:40:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:41:04] actualy worried about notebook1003, there is only 11G there atm [17:41:08] free [17:42:03] joal do you have anything there ytou can free up [17:42:06] you are using 2.9G [17:42:12] Checking ottomata [17:45:38] ottomata: deploy successfull - Plenty thanks! [17:48:29] k great [17:48:34] ottomata: almost everything in my home comes from the venv folder [17:48:40] k [17:48:47] ottomata: is it theo ne used by jupyther? [17:48:51] yes [17:48:51] i shed mine from 11G to 2.5G, hopefully ok? [17:48:55] thanks ebernhardson ! [17:48:59] much appreciataed [17:49:02] np [17:49:02] thanks ebernhardson :) [17:49:14] ottomata: ok - I guess I can't do a lot :S [17:49:35] ya no worried [17:49:47] yours is fine, was just crossing fingers for deploy [17:50:04] if we had modern shared versions of spacy, tensorflow, scipy and numpy maybe each user wouldn't need to install them :) unfortunately the deb packages are ancient [17:50:45] ottomata: quick question - do we on purpose not deploy jars on notebook1003? [17:50:53] no? [17:51:14] just checked out of curiosity, and the size of refinery-job-0.0.88.jar is 74 bytes [17:51:25] then git fat pull failed [17:51:31] right [17:51:37] but scap didn't [17:51:46] maybe because of re-deploy, but seems weird [17:52:07] will double check on an-coord1001 (refine), and on stat1004 before deploying to hhdfs [17:52:11] ottomata: --^ [17:52:13] k [17:52:28] hm, runnign git fat pull manually on notebook1003 is pulling them down... [17:52:40] joal we should probably do a jar version prune again soon [17:52:52] yessir [17:52:56] very much agreed [17:53:00] ok on an-coord1001 [17:53:08] k [17:53:14] yeah maybe it got stuck because of deploy fail [17:53:55] I assume so, not the first time it happens - when deploy fails because of git-fat (space), redeploy succeeds immediatly since code is already there [17:54:00] But mo git-fat [17:55:05] !log deploying refinery onto HDFS [17:55:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:57:10] ottomata: asking permission to manually start sqoop [17:57:17] sure! [17:57:22] thanks :) [17:59:02] !log Starting a manual run of sqoop [17:59:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:01:51] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Nuria) >I think mediawiki_api_request is correct. Is it possible that LOTS of data is somehow missing from ApiAction, and has been? We can verify this by coun... [18:03:15] ottomata: re: api requests [18:03:48] nuria: good idea w webrequest [18:03:50] will look [18:08:13] !log Manually killed sqoop-production (comment and actor) to have it done after the current manual labs run [18:08:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:21:27] ottomata: have you silenced a systemd timer alarm? [18:21:49] 10Analytics-Cluster, 10Analytics-Kanban: refinery-sqoop-mediawiki-production on an-coord1001 needed restart - https://phabricator.wikimedia.org/T222294 (10crusnov) [18:22:20] !log Confirming that sqoop-private (cu_changes) will run automatically tonight (2nd of the month at 00:00) - nothing needed [18:22:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:23:51] ottomata: ok, i am going to try the backfilling in a bit now that i have couple hours free, let me know if you want help looking ta apiaction [18:23:55] *at apiaction [18:25:39] !log restarted oozie jobs mediawiki-history-denormalize-coord, mediawiki-history-check_denormalize-coord and mediawiki-history-reduced-coord [18:25:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:32:50] Ok team, all in place so far - sqoop-labsdb running, will launch manual run of sqoop-prod (actor/comment) tomorrow morning after having double checked labsdb finiched ok [18:34:04] Oozie jobs restarted, old data moved to _archive folders/tables, comparison snapshot for checker currently being computed (I'll put it in production folder once done and vetted manually) [18:34:21] :) [18:34:23] All of that is noted here: https://etherpad.wikimedia.org/p/analytics-weekly-train [18:35:12] Gone for diner, will be back checking jobs after [18:41:18] nuria: thanks, am poking... [18:41:35] the request i am investigating with is in webrequest [18:41:50] would be nice to know if it was in Kafka, but reading avro out of kafka is not fun [18:47:22] ottomata: mmm, does camu import this data at all? [18:47:50] yes [18:48:04] the data that caamus imports is the ApiAction aable [18:48:15] it'd be really nice to have a reliable x-request-id!!! [18:48:23] ottomata: I see, kafka-> api action table directly [18:48:30] ottomata: a session? [18:48:32] via camus yes, since it s avro [18:48:34] ottomata: jajajaja [18:48:40] i'm looking for agood join field [18:49:19] ottomata: md5(ip+raw ua ) i think would be your best bet [18:49:26] hmmm [18:49:34] not a bad ideaa [18:49:34] ok [18:49:50] i'll just join on both of those [18:50:11] ottomata: and if does not work add date up to the hour [18:55:31] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Tgr) AIUI the webrequest table only contains GET requests but includes requests that Varnish serves from cache (Varnish caching is not typical for the Action... [19:13:24] ottomata: i am getting pertinent files for backfilling from hadoop, but after there is no way to [process them other than in teh eL box right? [19:13:30] *the eventlogging box [19:14:31] you can probably do it wherever, as long sa eventlogging can run. [19:15:01] ottomata: but that is the thing, it can only run on EL machine cause that is where its deps are installed right? [19:15:11] hmmmm [19:15:14] yes. [19:16:03] ottomata: o/ - whenever you have time, I think I found a way to fix the log4j logging https://gerrit.wikimedia.org/r/#/c/operations/puppet/cdh/+/507630/ [19:16:14] let me know if it makes sense [19:16:35] +1! [19:16:40] nuria yeah hm. [19:16:55] we could probably include eventlogging class on stat1007 or something [19:17:10] going to roll it out now so we stop getting broken outputs [19:18:26] cool elukey thnaks! [19:18:36] nuria: there is an eventlogging::dependencies class. [19:18:39] sorry for the spam in the commands :( [19:18:52] elukey: yay [19:19:04] elukey: kind of late eh? this CAN WAIT [19:19:49] nuria: I knowww :P [19:20:29] ottomata: installing a running version of EL in the stats machines to backfill seems kind of involved....maybe i am chicken [19:21:00] i think its pretty easy actulaly! i'm not going to include eventlogging codebase itself [19:21:07] you can clone that in your home dir [19:21:08] but [19:21:21] the deps are easy, since they are all debs [19:21:23] and in their own class [19:22:19] fixed! just tested on stat1007.. puppet is rolling it out everywhere [19:22:28] ok now I feel better :D [19:22:34] greta! [19:22:36] o/ [19:22:46] (going afk otherwise Marika will probably kill me :P) [19:31:17] nuria: if you like [19:31:20] on stat1004 [19:31:26] export PYTHONPATH=/home/otto/eventlogging [19:31:37] export PATH="$PATH:/home/otto/eventlogging/bin" [19:31:54] then [19:31:55] eventlogging-processor --help [19:31:56] :) [19:35:39] ottomata: WOW [19:35:56] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Ottomata) Haven't yet counted totals (a bit hard and expensive to join properly here), but as expected the request I'm looking at is also in webrequest: `lan... [19:36:40] ottomata: same for 1007 right? (minus your checkout) [19:37:40] you know it didn't install them on stat1007... [19:38:08] i think maybe they were already there [19:39:23] grr stat1005 is buster and is busting this up [19:39:31] since deps aren't avaail for buster. [19:40:23] ottomata: we can fake upload an empty deb , i know, you love this iddea [19:40:35] na we don't need on stat1005 [19:40:37] milimetric: I'm about to leave for tonight - Does the state of affair look ok for you? [19:40:40] we've aalreadya got a conditional for buster [19:42:44] joal, all jobs make sense, if somehow sqoop finishes early before I go to sleep I’ll launch comment/actor [19:42:59] milimetric: ack [19:43:22] milimetric: it is supposed to finish in ~10 hours - possibly too late for you, but just in time for my morning ;) [19:43:33] Ok gone for tonight tema - see you tomorrow [19:46:20] :) nite [19:47:36] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Nuria) I think there is no need to join, plain counts in webrequest and apiAction (old table) should match closely for api requests with MISS status in webreq... [19:48:14] ottomata: so, to clarify, we DO have these deps on stats1007, correct? [19:49:06] 10Analytics: Upgrade pandas in spark SWAP notebooks - https://phabricator.wikimedia.org/T222301 (10Groceryheist) [19:49:23] i thikn so yess nuria [19:50:08] nuria I have eventlogging cloned there in same place [19:50:14] same exports should work there too [19:50:36] ottomata: k, just clone it but will try , let me clean up files a bit cause they have a bunch of whitelines [19:52:28] ottomata: sqoop-production got restarted after I killed it manually [19:52:32] ottomata: any idea? [19:52:41] hm, no haven't touched it [19:52:46] :( [19:52:48] ok [19:53:12] ottomata: I'm assuming it'll raise an alarm in ops chan - is there a way for us to silent it? [19:53:13] joal wheree are you looking? [19:53:16] joal: the timer would have restarted it if you did not disabled it right? [19:53:34] ottomata: I noticed jobs on the cluster, then checked on an-coord1001 [19:53:40] I just manually killed it again [19:53:41] ottomata: does this look good? [19:53:45] https://www.irccloud.com/pastebin/yF9aO4Ju/ [19:53:56] joal: if it is a systemd timer i think it will start again [19:54:05] nuria: really ? [19:54:14] Arf - not good [19:54:26] joal: ya, you need to disable the timer no? [19:54:55] Yeah you're right [19:54:57] truing so [19:55:31] joal: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers#Disable_a_Systemd_timer [19:55:39] nuria: I was reading that [19:56:49] Looks like I can't - need root powa [19:56:54] ottomata: can ou help please? [19:57:11] ottomata: from an-coord1001: sudo systemctl disable refinery-sqoop-mediawiki-production [19:58:15] PROBLEM - Check the last execution of refinery-sqoop-mediawiki-production on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-sqoop-mediawiki-production [19:58:40] !log sudo systemctl disable refinery-sqoop-mediawiki-production [19:58:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:58:42] joal: done [19:58:49] Thanks ottomata [19:59:17] joal: i think we should all have permits to disable/enable these, let's talk to luca tomorrow [19:59:23] yes [19:59:59] nuria: i don't think you wanat to quote all of the args in one [20:00:19] e.g. stdin:// should be separate from the kafkaa ones [20:00:37] ottomata: let me look in el box [20:00:40] aalso [20:00:44] ottomata: so i understand what you mena [20:00:46] the el one uses just kafka:// [20:00:46] *mean [20:00:49] not kafka-confluent:// [20:00:53] nuria i meana [20:01:02] you have single quotes starting at 'stdin:// [20:01:09] and ending after the kafka output uri [20:01:18] that will feed it to python as a single arg [20:01:41] you'll only need to put single quotes around the kafka uri [20:01:46] since it uses e.g. & chars [20:02:28] I'm very sorry ottomata, reading the doc again it looks like I forgot a step - sudo systemctl stop refinery-sqoop-mediawiki-production [20:02:34] Then I promise I stop bothering [20:02:57] !log sudo systemctl stop refinery-sqoop-mediawiki-production [20:02:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:03:15] ACKNOWLEDGEMENT - Check the last execution of refinery-sqoop-mediawiki-production on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-sqoop-mediawiki-production ottomata systemd timer stopped manually. [20:04:08] ottomata: [20:04:12] https://www.irccloud.com/pastebin/UxQlZe20/ [20:04:54] should be good I think nuria [20:05:04] nuria btw, you won't be backfilling into eventlogging-valid-mixed [20:05:05] is that ok? [20:05:13] ottomata: ya [20:05:18] k [20:05:20] ottomata: i thought about that [20:05:40] ottomata: but as i was saying i think we should start thinking about it being kaput [20:06:05] ottomata: none of the salt+sanitization work marcel did is applying there for example [20:06:17] ottomata: i mean some of it is but mostly not [20:07:07] :) [20:10:18] hm nuria i must be selecting from webrequest wrong [20:10:32] maybe its this miss. [20:10:32] hm [20:11:04] ottomata: batcave? [20:11:21] naw hang on [20:14:12] nuria: i selected some distinct cache_statuses for an hour of %api.php% [20:14:15] hit-remote [20:14:15] hit-local [20:14:15] miss [20:14:15] int-local [20:14:15] - [20:14:15] pass [20:14:15] hit-front [20:14:16] int-front [20:14:24] so maybe coungint where NOT LIKE 'hit%' ? [20:16:10] ottomata: right I forgot about this , what is "int-local?" [20:16:19] ottomata: and "int-front?" [20:16:19] dunno! [20:16:26] ottomata: one sec [20:19:37] ottomata: ok, read all docs but this int-front does not appear [20:20:03] ottomata: kind of seems like internal-front [20:20:34] but still missing some [20:20:46] ottomata: so i would go for pass and expect those match api action (old) [20:20:46] and with no cache_status filter [20:20:48] too many [20:20:55] hmm [20:20:57] checking that [20:25:33] nuria: with just pass [20:25:36] still more than ApiAction [20:25:54] ottomata: super interesting interesting [20:26:45] ottomata: now i would just group by method or similar and see (for an hour only using pass data) if apttern emerges [20:26:49] ottomata: also, the reverse [20:26:58] ottomata: do all apiaction have pass? [20:27:18] i'm grouping by cache_status now [20:27:22] will do method next [20:39:27] ottomata: k let me know [20:39:48] was querying a daya, should have done an hour [20:40:16] ottomata: ya, couple hours for two different days are enough [21:02:37] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Ottomata) Gathered some counts: ` set hivevar:year=2019; set hivevar:month=4; set hivevar:day=30; set hivevar:hour=0; ` **wmf_raw.ApiAction** `9809748` **e... [22:46:55] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Tgr) Not counting all the hit-* requests is expected since those never reach the appservers and ApiAction is logged by the appservers. (Not sure what int-* is...