[01:18:16] 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014 (10Graham87) >>! In T155014#5371668, @John_M_Wolfson wrote: > I just created a Wikipedia page on this topic (https://en.wikipedia.org/wiki/Wikipedia:Starling_archive_imports) and was just about to bring it somewhere when I stumbl... [06:31:08] !log restart the hadoop workers' jvms via spicerack cookbook [06:31:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:19:25] 2019-07-29 07:18:28,294 [INFO log.py:155 in log_task_end] END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) [07:19:29] all done! [07:25:42] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Roll restart all openjdk-8 jvms in Analytics - https://phabricator.wikimedia.org/T229003 (10elukey) [07:26:23] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Roll restart all openjdk-8 jvms in Analytics - https://phabricator.wikimedia.org/T229003 (10elukey) All 54 Hadoop worker nodes roll restarted with the sre.hadoop.roll-restart-workers.py cookbook! [08:52:20] 10Analytics, 10EventBus, 10serviceops, 10Patch-For-Review: Allow eventgate-analytics service to reach schema.svc.{eqiad,codfw}.wmnet:8190 - https://phabricator.wikimedia.org/T229051 (10fsero) 05Open→03Resolved merged and applied [08:52:26] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Legacy (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service - https://phabricator.wikimedia.org/T201068 (10fsero) [08:56:09] !log roll restart druid jvms on druid100[1-3] via spicerack cookbook [08:56:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:36:02] * elukey lunch! [13:01:36] !log roll restart druid jvms on druid100[4-6] via spicerack cookbook [13:01:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:35:10] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Roll restart all openjdk-8 jvms in Analytics - https://phabricator.wikimedia.org/T229003 (10elukey) [13:47:58] 10Analytics, 10EventBus, 10serviceops: Allow eventgate-analytics service to reach schema.svc.{eqiad,codfw}.wmnet:8190 - https://phabricator.wikimedia.org/T229051 (10Ottomata) Thank you! [13:51:41] ottomata: o/ [13:51:47] o/ [13:51:51] all hadoop workers and both druid clusters restarted with spicerack! [13:52:00] 10Analytics: Map doesn't redraw when returning from table view - https://phabricator.wikimedia.org/T226514 (10fdans) I don't know why I put this in Wikistats Production but it most def should be in beta. Pretty urgent bug. [13:52:54] wowwww [13:52:59] 99cents wow! [13:56:34] ahahahah [13:56:54] I am working on a kafka one [13:57:13] https://gerrit.wikimedia.org/r/#/c/operations/cookbooks/+/525804/ [13:57:22] that is basically what me and you discussed a lot of times [13:57:36] one broker restarted at the time, wait for 10mins, another one, etc.. [13:57:39] very simple [13:58:17] Riccardo told me that there are ways to pull data from prometheus to check metrics too [13:58:29] but for the first version I wanted something simple [13:58:35] let me know your thoughts :) [14:00:17] looking [14:05:04] elukey, sorry, got confused with my kitchen clock! meeting? [14:57:30] 10Analytics: DCausse: can't log into Superset - https://phabricator.wikimedia.org/T229233 (10dcausse) [14:57:44] 10Analytics: superset login failure: AttributeError: 'bool' object has no attribute 'login_count' - https://phabricator.wikimedia.org/T229234 (10EBernhardson) [14:58:34] 10Analytics: superset login failure: AttributeError: 'bool' object has no attribute 'login_count' - https://phabricator.wikimedia.org/T229234 (10EBernhardson) [14:58:36] 10Analytics: DCausse: can't log into Superset - https://phabricator.wikimedia.org/T229233 (10EBernhardson) [15:19:14] (03PS1) 10Fdans: Restore agent type in page views metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/526166 (https://phabricator.wikimedia.org/T228937) [15:20:54] 10Analytics, 10Analytics-Kanban: Map doesn't redraw when returning from table view - https://phabricator.wikimedia.org/T226514 (10Milimetric) a:03Milimetric [15:23:35] 10Analytics: DCausse: can't log into Superset - https://phabricator.wikimedia.org/T229233 (10fdans) a:03Nuria [15:24:09] (03CR) 10Milimetric: [C: 03+2] Restore agent type in page views metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/526166 (https://phabricator.wikimedia.org/T228937) (owner: 10Fdans) [15:24:46] 10Analytics, 10Analytics-Kanban: Add agent type split to wikistats pageviews - https://phabricator.wikimedia.org/T228937 (10Milimetric) [15:28:48] 10Analytics, 10Operations, 10SRE-Access-Requests: Access to HUE for Mayakpwiki - https://phabricator.wikimedia.org/T229143 (10fdans) @Mayakp.wiki just a comment on hue. It might not be the best tool for querying the data lake. We (as in the analytics team) prefer using either hive/beeline directly or jupyter... [15:29:17] 10Analytics, 10Operations, 10SRE-Access-Requests: Access to HUE for Mayakpwiki - https://phabricator.wikimedia.org/T229143 (10fdans) p:05Triage→03High [15:29:43] 10Analytics, 10Analytics-Wikistats: Wikistats Beta Localization on the UI - https://phabricator.wikimedia.org/T186122 (10Milimetric) [15:29:46] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Beta - https://phabricator.wikimedia.org/T186120 (10Milimetric) [15:32:31] mforns: how does e.g. self.get_tables = MagicMock(return_value=tables) [15:32:33] just work? [15:32:43] Somehow it replaces any method called get_tables? [15:32:47] I don't see you using FakeHive anywhere [15:32:57] OH [15:32:59] you are. [15:33:00] but [15:33:01] ottomata, it replaces get_tables in self [15:33:05] only when testing directly [15:33:17] but [15:33:37] how does your e.g. run_main stuff work? [15:33:41] won't that call out to Hive class? [15:33:55] lookin [15:34:00] 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog: Reading_depth remove eventlogging instrumentation? - https://phabricator.wikimedia.org/T229042 (10fdans) @Groceryheist storage is not the problem for us, but the amount of events that are coming through. This affect performance for the rest of EL user... [15:34:56] !log roll restart kafka brokers on Jumbo with spicerack [15:34:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:36:20] 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog: Reading_depth remove eventlogging instrumentation? - https://phabricator.wikimedia.org/T229042 (10Ottomata) Or, it can be migrated to EventGate (sometimes in Q2 or Q3) and we don't have to worry about it affecting other EL stuff. :) [15:38:24] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10colewhite) 05Open→03Resolved [15:45:44] ottomata, sorry we're discussing mediacounts, you're right, the tests that use run_main *would* use the real Hive, but actually the tests are carefully passing parameters to prevent that Hive is called, either because they check thrown exceptions, or because None's are passed to avoid deletion. [15:46:49] ottomata, the other tests that use the MagicMock, they don't call run_main, they call directly the method and pass the FakeHive, or FakePartition, etc. [15:47:41] ok so MagicMock isn't that magic...it just a heelper that creates methods for ya without declaring them yourself [15:47:59] i can work with that though, use magic mock to create a FakeSwift etc. [15:48:01] thanks [15:50:33] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), and 4 others: Modern Event Platform: Stream Intake Service: Migrate eventlogging-service-eventbus events to eventgate-main - https://phabricator.wikimedia.org/T211248 (10Pchelolo) [15:51:01] ottomata, yes it's not magic :C [15:51:50] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10elukey) @colewhite awesome work, thanks! Should I start working on creating a new varnishkafka dashboard... [15:52:21] ottomata, but it makes it easier to specify the return values and asserting, etc. [15:52:39] aye [16:30:57] ottomata: I stopped the roll restart of kafka jumbo as precautionary measure after 4 brokers [16:31:08] https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&from=now-1h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-kafka_cluster=jumbo-eqiad&var-cluster=kafka_jumbo&var-kafka_broker=All&var-disk_device=All [16:31:13] see kafka-jumbo1005? [16:31:33] the traffic recovered only when I issued a manual kafka preferred-replica-election [16:32:13] even if in theory the other metrics looked good [16:33:18] IIRC a while ago we came up with an explanation about ending up in this situation [16:33:24] because it already happened [16:35:00] it was something related to the auto-rebalancing of partition leaders and some setting that we have for kafka [16:35:07] I'll dig into the config after the meeting [16:36:16] hmm [16:36:20] dont' remember [16:36:34] elukey: you could snaek in a preferred-replica-election in there [16:36:37] it doesn't hurt to run it [16:36:46] yep yep definitely [16:37:03] reboot; wait for kafka...ssome minutes? 15? election; proceseed [16:37:04] but I'd say that we could stick, for the moment, a specific ask for confirmation before proceeding with a restart [16:37:14] it would be good to check the under replicated partitions, that would help [16:37:22] the script could prompt you with it to proceed [16:37:30] yep but in this case I don't see them changing after my commadn [16:37:44] yes exactly this is my idea [16:37:59] and eventually the cookbook should be smart enough to ask you only if any metric looks weird [16:38:08] (ask to proceed I mean) [16:43:01] lunching, back later! [16:54:43] * elukey off! [17:00:15] 10Analytics: DCausse: can't log into Superset - https://phabricator.wikimedia.org/T229233 (10Nuria) 05Open→03Resolved [17:23:16] 10Analytics, 10Analytics-Kanban: Request for unique devices for all wikipedia families is only showing data up to November 2018 - https://phabricator.wikimedia.org/T229254 (10Nuria) [17:24:16] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10colewhite) Sounds good to me! Happy to help however I can, just let me know. [17:31:16] 10Analytics, 10Analytics-Kanban: Request for unique devices for all wikipedia families is only showing data up to November 2018 - https://phabricator.wikimedia.org/T229254 (10Nuria) The table unique_devices_project_wide_daily is reading data from: hdfs://analytics-hadoop/wmf/data/wmf/unique_devices/project_wi... [17:31:27] 10Analytics, 10Analytics-Kanban: API Request for unique devices for all wikipedia families is only showing data up to November 2018 - https://phabricator.wikimedia.org/T229254 (10Nuria) [17:31:40] mforns: this is pretty big deal we need to fix: https://phabricator.wikimedia.org/T229254 [17:32:33] nuria, wow [17:32:40] mforns: ya [17:32:46] mforns: wow indeed [17:32:55] k, will leave kerberos testing and do this [17:33:33] 10Analytics: cassandra loading jobs for unqiue devices data need an SLA alarm - https://phabricator.wikimedia.org/T229256 (10Nuria) [17:34:25] 10Analytics, 10Analytics-Kanban: API Request for unique devices for all wikipedia families is only showing data up to November 2018 - https://phabricator.wikimedia.org/T229254 (10Nuria) [17:34:27] 10Analytics: cassandra loading jobs for unqiue devices data need an SLA alarm - https://phabricator.wikimedia.org/T229256 (10Nuria) [17:37:37] 10Analytics, 10Analytics-Kanban: API Request for unique devices for all wikipedia families is only showing data up to November 2018 - https://phabricator.wikimedia.org/T229254 (10Nuria) This select returns no data ` > select * from unique_devices_project_wide_daily where year=2019 and month=07 limit 1; ` C... [17:46:42] 10Analytics, 10Analytics-Kanban: API Request for unique devices for all wikipedia families is only showing data up to November 2018 - https://phabricator.wikimedia.org/T229254 (10Nuria) Once we fix issue so there is data in table the loading jobs will work: https://github.com/wikimedia/analytics-refinery/blo... [17:47:53] mforns: ok, i added all info to ticket, basic root cause is that setup of jobs is suboptimal * i think* , now issue should be easily fixable by reloading jobs one partitions are changed. [17:54:29] mforns: let me know if you want to work on this together, i think we just need to alter the table location [17:59:50] nuria, looking into it now [18:08:08] nuria, yes, agree that we only need to alter the table location, and then backfill [18:08:31] by running a temporary oozie coordinator I guess [18:08:47] I can do that, if I have problems, I'll ping you for help :] [18:08:53] mforns: forgot to ask to you if it was ok to merge the first delete script patch tomorrow morning [18:09:00] but it sounds that you are busy :) [18:09:25] elukey, no problemo, yes, we can do that tomorrow morning, because it's hourly, so I'll be able to test it during the day [18:09:34] ack super [18:09:34] thanks for pinging :] [18:09:42] sorry for leaving without pinging, I forgot :( [18:09:45] (long day) [18:09:53] no prob [18:10:00] byeeee :] [18:25:22] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Neil_P._Quinn_WMF) [18:25:29] nuria, I think the table used for that missing data is unique_devices_per_project_family, and not unique_devices_project_wide [18:25:39] nuria, and that one points to the correct location... [18:25:51] and the data is there, must be sth else, looking at the logs [18:34:48] remember that the table locaiton is just a hint! [18:34:55] for external tables, the data could be anywhere [18:34:58] its the partition locations that matter [18:37:14] mforns: ah ya, [18:39:41] i suppose we don't yet have that service that resolves jsonschemas? my last version of swift upload events had 3 fields and could be embedded, but the schema is now half the size of my script :) [18:40:17] ebernhardson: that returns a schema? [18:40:31] nuria: yes, so i can embed a schema lookup instead of the actual schema [18:40:38] ebernhardson: ya, we do [18:41:02] docs on wikitech? [18:41:18] ebernhardson: i am not sure however if it is accessible on hadoop , it might not be cause it is being developed now [18:41:27] nuria: actually this daemon is in prod [18:41:30] cc ottomata [18:41:46] it is! [18:41:53] ... [18:42:44] ebernhardson: then it should work. [18:42:44] ebernhardson: it isn't widely used yet, but you should be able to request from it internally [18:42:52] schema.svc.{eqiad,codfw}.wmnet [18:42:57] ok, i'll check it out [18:43:03] it is just a json http fileserver [18:43:06] so you can explore it [18:43:07] also [18:43:12] http://schema-beta.wmflabs.org/#!/ [18:43:18] for a gui :p [18:43:28] "gui" :P [18:43:42] indeed :p [18:43:43] https://www.irccloud.com/pastebin/1qD5kk5r/ [18:43:45] should work though [18:43:45] ebernhardson: i don't think i've updated to the schema we discussed on friday [18:43:49] sorry cc mforns [18:44:13] ottomata: as long as it still has swift_container and swift_object_prefix it should work, those are the only two fields i access [18:45:04] nuria, I get good results from querying the table: [18:45:24] mforns: ah wait, i have year 2017, duh [18:45:34] nuria, also look at the -1 [18:45:46] ebernhardson: am changing it. [18:45:47] mforns: ya that was cut & paste ahem... badness [18:45:48] https://gerrit.wikimedia.org/r/c/analytics/refinery/+/525435/4/oozie/util/swift/upload/swift-upload.py#558 [18:45:57] swift_prefix_uri [18:46:21] the difference being that the uri there will return the objects for that event. [18:46:32] in the per object event case, ?prefix=full path [18:47:05] comments welcome! [18:47:10] nuria, plus the data looks good in turnilo [18:47:28] mforns: ah, there i did not look [18:47:46] mforns: wait, maybe the data is in cassandra, did you checked, i did not [18:47:53] nuria, https://tinyurl.com/y5rypsx3 [18:48:02] I did not check in cassandra, no [18:48:11] I was executing queries still [18:57:02] hmm, ebernhardson I will keep swift_object_prefix in; it is useful to differentiate the event per object from event per upload [19:00:20] ottomata: looks reasonable to me [19:12:28] 10Analytics, 10Analytics-Kanban, 10Research-Backlog: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10leila) @Milimetric nice to see that we're here. :) I did one pass over the wikitech page. What kind of input is most helpful for you? [19:16:15] 10Analytics, 10Analytics-Kanban, 10Research-Backlog: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10Nuria) @leila Best would be uses cases and how would you expect to use this data which informs our ideas as to how to release it. [19:38:52] nuria last day I can see per_project_family in cassandra is 2018-12-05 [19:42:13] perhaps a random question, do we have a nice way to define superset datasources in a git repo? I ended up needing to define a good bit of aggregation cols to actually make a dashboard and it feels odd to have it ouside revision control [19:42:37] mforns: same here [19:43:45] ebernhardson: agreggations hould be in oozie/hive data (if parquet storage) but datasource is in druid [19:44:01] ebernhardson: I think i missunderstood, can you explain again? [19:44:03] nuria: not that kind of aggregation. Things like postagg's that calculate percentages [19:44:30] ebernhardson: defined on superset dashboard config? [19:44:31] nuria: or even simple aggregations like {"type": "longSum", "fieldName": "search_count"} [19:45:18] nuria: everything here on the "list druid metric" tab: https://superset.wikimedia.org/druiddatasourcemodelview/edit/442 [19:45:26] so weird [19:46:09] ebernhardson: the way we have been doing that is to supply agreggations in druid json ingestion spec when possible [19:46:20] nuria: those don't define columns in superset though [19:46:33] nuria: it's slightly different defining columns in druid and superset [19:47:02] at least, in my testing ingestion time aggregations result in exact numbers [19:47:18] these aren't exact numbers, these are summing up across druid dimensions [19:48:12] i suppose you could put these definitions directly into charts, but then they aren't reusable and you can't set the verbose name [19:52:00] 10Analytics, 10Analytics-Kanban, 10Research-Backlog: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10leila) @Milimetric @Nuria: Ok. And by when do you need this input? [19:52:08] ebernhardson: I see, we have been putting our aggregatuon logic in druid (depending on what aggreggation it is it would be approximate or exact) and /or hive but not superset [19:52:45] 10Analytics, 10Analytics-Kanban, 10Research-Backlog: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10Nuria) @leila we are working on this for the next couple of weeks so the sooner the better [19:53:27] mforns: I downloaded all keys to the table just in case it was a key problem [19:54:03] mforns: like lowercase versus uppercase but no, data is just not there: https://etherpad.wikimedia.org/p/nuria [19:54:13] ebernhardson: not sure if i answered at all though [19:55:13] nuria: no, but it's ok :) I've looked around but i'm pretty sure the only way to get percentages out of superset is to do the aggregations in druid. It's probably fine just defined in superset [19:55:32] (03PS5) 10Ottomata: swift-upload.py to handle upload and event emitting [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525435 (https://phabricator.wikimedia.org/T227896) [19:55:44] ebernhardson: ok, ya, i think what you mean is that this agreggation would be an exact one: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/mediawiki/geoeditors/druid/load_geoeditors_monthly.json.template#L37 [19:56:02] ebernhardson: but the one you are druid on superset is computed on already agreggated data [19:56:37] nuria: right, basically it's just taking the `SUM(view_count)` you have in dashboards and defining them explicitly with names [19:56:56] except some more complicated because to get a percentage you need SUM(view_count) and SUM(view_count if X) [19:57:26] i guess i could store a bunch of extra numbers to pre-compute the if-x [19:57:34] ebernhardson : ya, this we do in turnilo , see pageview_hourly and bot percentage [19:58:37] ebernhardson: see https://turnilo.wikimedia.org/#pageviews_hourly and "percentage of ..." [19:59:00] ebernhardson: but you are right that we have not done it elsewhere than there that i can think of [19:59:32] nuria, if I execute refinery/oozie/cassandra/daily/unique_devices.hql by hand (https://pastebin.com/3grKXcmE) it returns data that includes per_project_family records! [20:00:05] mforns: can you make it keep data as json so we can look at it? [20:00:14] mforns: as in not delete the data? [20:01:10] nuria, you can look at the results of y query in hdfs:/user/mforns/unique_device_test/000000_0.gz [20:01:12] mforns: that file last commit is on dec 2018... [20:01:23] hmmm [20:02:19] mforns: this commit broke affairs i am pretty sure: https://github.com/wikimedia/analytics-refinery/commit/526c8f13d3cdcb0e91e1eb064b4a0cd394e1d9e1#diff-9cb3fa69b247266f6501fba3f476c11b [20:03:48] nuria, but wait, before that, there were no per_project_family uniques! [20:03:50] mforns: but if that broke that day and that is the commit that adds this functionality [20:03:57] mforns: how comes [20:03:59] mforns: RIGHT [20:04:03] it must have been broken since the beginning [20:04:04] mforns: was about to say same [20:04:17] I think the data that is there comes from backfilling [20:04:37] mforns: i bet we are running teh old JOb [20:04:40] mforns: man ... [20:04:45] ooooooh [20:06:43] mforns: wait, did you try to run it by hand to see if data got loaded in cassandra? [20:07:17] mforns: the job has start time of june https://hue.wikimedia.org/oozie/list_oozie_coordinator/0053242-190417151359684-oozie-oozi-C/ [20:07:19] no, I executed just the query file as indicated in the comments, pointing to the path above [20:07:27] and data per_project_family is there! [20:07:35] yea, I saw [20:07:45] weird [20:08:20] I was looking for an old refinery_jar_version in that job, but no found [20:08:46] which makes no sense, because this is pure oozie :[ [20:12:16] 10Analytics, 10Analytics-Kanban: API Request for unique devices for all wikipedia families is only showing data up to November 2018 - https://phabricator.wikimedia.org/T229254 (10Nuria) No data in cassandra after 20181205 ` cassandra@cqlsh> select * from "local_group_default_T_unique_devices".data where time... [20:14:08] 10Analytics, 10Analytics-Kanban, 10Discovery-Search (Current work): Spike. Load search data into turnilo to test whether exploratory data can do away with some of the dashboards - https://phabricator.wikimedia.org/T216058 (10EBernhardson) a:05Nuria→03EBernhardson Evaluated with respect to our did you mea... [20:17:07] 10Analytics, 10Analytics-Kanban, 10Research-Backlog: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10Milimetric) @leila: we can of course iterate on the format in the future. Eventually we'll have a public API to query the whole dataset.... [20:19:44] mforns: mmm data not look good [20:19:46] mforns: [20:19:50] yea [20:19:51] https://www.irccloud.com/pastebin/xuhvqsz9/ [20:21:22] nuria, ha! it's duplicated? [20:22:35] mforns: ya, which means that the inserting of the 2nd record will fail in cassandra ( i think) [20:22:49] mforns: cause cassandra is a key, value [20:22:56] mforns: and that data has teh same key [20:23:39] yes, ok, it's a possible explanation, why it won't return the data [20:24:15] looking if it's duplicated in the source Hive table as well [20:24:23] mforns: it is not [20:24:28] oh [20:24:31] mforns: the sql is wrong [20:24:35] mforns: i mean check it out [20:24:39] mforns: please [20:24:46] mforns: two eyes better than one! [20:25:05] nuria, I think I know... the grouping sets, are coalescing the access-method to 'all-sites' [20:25:13] mforns: ya [20:25:36] but the per_project_family already have (only) the all-sites access-method [20:25:42] mforns: this sql never worked and i do not understand how there is data in cassandra [20:26:13] so when grouping by the grouping set without access-method, access-method is null, and gets coalesced to all-sites, and thus duplicated [20:28:17] we can add an extra step and apply the grouping sets separately before applyying the union all [20:28:38] that would work I think [20:34:08] mforns: and what sql was used to back fill this data? [20:34:15] mforns: it must have been different [20:34:25] mforns: cause this one did not worked [20:34:51] aha [20:35:31] I guess, because it only would backfill the per_project_family data, and the grouping sets explosion was not needed [20:35:58] mforns: i guess, let's not do that gain [20:36:00] *again [20:36:20] mforns: if you modify sql we can do a tryout, i bet it would work [20:36:22] k [20:36:31] ya on it [20:46:06] mforns: i am totally wrong in that cassandra will fail trying to insert two identical rows, it should actually override .... [20:52:06] nuria, hmmm [20:52:23] mforns: yess? [20:53:19] no nothing, just thinking it's interesting, so how come data is not there? [20:54:20] mforns: i also do not know if our cassandra code does inserts in a particular way [20:54:39] mforns: let's try to remove the duplicated data and see, * i think* it would work [20:56:05] mforns: which we do , cause we enter data in batch mmm [21:00:31] nuria, maybe we're generating cassandra tombstones when inserting duplicates, I don't know what those are, but seems they can be created by means other than deletions and can affect the presence of data [21:00:46] https://www.beyondthelines.net/databases/cassandra-tombstones/ [21:02:08] mforns: i think those are an issue for deletions [21:08:55] 10Analytics, 10EventBus, 10MassMessage, 10Operations, and 2 others: Write incident report for jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Pchelolo) I've made a preliminary incident report at https://wikitech.wikimedia.org/wiki/Incident_documentation/20190619-JobQ... [21:12:28] (03PS1) 10Mforns: Remove duplicates from unique_devices_per_project_family [analytics/refinery] - 10https://gerrit.wikimedia.org/r/526252 (https://phabricator.wikimedia.org/T229254) [21:15:50] mforns: https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/526252/1/oozie/cassandra/daily/unique_devices.hql but this sql erases access-site "desktop-site" and "mobile_site" right? [21:20:00] nuria, no no, those are still there, under unique_devices_per_domain [21:20:38] the original code was using grouping sets to generate 2 kinds of groupings 1) (project, access-method, dt) and 2) (project, dt) [21:20:59] the problem is the second grouping was not appropriate for unique_devices_per_project_family [21:21:48] so, I'm just generating the extra (project, dt) grouping for the unique_devices_per_domain data, and unioning it to the end result [21:22:12] I tested it and just removes the duplicates [21:23:51] mforns: i believe you but i do not understand why do we need (project, dt ) data [21:25:03] 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014 (10John_M_Wolfson) >>! In T155014#5371733, @Graham87 wrote: >>>! In T155014#5371668, @John_M_Wolfson wrote: >> I just created a Wikipedia page on this topic (https://en.wikipedia.org/wiki/Wikipedia:Starling_archive_imports) and w... [21:26:31] nuria, to have all-sites numbers for unique_devices_per_domain [21:27:17] mforns: ooohhh, i see, but the numbers are not additive, nvm it is done for wikistats [21:27:41] (03CR) 10Nuria: [C: 03+1] Remove duplicates from unique_devices_per_project_family [analytics/refinery] - 10https://gerrit.wikimedia.org/r/526252 (https://phabricator.wikimedia.org/T229254) (owner: 10Mforns) [21:27:50] nuria, yes that's right! not additive... [21:27:50] mforns: let's try to reaload! [21:27:57] *reload [21:29:45] mmmmm, if our hypothesis is true, overwriting won't work, right? [21:30:00] mforns: it will we backfill all the time [21:30:19] but we should delete the data beforehand no? [21:30:32] mforns: no deletion needed [21:30:44] mforns: it is cassandra, so it will override the records [21:30:55] mforns: also no data from 2018-12 onwards [21:31:09] nuria, it has data for per_domain [21:31:21] what if we break that? [21:31:24] mforns: ah yes, true both jobs are same, [21:31:43] mforns: no, it will not, data being correct it will override [21:32:26] mforns: let's re-load data for the last day [21:32:36] ok [21:47:24] mforns: after 20 attemps finally i got to the logs of the task that loads into cassandra [21:51:36] nuria, the reloading is running [21:52:05] did you see sth in the logs? [21:52:36] ok, the loading finished, I did it for July 28 [21:53:11] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: API Request for unique devices for all wikipedia families is only showing data up to November 2018 - https://phabricator.wikimedia.org/T229254 (10Nuria) Looked at the latest job and the task that handles insert into cassandra: > sudo -u blah mapred job -... [21:53:32] mforns: ta-ta-channn [21:53:35] https://www.irccloud.com/pastebin/L8lY3J3g/ [21:53:44] mforns: fixed [21:53:49] yayayaya [21:54:17] hehe [21:56:02] mforns: i'd say backfill since 2018-12 up to today and let's stop job in prod no? [21:56:08] mforns: i can stop job [21:56:19] ok [21:57:54] I will launch a temp coordinator with no stop_time, and wednesday we can deploy this and restart a permanent coord [22:00:52] mforns: ok, i killed coords daily and monthly and wrote it on train etherpad: https://etherpad.wikimedia.org/p/analytics-weekly-train [22:01:03] (03CR) 10Nuria: [C: 03+2] Remove duplicates from unique_devices_per_project_family [analytics/refinery] - 10https://gerrit.wikimedia.org/r/526252 (https://phabricator.wikimedia.org/T229254) (owner: 10Mforns) [22:01:16] mforns: super thanks, just merged your code [22:03:30] nuria, mmmh! I forgot to change monthly query [22:03:55] mforns: we can do that tomorrow ,no? [22:04:01] ok, will do