Jump to content

Wikipedia:Request a query

From Wikipedia, the free encyclopedia

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of article titles that meet certain criteria, consider using PetScan (user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

Ref count

[edit]

Can anyone create/link me to a modified version of User:Bunnypranav/Reports#Probable "draftify because of no sources" candidates which has 1-5 refs (inline citations) only. I would appreciate if you could just add another section below the above link and create/test the query. Thanks in advance! ~/Bunnypranav:<ping> 13:50, 24 January 2025 (UTC)[reply]

References aren't stored in the database at all except in the page text, which isn't copied to the toolforge replicas. —Cryptic 14:37, 24 January 2025 (UTC)[reply]
Hmm. How is the no refs query made? Is there any way to change it from the existing no refs exists to x number of refs exist. From my very novice understanding of the existing query, it checks no of ext links, can we modify it to find a specific number of them for the requested query? Thanks for your help. ~/Bunnypranav:<ping> 16:32, 24 January 2025 (UTC)[reply]
It's looking at whether there are any external links, yes, and I'll go ahead and change the query. But it's going to get hardly any results either way - the only pages it finds even without limiting external links at all are Battle of Nalos (3 links) and Operation Qazanchi (6), I suspect by its requirement of a no-citations tag, and one specifically added through page-triage at that. —Cryptic 16:50, 24 January 2025 (UTC)[reply]
Thanks Cryptic for that change, that will help catch some more possibilities eventually. @DreamRimmer, since I see you have expertise in python scripts, could a py script help find the no. of citations tags for pages younger than 90 days and list them somewhere? ~/Bunnypranav:<ping> 06:28, 25 January 2025 (UTC)[reply]
I would say this is technically doable, but it's not advisable due to the large number of pages involved, roughly 50,000 to 60,000 pages created in the last 90 days. Checking the number of references for these pages would require making the same number of API requests, which would significantly waste resources. A more efficient approach would be to download the relevant database dumps and generate your list from that data. – DreamRimmer (talk) 07:40, 25 January 2025 (UTC)[reply]
Is there any way to limit the number of pages queried from the beginning? In any case, I am ready to atleast try to process the data dumps from my end, could you possibly help me with a py script since I do not know much to start from scratch. Up to you though and thanks for the reply! ~/Bunnypranav:<ping> 07:43, 25 January 2025 (UTC)[reply]
There's a tool in WP:AWB called the database scanner that may be helpful. –Novem Linguae (talk) 13:35, 25 January 2025 (UTC)[reply]
Noted, thanks! ~/Bunnypranav:<ping> 13:36, 25 January 2025 (UTC)[reply]

Website Infobox parameter

[edit]

Hi Cryptic, could you query me a list of articles that: Are part of {{WikiProject Academic Journals}}, have a {{Infobox journal}}, don't have a |website parameter? Thanks Nobody (talk) 18:00, 24 January 2025 (UTC)[reply]

There's no way to see whether they have a parameter. The usual solution is to add a hidden tracking category in the template. —Cryptic 18:51, 24 January 2025 (UTC)[reply]
Thanks for the reply, I've asked for it at Template talk:Infobox journal#Template-protected edit request on 24 January 2025, which was the reason for this request. Nobody (talk) 19:45, 24 January 2025 (UTC)[reply]

Twenty Year Society of Wikipedia editors

[edit]

I'm curious as to how accurate a reflection Category:Members of the Twenty Year Society of Wikipedia editors is, of the number of editors still active who have been here for 20 years or more?

Can Quarry be used to measure editors who either:

  1. Made edits at least 20 years apart
  2. First edited over 20 years ago and are still (for some value, say: edited in the last three months) active?

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:27, 4 February 2025 (UTC)[reply]

The second is at quarry:query/90085. —Cryptic 19:21, 4 February 2025 (UTC)[reply]

What groups create what percentages of new pages?

[edit]

Could I get some help generating the following queries? I plan to make some pie charts with the data.

  • Data for pie chart 1 - 2024 - how many new articles (mainspace, non-redirects) created (and what percent overall are created)...
    • by autopatrolled users
    • by non autopatrolled, non admin users
    • by non autopatrolled, admin users
  • Data for pie chart 2 - 2024 - how many new articles (mainspace, non-redirects) created (and what percent overall are created)...
    • by autopatrolled users
    • by non autopatrolled editors with less than 10,000 edits
    • by non autopatrolled editors with greater than or equal to 10,000 edits
  • Data for pie chart 3 - 2024 - how many new articles (mainspace, non-redirects) created (and what percent overall are created)...
    • by autopatrolled users?
    • by non autopatrolled editors who are not new page patrollers
    • by non autopatrolled editors who are new page patrollers

Thanks. –Novem Linguae (talk) 21:53, 5 February 2025 (UTC)[reply]

@Novem Linguae I had to use my staff account to run queries against MediaWiki history, so I can't link the queries directly, and these should be taken with "Sam isn't an expert and might have missed something" caveats, but I found the following:
  • In 2024, 678,142 pages were created in the article namespace by users (not including bots). Of those, 170,684 are not currently redirects (they may have changed after initial creation, so this isn't precisely a reflection of how many non-redirects were created). The rest of the numbers will be non-redirects and also compared to this number. These numbers all exclude bots, too.
  • 101,987 (60%) pages were created by non-autopatrolled, non-admin users.
  • 756 (0.4%) pages were created by non-autopatrolled, admin users.
  • 67,941 (40%) pages were created by autopatrolled users.
  • 77,412 (45%) pages were created by non-autopatrolled editors with less than 10,000 edits.
  • 25,331 (15%) pages were created by non-autopatrolled editors with greater than or equal to 10,000 edits.
  • 100,929 (59%) pages were created by non-autopatrolled editors who do not have the patroller user group.
  • 1,814 (1%) pages were created by non-autopatrolled editors who do have the patroller user group.
Sam Walton (talk) 23:25, 5 February 2025 (UTC)[reply]
Public version at quarry:query/90509 quarry:history/90509/976754/947548. Can't figure out why my numbers disagree - I'm seeing about twice as many pages total - even considering I included pages created by non-users (either ips, or where the first revision had the user revdelled/suppressed, or conceivably bad imports). Percentages excluding those are broadly similar though. —Cryptic 00:11, 6 February 2025 (UTC)[reply]
Oh, it's because I didn't see the "non-redirects" added later. —Cryptic 00:13, 6 February 2025 (UTC)[reply]
Thank you both for your hard work. This data makes it clear that advocating for something like autopatrolling admins would have a negligible effect on reducing the WP:NPP queue (0.4% reduction). This is good data that will help inform some NPP-related decisions. –Novem Linguae (talk) 00:22, 6 February 2025 (UTC)[reply]
Frown. Now I'm seeing fewer pages than you - 195800 total currently-non-redirects, 107941 currently-non-redirects by users. And that's without excluding bots. Are you including currently-deleted pages? —Cryptic 00:25, 6 February 2025 (UTC) Nmind, I see what I did, duh. quarry:query/90509 again, now with much-less-disagreeable numbers - all somewhat higher than Sam's, since it still includes bots. Interesting that the percentages turned out close to right even though I was looking at the wrong (essentially random) users' groups. —Cryptic 00:46, 6 February 2025 (UTC)[reply]
I am actually kind of curious as to why an admin would ever not be autopatrolled. Are there admins who make poor quality new articles? BD2412 T 02:46, 6 February 2025 (UTC)[reply]
Something needed to be done to get more administrators, removing autopatrol was something, so removing autopatrol needed to be done. —Cryptic 03:00, 6 February 2025 (UTC)[reply]
I agree and I did not support the rfc, but I was in the minority. –Novem Linguae (talk) 06:28, 6 February 2025 (UTC)[reply]
@Cryptic Good to hear we're on the same track :) MediaWiki history includes since-deleted pages, so that will be another minor source of disagreement. Sam Walton (talk) 09:03, 6 February 2025 (UTC)[reply]
Then my numbers should be lower than yours, instead of ranging from 10-30% higher across the board. Including bots isn't it; there were only four bot "creations" in my dataset, Kansas City shooting, Pedro Campos (disambiguation), 10 Lives (disambiguation), and Chris Sheppard (disambiguation), all by User:RussBot; and two are bad data anyway, showing up only because they were history merged. (History merges, despite being half of that very small sample, aren't going to come anywhere close to accounting for the difference. Even considering that the other two of these pages were cut-and-paste moves and could stand to have their histories fiddled with too.) Is your data source looking at where pages were initially created, or where they are now? —Cryptic 11:25, 6 February 2025 (UTC)[reply]
@Cryptic Where they were initially created. The only caveat is the redirect issue I noted - the MW History dataset can say whether a page is currently a redirect, but doesn't store data on whether it was a redirect at the time of the edit. That would mean I'm undercounting based on pages which were created and then turned into a redirect some time between then and now, which might account for the difference? Sam Walton (talk) 15:37, 6 February 2025 (UTC)[reply]
No, my query also only sees the current redirect status. The difference is that mine also uses the current page location; while there's going to be some pages created in mainspace and moved to draft, there'll be many, many more created in draft or userspace first. These are probably the more relevant numbers, since those need to be patrolled too.
I suppose I could add in deleted pages that were last in mainspace, but it's going to be slow - there's no usable index on timestamp, and the one on namespace isn't very selective - and there's no way to see redirect status. —Cryptic 20:16, 6 February 2025 (UTC)[reply]

Again about New Pages

[edit]

Following a January backlog drive, for a further research into improving NPP would it be possible to generate the following queries over a 1 month sample 1 to 31 Jan 2025? :

  1. Number and % of new articles created in the article space by non-autopatrolled users, excluding redirects and dab pages
  2. Number and % of new articles created by accounts that existed for less than 30 days and made less than 500 edits
  3. Number and % of new articles excluding redirects and dab pages, that were still unpatrolled after the sample period
  4. Number and % of new articles excluding redirects and dab pages, that got deleted within the sample period.
  5. Number and % of new articles that were recreated from previously deleted titles.
  6. and if possible, Number and % of new articles that were created from redirects.

Many thanks, Kudpung กุดผึ้ง (talk) 23:32, 6 February 2025 (UTC)[reply]

#4 (I've converted your requests from bullet points), at least, is going to be a problem - we can only see whether a currently-existing page is currently a redirect or dab. There's no access to the wikitext of either existing or deleted pages; for existing ones, there's a field that says whether it's currently a redirect, and dabs can be detected either through their categories or from a page property that's set on disambigs proper (though not set indices), but none of those exist for deleted ones.
The queries that don't mention excluding redirects and dabs, #2 #5 and #6 - was this omission deliberate or not?
For #2, I can compare article creation time directly to its creator's registration time, to see if the user had registered within 30 days of creation; but checking editcount can only reasonably look at the current editcount. Given the latter, should I look at current account age for consistency, or account age at creation for the partially increased accuracy? (The former's a lot easier for these particular numbers, since they happen to match up with the extendedconfirmed group.) —Cryptic 00:20, 7 February 2025 (UTC)[reply]
The queries that don't mention excluding redirects and dabs, #2 #5 and #6 - was this omission deliberate or not? That is correct. It was deliberate. For #4, Just the number and % of new mainspace articles - any kind - that were deleted, or flagged for CSD or AFD, woud be fine. For #2 whaterver is easiest for you. I'm looking to see by this to extrapolate a hypothesis of what it would have looked like if mainspace articles were onol created by ExtendedConfirmed editors.Thanks. Kudpung กุดผึ้ง (talk) 11:43, 8 February 2025 (UTC)[reply]
quarry:query/90585. —Cryptic 13:10, 8 February 2025 (UTC)[reply]