User:Sannita/Sandbox

From Wikispore
Jump to navigation Jump to search

This is a draft for a potential "sports results" project. At the moment, it's a rambling, but I'm confident one day it'll turn into a real project.

Sports results are a very mundane topic, in comparison to the wealth of information that we can share across projects. It's a matter of fact, though, that sports attract many users to our projects, and that the relative wikiprojects evolved into very complex ecosystems, maintaining a huge number of pages and using a conspicuous number of templates and, in the last couple of years, modules.

It didn't take long for Wikidata to set up its own wikiprojects about sports, and even one specifically about sports results, trying to define how to manage quite a number of items about tournaments, athletes, single matches or contests, etc. It didn't take long either to set up a working data model - and for some of the user to get to think how complex this issue is, and how deep and far we can go trying to address the sum of all sports championships.

Despite the presence of a fairly working data model and despite the fact that sports results share the same basic characteristics with data (i.e. they remain the same regardless of the Wikimedia project on which they are shown), we're still a long way from the desired result, due to the implicit difficulties in dealing with this issue and the lack of a suitable cross-wiki infrastructure that may help users to overcome some of those difficulties.

Starting from scratch

Every major international or national tournament, regardless of which IOC-recognised sport we're talking about, has (or should have) an article on every Wikipedia. Most, if not almost all, of these major tournaments also have an article for every single edition (i.e. 2016 Summer Olympics, 1930 Ice Hockey World Championships, 1975 World Series, and so on), and all of those tournaments and editions should have an item on Wikidata, since they fall in Wikidata's first criteria for notability (i.e. item contains «at least one valid sitelink» to a Wikimedia project page). Nothing wrong or strange until now.

Not all projects have articles about all tournaments and/or editions we're talking about. This means that, if you want to make an educated guess about how many items we're talking about, the final number will be higher than the sum of all articles on your home wiki (yes, even if we're talking about English Wikipedia). Still nothing wrong or strange about this, though. On the contrary, it means there's room for improvement for your project, which is rather good news.

Let's get to the fun part now, and hypothesise that you want to create some of those missing articles on your Wikipedia. The safest assumption is that you're going to copy-paste and then adapt some other wiki's content into your own wiki - as we all did and still do, most of the time.

As the Global templates proposal already fleshed out, this is going to take some time, because you're not just going to translate every single line of text into your language, but you'll also need to substitute every single template in there in order to standardise the article to your home wiki's Wikiproject guidelines (if present), or just your home wiki's general guidelines. This starts with the infobox at the top of the article, continues with all the citation templates and with those you use to properly format all matches.

Maybe you might want also to add or correct some data about a match or a table. This is fine, and this is actually what we're here for - but should you add/correct those data also on the project you took them from? The answer is «obviously yes», especially if we're talking about correcting a tiny mistake, such as a single wrong digit in a table. But what if you want to add formations in a Footballbox? If there's a documentation page that can help you do it, maybe you can do it, but what if there's no documentation page? You'll probably end up expanding just your version, and leave the original one incomplete.

Why not use Wikidata then?

Someone will definitely think or say that the solution (or at least part of it) is Wikidata. This is only partially correct.

Reusing Wikidata data about a tournament or a single edition of a tournament is, all in all, quite easy, and only requires for the data to be present in the item and for the template to be able to call that data.

But what if we want to call data for a particular match and/or phase of that tournament? In order to do so, Wikidata should be hosting data about single phases and matches too. This could prove to be, to put it euphemistically, challenging.

The assumption about how to cover these is simple, yet quite radical: every match or race or equivalent should have an item on its own. This allows for a better and deeper storage of information and statistics about a match in a single item, while allows also to group matches into items about the match-day (or turn, or phase, or round, or whatever the name is) they are part of. Such items, in turn, can be included into other “grouping items”, and so on, up to the item regarding the single edition of the tournament.

This is as easy to say, as it is in fact difficult to manage: 99% of all matches do not have the same importance of Olympics/World Cup/Championship finals, or of Superbowls or World Series; even less made it into actual history, such as the “Game of the Century” (1970 FIFA World Cup's Italy v West Germany), the “Miracle on Ice” (1980 Winter Olympics USSR v USA ice hockey match) or the infamous “Blood in the Water match” (1956 Summer Olympics Hungary v USSR water polo match). In other words, they usually do not have a Wikipedia article, therefore they do not fall in Wikidata notability criteria #1.

They may fall into criteria #2 («an instance of a clearly identifiable conceptual or material entity» that «can be described using serious and publicly available references») and/or #3 (an item that «fulfills a structural need»), but this could turn out to be one of the slipperiest of slippery slopes ever.

An “easy” case

Let's make the example of 2018-2019 edition of Serie A (Italy's main men's association football tournament): the championship is divided into 38 match-days, and each match-day is composed by 10 matches. This means that one needs to create:

  1. 1 item about the edition itself;
  2. 38 items for all edition's match-days;
  3. (38*10)=380 items for all edition's matches.

This means that a user has to create, and fill in appropriately, (1+38+380)=419 items in order to freely share and reuse data about the 2018-2019 edition of Serie A. This may not seem a high number, given that Wikidata hosts 60 million items and counting. But you have to think that you have to multiply this number for all Serie A editions (which, as of now, are around 120, with an evolving formula), then for all first-division women's and men's association football championships all around the world, then add up all continental, world, and Olympic tournaments, cups, and championships for clubs and national teams... and then repeat all of this for all the IOC-sanctioned sports.

But we can go further than that: what about second-, third- or fourth-level divisions? What about high-level friendly tournaments, such as the International Champions Cup, or other kind of high-level exhibition games or All-star games? What about championships for amateur selections (such as the UEFA Regions' Cup) or for unrecognised States (such as the CONIFA World Football Cup)? What about players, referees and match officials, venues, coaches, etc.? How many items are we going to need? Where do we draw the line on Wikidata, considering that every user might have a different take on this?

A complicated case

And what about complicated cases? Let's make another example, this time about Italy's 1921-1922 CCI First Division (one of the two main association football championships for that season), that was structured the following way:

1921-1922 CCI First Division
Preliminary phase
(1 item)
League North
(1 item)
Group stage
(1 item)
Group A
(1 item)
22 match-days
(22 items)
6 matches each match-day
([6*22] = 132 items)
Group B
(1 item)
22 match-days
(22 items)
6 matches each match-day
([6*22] = 132 items)
Final
(1 item)
First leg
(1 item)
Second leg
(1 item)
League South
(1 item)
Group stage
(1 item)
Marche
(1 item)
Group stage
(1 item)
Group A
(1 item)
6 matches
(6 items)
Group B
(1 item)
6 matches
(6 items)
Final group
(1 item)
2 match-days
(2 items)
2 matches each match-day
([2*2] = 4 items)
Lazio
(1 item)
20 match-days
(20 items)
68 matches +
8 annulled matches[1] +
1 relegation play-out
([68+8+1] = 77 items)
Campania
(1 item)
Preliminary play-off
(1 item)
First leg
(1 item)
Second leg
(1 item)
Group stage
(1 item)
14 match-days
(14 items)
3 matches each match-day
([3*14] = 42 items)
Apulia
(1 item)
Group stage
(1 item)
3 match-days
(3 items)
2 matches each match-day
([2*3] = 6 items)
Final group
(1 item)
6 match-days
(6 items)
2 matches each match-day
([2*6] = 12 items)
Sicily
(1 item)
10 match-days
(10 items)
3 matches each match-day
([3*10] = 30 items)
Final phase
(1 item)
First round
(1 item)
2 matches +
1 walkover
(3 items)
Semifinals
(1 item)
1 match +
1 walkover
(2 items)
Final
(1 item)
Grand final
(1 item)
First leg
(1 item)
Second leg
(1 item)
  1. Two teams were disqualified after the championship started and their results were annulled as a consequence.

Considering the original assumption (one item for every match, match-day, and tournament subdivision), what's interesting in this case is that 108 items of the 570 overall needed (18.95%, almost one in five) to cover the whole championship will be about its structure, not about the matches. (Oh, and remember: this was one of the two Italy's main association football championships that season, due to an internal split that was recomposed in time for the next one).

Looking for a solution

As we already saw, it is fairly impossible to store all these data on Wikidata alone. The best solution would be a separate project, with a Structured Data-like underlying Wikibase instance.

This way, the original assumption for covering all this data would be respected, and data across versions would be finally standardised (including their references, that would be stored in the items). By transcluding this data, big projects will be helped in cleaning up articles and templates, and small projects in creating and maintaining new articles.

The problems of this solution are that:

  • it is currently extremely improbable that such a project can be established in the short term by WMF;
  • even if a new project is green-lighted, it would need a working Wikibase federation - which comes with a separate load of problems;
  • it is currently impossible to transclude templates (or even data) from this new project to the existing projects.

So, as an intermediate step towards this solution, another solution would be to take advantage of the Global templates proposal, basically by turning into templates every match and table of every tournament, and then transcluding them into the relevant articles. The templates will rely in turn on a common, globally developed set of templates and modules to guarantee a standardisation of data and graphics, such as w:en:Module:Sports table for tables or Template:Wl for creating links to Wikidata items that read automatically if your project already has an article or provides you with an alternative (that can be really useful with athletes or venues).

This will not be the end to all problems, but can be a step towards the right direction: it can strengthen collaboration and data exchange between projects, it can help minor projects to import these kind of data easily (since they would be as easy to use as "copy and paste the name of the template"), and provide a first step-stone towards the real solution, that would be the Wikibase instance.

There are, of course, several things still to discuss, such as sources and graphics, but these should be worked out by the community of templates and modules developers, in concert with those interested in this kind of things.