1000/1000
Hot
Most Recent
Among the operators and users of commercial Usenet news servers, common concerns are the continually increasing storage and network capacity requirements and their effects. Completion (the ability of a server to successfully receive all traffic), retention (the amount of time articles are made available to readers) and overall system performance are the topics of frequent discussion. With the increasing demands, it is common for the transit and reader server roles to be subdivided further into numbering, storage and front end systems. These server farms are continually monitored by both insiders and outsiders, and measurements of these characteristics are often used by consumers when choosing a commercial news service.
End users often use the term "posting" to refer to a single message or file posted to Usenet. For articles containing plain text, this is synonymous with an article. For binary content such as pictures and files, it is often necessary to split the content among multiple articles. Typically through the use of numbered Subject: headers, the multiple-article postings are automatically reassembled into a single unit by the newsreader. Most servers do not distinguish between single and multiple-part postings, dealing only at the level of the individual component articles.
Each news article contains a complete set of header lines, but in common use the term "headers" is also used when referring to the News Overview database. The overview is a list of the most frequently used headers, and additional information such as article sizes, typically retrieved by the client software using the NNTP XOVER command. Overviews make reading a newsgroup faster for both the client and server by eliminating the need to open each individual article to present them in list form.
If non-overview headers are required, such as for when using a kill file, it may still be necessary to use the slower method of reading all the complete article headers. Many clients are unable to do this, and limit filtering to what is available in the summaries.
When the server stores the body of an article, it places it in a disk storage area generically called a "spool". There are several common ways in which the spool may be organized:
Speed, for the purpose of this article, is how quickly a server can deliver an article to the user. The server that the user connects to is typically part of a server farm that has many servers dedicated to multiple tasks. How fast the data can move in this farm is the first thing that affects the speed of delivery.
The speed of data travelling throughout the farm can be severely bottlenecked through hard drive operations. Retrieving the article and overview information can cause massive stress on hard drives. To combat this, caching technology and cylindrical file storage systems have been developed.
Once the farm is able to deliver the data to the network, then the provider has limited control over the speed to the user. Since the network path to each user is different, some users will have good routes and the data will flow quickly. Other users will have overloaded routers between them and the provider which will cause delays. About all a provider can do in that case is try moving the traffic through a different route. If the ISP has limited connectivity to the network, routing changes may have little effect.
Frequently a user can reduce the impact of network problems by using multiple connections. Some servers allow as many as 60 simultaneous connections, but this varies widely. Likewise, newsreaders are commonly limited to using as few as two or four connections.
Article sizes are limited to what the servers will accept. For text newsgroups this is generally not a problem. For binary newsgroups this can be a problem since the maximum article size varies from site to site.
The larger the article size, the fewer articles on each server. This generally means that a server can run with less overhead which makes for a more efficient server. This is because fewer articles reduces the overhead needed to process them. However, the larger the article size, the fewer servers the article will arrive on. Most text servers will only accept article sizes as small as 64 kb, while news.neva.ru stores article sizes as large as 1 mb.
Users frequently call their service a server. In many cases this is very far from the truth. While each service is different, here is a list of the various types of server roles that a provider will have in each server farm it runs. Roles can be mixed at a given site, for example numbering and transit may be provided by the same system.
Large server farms typically also place load balancers between the front end servers and the network.
Retention is simply defined as how long the server keeps articles. Most users want retention to be long enough so that they don't need to access the server every day. Conversely, overly long retention can overwhelm users with slow computers or network connections by making the article lists inordinately large.
Retention is generally quoted separately for text and binary articles, though it may also vary between different groups within these categories. The times vary greatly according to the amount of storage available on the servers and continually increasing traffic. As of 2009, it is common for average news providers to have text retention of over 1000 days and binary retention of over 200 days. Large news providers offer text retention up to 2480 days and binary retention of 850 days or more. It's important to understand that retention time varies between different newsgroups within the text and binary categories. Omicron's HW Media is currently the Usenet server with the highest amount of binary retention, while Google is the Usenet server with the highest amount of text retention.
It can be difficult for end users to accurately measure the retention of a server. One common method is to examine the oldest articles in a group and examine the Date: headers, but this is not always accurate. Some articles in a group may be retained for longer than others, articles from remote servers do not always arrive promptly, and at times the date headers are simply incorrect. A sampling of many or all articles, preferably in more than one newsgroup, is required to detect such anomalies.
News servers do not have unlimited storage, and due to this fact they can only hold posts for a short length of time before they must delete them in order to make room for new posts. This is a particular problem to binary newsgroups which transmit large volumes of articles.
For news servers provided by Internet Service Providers as part of a user's subscription package, typical retention rates are usually only 2–4 days. For premium news servers, this often rises to 6–12 months. Many commercial providers are now advancing to 520 or more days of retention.
To deal with the increase of Usenet traffic, many providers turn to a hybrid system, in which old articles not found on the provider's server will request the article from another server with deep retention.
Given the large number of articles transferred between servers and the large size of individual articles, their complete propagation to any one server farm is not guaranteed. The term "completion" is used to describe how well a service is keeping up with the traffic.
The primary obstacle to calculating the completion percentage is how many articles were posted. Looking at only one server, one cannot know how many articles were actually inserted throughout the network. Articles may never make their way outside the originating server, or may fail to find their way out to the transit cloud. Very large articles are frequently dropped, and tend to propagate less well than smaller ones.
One way to measure completion is to access multiple servers and retrieve lists of articles. Because Message-ID: headers are nominally unique throughout the network, comparison of the lists is mostly a straightforward task. Practical limitations to this type of measurement include the impossibility of obtaining lists from all servers worldwide, the fact that many servers filter out spam or employ Usenet Death Penalties, and that some servers mask incompletion by hiding multipart binary sets with missing articles. It is also necessary to take into account propagation times and retention; an article may simply have not yet arrived at a given server, or it may have been present but already expired.
All Usenet servers peer with one or more other servers in order to exchange articles. Occasionally, new servers appear. Although there are several web resources which may aid in finding peers, a better resource is the newsgroup news.admin.peering (Google Groups portal).
As of 2020, text feeds can usually be attained for free, while full binary feeds can be free or paid (depending on how many articles each server sends to the other). Due to the large amount of data in a full binary+text Usenet feed (can be high as 30 terabytes a day) and the high costs of transmitting that data through an IP transit provider like Cogent, Telia, or Zayo, most Usenet providers will only engage in binary peering when they are interconnected at an Internet exchange like AMS-IX, SIX, or DeCIX.