Stack overflow why
Support teams Level up your support by providing information to your customers using a natural interface: questions and answers. Engineering leaders Help your team get the information they need to do their job - reduce burnout and help engineers grow and learn together. Annual billing discount. Free Free. Create a free Team.
Enterprise Custom pricing. Request a demo. You can save time and money with Stack Overflow for Teams. Calculate your ROI. Microsoft Teams. Some of the premium features available with paid tiers. Robust read and write API. Your own customer success representative. Director of Product Management Microsoft.
Director of Engineering Elastic Cloud. Engineering Expensify. Software Engineer Box. Stack Overflow works best with JavaScript enabled. The software and computing industries change rapidly as new technologies and platforms are introduced.
The ecosystem of programming languages, methodologies and developers is highly complex, with tools and behaviours continually adapting to internal and external forces. Whereas once an individual might seek an answer in a text book or from an office colleague, it has become the norm to seek help online.
Question-and-answer websites such as Stack Overflow are widely used by programmers and researchers, forming a large repository of valuable knowledge related to the software development, computing, and data science industries. Software developers rely on such websites to acquire knowledge, solve problems, seek snippets of code for reuse, improve their own code, and discuss technical concepts. Stack Overflow also helps individuals gain visibility to establish professional standing and reputation.
Stack Overflow is written by many and read by many. To help ensure good content is easily visible, Stack Overflow provides a voting system where users that contribute high quality answers or interesting questions are assigned positive votes and thematic tags e.
The reputation score is also used to give privileged roles to high-ranked users, such as up-voting, editing and moderating the community.
These features ensure that content is of high quality, but also provide a rich resource for social analysis of the platform. The social dimensions to online programming platforms such as GitHub and Stack Overflow are an intrinsic part of their function.
Various studies have tried to understand social factors, for example, seeking to identify influential users [ 1 , 2 ] or understand their general properties as socio-technical systems [ 3 ]. The social component also provides useful information about wider trends in the software industry, with user activity reflecting the shifting popularity of different technologies.
Here we analyse the evolution of the Stack Overflow user community over a relatively long period By tracking the usage of different tags by individual users, we are able to provide insight into the clusters of topics that are the focus of clusters of users, and observe trends in the adoption of new programming languages and technologies. It is reasonable to assume that trends on Stack Overflow, revealed by analysis of users, tags and various platform metrics, are reflective of wider trends in the software industry.
Thus we use Stack Overflow as a lens with which to study attention to different technologies, reveal technology clusters defined by the user groups that utilise them, and observe the movement of people between different technological clusters over time.
The core of our methodology is the construction of networks that link users to each other based on the tags that define their shared expertise. Within these networks we use community detection algorithms to identify sub-communities representing groups of users focused on particular technology clusters, using the set of tags associated with users to characterise each sub-community. By analysing a temporal sequence of such networks we are able to explore the concurrent evolution of the programming community and underpinning technologies over time.
We examine how the various sub-communities relate to each other and identify different technologies with common applications. The rise and fall of different technologies, revealed by the number of users who are interested in them and the way technologies are clustered, provides insight into the dynamics of the tech industry during a period of rapid change.
The next section describes some relevant Background, including a brief description of the operation of the Stack Overflow platform and some related work using similar data.
The Discussion section concludes the paper. Stack Overflow questions are generally hard, requiring expertise and domain knowledge to provide a good answer. Content is heavily curated by the community. Duplicate or similar questions are quickly identified as such and merged with existing questions. Posts questions or answers that are unhelpful or irrelevant are removed. As a result of this self-regulation, content on Stack Overflow tends to be of high quality.
The quality of each post is collaboratively evaluated using a voting system. Each question or answer can receive up-votes or down-votes from users, with the sum of votes up-votes minus down-votes acting as its overall voting score. Reputation brings moderation privileges. Each user gets the ability to up-vote a post when their reputation score reaches 15 points and the ability to down-vote a post when their reputation reaches points.
On Table 1 we present all the available privileges on the platform as well as the required reputation for each one and the percentage of the total users that poses them. Another key mechanism on the Stack Overflow site is the use of tags to identify the content or theme of each post.
When a user asks a question, the platform prompts them to add a small number of content tags at least one and at most five. These studies can be loosely grouped into three categories: studies of network structure, studies of content, and studies of information retrieval. We cover these in turn below. Studies of network structure explore the relationships between entities, such as users, posts or tags, that are associated with question-and-answer websites.
Communities can be identified from network structure and analysed to detect key actors, as well as the main interests and typical behaviours of the users. Silvestri et al [ 4 ] describe a methodology for linking user accounts between platforms across Stack Overflow, Github and Twitter based on user attributes and platform specific services, examining different account matching strategies.
Beyer and Pinzger [ 6 ] introduce an approach to group tag synonyms to meaningful topics by creating networks, and investigating community detection algorithms to build meaningful groups of tags. Studies of content on online question-and-answer communities typically analyse the content and metadata of questions and answers. Calefato et al [ 11 ] investigate how Stack Overflow users can increase the chance of getting their answer accepted when writing an answer or making comments, finding that information presentation, timing and affect all have an impact on the success of a post.
They find that Europe and North America are the principal and roughly equal contributors, with Asia as a distant third mainly India , followed by Oceania, which even in fourth position still contributes more than South America and Africa combined.
Vasilescu et al. They find that Stack Overflow activity rates correlate with activity in GitHub. They identify query clarity, query-to-question match, and answer quality, as key factors in predicting searcher satisfaction. Xu et al [ 18 ] use Stack Overflow as their source of question-and-answer threads, achieving good results with an attention-based model that predicts which answer will be preferred by the user posting the original question.
Zhang et al [ 1 ] propose a methodology for duplicate question detection in question-and-answer communities, adopting a classification approach based on text vectorisation and neural networks. Our methodology uses interactions between users and tags on Stack Overflow to explore trends in software development and technology usage over time. The assumption is that the tags attached to posts by a user, and the reputation score they acquire from posts using those tags, form a profile for each user that defines their interests and expertise.
These profiles can be used to link pairs of users based on the similarity of their expertise. Pairwise links can be aggregated to form a network of users within which community structure reflects groupings of users and technologies. These networks can be studied over time to explore trends.
Then we describe the main parts of the network creation and analysis methods, including how each community was characterised based on its dominant tags. Overall, this analysis pipeline permits the Stack Overflow developer community to be studied over time and to thereby reveal trends in software development and technology usage. All the data were retrieved from the Archive. This is an anonymized dump of all user-contributed content on the Stack Exchange network. Each site is formatted as a separate archive consisting of XML files zipped via 7-zip using bzip2 compression and updated every three months.
All user content contributed to the Stack Exchange network is cc-by-sa 4. The acquisition, processing and presentation of these data fully complies with the terms and conditions of the Stack Exchange network.
The XML files were quite large. If an answer gets accepted the author of the question gets 5 score points and the author of the answer 15 points. If a question gets up-voted the author gets 10 points and if an answer gets up-voted the author gets 10 points. If a post gets down-voted the author gets -2 points, as also does the user who cast the down-vote.
The first step of our analysis was to parse the data files to group together posts that were created in each month. For some analyses, we used a monthly time unit, for others yearly, making a 1-month bin size suitable as input for both.
Before user networks could be created, we first associated each user with a set of tags reflecting their interests and expertise. The up-votes each user receives on a post are also associated with the tags assigned to the post.
For example if a user receives reputation on a post the co-responding tags will be assigned to the user as well and each one of them will receive the same reputation.
This creates a ranking of related tags for each user. Our goal is to utilise those tags to form relations between users based on how similar their tags are. Those tags are based on the reputation score the users acquired from posts containing those tags. The data archive provides the list of tags associated with each question post, but does not provide this information for answer posts. Therefore, we annotated each answer post with the same tags as its parent question post, using the ParentId attribute to recover the corresponding question record.
At the end of processing, the product was a data dictionary for each month with the user IDs as keys and a ranked list of their top tags based on the reputation score they acquired. For our analysis, we created graphs where the nodes are the users and the edges between them represent how related two users are in their technology focus, based on how similar tags those users have.
This way we get 0 for identical users and 2 for users with completely different tags. We convert the distance into similarity by subtracting it from two. Our similarity metric determines which tags mostly represent each user and by comparing the users vectors we can calculate how similar two users are.
The motivation behind this graph creation approach is to link users that have similar tags. Using this similarity-based edge creation approach we cluster users that have similar technology interests.
Alternatively, we might have linked users that interacted directly on the StackOverflow platform, e. This kind of user interaction is subsumed within our similarity-based edges, since both questions and answers are associated with the same tags. Our approach additionally allows users with similar interests to be linked, even if they have never interacted directly; this aspect serves to provide a more coherent network structure.
We then monitor if our methodology produces a meaningful structure, detect user communities and annotate them based on the tags of the users belonging to each community. Our priority here is to reveal which tags are grouped together on each community and, by annotating each community, to understand which technologies are used on each software development field.
The approach we took for dynamic community discovery was the Instant Optimal Community Detection approach [ 19 ] which considers that the detected communities on every time step one year in our case are independent.
This approach consists of two stages. The Identify stage where communities are detected on each step of the evolution and the Match stage where the detected communities on each step are aligned with the communities of the other steps. We then applied the InfoMap [ 20 ] community detection algorithm to identify community structure within these graphs. Infomap is a pattern-based community detection algorithm and is based on the concept of patterns of random movement walks among the nodes of a network.
The main intuition of this method is that a community can be defined as a group of nodes where a random walker is more likely to be trapped in.
This concept can be treated as an increased flow circulation pattern between the nodes of the same community. We chose this algorithm based on its performance on networks with sizes around thousand nodes [ 21 — 23 ]. To characterise each community of users, we utilised their reputation scores on the tags that were used to create its edges.
This was done by summing the scores for tags shared by the users within each community i. The summed scores were used to create a ranking of tags based on the total associated reputation score acquired by the community users. Using the top tags of each community ranking we are able to characterise the community by its core topics and technologies, and identify the part of the software industry that it represents. For each year, we detected a number of communities in the user-tag graph.
Characterisation using tags suggested that many of the bigger communities were present in almost every year, while smaller communities tended to be more volatile, appearing and disappearing. For the bigger communities we wanted to investigate their consistency in terms of the users that constitute them in each year. Our intuition is that a persistent community should appear as a pair of communities in consecutive years that have a large number of users in common and a low number of users in common with any other community.
Persistence was also assessed qualitatively by the similarity of dominant tags over time. Fig 1 displays the number of questions and answers posted on Stack Overflow for every month since August until December , as well as the number of active users defined as those users that created a post—question or answer—or received scores from up-votes on a historic post.
The amount of posts increased from until , with more answers than questions perhaps to be expected since each question can have more than one answer. We observe a significant drop in the number of answers and questions posted after The number of active users making posts stabilises after while the number of users receiving up-votes continues to increase. During the same period, there is a decrease in the number of answers, while the number of questions reaches an equilibrium.
Plot displays the number of questions and answers posted left axis , and the number of users creating posts or receiving score via up-votes right axis.
Stack Overflow Meta is a forum where users discuss the workings and policies of Stack Overflow, rather than discussing programming itself. It is separated from the main question-and-answer site to reduce noise, while providing a legitimate space for people to ask how and why the Stack Overflow site works the way it does. A lot of questions more suitable for the Meta community were moved away from the main platform by the moderators, giving a plausible explanation for the significant drop in What is probably happening is that the community can handle a specific amount of questions each month.
If the number of questions exceeds a threshold about thousand questions the community gets saturated and many questions are not answered. This discourages users, especially new ones, from asking new questions and answer on other peoples questions, resulting on a decrease in answers and stable amount of questions.
The amount of users creating new posts is increasing until and then it seems to stabilize on around thousand users. The amount of users receiving score is increasing since and reaches about thousand users. This means that older posts are receiving votes from the users thus posts can be useful to the community for a long period of time, even tho the software development industry is rapidly evolving and changing.
To observe trends in the popularity of different tools and technologies in the software industry, we tracked the accumulated monthly votes given by Stack Overflow users to posts labelled with three categories of tags. Each time a question or answer received a vote, we incremented the score for every tag associated with the question. Scores for each tag were aggregated over monthly time periods. While computing has changed a lot in the 20 years since the SOLID principles were conceived, they are still the best practices for designing software.
Daniel Orner. Organizations and leaders have a responsibility to ensure people are heard, to build high levels of trust and enable them to show up authentically— all so they can do their best work. Prashanth Chandrasekar CEO. Here's why JavaScript has been the language of choice for front-end and back-end web dev. Originally, React mainly used class components, which can be strenuous at times as you always had to switch between classes, higher-order components, and render props. With React hooks, you can now do all these without switching, using functional components.
Doro Onome. Code quality affects the mental state of a programmer, communication within their team, and the incentives attached to their work. Improve your code and you can improve your organizational health and competence as a whole. Isaac Lyman.
0コメント