Background & Summary
Gryzzly (https://www.gryzzly.io/en/) is a time-tracking and project management software that integrates with popular communication platforms such as Slack and Microsoft Teams, allowing users -86% of whom work for French companies- to log work hours directly within their messaging platforms. It does so by means of both a chatbot and an application that engage with users through daily interactions, prompting them to record their work hours, and providing them with reminders to ensure that all daily activities are recorded. In doing so, they collect various data points, including the amount of time spent per user on specific projects or tasks, the nature of the activities performed, and the allocation of resources across different projects. The data is aggregated in a dashboard for real-time information on project progress and budget use. Accurate time-tracking facilitates precise project planning and scheduling, reducing the probability of budget overruns and missed deadlines, which are known drivers of project failure (occurring when a project does not meet its defined objectives, exceeds its budget, or fails to adhere to its scheduled timeline)1.
Projects, or well-defined units of work with specific aims, are foundational drivers of economic growth, both in the public and private sectors, as the World Bank estimates that 22% of the world’s gross domestic product is attributed to project-based delivery mechanisms2. As the economy increasingly relies on large-scale projects to develop infrastructure, drive technological advancements, and fulfill social objectives, the successful completion of these projects becomes essential to maintaining progress3. However, despite their importance, projects tend to fail quite often, particularly in complex, high-stakes environments like IT and public infrastructure, where up to 98% of all projects exceed their schedules or their allotted budget2,4. Moreover, despite the increased attention project failure has attracted over time, some authors estimate that project execution delays remain at levels similar to those seen 15 years ago5, further emphasizing the need for new datasets and frameworks to better understand the mechanisms by which projects fail.
In light of this, recent research in the field of science of science6 has expanded our understanding of project failure dynamics by leveraging large datasets to analyze the factors that influence project success or failure. In a recent study7, Klug and Bagrow analyzed the team dynamics of 150,000 GitHub projects, showing that successful teams are often larger and also exhibit a more uneven workload distribution than unsuccessful ones, with most tasks performed by a core subset of contributors. Other datasets such as the one provided by Schueller et al.8 drew similar conclusions. Rather than focusing solely on team dynamics, other authors have investigated how the structural composition of projects significantly contributes to failure. Santolini et al.9 analyzed 14 large-scale projects to demonstrate how dependencies within activity networks create vulnerabilities to cascading failures, where a delay in one task at the project’s foundation can propagate and ultimately disrupt the entire project. Moran et al.10 recently provided insights into this process, identifying a phase transition in a model of delay propagation in temporal networks, where insufficient temporal buffers lead to cascading delay avalanches, while sufficient buffers prevent disruption. In doing so, the authors demonstrated how just-in-time strategies, commonly used in supply chains and activity networks to enhance efficiency, exacerbate vulnerabilities to localized disruptions11,12. Lastly, rather than examining the relationship between project structure and failure, some authors have shifted their focus to studying the dynamics of failure itself. Yin et al.13 examined NIH grants, startup ventures, and terrorist attacks as case studies to explore these dynamics, illustrating how repeated attempts at success may lead to either progressive improvement or stagnation, depending on the underlying learning mechanisms.
Despite significant advancements in project management research, understanding the interplay between task dependencies, team dynamics, and failure patterns remains a challenge. This dataset bridges this gap by capturing granular task-level and temporal data. Indeed, unlike previously discussed datasets-which typically support only isolated analyses of team dynamics due to a lack of details on task failure7 or user involvement in specific tasks9-our dataset is uniquely positioned to facilitate the combined study of project structure, team composition, and project failure. To support this, each project is modeled as a bipartite temporal network, where users collaborate in teams across a series of tasks that collectively form the project. Moreover, because task failure and project failure are decoupled, that is, a project can fail without all or any of its individual tasks failing and vice versa, our dataset provides a nuanced view of how task dependencies within the project structure influence its chances of failing. Given its size and time span, we believe this dataset is among the first of its kind to facilitate the identification of specific behaviors and work patterns associated with higher productivity and successful project outcomes, thereby supporting evidence-based best practices in project management. Additionally, the unique requirement for users to manually log their completed tasks in the system also creates a rich secondary layer of behavioral data that may reveal implicit patterns of user engagement and task completion strategies.
Methods
The time-tracking dataset introduced in this study was compiled from the Gryzzly time-tracking platform, a tool used by multiple companies to monitor and manage work on projects, tasks, and team activities. The dataset primarily consists of user-reported time declarations indicating the amount of time employees dedicated daily to specific tasks. Each record includes detailed information on user contributions to projects, including the specific tasks performed and how their workload was distributed among team members and projects.
Data collection began when a project was created on the Gryzzly platform, with users providing consent at registration through agreed terms and conditions specifying that the data could be processed for research purposes. Company administrators defined the project scope, listed the tasks required for its completion, set deadlines, allocated budgets, and assigned team members. Once established, tasks were assigned to users, who were further organized into teams to facilitate collaboration. Each team member had a designated role, with team leaders overseeing progress and productivity. Users interacted with Gryzzly via the web interface or chatbot to declare work hours for specific tasks. Figure1 illustrates the user interface within Slack, highlighting the way declarations were submitted and organized.
Gryzzly app. Interface of the Gryzzly bot within the Slack workspace of a customer. The bot interacts with users daily, prompting them to declare the tasks they have worked on. Users can input task details such as project, activity type, duration, and additional descriptions through a guided interface, facilitating accurate and structured activity tracking. The bot also provides shortcuts for repeating previous declarations to streamline the process.
Collected data underwent processing to ensure quality, privacy, and usability. Initially, the raw data contained some inconsistencies, including incomplete entries, aberrant declaration durations, and duplicates. These inconsistencies were addressed using automated scripts, ensuring that only high-quality data was retained in the final dataset. To protect the privacy of individual users and companies, the dataset underwent an anonymization process. All personally identifiable information (PII), including user names and specific company identifiers, was pseudonymized to safeguard sensitive data. The anonymization procedure was automatically applied when a company stopped using our software or when a research dataset was exported from our production database. Internal database identifiers were replaced with public identifiers suitable for sharing. These public identifiers were generated by hashing internal identifiers using SHA-256 and converting the first 16 bytes into a UUIDv4 format, ensuring stability across dataset versions. Furthermore, any sensitive information that could potentially enable the re-identification of individuals was redacted according to Gryzzly’s anonymization guidelines. This approach adhered to industry best practices for privacy protection and complied with regulations such as GDPR, ensuring that the dataset could not be used for unauthorized user or company identification.
Data Records
The dataset is available through Figshare14 under the Creative Commons International License 4.0 (CC BY 4.0). It includes comprehensive records detailing users, tasks, projects, teams, and work activities, as visualized in Fig.2, which illustrates the growth of our database over time along with the log-binned degree distribution within the Gryzzly network. Note that, in our bipartite interaction network, nodes represent entities-specifically, users, tasks, projects (defined by the tasks they encompass), and teams (defined by the users they include)-while edges represent declarations from users to tasks. All node degree distributions exhibit heavy-tailed characteristics. This information is stored in multiple .csv files, each corresponding to specific data categories as outlined below:
-
users.csv: This file contains anonymized identifiers for each of the 12,447 users (id), information about their team assignments (team_id), account creation (created_at) and deletion dates (deleted_at). It provides a basis for understanding individual participation in various projects, offering insights into user-level engagement and activity patterns.
-
projects.csv and projects_computed.csv: These files include metadata for 50,759 projects in the dataset, such as project IDs (id), project creation dates (created_at), initially allotted budgets (in terms of planned hours i.e planned_duration), consumed budget so far (elapsed_duration) and associated teams (team_id). Note that when projects and tasks are created, they can be allocated an initial budget, either in terms of planned duration (in hours) or planned monetary cost (in euros). Our internal analysis reveals a strong correlation (R2=0.894) between planned hours and planned cost as well as between spent hours and spent costs. This relationship enables the analysis of budget consumption in terms of either time or money. Due to the sensitive nature of financial data, this dataset release includes only planned and spent hours as the primary variables for evaluating project management. Not all projects have allocated budgets; only 18,612 projects are subject to limited resource allocation. For these projects, success or failure is defined as follows: a project is deemed a failure if the consumed budget exceeds the initial allocation, while success is determined by staying within budget without any new declarations for three months after the last recorded declaration-a threshold where the probability of new declarations drops below 5%, marking the project as complete. Projects that remain under budget but have received recent declarations within three months of the data collection end date (October 17, 2024) are not classified as successful, as they may still exceed their budget. This provides conclusive failure information for 16,032 projects. Figure 3 presents the typical evolution of one of these projects within our dataset, showing how declarations accumulate for each task over time and one of the many common cases where the project’s initial budget is exceeded.
-
tasks.csv and tasks_computed.csv: These files describe the 173,323 tasks and include unique identifiers (id), task creation dates (created_at), associated project identifiers (project_id), task completion goals (planned_duration) and budget consumption at the time of the data export (elapsed_duration). Note that some tasks act as containers for subtasks (indicated by is_container), and those subtasks identify their parent container through the parent_id field. The tasks represent the subactivities that make up each project and serve as the fundamental building blocks that break down the overall goal of the project into manageable units of work. Like projects, 45,416 tasks contain budget information, and 40,947 can be categorized as failed or successful based on budget adherence, with their budget and cost determined by the planned and actual hours spent on completion.
-
teams.csv and subscriptions.csv: These files provide details about the 2,839 companies employing the users, including their creation (created_at) and deletion dates (deleted_at) from the database as well as subscription (subscribed_at) and churn dates (disabled_at). They also indicate whether the companies are still actively using our product (status) and whether they participated in a trial period, with the corresponding starting (trial_start_at) and ending (trial_end_at) dates. Additionally, the dataset includes the type of offer subscribed by the company (offer), which can be budget, time, or freemium, as described on our website. It enables the analysis of team collaboration and productivity, offering information on the dynamics of teamwork and how team composition affects project outcomes.
-
declarations.csv: This file provides details on the 4,446,670 declarations (identified by id) made by users through 811,924 user-days into project tasks, including information such as the user source (user_id), task target (task_id), declaration date (date), creation date (created_at, i.e, the date when the declaration was created in the database), duration (duration), and platform source of the declaration (source either the Gryzzly app (app), the Gryzzly bot through Slack or Microsoft Teams (bot), the Gryzzly API (api), manual csv imports (import), or other sources (papi)).
Network Statistics. (a) Cumulative number of teams, users, projects, and tasks in the Gryzzly dataset over time (2018-2025). (b) Log-binned probability distribution function (PDF) of the number of tasks, projects and teams per user. Note that each user belongs only to a single team (red curve collapsed). (c) PDF of the number of users, projects and teams per task. Note that each task belongs to a single project and a single team (red and green curves collapsed). (d) PDF of the number of users, tasks, and teams per project. Each project is linked to one team (red curve collapsed). (e) PDF of the number of users, tasks, and projects per team.
Timeline of Project Evolution. The shared x-axis represents the timeline of a particular project, detailing the progression from January 2021 to July 2024. (a) The y-axis lists individual tasks within the project. Dots along each line indicate declarations made on a specific day for the corresponding task. Dots are colored blue if the total declared cost of the task at that point remains within the budget. Dots turn red once the budget for that task is surpassed. Tasks that did not exceed their budget or do not have an associated budget contain only blue dots. (b) The y-axis represents the cumulative number of hours declared for the entire project over time. The cumulative curve is also color-coded-blue before the overall project budget is exceeded, and red thereafter-indicating periods of budget adherence and overrun.
These .csv files allow researchers to perform a detailed analysis on different aspects of workplace activities, such as individual productivity, team collaboration, project management, and financial expenditure.
Technical Validation
In this section, we validate our dataset across multiple dimensions, demonstrating its reliability in capturing well-documented temporal and network patterns, project failures, and team dynamics, as well as its capacity to recover known correlations between team performance, structure, composition, and overall project outcomes. This systematic validation aims to provide researchers with confidence in the dataset’s capacity to support nuanced analyses of organizational behavior and project management, thereby contributing to the advancing field of the science of science6,15. We begin by evaluating the temporal properties of the dataset, including the dynamics of user declarations and project activities. We also validate the network structure, observing that they exhibit properties typical of real-world networks, such as the small-world and scale-free characteristics typically associated with activity networks9. Furthermore, we assess project failure rates, exploring how well our data aligns with known predictors of project outcomes13. Then, we examine the extent to which the dataset represents real-world project trajectories, including task completion and budget adherence3,4. Finally, we examine team dynamics to verify that the dataset accurately captures how known variations in team performance, structure and composition correlate with overall project performance7,16,17.
Temporal and Network Structure of Declarations
We first focus on the dataset’s ability to accurately display temporal and network structures. It effectively captures the circadian rhythms that characterize workplace productivity. As shown in Fig.4a, the average share of declarations per hour of the day reveals distinct peaks in user activity, particularly during the morning and late afternoon hours. These activity patterns align with established observations of human productivity cycles18,19, where individuals exhibit peak engagement during early work hours and again later in the day as they work to complete their tasks.
Temporal Statistics. Statistical characteristics of the temporal data. (a) Daily circadian variation in the timing of declarations, showing the percentage of declarations made at each hour of the day with peaks in the early morning and afternoon. The solid line represents the average declaration density per hour, faint lines represent declaration density per hour of the day of each year in the dataset. (b) Complementary cumulative distribution function (CCDF) of declaration durations for edges, users, tasks, projects and teams. (c) CCDF of inter-event times, depicting the time intervals between successive declarations for edges, users, tasks, projects and teams.
Further validation is achieved by analyzing the temporal dynamics of workplace interactions through the distribution of inter-declaration times (see Fig.4c). This metric measures the time interval between consecutive declarations for each edge (i.e., a user-task pair), as well as for individual users, tasks, and projects. The observed scaling of inter-declaration times follows a heavy-tailed distribution, indicative of bursty activity patterns characterized by short, intense periods of activity followed by longer intervals of inactivity.
Additionally, the observed patterns of declaration durations (see Fig.4b) and inter-declaration times mirror those found in other forms of human interaction, such as face-to-face interactions among children20 and telephone communications19. The heavy-tailed node degree distributions within the dataset’s network structure also align with those documented in network descriptions of similar processes.
Consistent Observations with Failure Dynamics
As previously noted, project failure is a pervasive issue, with estimated failure rates ranging from 50% to 98%, depending on the industry4. Our dataset reflects similar trends, with only 42% of tasks and 48% of projects classified as successfully completed. Previous studies on failure dynamics suggest that in processes involving iterative attempts, success often emerges through two primary mechanisms: chance, where one attempt in the sequence succeeds randomly, and learning, where each failure reduces the likelihood of future failure by providing insights into the process. Under the assumption that chance was the primary mechanism for success, Yin et al.13 demonstrated that if each attempt had a fixed probability of success, the likelihood of multiple consecutive failures would decrease exponentially with each trial. The researchers then disproved this theory by revealing that the chance hypothesis could not fully account for the failure dynamics observed experimentally. In fact, in sequences of failed attempts, or failure streaks, the penultimate attempt showed systematically better performance than the initial attempt, reinforcing the role of learning in achieving success through repeated failure. Despite this, the authors also argued that if learning were the primary driver of success, it should reduce the number of failures required and produce failure streaks following a narrower distribution than the exponential one predicted by the chance hypothesis. This, however, was rarely observed in real-world datasets, where the length of failure streaks typically followed a heavy-tailed distribution. This pattern indicated that, despite performance improvements, failures were often characterized by longer-than-expected streaks before success was achieved, suggesting that neither chance nor learning alone could fully explain the observed data. To assess whether these findings are reflected in our dataset, we analyze the distribution of failure streak lengths within it.
To evaluate our observations against predictions from a chance model, we replicate the methodology of Yin et al. by generating a randomized sequence of successes and failures, with each attempt randomly assigned to agents. This simulation reveals patterns consistent with existing literature, as illustrated in Fig.5. Specifically, the distribution of task and project failure streaks for both users and teams exhibits greater variability than would be expected if project success was determined solely by learning or random chance. We also examine the change in task performance from the first to the last attempt. To achieve this, we treat sequential tasks within projects as proxies for consecutive attempts by users or teams employing different management strategies. An attempt is considered failed if the corresponding task fails, while overall project failure is determined by the project’s failure status. Additionally, instead of categorizing performance as a binary outcome, we quantify task performance (St) and project performance (Sp) by evaluating the extent to which the actual cost (CF) deviates from the initial cost estimate (C0). For tasks, this is computed as:
$${S}_{t}=\frac{{C}_{F}-{C}_{0}}{{C}_{0}}$$
As illustrated in Fig.6a, successful projects generally exhibit improved performance from the first to the last attempt, unlike failed projects. This pattern mirrors findings reported in the literature13, thereby reinforcing the notion that both chance and learning contribute to the observed outcomes.
Failure Streak Analysis. Chance hypothesis (hyp) versus Empirical Measurements (emp): The plots compare theoretical (lighter curves) predictions under the chance model for success and empirical (darker curves) measurements for the distribution of failure streak lengths. (a) shows the CCDF of failure streak length aggregated at the user level, while (b) shows the CCDF aggregated at the team level. Red lines represent task failure streaks, and blue lines represent project failure streaks. The observed distributions of failure streaks exhibit heavy-tailed behavior, indicating that real failure streaks tend to be significantly longer than what is expected under the chance hypothesis.
Failure Dynamics. (a) Average task performance of the first and last tasks. Bar heights represent the mean values, and error bars indicate the standard error of the mean of task scores. Red bars denote failed projects, while blue bars denote successful ones. Successful projects demonstrate an improvement in task performance from the first to the last task, while failed projects do not exhibit such an improvement, highlighting the disparity in learning and adaptation between successful and unsuccessful endeavors. (b) Average inter-event time (Tn) between two failed tasks as a function of the number of tasks attempted. The ratio Tn≡tn/t1 measures the relative timing between consecutive failures. Dots represent mean values, and shaded areas indicate the standard error of the mean. The results show that successful projects (blue) tend to have shorter inter-event times between failures, indicating more rapid iterations compared to failed projects (red).
Finally, Yin et al. found that successful projects initiated new attempts more rapidly than unsuccessful ones, attributable to their greater efficiency and higher quality of efforts. This was demonstrated by analyzing the average inter-event time between consecutive failures (Tn) as a function of the number of failures (n). The authors showed that in successful groups, Tn decreased with increasing n, in contrast to the pattern observed in unsuccessful groups. In our dataset, we observe a similar trend: As shown in Fig.6b, users of successful projects tend to shorten the time between failed tasks compared to failed projects, particularly in the early stages of the project. These findings indicate that our dataset aligns well with the failure dynamics observed in recent studies, supporting the notion that successful projects are characterized by an adaptive reduction in the time between consecutive attempts following failure.
Consistent Observations with Team Dynamics
Finally, we validate our dataset against existing research on the relationship between team structure and performance. Klug et al.7 found that successful teams were typically larger, concentrated their workload on a few key members, and comprised individuals with more diverse project experiences than unsuccessful ones. The “group size hypothesis”, which posits that larger teams enhance performance, has since been refined to account for factors such as task complexity16, and team hierarchy21. Our objective is to evaluate whether our dataset replicates these findings regarding team structure and its impact on project outcomes. To achieve this, we examine the correlations between various project metrics-namely project activity, team composition, and indicators of past team experience-and project failure by means of a logistic regression model applied to 5,409 projects with at least five individual days of activity. For each user u involved in a project P, we define Su as the set of projects in which the user participated prior to starting work on P. In accordance with the methodology proposed by Klug et al.7, we estimate the experience E of a team of size n by calculating the average number of projects that team members have completed before starting work on P:
$$E=\frac{1}{n}\sum _{u\in P}| {S}_{u}| -1$$
and the diversity D as the ratio of projects users in P have shared prior to start working on P:
$$D=\frac{| {\bigcup }_{u\in P}{S}_{u}| }{{\sum }_{u\in P}| {S}_{u}| }$$
We also define the project’s time span as the number of days passed since its creation. To address the skewed distributions of these variables, we apply a log-transformation to those with heavy-tailed distributions and standardize the input features using z-scores. Additionally, we remove extreme outliers by excluding data points below the 2.5% and above the 97.5% quantiles of the data. The results are presented in Fig.7.
Independent Predictors of Project Failure: Independent predictors of project failure are derived from a multivariate logistic regression model. The analysis includes variables grouped across several key domains: team composition, past experience, and activity levels. Team composition includes the number of users and number of tasks in the project while past experience groups the team’s diversity and experience. Activity encompasses the number of individual declarations in the project, the average number of hours declared per day per task of the project, the time span of activity in the project, the average number of hours declared in the project per day and the number of declared days in the project. Each variable is presented with its estimated odds ratio, and corresponding 95% confidence interval and p-value.
The model closely mirrors the observations made by Klug et al.7, demonstrating that increased team size (measured by the number of team members; coefficient: −0.202, p<0.001) and team diversity (coefficient: −0.209, p<0.001) are significantly associated with project success. Unsurprisingly, higher total work input-estimated by the number of individual declarations (coefficient: 0.799, p<0.001) and the daily average of hours declared into a task of the project (coefficient: 0.465, p<0.01)-is associated with an increased likelihood of project failure, as these variables are strong indicators of project budget consumption. Interestingly, dividing a project’s goals into a greater number of tasks correlates with project success (coefficient: −0.157, p<0.005). This effect may result from an improved understanding of project complexity and the actual work required, leading to more effective project management, as evidenced by previous studies22,23. Team experience and other measures of project timelines-such as time span of activity, number of active days, and the average number of hours per declaration-do not show a significant association with project success (p>0.05). Therefore, the technical validation confirms that the Gryzzly dataset reliably captures expected patterns in user behavior, temporal collaboration dynamics, and project failure trends, ensuring its value for further analysis of work processes and productivity studies.
Usage Notes
To demonstrate how researchers can use our dataset, we provide a Jupyter notebook in the accompanying GitHub repository to illustrate how to load the dataset and reproduce the analysis conducted in this work. The dataset is structured to facilitate easy integration with commonly used data analysis tools, including popular Python libraries such as pandas and polars. Researchers can import the dataset and conduct exploratory data analysis, statistical modeling, and machine learning tasks to derive insights relevant to their specific research questions. Furthermore, the inclusion of temporal and budgetary data dimensions enables time-series analysis and performance modeling. Researchers are advised to focus on aggregated patterns to draw meaningful conclusions. Those aiming to correlate behavioral interventions with productivity metrics can take advantage of failure variables to assess the impact of the project’s collaboration structure on its success.
Code availability
Code developed for this research is made freely available for non-commercial uses under a MIT License and shared through an open repository at https://github.com/jaklevab/gryzzly. The software is written in Python. The repository includes comprehensive documentation to ensure reproducibility of the analyses. Dependencies are managed via a requirements.txt file, and installation instructions are provided in the README file. The dataset is openly accessible online at https://doi.org/10.6084/m9.figshare.28114247.v2. For further questions, please contact the corresponding author.
References
Hughes, D. L., Rana, N. P. & Simintiras, A. C. The changing landscape of is project failure: an examination of the key factors. J. Enterp. Inf. Manag. 30, 142–165 (2017).
Vazquez, A., Pozzana, I., Kalogridis, G. & Ellinas, C. Activity networks determine project performance. Sci. Rep. 13, 509 (2023).
Dick-Sagoe, C., Lee, K. Y., Odoom, D. & Boateng, P. O. Stakeholder perceptions on causes and effects of public project failures in Ghana. Humanit. Soc. Sci. Commun. 10, 1–9 (2023).
Balka, K., Breanna, H. & Risse-Tenk, S. Unlocking the potential of public-sector IT projects — mckinsey.com. https://www.mckinsey.com/industries/public-sector/our-insights/unlocking-the-potential-of-public-sector-it-projects (2022).
Park, J. E. Schedule delays of major projects: what should we do about it? Transp. Rev. 41, 814–832 (2021).
Fortunato, S. et al. Science of science. Science 359 (2018).
Klug, M. & Bagrow, J. P. Understanding the group dynamics and success of teams. Royal Soc. Open Sci. 3 (2016).
Schueller, W., Wachs, J., Servedio, V. D., Thurner, S. & Loreto, V. Evolving collaboration, dependencies, and use in the rust open source software ecosystem. Sci. Data 9, 703 (2022).
Santolini, M., Ellinas, C. & Nicolaides, C. Uncovering the fragility of large-scale engineering projects. EPJ Data Sci. 10, 36 (2021).
Moran, J. et al. Timeliness criticality in complex systems. Nat. Phys. 1–7 (2024).
Saramäki, J. Critical delay accumulation. Nat. Phys. 1–2 (2024).
Panja, T. How global supply chains became so vulnerable to local events (2024). Published on The Academic, [Accessed: 2025-01-17].
Yin, Y., Wang, Y., Evans, J. A. & Wang, D. Quantifying the dynamics of failure across science, startups and security. Nature 575, 190–194 (2019).
ADS CAS PubMed MATH Google Scholar
Levy Abitbol, J. & Arod, L. Gryzzly dataset, https://doi.org/10.6084/m9.figshare.28114247.v2 (2024).
Lazer, D. et al. Computational social science. Science 323, 721–723 (2009).
Almaatouq, A., Alsobay, M., Yin, M. & Watts, D. J. Task complexity moderates group synergy. Proc. Natl. Acad. Sci. 118 (36) (2021).
Muric, G., Abeliuk, A., Lerman, K. & Ferrara, E. Collaboration drives individual productivity. Proc. ACM on HumanComputer Interact. 3, 1–24 (2019).
Jo, H.-H. et al. Circadian pattern and burstiness in human communication activity. New J. Phys. 14 (2012).
Leo, Y., Busson, A., Sarraute, C. & Fleury, E. Call detail records to characterize usages and mobility events of phone users. Comput. Commun. 95, 43–53 (2016).
Dai, S. et al. Longitudinal data collection to follow social network and language development dynamics at preschool. Sci. Data 9, 777 (2022).
Xu, F., Wu, L. & Evans, J. Flat teams drive scientific innovation. Proc. Natl. Acad. Sci. 119 (2022).
Ellinas, C., Allan, N. & Johansson, A. Toward project complexity evaluation: A structural perspective. IEEE Syst. J. 12, 228–239 (2016).
Ellinas, C., Avraam, D. & Nicolaides, C. Neglecting complex network structures underestimates delays in a large-capital project. J. Physics: Complex. 4 (2023).
Acknowledgements
The authors thank the Gryzzly team members for supporting this research and for providing us with the opportunity to present this dataset.
Author information
Authors and Affiliations
Gryzzly, Lyon, France
Jacob Levy Abitbol&Louis Arod
Authors
- Jacob Levy Abitbol
View author publications
You can also search for this author inPubMedGoogle Scholar
- Louis Arod
View author publications
You can also search for this author inPubMedGoogle Scholar
Contributions
J.L.A. and L.A. conceived and designed the project. L.A. collected the dataset. J.L.A. performed the data aggregation, the validation of the data, and wrote the manuscript. All authors reviewed the manuscript. All authors have read and approved the submitted version.
Corresponding author
Correspondence to Jacob Levy Abitbol.
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Abitbol, J.L., Arod, L. Seven years of time-tracking data capturing collaboration and failure dynamics: the Gryzzly dataset. Sci Data 12, 578 (2025). https://doi.org/10.1038/s41597-025-04903-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-04903-2