// PROJECT CASE STUDY

ANONYMISED ACCESS DATA FORECASTING

Data Analyst Apprenticeship EPA

Locker Room Access Patterns & Forecasting

An anonymised analytics case study showing how access-control event data can reveal demand concentration, recurring usage patterns, and the strengths and limits of short-term forecasting for shared facilities.

Source: access-control event logs Scope: two shared facilities Tools: Python, Excel, ARIMA Focus: pattern analysis + forecasting

Key Outcomes

59,834

Access events in the extended extract used for the main pattern analysis, covering 14 April 2024 to 14 October 2024.

32.4%

Of all events fell between 05:00 and 08:59, making the early-shift window the clearest operational pressure point.

371.8

Average Tuesday daily events in the extended extract, the highest weekday average in the source data.

ARIMA

Delivered lower MAE than SARIMA in both saved modelling passes, making it the more reliable baseline from the files provided.

Context

This work explored whether access-control events from shared locker room areas could be used as a practical proxy for operational demand. The goal was not to infer occupancy minute by minute, but to show where access clustered, how patterns repeated through the week, and whether forecasting daily volumes could support planning decisions.

Scope

The source files cover two iterations of the analysis: an initial combined extract with 12,858 events from 28 November 2023 to 31 January 2024, and a later combined extract with 59,834 events from 14 April 2024 to 14 October 2024. This page uses the extended extract for the main access-pattern visuals, while the notebooks provide the modelling approach and saved forecast metrics.

Tools & Techniques

The notebooks use a simple, maintainable Python workflow built around event cleaning, time aggregation, exploratory plotting, and hold-out forecasting tests on daily totals.

Python

Core environment for cleaning, aggregation, exploratory plotting, and time-series modelling.

Pandas

Used in the notebooks for column cleaning, date parsing, concatenation, and daily resampling.

Seaborn / Matplotlib

Used for histograms, weekday counts, box plots, daily trend charts, and model-output plots.

ARIMA

Applied as the main baseline model with an order of (5,1,0) on daily activity counts.

SARIMA

Tested with weekly seasonality using a seasonal order of (1,1,0,7) to check whether recurring weekly rhythm improved fit.

Prophet

Explored in the modified notebook for trend and seasonality review, with forecast and component plots saved in the notebook output.

Time-Series Forecasting

Daily event counts were split chronologically into training and test windows using an 80:20 hold-out approach.

Data Cleaning

Event files were combined, parsed, checked for validity flags, and prepared for repeatable descriptive analysis.

Timestamp Rounding

Raw access times were rounded down into 10-minute blocks to make pattern concentration easier to compare.

Visuals

The charts below are redrawn from anonymised aggregates in the workbook extracts and the saved notebook metrics. They preserve the analytical shape of the work without publishing room labels, names, card references, or raw event trails.

Anonymised aggregate

Access Activity by Hour of Day

The combined extract shows a clear twin-peak profile, with the strongest concentration around 07:00 to 08:00 and a second surge around 19:00 to 20:00. Midday use remains active, but materially lower than the shift-edge peaks.

Average Daily Activity by Weekday

Tuesday and Wednesday are the busiest average days in the extended extract. Weekend volumes are lower, but still substantial enough to matter for shared-facility planning.

Daily Activity Trend Across the Extended Extract

Average daily activity increased materially from spring into late summer and early autumn. October remains high, although the source extract only runs to 14 October.

Notebook accuracy output

Forecast Model Comparison

The saved notebooks provide comparable error metrics for ARIMA and SARIMA. In both the shorter and longer extracts, the simpler ARIMA baseline produced lower MAE and RMSE than the tested SARIMA configuration. Prophet was fitted and plotted, but the saved output does not include matching error metrics, so it is treated here as exploratory rather than scored.

Constraints

Access events are not the same as occupancy, unique users, or dwell time. A person can scan multiple times in a day, and the source files only describe entry events for two rooms rather than end-to-end facility flow. The notebooks also model daily counts without linked rota, holiday, or closure variables, so the forecasts should be treated as directional rather than operationally complete.

Method

Combined the two room extracts into a shared time-series view while removing person-level and room-level detail from the portfolio output.
Parsed event dates and times, derived weekday labels, and rounded raw timestamps into 10-minute blocks.
Explored patterns by hour of day, weekday, and daily totals across both notebook iterations.
Resampled the event stream to daily counts for forecasting.
Tested ARIMA(5,1,0), SARIMA(5,1,0)x(1,1,0,7), and an exploratory Prophet model using an 80:20 chronological split.

Findings

The strongest access concentration sat around shift-edge periods rather than being spread evenly through the day. The 05:00 to 08:59 window alone accounted for 32.4% of all events in the extended extract.
07:00 was the busiest single hour, representing 13.6% of all recorded events. A second recurring concentration appeared at 19:00.
Tuesday and Wednesday showed the highest average daily activity, while weekends were quieter but still accounted for 22.1% of total events.
Average daily volumes rose from 197 events in April to 421 in September, indicating that baseline demand was not static across the observation period.
97.45% of records in the extended extract were marked as valid events, with smaller invalid and void categories that would need separate operational treatment if used in live monitoring.

Forecasting Approach

The notebooks convert the event stream into daily totals and hold back the final 20% of observations as a test window. ARIMA and SARIMA are both fitted on the training segment and evaluated against the hold-out period with MAE, RMSE, and MSE. Prophet is then fitted in the modified notebook for an additional trend and seasonality view, but the saved output does not publish equivalent error scores.

Forecasting Findings

The source material supports a cautious but clear forecasting story. Simpler daily-count modelling performed better than the tested seasonal alternative, while Prophet remained exploratory in the saved notebooks.

ARIMA Baseline

On the extended extract, ARIMA recorded MAE 52.5 and RMSE 71.4. On the earlier shorter extract, it recorded MAE 25.6 and RMSE 33.7.

Supported directly by saved notebook output.

SARIMA Test

The weekly seasonal specification underperformed in both saved runs, with MAE 89.7 and RMSE 106.3 on the extended extract, and MAE 30.1 and RMSE 39.6 on the shorter extract.

Useful as a comparison, but not the best performer from the files provided.

Prophet Exploration

The modified notebook fits a Prophet model and saves both forecast and component plots. Because no matching MAE or RMSE is printed, this page treats Prophet as explored rather than ranked.

Flagged explicitly to avoid overstating the evidence.

Potential Operational Opportunities

The case study suggests practical uses for access data beyond simple reporting, especially where a shared facility has repeatable surge periods and linked operational dependencies.

Target Peak Windows

The strongest demand sits around early-shift and evening windows, which gives a clearer basis for staffing, support cover, or cleaning availability.

Plan for Repeatable Weekday Pressure

Tuesdays and Wednesdays appear consistently heavier, so those days are the natural starting point for capacity checks or local timetable review.

Monitor Growth in Baseline Demand

The rise from spring into late summer suggests that static assumptions about facility pressure would miss real operational change.

Use Forecasts for Short-Term Readiness

Even a simple daily model can provide an early warning baseline for unusual spikes when paired with local operational context.

Improve Shared-Facility Planning

Linked rota, calendar, or closure data would make it easier to move from descriptive analysis into more decision-ready forecasting.

Privacy and Anonymisation

The raw workbook extracts contain personal names, card-validation messages, room labels, workgroup labels, and card-related identifiers. None of that detail is reproduced here. This page removes or generalises personal and site-specific fields, combines the two source rooms into a shared anonymised view, and redraws charts from aggregated counts so the analytical structure remains representative without exposing sensitive operational detail.

Recommendations

Treat access data as a demand signal, then pair it with rota, calendar, and closure data before using forecasts for operational decisions.
Build routine monitoring around the recurring 07:00 to 08:00 and 19:00 to 20:00 peaks, where pressure is most concentrated.
Keep a simple ARIMA baseline in place for daily totals, and only add more complex seasonality once more explanatory operational inputs are available.
Separate valid, invalid, and void event categories in future reporting so exception activity does not distort demand measures.
Maintain publication-safe reporting through aggregation and anonymised redraws if the work is shared outside the operational team.