北京地铁客流数据特征分析
对于特征分析及特征提取工作而言,则主要从两方面挖掘潜在的表征信息,并在两个关键维度进行目标识别划分的基础上形成了四种类别的表征数据
首先,在站点维度出发,则考察发现同一类型日期呈现相似的客流量变化模式;但这种波动性的存在背后又有什么内在原因呢?其次,在用户行为维度展开研究,则发现用户的运动轨迹往往呈现高度一致性——即客流量稳定性
在时空维度分别对应节假日与工作日的不同特点;而在空间维度则根据地理位置与功能属性的差异进行分类
最终归纳出以下四类典型客流量特性:
- 基本特性
- 时间相关规律性影响特性
- 历史时间相关规律性影响特性
- 空间地理信息相关分类特性
To construct the characteristics of passenger flow forecasting within the framework of machine learning-based passenger flow prediction, it becomes imperative to examine its evolving characteristics and key influencing factors with a focus on comprehensively understanding passenger flow data. By deriving some physically significant features from original and accessible information, it enables subway passenger flow forecasting to effectively utilize these features.
Feature analysis overview
从整体上来看,在特征构造方面尚未有统一的理论体系和标准。因此,在进行特征构造时需要对问题有深入的理解和把握。其目的在于构建与预测目标相关的特征;作为后续特征选择的基础集合,在这一阶段应该尽可能多地构建相关联的特征,并包含会对预测结果产生显著影响的因素;这样才能为后续的选择留下足够的空间和发展余地。
基本特征:通用特征是用于非特殊情况下的乘客流预测。
如果论文中常规特征的构建旨在采用当前研究中常用的乘客流预测问题中的特征,并基于经验进行预测判断,则应重点选择与实际应用相关的定量指标,并结合定性分析方法对相关数据进行系统评估。
Time features: the commuting characteristics of conventional passenger flow. These aspects encompass date types and peak periods. Data types such as weekdays, weekends, and holidays are included, alongside high periods categorized into normal peaks, significant peaks, and special peaks.
(3) Historical characteristics of historical passenger flows exhibit a sequential relationship among variations in patterns. The variation in patterns has been an ever-changing and unbroken process over time. The current state reflects the prior shift in patterns.
Geographic information characteristics: They aim to elaborate the elements of geographic information that can influence passenger flow.
Basic feature
Based on the fundamental data derived from data preprocessing activities conducted earlier in the process chain. This foundational dataset comprises several key variables: station ID (station identifier), day (date), hour (time slot), minute (specific time point), week (weekly phase), time cut (specific cutoff time), incoming numbers (inbound count) and outgoing numbers (outbound count). Among these variables: The incoming numbers variable signifies the count of inbound events within this timeframe. The outgoing numbers variable indicates the count of outbound events within this timeframe.
Basic features
Various kinds of fundamental data exist. The basic stations composed of weeks、days、hours、and minutes form the time components.
(2) 时间段
变量t表示将一天划分为每分钟的时间段所得出的索引项。本文的目标输出旨在反映乘客流量在每分钟内的预测值。因为每分钟能够反映时间的变化并提供站内运营响应时间,在这种情况下取用一分钟作为乘客流量预测的时间粒度较为合适;一天可以划分为144个时间段

(3) Weeks
Use W (t) to indicate the day of the week.

Time feature
用于预测城市轨道交通乘客流的模型中,乘客流也会受到常规时间因素的影响。因此,在此基础上分析常规乘客流变化所受的时间影响因素并构建时间特征是必要的。time feature 的构建基础如下:
Date type: Concerning how time types are represented, they can be categorized into three distinct categories: weekdays, weekends, and holidays (Bai et al., 2017). As visualized in Figure 1 below, there is a significant disparity between weekend traffic volumes and typical traffic volumes. Therefore, weekdays and weekends should be considered as distinct characteristics to ensure that feature extraction aligns more closely with real-world observations.

From Figure 1, it can be observed that Day 1 is designated as a holiday. It is clearly evident that passenger throughput during holidays exhibits notable differences compared to typical weekends and regular weekdays.
Therefore, weekdays, weekends, and holidays should be considered as established characteristics of the date type.

Refer to Formula (10) for the definition. Date types determine that each day falls into one of three categories based on its nature: working days, weekends, or holidays. These categories are denoted by characteristic variables F(t).

(2) Peak time
The event types within a typical day exhibit fluctuations characterized by distinct peaks and troughs. It can be observed from Figure 1 that two prominent peaks occur daily: one during early morning hours (morning period) and another during late afternoon/evening hours (evening period). Notably, an additional minor fluctuation is noticeable between these two main periods; I classify this occurrence as notable.

Historical passenger flow feature
The long-term passenger flow characteristic, which constitutes a kind of time window characteristic, serves as an advanced characteristic due to its close relationship with previous values over a period of time. When dealing with sequential data, constructing a weighted value for brief moments becomes essential. Through scrolling through these windows, intricate features emerge incrementally for each distinct temporal point within them.
Passenger flow in adjacent periods exhibits a certain degree of temporal correlation with the preceding and succeeding periods. The feature set for the first two and last two sub-periods within the current period is selected to represent this temporal pattern. These features are derived from inNums_before1, inNums_before2, inNums_after1, inNums_after2, outNums_before1, outNums_before2, outNums_after1, and outNums_after2 to comprehensively capture the passenger flow dynamics within each sub-period.
(2) Adjacent day's passenger flow The maximum, average, and minimum values exhibit certain data characteristics (Liu et al., 2017). Similarly, both inbound and outbound traffic peaks, averages, and lows within the same week are incorporated as features.
前一周的乘客流量变化情况及其分布特征也可以通过该站同时间段内进出站的乘客流量进行观察(Luo, 2017)。
Geographic information feature
The aspect of geographic information primarily encompasses elements such as station type, station attributes, station equipment and passenger flow stability among others. In the design process related to geographic information systems, these features have been appropriately integrated into all aspects. From a standpoint encompassing geographical positioning, functional roles and stable passenger flow characteristics, sufficient attention has been directed towards their consideration.
由于各站点类型各异,在此分类中包含起始站、换乘站及普通站等类型站点。由于各站点类型各异,在此分类中包含起始站、换乘站及普通站等类型站点。相邻站点数量可作为此类站点类型的量化指标。通过道路网络图示即可统计出各站点的邻接数量,并据此划分各类别站点类型。起始站只有一个邻接站点;普通车站由两个邻接车站组成;而换乘车站则拥有超过两个邻接车站。
(2)Station attributes
各站点也具有各自独特的特征。依据不同的特性,可以将其划分为商业性质、工作性质以及住宅性质。参考图17可知,各站点之间的乘客流量规律显著不同。某些站点周末的客流量显著高于工作日的客流量,因此可以被归类为商业性质的站点。


It is evident from Figure 1 that certain subway stations exhibit significantly higher weekday than weekend passenger traffic; these locations can thus be categorized as either residential or work areas. Additionally, it is worthwhile to examine not only the variation in weekday versus weekend traffic but also how entry and exit points are distributed throughout each day. As shown in Figures 2 and 3; certain areas display notable early-morning inbound traffic alongside late-evening outbound traffic; these locations are typically designated as work zones. However; contrary to this (Ni; 2016); if there is significant morning outbound traffic coupled with substantial night inbound traffic; then such regions will instead be classified as residential areas.


Consideringthe features associated with the swipe device, the number of devices at each location is linked to human traffic at that location, while**[space]the number of devices during specific time segments also maintains a proportional connection** to human traffic.
(4) 客户流量稳定性
每天同一站点进出人数的比例占该站点日进站人数的比例较高。各站点的客流量占比相对稳定,在某些站点已达到70%,每日波动较为规律。可以认为,在轨道交通等公共交通工具使用者中居住较近的居民群体是主要群体。即每天同一地点进出人数中的大部分属于同一个人群。可以用"乘客流稳定性"这一指标来衡量其特性

Summary
In the context of feature analysis and extraction, we primarily extract features from two distinct angles and differentiate the targets across two unique dimensions, thereby categorizing four distinct types of features.
From two different perspectives: first, from a website standpoint, daily passenger flows exhibit no variation compared to previous days; secondly, investigate what factors contribute to these changes. Then, shift focus to user behavior: most users exhibit consistent travel patterns; in other words, passenger flow demonstrates stability. In terms of temporal aspects: identify how holidays and peak hours influence traffic patterns. From a spatial standpoint: discern how differing geographical regions and functional attributes affect traffic distribution. By examining these two dimensions and their intersecting angles and dimensions (time-based vs. space-based), we can categorize passengers into four distinct feature types: basic features encompassing fundamental characteristics; time features reflecting temporal dynamics; historical time features capturing past trends; geographic information features encapsulating spatial distributions.
