简介:本文详细解析多因子量化选股的Python实现方法,涵盖因子设计、数据处理、模型构建及回测优化,提供可直接复用的代码框架与策略优化建议。
多因子量化选股通过构建包含多个财务、市场或技术指标的因子模型,筛选出预期收益高于市场组合的股票。其核心逻辑基于”风险-收益”的量化分解:优质股票的超额收益可由特定因子组合解释。例如,Fama-French三因子模型(市场因子、规模因子、价值因子)证明小市值、高账面市值比(BM)的股票长期跑赢大盘。
import pandas as pdimport numpy as npimport yfinance as yf # 示例数据源,实际可用Tushare、Wind等# 获取股票数据(示例:沪深300成分股)def fetch_stock_data(tickers, start_date, end_date):data = {}for ticker in tickers:df = yf.download(ticker, start=start_date, end=end_date)df['Ticker'] = tickerdata[ticker] = dfreturn pd.concat(data.values())# 计算财务因子(示例:市盈率PE)def calculate_pe_ratio(price_data, fundamental_data):merged = pd.merge(price_data, fundamental_data, on=['Ticker', 'Date'])merged['PE'] = merged['MarketCap'] / merged['NetIncome'] # 简化计算return merged
# 示例:动量因子(过去6个月收益率)def calculate_momentum(price_data, window=120):price_data['Momentum'] = price_data.groupby('Ticker')['Close'].transform(lambda x: x.shift(1) / x.shift(window + 1) - 1)return price_data# 因子标准化(Z-score)def standardize_factors(df, factors):for factor in factors:df[f'{factor}_Z'] = (df[factor] - df[factor].mean()) / df[factor].std()return df
# 等权合成因子得分def composite_factor_score(df, factors, weights=None):if weights is None:weights = {f: 1/len(factors) for f in factors}df['Composite_Score'] = sum(df[f'{f}_Z'] * weights[f] for f in factors)return df.sort_values('Composite_Score', ascending=False)# 选择Top 20股票def select_top_stocks(df, n=20):return df.groupby('Date').head(n)
# 回测函数:计算组合收益def backtest_portfolio(selected_stocks, benchmark_returns):portfolio_returns = []for date in selected_stocks['Date'].unique():subset = selected_stocks[selected_stocks['Date'] == date]# 假设等权配置weights = 1 / len(subset)daily_returns = subset['Close'].pct_change().mean() * weightsportfolio_returns.append(daily_returns)portfolio_returns = pd.Series(portfolio_returns, index=selected_stocks['Date'].unique())annualized_return = (1 + portfolio_returns.mean()) ** 252 - 1sharpe_ratio = portfolio_returns.mean() / portfolio_returns.std() * np.sqrt(252)return {'Annualized_Return': annualized_return,'Sharpe_Ratio': sharpe_ratio,'Max_Drawdown': (portfolio_returns.cumsum().max() - portfolio_returns.cumsum().min()) /portfolio_returns.cumsum().max()}
def trainfactor_model(X, y):
model = GradientBoostingRegressor(n_estimators=100)
model.fit(X, y) # y为未来1个月收益率
return model.feature_importances # 输出因子权重
#### 3.2 行业与风格中性化- **行业中性**:按行业分组计算因子得分,避免行业暴露。- **市值中性**:将股票分为大、中、小盘组,分别计算因子排名。```python# 行业中性化示例def industry_neutralize(df, industry_map):df['Industry'] = df['Ticker'].map(industry_map)neutralized_scores = []for industry in df['Industry'].unique():subset = df[df['Industry'] == industry]subset['Neutral_Score'] = subset['Composite_Score'] - subset['Composite_Score'].mean()neutralized_scores.append(subset)return pd.concat(neutralized_scores)
# 滑点模拟(简化版)def simulate_slippage(order_price, volume, avg_volume):slippage = 0.001 * (1 - min(volume / avg_volume, 1)) # 成交量越低,滑点越高return order_price * (1 + slippage)
| 指标 | 计算公式 | 意义 |
|---|---|---|
| 年化收益 | (1 + R)^252 - 1 | 长期收益能力 |
| 夏普比率 | (Rp - Rf)/σp * √252 | 风险调整后收益 |
| 最大回撤 | (Peak - Trough)/Peak | 极端风险承受能力 |
| 信息比率 | (Rp - Rb)/Tracking_Error | 相对于基准的超额收益能力 |
多因子量化选股的核心在于持续迭代与严格风控。通过Python实现自动化因子计算、组合构建和回测评估,可显著提升策略研发效率。实际部署时需结合实盘交易接口(如华泰证券的PTrade)和风险管理模块,形成完整的量化交易系统。